To develop, implement, and optimize complex Data Warehouse (DWH) and Data Lakehouse solutions using the Databricks platform (including Delta Lake, Unity Catalog, and Spark) to ensure a scalable, high-performance, and governed data foundation for analytics, reporting, and Machine Learning.
Responsibilities
A. Databricks Development and Architecture
- Advanced Design and Implementation: Design and implement robust, scalable, and high-performance ETL/ELT data pipelines using PySpark/Scala and Databricks SQL on the Databricks platform.
- Delta Lake: Expertise in implementing and optimizing the Medallion architecture (Bronze, Silver, Gold) using Delta Lake to ensure data quality, consistency, and historical tracking.
- Lakehouse Platform: Efficient implementation of the Lakehouse architecture on Databricks, combining best practices from DWH and Data Lake.
- Performance Optimization: Optimize Databricks clusters, Spark operations, and Delta tables (e.g., Z-ordering, Compaction, Tuning Queries) to reduce latency and computational costs.
- Streaming: Design and implement real-time/near-real-time data processing solutions using Spark Structured Streaming and Delta Live Tables (DLT).
B. Governance and Security
- Unity Catalog: Implement and manage Unity Catalog for centralized data governance, fine-grained security (row/column-level security), and data lineage.
- Data Quality: Define and implement data quality standards and rules (e.g., using DLT or Great Expectations) to maintain data integrity.
C. Operations and Collaboration
- Orchestration: Develop and manage complex workflows using Databricks Workflows (Jobs) or external tools (e.g., Azure Data Factory, Airflow) to automate pipelines.
- DevOps/CI/CD: Integrate Databricks pipelines into CI/CD processes using tools like Git, Databricks Repos, and Bundles.
- Collaboration: Work closely with Data Scientists, Analysts, and Architects to understand business requirements and deliver optimal technical solutions.
- Mentorship: Provide technical guidance and mentorship to junior developers and promote best practices.