Senior Databricks Data Engineer
Senior Databricks Data Engineer
Open-ended contract
Beschrijving van het bedrijf
Organization
Our Mission Statement
Digital and human resources at the center of the sustainable development of our society.
In a world of continuous transformation, accelerated by technological developments and societal challenges, it is necessary to adapt in an ongoing, agile way to meet the challenges of the future.
About Inetum
Inetum is a European leader in digital services. Inetum’s team of 27,000 consultants and specialists strive every day to make a digital impact for businesses, public sector entities and society. Inetum’s solutions aim at contributing to its clients’ performance and innovation as well as the common good.
Present in 19 countries with a dense network of sites, Inetum partners with major software publishers to meet the challenges of digital transformation with proximity and flexibility. Driven by its ambition for growth and scale, Inetum generated sales of 2.4 billion euros in 2024.
For further information, please visit www.inetum.com
Functieomschrijving
To develop, implement, and optimize complex Data Warehouse (DWH) and Data Lakehouse solutions using the Databricks platform (including Delta Lake, Unity Catalog, and Spark) to ensure a scalable, high-performance, and governed data foundation for analytics, reporting, and Machine Learning.
Responsibilities
A. Databricks Development and Architecture
- Advanced Design and Implementation: Design and implement robust, scalable, and high-performance ETL/ELT data pipelines using PySpark/Scala and Databricks SQL on the Databricks platform.
- Delta Lake: Expertise in implementing and optimizing the Medallion architecture (Bronze, Silver, Gold) using Delta Lake to ensure data quality, consistency, and historical tracking.
- Lakehouse Platform: Efficient implementation of the Lakehouse architecture on Databricks, combining best practices from DWH and Data Lake.
- Performance Optimization: Optimize Databricks clusters, Spark operations, and Delta tables (e.g., Z-ordering, Compaction, Tuning Queries) to reduce latency and computational costs.
- Streaming: Design and implement real-time/near-real-time data processing solutions using Spark Structured Streaming and Delta Live Tables (DLT).
B. Governance and Security
- Unity Catalog: Implement and manage Unity Catalog for centralized data governance, fine-grained security (row/column-level security), and data lineage.
- Data Quality: Define and implement data quality standards and rules (e.g., using DLT or Great Expectations) to maintain data integrity.
C. Operations and Collaboration
- Orchestration: Develop and manage complex workflows using Databricks Workflows (Jobs) or external tools (e.g., Azure Data Factory, Airflow) to automate pipelines.
- DevOps/CI/CD: Integrate Databricks pipelines into CI/CD processes using tools like Git, Databricks Repos, and Bundles.
- Collaboration: Work closely with Data Scientists, Analysts, and Architects to understand business requirements and deliver optimal technical solutions.
- Mentorship: Provide technical guidance and mentorship to junior developers and promote best practices.
Functie-eisen
A. Mandatory Knowledge (Expert Level)
- Databricks Platform: Proven, expert-level experience with the entire Databricks ecosystem (Workspace, Cluster Management, Notebooks, Databricks SQL).
- Apache Spark: In-depth knowledge of Spark architecture (RDD, DataFrames, Spark SQL) and advanced optimization techniques.
- Delta Lake: Expertise in implementing and managing Delta Lake (ACID properties, Time Travel, Merge, Optimize, Vacuum).
- Programming Languages: Advanced/expert-level proficiency in Python (with PySpark) and/or Scala (with Spark).
- SQL: Advanced/expert-level skills in SQL and Data Modeling (Dimensional, 3NF, Data Vault).
- Cloud: Solid experience with a major Cloud platform (AWS, Azure, or GCP), especially with storage services (S3, ADLS Gen2, GCS) and networking.
B. Additional Knowledge (Major Advantage)
- Unity Catalog: Hands-on experience with implementing and managing Unity Catalog.
- Lakeflow: Experience with Delta Live Tables (DLT) and Databricks Workflows.
- ML/AI Concepts: Understanding of basic MLOps concepts and experience with MLflow to facilitate integration with Data Science teams.
- DevOps: Experience with Terraform or equivalent tools for Infrastructure as Code (IaC).
- Certifications: Databricks certifications (e.g., Databricks Certified Data Engineer Professional) are a significant advantage.
C. Education and Experience
- Education: Bachelor’s degree in Computer Science, Engineering, Mathematics, or a relevant technical field.
- Professional Experience: Minimum of 5+ years of experience in Data Engineering, with at least 3+ years of experience working with Databricks and Spark at scale.
Aanvullende informatie
Benefits
- Full access to foreign language learning platform
- Personalized access to tech learning platforms
- Tailored workshops and trainings to sustain your growth
- Medical insurance
- Meal tickets
- Monthly budget to allocate on flexible benefit platform
- Access to 7 Card services
- Wellbeing activities and gatherings
Working model: hybrid - 2 days at the office
Country
Romania
Location
Bucharest
Werknemers kunnen op afstand werken
Contract type
Open-ended contract