Join our team to leverage your data engineering skills in a dynamic environment, ensuring seamless data migration and optimization for advanced AI and ML projects. Apply now to be part of our innovative journey!
Key responsibilities
Data pipeline development:
- Design, develop, and deploy Python-based ETL/ELT pipelines to migrate data from the on-premises MS SQL Server into the Databricks instance,
- Ensure efficient ingestion of historical parquet datasets into Databricks.
Data quality & validation:
- Implement validation, reconciliation, and quality assurance checks to ensure accuracy and completeness of migrated data,
- Handle schema mapping, field transformations, and metadata enrichment to standardize datasets,
- Ensure data governance, quality assurance, and compliance are integral to all migration activities.
Performance optimization:
- Tune pipelines for speed and efficiency, leveraging Databricks capabilities such as Delta Lake when appropriate,
- Manage resource usage and scheduling for large dataset transfers.
Collaboration:
- Work closely with AI engineers, data scientists, and business stakeholders to define data access patterns required for upcoming AI POCs,
- Partner with infrastructure teams to ensure secure connection between legacy systems and Databricks.
Documentation & governance:
- Maintain technical documentation for all data pipelines,
- Adhere to data governance, compliance, and security best practices throughout the migration process.