The Data Engineer will be responsible for designing, developing, and maintaining scalable and reliable data pipelines for a financial services project. The role focuses on backend data processing, data quality, and integration of multiple data sources in a cloud-based environment, working closely with international teams.
Key Responsibilities
- Design, develop, and maintain end-to-end ETL/ELT data pipelines to process large volumes of structured and semi-structured data.
- Implement backend data solutions using Python and SQL, applying Object-Oriented Programming (OOP) to ensure modularity, reusability, and maintainability.
- Orchestrate data workflows using Apache Airflow, including scheduling, monitoring, and failure handling.
- Process and transform large datasets using PySpark in distributed environments.
- Integrate data from multiple sources, including APIs, relational databases, and cloud storage systems.
- Manage and utilize AWS S3 for data storage and data lake architectures.
- Apply data quality checks, validation rules, and deduplication logic to ensure data consistency and accuracy.
- Develop, maintain, and support CI/CD pipelines using Bitbucket, ensuring controlled deployments, versioning, and code quality.
- Collaborate with cross-functional and international teams, contributing to technical discussions and documentation in English.
- Support downstream data consumers by ensuring datasets are well-structured, documented, and ready for analytics or reporting.
- Troubleshoot and resolve data pipeline issues, performance bottlenecks, and data inconsistencies.