Mission

The Data Engineer is responsible for building and maintaining the infrastructure that supports the organization’s data architecture. The role involves creating and managing data pipelines using Airflow for data extraction, processing, and loading, ensuring their maintenance, monitoring, and stability.

The engineer will work closely with data analysts and end-users to provide accessible and reliable data.

What we expect from the candidate?

Candidate must be able to use Unix, must know how to use Unix commands to check processes, to read files, processes and run bash commands. Candidate needs to know how to access a Unix server and perform commands there. If some process is not running , needs to check the server to see what might be going on. For example, if a Hadoop/yarn process is not running of if some container for Airflow is not up, need to know how to investigate.
Candidate must know how to list Docker containers, how to build Docker images, how to change current images to add or remove things, how to use and map volumes. Must know how to maintain and setup a distributed Airflow environment using Docker, need to know how to build custom Docker images using Airflow image as base.
We strongly expect that the candidate knows Airflow , knows Airflow components, knows how to identify possible issues in the servers and fix them, knows how to add more workers to the cluster. Need to make sure the containers are running fine in the servers and if any issue, need to be able to fix.
Candidate must know how to maintain a Hadoop/Yarn cluster with Spark. Need to know which processes need to run in the servers, how to set up the xml files for Hadoop and Yarn, how to perform commands in HDFS. Need to be able to add a new worker in the Hadoop Cluster, if necessary, fix any possible issues in the servers. Need to know how to read the logs from Yarn and HDFS. Must know and understand how Spark works using Yarn as the resource manager.
Candidate must know how to develop in Python, how to manage packages with pip, review PRs from other people in the team and how to maintain and use a Flask API.
Candidate must know SQL, how to run queries with CTEs, window functions , mainly Oracle database.

Main Tasks:

Responsible for maintaining the infrastructure that supports the current data architecture
Responsible for creating data pipelines in Airflow for data extracting, processing and loading
Responsible for data pipelines maintenance, monitoring and stability
Responsible for providing data access to data analysts and end-users
Responsible for DevOps infrastructure
Responsible for deploying Airflow dags to production environment using DevOps tools
Responsible for code and query optimization
Responsible for data pipelines maintenance, monitoring and stability
Responsible for code review"
Responsible for improving the current data architecture and DevOps processes
Responsible for delivering data in useful and appealing ways to users
Responsible for performing and documenting analysis, review and study on specified regulatory topics
Responsible for understanding business change and requirement needs, assess the impact and the cost.

Profile

Technical Skills:

Python
Experience in creating APIs in Python
PySpark
Spark Environment Architecture
SQL, Oracle Data Base
Experience in creating and maintaining distributed environments using Hadoop and Spark
Hadoop ecosystem - HDFS + Yarn
Containerization - Docker is Mandatory
Data Lakes - Experience in organizing and maintaining data lakes - S3 is preferred
Experience with Parquet file format
Apache Airflow - Experience in both pipeline development and deploying Airflow in distributed environment
Apache Kakfa
Experience in automating applications deployment using DevOps tools - Jenkins is Mandatory, Ansible is a plus

Language Skills

English

Organization

Inetum is a European leader in digital services. Inetum’s team of 28,000 consultants and specialists strive every day to make a digital impact for businesses, public sector entities and society. Inetum’s solutions aim at contributing to its clients’ performance and innovation as well as the common good.

Present in 19 countries with a dense network of sites, Inetum partners with major software publishers to meet the challenges of digital transformation with proximity and flexibility.

Driven by its ambition for growth and scale, Inetum generated sales of 2.5 billion euros in 2023.

Data Engineer

Open-ended contract

Join us to live your digital impact!

Join us to live your digital impact!

Join us to live your digital impact!

Join us to live your digital impact!

Join us to live your digital impact!

Join us to live your digital impact!

Join us to live your digital impact!

Join us to live your digital impact!

Join us to live your digital impact!

Join us to live your digital impact!

Join us to live your digital impact!

Join us to live your digital impact!

Join us to live your digital impact!

Join us to live your digital impact!

Join us to live your digital impact!

Join us to live your digital impact!

Join us to live your digital impact!

Join us to live your digital impact!