Data engineers are mainly responsible for designing, building, managing, and operationalizing data pipelines to support key data and analytics use cases
They play a crucial role in constructing and maintaining a modern, scalable data platform that utilizes the full capabilities of a Lakehouse Platform
You will be a key contributor to our data-driven organization, playing a vital role in both building a modern data platform and maintaining our Enterprise Data Warehouse (EDW)
You will leverage your expertise in the Lakehouse Platform to design, develop, and deploy scalable data pipelines using modern and evolving technologies
Simultaneously, you will take ownership of the EDW architecture, ensuring its performance, scalability, and alignment with evolving business needs
Your responsibilities will encompass the full data lifecycle, from ingestion and transformation to delivery of high-quality datasets that empower analytics and decision-making
Duties and responsibilities
Build data pipelines using Azure Databricks:
Build and maintain scalable data pipelines and workflows within the Lakehouse environment
Transform, cleanse, and aggregate data using Spark SQL or PySpark
Optimize Spark jobs for performance, cost efficiency, and reliability
Develop and manage Lakehouse tables for efficient data storage and versioning
Utilize notebooks for interactive data exploration, analysis, and development
Implement data quality checks and monitoring to ensure accuracy and reliability
Drive Automation:
Implement automated data ingestion processes using functionality available in the data platform, optimizing for performance and minimizing manual intervention
Design and implement end-to-end data pipelines, incorporating transformations, data quality checks, and monitoring
Utilize CI/CD tools (Azure DevOps/GitHub Actions) to automate pipeline testing, deployment, and version control
Enterprise Data Warehouse (EDW) Management:
Create and maintain data models, schemas, and documentation for the EDW
Collaborate with data analysts, data scientists and business stakeholders to gather requirements, design data marts, and provide support for reporting and analytics initiatives
Troubleshoot and resolve any issues related to data loading, transformation, or access within the EDW