Design, develop, and maintain end-to-end data pipelines using Databricks and Apache Spark .
Build and manage ETL processes for transforming and processing large datasets in real-time and batch modes.
Define and govern data modelling and design standards, tools, and best practices, developing and maintaining data models and schemas to support analytics and machine learning applications.
Collaborate with cross-functional teams, including data scientists, business analysts, and IT, to understand data requirements and deliver high-quality solutions.
Optimize performance and ensure scalability of Databricks-based solutions, addressing data quality, processing time, and resource usage.
Implement data governance best practices, ensuring data integrity, privacy, and security across all data assets.
Monitor and troubleshoot pipeline performance, ensuring reliability and quick resolution of data issues.
Automate data workflows and implement CI/CD pipelines for smooth deployment of data applications.
Stay up-to-date with emerging technologies and best practices related to Databricks, Spark, and cloud data solutions.
Requirements
Required Qualifications:
Bacheloror Masterdegree in Computer Science, Engineering, Data Science, or a related field.
Proven experience as a Data Engineer with Databricks and Apache Spark .
Strong programming skills in Python , Scala , or SQL .
Hands-on experience working with cloud platforms such as AWS , Azure , or Google Cloud .
Solid understanding of data warehousing , ETL , and data modelling principles.
Experience with Databricks Delta Lake and building scalable data pipelines.
Familiarity with Apache Kafka , Airflow , or other workflow automation tools.
Knowledge of version control tools like Git and experience with continuous integration and deployment pipelines.
Excellent analytical, data profiling, and problem-solving abilities