Data Pipeline Development : Design, develop, and optimize data pipelines to ingest, process, and transform data from various sources (e.g., APIs, databases, into the data warehouse.
Data Integration: Integrate data from various structured and unstructured sources into the Databricks Lakehouse environment, ensuring data accuracy and reliability
Data Lakehouse storage Management: Design and maintain data warehouse solutions using medallion architecture practices, optimizing storage, cloud utilization, costs and query performance
Collaboration with Data Teams : Work closely with data scientists, analysts, to understand requirements, translate them into technical solutions, and implement data solutions.
Data Quality and Monitoring : Cleanse, transform, and enrich data. Implement data quality checks and establish monitoring processes to ensure data integrity and accuracy. Implement monitoring for data pipelines and troubleshoot any issues or failures promptly to ensure data reliability.
Optimization and Performance Tuning: Optimize data processing workflows for performance, reliability, and scalability, including tuning spark jobs, caching, and partitioning data appropriately.
Data Security and Privacy : Manage and organize data lakes using Unity catalog, ensuring proper governance, security, role-based access and compliance with data management policies.
Key Skills:
Technical Skills:
Proficiency with Databricks Lakehouse platform, Delta Lake, Genie, ML Flow (e.g., Databricks Certified Data Engineer Associate) is a plus.
SQL and NoSQL: Experienced working with both SQL and NoSQL data sources (e.g., MySQL, PostgreSQL, MongoDB etc.)
Strong knowledge of Spark, especially in PySpark or Scala, for data transformation.
Proficiency in Python, R and other programming languages used in data processing.
Experience with cloud platforms like Azure, AWS, particularly Azure storage services
Knowledge of ML Pipelines, data streaming platforms (e.g., Apache Kafka, AWS Kinesis).
Familiarity with data visualization tools (e.g., Tableau, Power BI, Looker)
Educational Qualification:
Education: Bachelor s degree in Computer Science, Engineering/MCA, or a related field (Master s preferred)
3+ years of experience as a Data Engineer, with hands-on experience in Databricks