We are seeking a skilled Databricks Consultant to join our team. The ideal candidate will possess extensive experience in creating, managing and optimizing data engineering pipelines using Databricks and associated technologies.
Responsibilities:
1. Worked on Lakehouse ETL Projects:
- Execute ETL projects within the Lakehouse architecture, ensuring seamless data integration and transformation.
2. Delta Lake Batch and Real-Time Streaming:
- Design, implement, and manage batch and real-time streaming pipelines utilizing Delta Lake technology.
3. Apache Spark Big Data Pipelines:
- Develop and optimize big data pipelines using Apache Spark ( Pyspark ) to handle large-scale data processing and analysis.
4. Cloud Experience:
- Demonstrate proficiency with at least one major cloud platform (AWS, Azure, or GCP) to manage and deploy data solutions.
5. Delta Table ETL Pipeline Management:
- Monitor and track ETL pipelines involving Delta Tables, ensuring data integrity, performance, and reliability.
6. Familiarity with MLflow / MLOps / DataOps:
- Oversee MLflow model tracking, serving, and deployment to streamline workflows and ensure efficient lifecycle management.
Qualifications:
- Proven experience with Databricks, Delta Lake, and Apache Spark.
- Strong understanding of ETL processes, data integration, and real-time data streaming.
- Excellent problem-solving skills and ability to work in a collaborative team environment.
- Pyspark experience is desired.
- Business Domain BFSI, Retail experience is desired.
- Familiarity with MLflow for model tracking and deployment.