5 CareerHaat Jobs
Senior Data Engineer (5-10 yrs)
CareerHaat
posted 1mon ago
Fixed timing
Key skills for the job
Job Location : - Bengaluru (hybrid)
Permanent position with company payroll
Job Description :
We are looking for a highly skilled Databricks PySpark Developer to join our data platform implementation team. In this role, you will be instrumental in designing, developing, and maintaining ETL processes to ensure efficient extraction, transformation, and loading of data from various sources into data lake and data warehouse.
You will work closely with data engineers, data scientists, and business intelligence teams to build and optimize data workflows that support the project's analytics and reporting needs.
Key Responsibilities :
1. ETL Development : Design and develop ETL processes using Databricks PySpark to extract, transform, and load data from heterogeneous sources into our data lake and data warehouse. Optimize ETL workflows for performance and scalability, leveraging Databricks PySpark and Spark SQL to efficiently process large data volumes. Implement robust error handling and monitoring mechanisms to proactively detect and resolve issues within ETL processes. Design and implement data solutions following the Medallion Architecture principles, organizing data into Bronze, Silver, and Gold layers. Ensure data is appropriately cleansed, enriched, and optimized at each stage to support robust analytics and reporting.
2. Data Pipeline Management : Hands On experience in creating advanced data pipelines using databricks workflows Develop and maintain data pipelines using Databricks PySpark, ensuring data quality, integrity, and reliability throughout the ETL lifecycle. Collaborate with data engineering, data science, and business intelligence teams to translate data requirements into efficient ETL workflows and pipelines.
3. Data Analysis and Query Optimization : Write and optimize complex SQL queries for data manipulation, aggregation, and analysis within Databricks PySpark applications.
4. Project Coordination and Continuous Improvement : Participate in project planning and coordination activities to ensure timely delivery of ETL solutions. Stay updated on the latest developments in Databricks PySpark, Spark SQL, and related technologies, recommending and implementing best practices and optimizations. Document ETL processes, data lineage, and metadata to facilitate knowledge sharing and ensure compliance with data governance standards.
Required Qualifications :
- Bachelor's degree in Computer Science, Engineering, or a related field.
- Minimum of 3 years of experience on Databricks PySpark development.
- Proficiency in Python programming, with extensive experience in developing and debugging Databricks PySpark applications.
- In-depth understanding of Spark architecture and internals, with hands-on experience in Spark RDDs, DataFrames, and Spark SQL.
- Expertise in writing and optimizing complex SQL queries for data manipulation, aggregation, and analysis.
- Proven experience in working with large-scale data warehousing and ETL frameworks.
- Strong problem-solving skills and the ability to troubleshoot and resolve ETL process issues.
- Excellent communication and collaboration skills, with the ability to work effectively in a team environment.
Preferred Qualifications :
- Experience with cloud platforms with a preference for AWS. Experience with data platform tools such as DataBricks, Snowflake, and Tableau.
- Demonstrated ability to implement best practices for ETL processes and data management.
- Strong understanding of data governance and data quality principles.
- Relevant certifications in Databricks PySpark, Spark SQL, or related technologies.
Functional Areas: Software/Testing/Networking
Read full job description6-10 Yrs
Bangalore / Bengaluru