7 EdgeSoft Jobs
Data Engineer - Azure Databricks
EdgeSoft
posted 13d ago
Key skills for the job
The ideal candidate should have a robust understanding and hands-on expertise in PySpark and various components within DataBricks. As a crucial member of our data team, you will play a pivotal role in developing, optimizing, and maintaining our data infrastructure, ensuring seamless and efficient data processing.
Responsibilities :
- Design, develop, and maintain data pipelines using DataBricks and PySpark to process and manipulate large scale datasets.
- Proven experience in optimizing Apache Spark batch processing workflows.
- Extensive experience in building and maintaining streaming data pipelines.
- Optimize and finetune existing DataBricks jobs and PySpark scripts for enhanced performance and reliability.
- Troubleshoot issues related to data pipelines, identify bottlenecks, and implement effective solutions.
- Implement best practices for data governance, security, and compliance within DataBricks environments.
- Work closely with Data Scientists and Analysts to support their data requirements and enable efficient access to relevant datasets.
- Stay updated with industry trends and advancements in DataBricks and PySpark technologies to propose and implement innovative solutions.
- Demonstrated expertise in optimizing systems for low-latency and high-throughput performance.
- Proficiency in using Spark SQL and DataFrame API for dynamic data transformations.
- Experience with using programming languages such as Python or Scala to implement advanced filtering logic in Databricks notebooks or scripts.
- Familiarity with the principles of distributed systems and their application in message broking.
- Collaborate with cross functional teams to gather requirements, understand data needs, and implement scalable solutions.
Requirements :
- Bachelor's or Master's degree in Computer Science, Engineering, or a related field.
- 5 to 10 years of proven experience as a Data Engineer with a strong emphasis on DataBricks.
- Proficiency in PySpark and extensive hands on experience in building and optimizing data pipelines using DataBricks.
- Solid understanding of different components within DataBricks such as clusters, notebooks, jobs, and libraries.
- Strong knowledge of SQL, data modeling, and ETL processes.
- Ability to analyzxcellent communication skills with the ability to collaborate with cross functional teams.
- Experience working in e complex problems, propose solutions, and implement them effectively.
- Ea cloud environment (AWS, Azure, or GCP) is a plus.
Preferred Skills :
- Certifications or training in DataBricks or PySpark.
- Familiarity with other big data technologies (Hadoop, Spark, Kafka, etc.).
- Experience with version control systems (Git, SVN, etc.) and CI/CD pipelines.
- Knowledge of at least one repository like GIT and software tool like JIRA.
Employment Type: Full Time, Permanent
Read full job description