i
WITS Innovation Lab
123 WITS Innovation Lab Jobs
Data Engineer - PySpark/Data Warehousing (3-5 yrs)
WITS Innovation Lab
posted 11hr ago
Flexible timing
Key skills for the job
Job Overview :
We are seeking a highly skilled Data Engineer to join our growing data team.
The ideal candidate will have experience working with large-scale data processing systems and be proficient in Snowflake, PySpark, and Databricks to design, build, and maintain data pipelines. You will work closely with data scientists, analysts, and other engineers to transform data into actionable insights and ensure high-performance, scalable data systems.
Key Responsibilities :
Data Pipeline Development :
- Design, build, and maintain scalable data pipelines to support data processing and integration.
- Utilize Snowflake for data storage, transformation, and optimization of queries.
- Implement batch and stream processing frameworks using PySpark and Databricks.
ETL Processes :
- Develop and optimize ETL (Extract, Transform, Load) workflows to integrate and process data from various sources.
- Work with data from structured, semi-structured, and unstructured sources.
Collaboration with Data Teams :
- Collaborate with data scientists, data analysts, and business stakeholders to understand data requirements and deliver efficient data solutions.
- Provide technical expertise in data modeling, schema design, and data quality.
Data Optimization & Performance Tuning :
- Optimize queries, processes, and data storage to ensure high-performance and cost-efficient data pipelines.
- Troubleshoot and resolve issues related to data processing, storage, and performance.
Cloud Infrastructure Management :
- Work with cloud technologies such as AWS, Azure, or GCP, particularly for data storage, computation, and orchestration services.
- Leverage Databricks for collaborative data engineering and machine learning workflows.
Automation & Monitoring :
- Automate data workflows, monitoring, and alerting to ensure smooth and reliable data pipeline operation.
- Set up data quality checks and ensure consistency, completeness, and accuracy of data.
Documentation & Best Practices :
- Document data pipeline designs, data models, and processes to ensure maintainability and adherence to best practices.
- Stay up to date with emerging data engineering trends, tools, and techniques.
Key Skills & Qualifications :
Snowflake : Hands-on experience with Snowflake data warehousing solutions, including data loading, transformation, and optimization.
PySpark : Strong experience in PySpark for big data processing and manipulation of large datasets in a distributed computing environment.
Databricks : Familiarity with Databricks for creating scalable data pipelines and integrating machine learning workflows.
ETL Tools : Experience with ETL tools (e.g., Apache Airflow, Talend, Informatica) and data integration methods.
Cloud Platforms : Familiarity with cloud platforms such as AWS, Azure, or GCP, particularly for data storage and computation (e.g., S3, Redshift, BigQuery, Azure Data Lake).
SQL : Advanced SQL skills for querying relational and non-relational databases.
Programming Languages : Proficient in Python, Java, or Scala for developing data engineering solutions.
Data Modeling : Strong understanding of data modeling, database schema design, and normalization techniques.
Data Warehousing : Experience in data warehousing concepts, data lakes, and dimensional modeling.
Version Control & CI/CD : Experience with Git, version control, and CI/CD pipelines to manage code deployments and updates.
Problem Solving : Strong analytical and troubleshooting skills to identify and resolve data issues.
Functional Areas: Software/Testing/Networking
Read full job descriptionPrepare for Data Engineer roles with real interview advice
3-5 Yrs
Hyderabad / Secunderabad, Pune, Bangalore / Bengaluru