i
Innovatily
8 Innovatily Jobs
Innovatily - Lead Data Engineer - Python/ETL (8-12 yrs)
Innovatily
posted 12d ago
Flexible timing
Key skills for the job
Job Description :
We seek a highly skilled and experienced Data Engineering Lead to join our team. This role demands deep technical expertise in Apache Spark, Hive, Trino (formerly Presto), Python, AWS Glue, and the broader AWS ecosystem. The ideal candidate will possess strong hands-on skills and the ability to design and implement scalable data solutions, optimize performance, and lead a high-performing team to deliver data-driven insights.
The core responsibilities for the job include the following :
Technical Leadership :
- Lead and mentor a team of data engineers, fostering best practices in coding, design, and delivery.
- Drive the adoption of modern data engineering frameworks, tools, and methodologies to ensure high-quality and scalable solutions.
- Translate complex business requirements into effective data pipelines, architectures, and workflows.
- Architect, develop and optimize scalable ETL/ELT pipelines using Apache Spark, Hive, AWS Glue, and Trino.
- Handle complex data workflows across structured and unstructured data sources, ensuring performance and cost-efficiency.
- Develop real-time and batch processing systems to support business intelligence, analytics, and machine learning applications.
- Work closely with data scientists, analysts, and business teams to understand requirements and deliver actionable insights.
- Ensure that data infrastructure aligns with organizational goals and compliance standards.
- Establish and enforce data quality standards, governance practices, and monitoring processes.
- Ensure data security, privacy, and compliance with regulatory frameworks.
- Stay ahead of industry trends, emerging technologies, and best practices in data engineering.
- Proactively identify and implement improvements in data architecture and processes.
- Advanced proficiency with Apache Spark (core, SQL, streaming) for large-scale data processing.
- Strong expertise in Hive for querying and managing structured data in data lakes.
- In-depth knowledge of Trino (Presto) for federated querying and high-performance SQL execution.
- Solid programming skills in Python with frameworks like PySpark and Pandas.
- Hands-on experience with AWS Glue, including Glue ETL jobs, Glue Data Catalog, and Glue Crawlers.
- Deep understanding of data formats such as Parquet, ORC, Avro, and their use cases.
- 8+ years of experience in building data pipelines from scratch in large data volume environments
- AWS certifications, such as AWS Certified Data Analytics or AWS Certified Solutions Architect.
- Experience with Kafka or Kinesis for real-time data streaming would be a plus.
- Familiarity with containerization tools like Docker and orchestration platforms like Kubernetes.
- Knowledge of CI/CD pipelines and DevOps practices for data engineering.
- Prior experience with data lake architectures and integrating ML workflows.
- Bachelor's or Master's degree in Computer Science, Data Engineering, or a related field.
- 8+ years of experience in data engineering with a minimum of 2 years in a leadership role.
Functional Areas: Software/Testing/Networking
Read full job descriptionPrepare for Lead Data Engineer roles with real interview advice
8-15 Yrs
3-5 Yrs