We are seeking a highly skilled and experienced Data Engineer to join our team. As a Lead Data Engineer, you will play a crucial role in designing, building, optimizing, and maintaining large-scale ETL/ELT pipelines, ensuring high-quality data flows and processing. You will collaborate with cross-functional teams to seamlessly connect data systems and drive data integration between HCM and BDC platforms. This is an excellent opportunity to work with cutting-edge technologies, including Spark, Python, Java, and other tools used in the data engineering space.
RESPONSIBILITIES:
- Design and Build Scalable ETL Pipelines: Architect, build, and optimize large-scale ETL/ELT data pipelines, focusing on performance, scalability, and data integrity. Ensure the pipelines integrate smoothly within the larger data ecosystem.
- Microservices Development and Maintenance: Design and implement microservices for managing data processing tasks. Build reusable and scalable services to handle various data processing needs and ensure maintainability.
- Implement DevOps Practices : Utilize DevOps tools and practices (CI/CD) for automating testing, building, and deployment of data pipelines and microservices. Ensure that data workflows, models, and infrastructure are robust and can scale in a cloud-native environment.
- Data Integration and Transformation : Collaborate with cross-functional teams to design seamless data integrations between HCM and BDC systems. Focus on data transformation, cleansing, and deduplication while ensuring the pipelines are efficient and maintainable.
- Optimize Data Processing : Tune ETL pipeline performance, focusing on real-time data processing and optimizing for low-latency data delivery. Troubleshoot and debug complex pipeline issues in distributed systems.
- Database and Big Data Tools Management : Work extensively with SQL for structured data and leverage big data tools like Hive, HBase, and Parquet for large-scale data storage and querying.
- Performance Optimization & Data Quality : Drive data quality initiatives including data deduplication, data transformation, and performance optimizations across data pipelines and services.
- Collaboration and Mentorship : Partner with Data Scientists, Analysts, and Engineers to ensure seamless data integration. Provide technical leadership, mentorship, and guidance to junior engineers, fostering best practices in both data engineering and DevOps.
SKILLS & COMPETIENCES:
- Strong expertise in Python, Java, or Scala for building scalable data pipelines and microservices.
- Proven experience working with Apache Spark, Kafka, and Airflow for building data workflows and processing large datasets.
- Strong knowledge of SQL for querying structured data and interacting with databases.
- Hands-on experience with microservices architecture for building, deploying, and managing data services.
- Experience working in a DevOps environment using tools like Jenkins, Git, Docker, and Kubernetes to support continuous integration and delivery.
- Expertise in big data tools such as Hive, HBase, Parquet, and other storage solutions for managing large volumes of data.
- Hands-on experience with data transformation, deduplication, performance optimization, and distributed systems.
- Familiarity with Machine Learning workflow integration and automating ML data pipelines.
- Ability to architect and scale microservices for high-volume data workloads.
- Strong problem-solving skills with a focus on troubleshooting and debugging in a distributed data environment.
- Experience in mentoring and providing technical leadership to junior engineers in data engineering and DevOps practices
WORK EXPERIENCE & EDUCATION :
- Requires 8+ years of professional experience is essential for understanding system architecture, programming, and technical troubleshooting.
- Outstanding written and verbal communication skills.
- Bachelors/Masters in computer science required.
Employment Type: Full Time, Permanent
Read full job description