Design, develop, and deploy end-to-end data pipelines on AWS cloud infrastructure using services such as Amazon S3, AWS Glue, AWS Lambda, Amazon Redshift, etc.
Implement data processing and transformation workflows using Apache Spark, and SQL to support analytics and reporting requirements.
Build and maintain orchestration workflows to automate data pipeline execution, scheduling, and monitoring.
Collaborate with analysts, and business stakeholders to understand data requirements and deliver scalable data solutions.
Optimize data pipelines for performance, reliability, and cost-effectiveness, leveraging AWS best practices and cloud-native technologies.
Required Experience and Skill sets:
Minimum 8+ years of experience building and deploying large-scale data processing pipelines in a production environment.
Hands-on experience in designing and building data pipelines on AWS cloud infrastructure.
Strong proficiency in AWS services such as Amazon S3, AWS Glue, AWS Lambda, Amazon Redshift, etc.
Strong experience with Apache Spark for data processing and analytics.
Hands-on experience on orchestrating and scheduling data pipelines using AppFlow, Event Bridge and Lambda.
Solid understanding of data modeling, database design principles, and SQL and Spark SQL.
Experience with version control systems (e.g., Git) and CI/CD pipelines.
Excellent communication skills and the ability to collaborate effectively with cross-functional teams.
Strong problem-solving skills and attention to detail.