9 Steer Lean Consulting Jobs
5-8 years
Gurgaon / Gurugram
AWS Data Engineer - Python/PySpark (5-8 yrs)
Steer Lean Consulting
posted 1mon ago
Flexible timing
Key skills for the job
Job Description :
Key Responsibilities :
Data Pipeline Development :
- Design, develop, and maintain efficient and reliable AWS Glue ETL pipelines to extract, transform, and load data from various sources into target systems.
Data Integration :
- Integrate data from diverse sources, including on-premises and cloud-based systems, using AWS Glue and other relevant tools.
Data Quality Assurance :
- Implement data quality checks and monitoring mechanisms to ensure data accuracy and consistency.
Cloud Infrastructure :
- Manage and provision AWS infrastructure using CloudFormation or Terraform to support data pipelines and data lakes.
Performance Optimization :
- Identify and optimize data pipeline performance bottlenecks to improve processing speed and efficiency.
Collaboration :
- Collaborate with data analysts, data scientists, and other team members to understand data requirements and deliver solutions that meet business needs.
Best Practices :
- Adhere to industry best practices for data engineering, including data security, privacy, and governance.
Agile Methodology :
- Work within an Agile framework to deliver projects on time and within budget.
Required Skills and Qualifications :
- 5+ years of experience in data engineering or related fields.
- Strong proficiency in AWS Glue, including Crawler, Data Catalog, and ETL jobs.
- Expertise in Python and PySpark for data processing and analysis.
- Solid understanding of AWS infrastructure components, such as S3, Lambda, SNS, Secret Manager, and Athena.
- Experience with CloudFormation or Terraform for infrastructure as code.
- Knowledge of data warehousing concepts and best practices. Strong problem-solving and analytical skills. Excellent communication and collaboration skills.
- Hands-on experience with a range of AWS Data Services, including Amazon Redshift, S3, Glue, Kinesis, DynamoDB, and EMR, allowing them to design and manage robust, cloud-native data solutions.
- Proficiency in ETL/ELT Development is essential, particularly in building complex pipelines using AWS Glue and Lambda, or similar tools, to enable efficient data ingestion and transformation processes.
- The role also demands strong skills in Programming Languages like Python and SQL, with additional experience in Java or Scala as a plus, to enhance scripting and data manipulation capabilities.
- Experience with Big Data Tools such as Apache Spark, Hadoop, or Kafka is important, particularly in an AWS environment, to facilitate the processing and analysis of large datasets.
- The candidate should also have expertise in Infrastructure as Code (IaC), specifically with tools like CloudFormation or Terraform, to manage and automate AWS infrastructure configurations effectively.
- Data Modeling and Warehousing skills, including dimensional and relational data modeling in Redshift, are key to building organized and optimized data storage solutions.
- Additionally, proficiency in Monitoring and Logging is crucial, with knowledge of AWS CloudWatch and AWS Config for setting up automated alerts, monitoring, and ensuring the reliability and security of data infrastructure.
Exp : 5 - 7 Years
Functional Areas: Other
Read full job description5-7 Yrs
Gurgaon / Gurugram
4-6 Yrs
Bangalore / Bengaluru