Home
Communities
Companies
- Companies
  
  Discover best places to work
- Compare Companies
  
  Compare & find best workplace
- Add Office Photos
  
  Bring your workplace to life
- Add Company Benefits
  
  Highlight your company's perks
Reviews
- Company reviews
  
  Read reviews for 6L+ companies
- Write a review
  
  Rate your former or current company
Salaries
- Browse salaries
  
  Discover salaries for 6L+ companies
- Salary calculator
  
  Calculate your take home salary
- Are you paid fairly?
  
  Check your market value
- Share your salary
  
  Help other jobseekers
Interviews
- Company interviews
  
  Read interviews for 40K+ companies
- Campus placements
  
  Interviews questions for 1K+ colleges
- Share interview questions
  
  Contribute your interview questions
Jobs
Awards

RATE NOW!
- ABECA 2025
  
  RATE NOW!
  
  AmbitionBox Employee Choice Awards - 4th Edition
- ABECA 2024
  
  AmbitionBox Employee Choice Awards - 3rd Edition
- AmbitionBox Best Places to Work 2022
  
  2nd Edition
- AmbitionBox Best Places to Work 2021
  
  1st Edition

Add office photos

Employer? Claim Account for FREE

Forward Eye Technologies

Compare

4.7

based on 3 Reviews

159 Forward Eye Technologies Jobs

Data Engineer - ETL (8-14 yrs)

Forward eye technologies

4.7

based on 3 Reviews

8-14 years

Forward Eye Technologies

posted 1mon ago

Job Role Insights

Fixed timing

Key skills for the job

Data Engineering AWS SQL ETL Testing Pyspark Data Governance

+ 3 more

Job Description

Job Title : Data Engineer (PySpark, AWS, SQL)

Job Description :

We are looking for a skilled Data Engineer with expertise in PySpark, AWS, and SQL to support data processing and analytical initiatives. This role involves working closely with data engineering and data science teams to build, maintain, and optimize large-scale data pipelines and integrations on AWS. The ideal candidate will be proficient in ETL processes using PySpark and SQL, with a deep understanding of cloud data infrastructure, specifically within the AWS ecosystem.

Key Responsibilities :

Data Pipeline Development :

- Design, build, and optimize ETL pipelines using PySpark for data ingestion, transformation, and storage on AWS.

- Collaborate with stakeholders to understand data requirements, translating them into scalable data solutions.

Cloud Infrastructure Management :

- Develop and manage AWS services such as S3, Glue, Lambda, EMR, Redshift, and RDS for data processing and storage.

- Implement data workflows that handle both batch and real-time processing needs, ensuring low-latency and efficient data access.

Database Management :

- Write and optimize complex SQL queries for data extraction and transformation from AWS RDS, Redshift, and other SQL-based databases.

- Leverage indexing, partitioning, and caching techniques to improve query performance on large datasets.

Data Quality and Governance :

- Ensure data accuracy, completeness, and consistency throughout the data lifecycle.

- Implement best practices for data quality, governance, and security using AWS-native tools and third-party solutions.

Automation and Optimization :

- Automate repetitive tasks and optimize workflows, ensuring efficient and resilient data processing.

- Use CI/CD pipelines and version control (e.g., Git) to deploy and manage data workflows.

Troubleshooting and Support :

- Identify and resolve issues within the data pipeline, providing solutions for data access and query performance.

- Support the team in data analytics and data science initiatives by preparing data in an accessible and structured format.

Required Skills and Experience :

- 5+ years of data engineering experience with a focus on PySpark, AWS, and SQL.

- Strong proficiency in PySpark for ETL and data transformation tasks.

- Hands-on experience with AWS data services (S3, Glue, Redshift, EMR, Lambda).

- Expertise in SQL and relational database management, with experience optimizing complex queries.

- Experience in data pipeline orchestration and monitoring (Airflow or similar).

- Knowledge of data partitioning, indexing, and caching techniques to enhance performance.

- Familiarity with CI/CD principles and tools, including version control systems like Git.

Nice to Have :

- Experience with AWS Redshift Spectrum and Athena for querying data on S3.

- Knowledge of streaming data processes and tools like Kinesis or Kafka.

- Background in big data technologies, such as Hadoop or Apache Spark.

- Experience with data lake architectures and building data marts for analytics.