Home
Communities
Companies
- Companies
  
  Discover best places to work
- Compare Companies
  
  Compare & find best workplace
- Add Office Photos
  
  Bring your workplace to life
- Add Company Benefits
  
  Highlight your company's perks
Reviews
- Company reviews
  
  Read reviews for 6L+ companies
- Write a review
  
  Rate your former or current company
Salaries
- Browse salaries
  
  Discover salaries for 6L+ companies
- Salary calculator
  
  Calculate your take home salary
- Are you paid fairly?
  
  Check your market value
- Share your salary
  
  Help other jobseekers
- Gratuity calculator
  
  Check your gratuity amount
- HRA calculator
  
  Check how much of your HRA is tax-free
- Salary hike calculator
  
  Check your salary hike
Interviews
- Company interviews
  
  Read interviews for 40K+ companies
- Campus placements
  
  Interviews questions for 2K+ colleges
- Share interview questions
  
  Contribute your interview questions
Jobs
Awards

WINNERS AWAITED!
- ABECA 2025
  
  WINNERS AWAITED!
  
  AmbitionBox Employee Choice Awards - 4th Edition
- ABECA 2024
  
  AmbitionBox Employee Choice Awards - 3rd Edition
- AmbitionBox Best Places to Work 2022
  
  2nd Edition
- AmbitionBox Best Places to Work 2021
  
  1st Edition

Add office photos

Employer? Claim Account for FREE

O2f info Solutions

Compare

3.4

based on 128 Reviews

1 O2f info Solutions Job

Data Engineer - Python/Spark (4-8 yrs)

O2F Info Solutions Pvt. Ltd.

3.4

based on 128 Reviews

4-8 years

O2f info Solutions

posted 10d ago

Job Role Insights

Flexible timing

Key skills for the job

Data Analytics Data Engineering Python SQL ETL Testing Spark

+ 3 more

Job Description

Job Summary :

We are seeking a highly skilled Senior Data Engineer with 4 to 8 years of experience in building robust data pipelines and working extensively with PySpark to join our data engineering team.

Key Responsibilities :

Data Pipeline Development :

- Design, build, and maintain scalable data pipelines using PySpark to process large datasets and support data-driven applications and analytics.

ETL Process Automation :

- Develop and automate ETL (Extract, Transform, Load) processes using PySpark, ensuring efficient data processing, transformation, and loading from diverse sources into data lakes, warehouses, or databases.

Distributed Computing with PySpark :

- Leverage Apache Spark and PySpark to process large-scale data in a distributed computing environment, optimizing for performance and scalability.

Cloud Data Solutions :

- Develop and deploy data pipelines and processing frameworks on cloud platforms (AWS, Azure, GCP) using native tools like AWS Glue, Azure Databricks, or Google Dataproc.

Data Integration & Transformation :

- Integrate data from various internal and external sources, ensuring data consistency, quality, and reliability throughout the pipeline.

Performance Optimization :

- Optimize PySpark jobs and pipelines for faster data processing, handling large volumes of data efficiently with minimal latency.

- Proven experience as a Data Engineer or similar role, with a strong background in database development, ETL processes, and software development.

- Proficiency in SQL and scripting languages such as Python, with experience working with relational databases.

- Proficiency in dataProc (PySpark), Pandas or other data processing libraries

- Experience with data modeling, schema design, and optimization techniques for scalability.

- Strong analytical and problem-solving skills, with the ability to troubleshoot complex data issues and optimize data processing pipelines for scale

Required Qualifications :

Experience :

- 4-8 years of experience in data engineering, with a strong focus on PySpark and large-scale data processing.

Technical Skills :

- Expertise in PySpark for distributed data processing, data transformation, and job optimization.

- Strong proficiency in Python and SQL for data manipulation and pipeline creation.

- Hands-on experience with Apache Spark and its ecosystem, including Spark SQL, Spark Streaming, and PySpark MLlib.

- Solid experience working with ETL tools and frameworks, such as Apache Airflow or similar orchestration tools.

Functional Areas: Software/Testing/Networking

Read full job description

Prepare for Data Engineer roles with real interview advice