Home
Communities
Companies
- Companies
  
  Discover best places to work
- Compare Companies
  
  Compare & find best workplace
- Add Office Photos
  
  Bring your workplace to life
- Add Company Benefits
  
  Highlight your company's perks
Reviews
- Company reviews
  
  Read reviews for 6L+ companies
- Write a review
  
  Rate your former or current company
Salaries
- Browse salaries
  
  Discover salaries for 6L+ companies
- Salary calculator
  
  Calculate your take home salary
- Are you paid fairly?
  
  Check your market value
- Share your salary
  
  Help other jobseekers
- Gratuity calculator
  
  Check your gratuity amount
- HRA calculator
  
  Check how much of your HRA is tax-free
- Salary hike calculator
  
  Check your salary hike
Interviews
- Company interviews
  
  Read interviews for 40K+ companies
- Campus placements
  
  Interviews questions for 2K+ colleges
- Share interview questions
  
  Contribute your interview questions
Jobs
Awards

WINNERS AWAITED!
- ABECA 2025
  
  WINNERS AWAITED!
  
  AmbitionBox Employee Choice Awards - 4th Edition
- ABECA 2024
  
  AmbitionBox Employee Choice Awards - 3rd Edition
- AmbitionBox Best Places to Work 2022
  
  2nd Edition
- AmbitionBox Best Places to Work 2021
  
  1st Edition

Add office photos

Employer? Claim Account for FREE

Recro

Compare

4.2

based on 34 Reviews

2 Recro Data Engineer Jobs

Data Engineer - Python/PySpark (5-10 yrs)

Recro

4.2

based on 34 Reviews

5-10 years

Recro

posted 11hr ago

Job Role Insights

Flexible timing

Key skills for the job

Data Engineering Python Java Big Data Spark Pyspark

+ 3 more

Job Description

What You'll Be Doing :

- Build highly scalable, available, fault-tolerant distributed data processing systems (batch and streaming systems) processing over 100s of terabytes of data ingested every day and petabyte-sized data warehouse and elasticsearch cluster.

- Build quality data solutions and refine existing diverse datasets to simplified models encouraging self-service

- Build data pipelines that optimize on data quality and are resilient to poor quality data sources

- Own the data mapping, business logic, transformations and data quality

- Low level systems debugging, performance measurement & optimization on large production clusters

- Participate in architecture discussions, influence product roadmap, and take ownership and responsibility over new projects

- Maintain and support existing platforms and evolve to newer technology stacks and architectures

We're excited if you have :

- Proficiency in Python and pyspark

- Deep understanding of Apache Spark, Spark tuning, creating RDDs, and building data frames.

- Create Java/ Scala Spark jobs for data transformation and aggregation.

- Experience in big data technologies like HDFS, YARN, Map-Reduce, Hive, Kafka, Spark, Airflow, Presto, etc.

- Experience in building distributed environments using any of Kafka, Spark, Hive, Hadoop, etc.

- Good understanding of the architecture and functioning of Distributed database systems

- Experience working with various file formats like Parquet, Avro, etc for large volumes of data

- Experience with one or more NoSQL databases

- Experience with AWS, GCP

- 5+ years of professional experience as a data or software engineer

Functional Areas: Software/Testing/Networking

Read full job description