Home
Communities
Companies
- Companies
  
  Discover best places to work
- Compare Companies
  
  Compare & find best workplace
- Add Office Photos
  
  Bring your workplace to life
- Add Company Benefits
  
  Highlight your company's perks
Reviews
- Company reviews
  
  Read reviews for 6L+ companies
- Write a review
  
  Rate your former or current company
Salaries
- Browse salaries
  
  Discover salaries for 6L+ companies
- Salary calculator
  
  Calculate your take home salary
- Are you paid fairly?
  
  Check your market value
- Share your salary
  
  Help other jobseekers
- Gratuity calculator
  
  Check your gratuity amount
- HRA calculator
  
  Check how much of your HRA is tax-free
- Salary hike calculator
  
  Check your salary hike
Interviews
- Company interviews
  
  Read interviews for 40K+ companies
- Campus placements
  
  Interviews questions for 2K+ colleges
- Share interview questions
  
  Contribute your interview questions
Jobs
Awards

WINNERS AWAITED!
- ABECA 2025
  
  WINNERS AWAITED!
  
  AmbitionBox Employee Choice Awards - 4th Edition
- ABECA 2024
  
  AmbitionBox Employee Choice Awards - 3rd Edition
- AmbitionBox Best Places to Work 2022
  
  2nd Edition
- AmbitionBox Best Places to Work 2021
  
  1st Edition

Add office photos

Employer? Claim Account for FREE

Hanker Systems

Compare

3.4

based on 7 Reviews

41 Hanker Systems Jobs

PySpark Developer

Hankersystems India

3.4

based on 7 Reviews

6-10 years

Bangalore / Bengaluru

PySpark Developer

Hanker Systems

posted 2d ago

Job Role Insights

Key skills for the job

Kubernetes Data Bricks AWS Glue EMR Pyspark Docker

+ 4 more

Job Description

We are seeking an experienced PySpark Developer / Data Engineer to design, develop, and optimize big data processing pipelines using Apache Spark and Python (PySpark). The ideal candidate should have expertise in distributed computing, ETL workflows, data lake architectures, and cloud-based big data solutions.

Key Responsibilities:
Develop and optimize ETL/ELT data pipelines using PySpark on distributed computing platforms (Hadoop, Databricks, EMR, HDInsight).
Work with structured and unstructured data to perform data transformation, cleansing, and aggregation.
Implement data lake and data warehouse solutions on AWS (S3, Glue, Redshift), Azure (ADLS, Synapse), or GCP (BigQuery, Dataflow).
Optimize PySpark jobs for performance tuning, partitioning, and caching strategies.
Design and implement real-time and batch data processing solutions.
Integrate data pipelines with Kafka, Delta Lake, Iceberg, or Hudi for streaming and incremental updates.
Ensure data security, governance, and compliance with industry best practices.
Work with data scientists and analysts to prepare and process large-scale datasets for machine learning models.
Collaborate with DevOps teams to deploy, monitor, and scale PySpark jobs using CI/CD pipelines, Kubernetes, and containerization.
Perform unit testing and validation to ensure data integrity and reliability.

Required Skills & Qualifications:
6+ years of experience in big data processing, ETL, and data engineering.
Strong hands-on experience with PySpark (Apache Spark with Python).
Expertise in SQL, DataFrame API, and RDD transformations.
Experience with big data platforms (Hadoop, Hive, HDFS, Spark SQL).
Knowledge of cloud data processing services (AWS Glue, EMR, Databricks, Azure Synapse, GCP Dataflow).
Proficiency in writing optimized queries, partitioning, and indexing for performance tuning.
Experience with workflow orchestration tools like Airflow, Oozie, or Prefect.
Familiarity with containerization and deployment using Docker, Kubernetes, and CI/CD pipelines.
Strong understanding of data governance, security, and compliance (GDPR, HIPAA, CCPA, etc.).
Excellent problem-solving, debugging, and performance optimization skills.

Employment Type: Full Time, Permanent

Read full job description

Prepare for Pyspark Developer roles with real interview advice