i
Cognizant
Proud winner of ABECA 2024 - AmbitionBox Employee Choice Awards
4 Cognizant Pyspark Developer Jobs
4-12 years
Bangalore / Bengaluru
1 vacancy
Pyspark Developer- Indore
Cognizant
posted 7d ago
Flexible timing
Key skills for the job
Job Title PySpark Developer
Location Indore
Job Type Full time
Years of Experience 4 to 12 years.
Job Description
We are seeking an experienced PySpark Developer to join our data engineering team. In this role, you will be responsible for designing, developing, and optimizing large-scale data processing pipelines using PySpark and other big data technologies. The ideal candidate will have expertise in distributed computing, data processing frameworks, and working with large datasets in cloud-based or on-premises environments. You will collaborate with data engineers, data scientists, and business analysts to build robust, scalable, and efficient data solutions.
Key Responsibilities
Data Processing & Transformation Design, develop, and implement distributed data processing and transformation workflows using PySpark to handle large-scale datasets across various storage systems (HDFS, S3, etc. ).
ETL Development Build and manage ETL (Extract, Transform, Load) pipelines using PySpark, integrating data from multiple sources such as databases, flat files, cloud storage, and other data platforms.
Data Wrangling & Cleansing Perform data cleaning, data wrangling, and data transformations to ensure the integrity, accuracy, and completeness of the data before feeding it into analytical models or reports.
Optimization & Performance Tuning Optimize PySpark jobs for better performance, such as minimizing memory usage, optimizing partitioning, and tuning Spark configurations for faster data processing.
Collaboration with Data Scientists Work closely with data scientists to help preprocess large datasets, manage data pipelines, and support machine learning model deployment and experimentation.
Big Data Technologies Integration Integrate PySpark with other big data technologies (e. g. , Hadoop, Hive, Kafka, NoSQL databases) to process structured and unstructured data in real-time or batch modes.
Data Modeling Work with data engineers to design and implement data models that support efficient storage and querying, ensuring data can be leveraged for analytics, BI, and machine learning use cases.
Testing & Debugging Ensure the accuracy and reliability of data processing by conducting unit tests, integration tests, and debugging PySpark jobs in a distributed environment.
Documentation Create and maintain documentation for PySpark applications, data workflows, and procedures to ensure clarity and knowledge transfer across teams.
Monitoring & Support Monitor data pipelines and jobs, ensuring they run efficiently and handle exceptions or errors effectively. Provide support for production systems as needed.
Required Skills and Qualifications
PySpark Expertise Strong experience with PySpark for developing distributed data processing workflows, transformations, and optimizations on large datasets.
Big Data Frameworks Proficiency with big data technologies such as Hadoop, Hive, Spark, Kafka, or other distributed processing frameworks.
Programming Skills Solid knowledge of Python for data manipulation, scripting, and automating tasks. Familiarity with other languages like Scala or Java is a plus.
SQL Skills Proficient in SQL for querying databases and integrating with PySpark to extract and manipulate structured data.
Data Storage Experience with cloud storage systems (e. g. , Amazon S3, Azure Blob Storage) and distributed file systems (e. g. , HDFS).
Data Processing & Integration Experience in building data pipelines and integrating disparate data sources for processing, analysis, and reporting.
Performance Tuning & Troubleshooting Expertise in optimizing PySpark jobs for performance and troubleshooting issues in a distributed computing environment.
Cloud Platforms Experience working with cloud platforms like AWS, Azure, or Google Cloud, specifically their big data offerings (e. g. , AWS EMR, Azure Databricks, Google Dataproc).
Version Control Familiarity with Git or other version control tools for collaborative development and deployment.
Problem-Solving Strong analytical skills with the ability to break down complex problems
We are seeking a highly skilled Sr. Developer with 4 to 8 years of experience to join our team. The ideal candidate will have expertise in Python Databricks SQL Databricks Workflows and PySpark. Experience in Park Operations is a plus. This role involves developing and optimizing data workflows to support our business objectives and enhance operational efficiency.
Responsibilities
Qualifications
Certifications Required
Databricks Certified Associate Developer for Apache Spark Python Certification
The Cognizant community
We are a high caliber team who appreciate and support one another. Our people uphold an energetic, collaborative and inclusive workplace where everyone can thrive.
About us
Cognizant is one of the world's leading professional services companies, transforming clients' business, operating, and technology models for the digital era. Our unique industry-based, consultative approach helps clients envision, build, and run more innovative and efficient businesses. Headquartered in the U. S. , Cognizant (a member of the NASDAQ-100 and one of Forbes World s Best Employers 2024) is consistently listed among the most admired companies in the world. Learn how Cognizant helps clients lead with digital at www. cognizant. com
Our commitment to diversity and inclusion
CareersNA2@cognizant. com with your request and contact information.
Disclaimer
Employment Type: Full Time, Permanent
Read full job descriptionPrepare for Cognizant Pyspark Developer roles with real interview advice
Good Work Environment and culture. Team mates are supportive.
Read 1 review