Home
Communities
Companies
- Companies
  
  Discover best places to work
- Compare Companies
  
  Compare & find best workplace
- Add Office Photos
  
  Bring your workplace to life
- Add Company Benefits
  
  Highlight your company's perks
Reviews
- Company reviews
  
  Read reviews for 6L+ companies
- Write a review
  
  Rate your former or current company
Salaries
- Browse salaries
  
  Discover salaries for 6L+ companies
- Salary calculator
  
  Calculate your take home salary
- Are you paid fairly?
  
  Check your market value
- Share your salary
  
  Help other jobseekers
- Gratuity calculator
  
  Check your gratuity amount
- HRA calculator
  
  Check how much of your HRA is tax-free
- Salary hike calculator
  
  Check your salary hike
Interviews
- Company interviews
  
  Read interviews for 40K+ companies
- Campus placements
  
  Interviews questions for 2K+ colleges
- Share interview questions
  
  Contribute your interview questions
Jobs
Awards

WINNERS AWAITED!
- ABECA 2025
  
  WINNERS AWAITED!
  
  AmbitionBox Employee Choice Awards - 4th Edition
- ABECA 2024
  
  AmbitionBox Employee Choice Awards - 3rd Edition
- AmbitionBox Best Places to Work 2022
  
  2nd Edition
- AmbitionBox Best Places to Work 2021
  
  1st Edition

Add office photos

Engaged Employer

Exponentia.ai

Compare

4.1

based on 118 Reviews

82 Exponentia.ai Jobs

Exponentia.ai - Data Architect - ETL/PySpark (8-10 yrs)

exponentia.ai

4.1

based on 118 Reviews

8-10 years

Exponentia.ai

posted 1mon ago

Job Role Insights

Flexible timing

Key skills for the job

Power BI Data Engineering SQL ETL Testing Tableau Pyspark

+ 4 more

Job Description

Overview :

We are looking for an experienced Data Architect with 10+ years of experience. This role is focused on driving data transformation and the development of the Data Curation Layer using Pyspark notebooks.

You will play a key role in shaping our data architecture and enhancing our data pipelines.

Key Responsibilities :

- Design and develop the Data Curation Layer to ensure seamless access to curated data for business and analytics use cases.

- Lead the data transformation efforts utilizing Pyspark notebooks to process and transform large-scale data.

- Work closely with cross-functional teams (including Data Engineers, Analysts, and Business Stakeholders) to understand business requirements and translate them into scalable data architecture solutions.

- Develop and maintain efficient and optimized Pyspark notebooks for ETL/ELT processes, ensuring high-performance data transformation and integration across multiple sources.

- Define and enforce data governance, quality, and compliance standards for the Data Curation Layer.

- Collaborate with data engineering teams to build scalable and automated data pipelines using Pyspark, ensuring seamless transformation from raw data to refined datasets.

- Provide guidance on best practices for data modeling, transformation, and performance optimization within the Pyspark environment.

- Build and optimize data workflows in the cloud (AWS, Azure, or GCP) using Pyspark for efficient data processing.

- Document and maintain the architecture of data models, notebooks, and workflows, ensuring transparency and knowledge sharing across the team.

- Conduct code reviews and optimize existing Pyspark notebooks for performance and maintainability.

- Ensure data transformations are performed efficiently, with a focus on scalability and performance at every stage of the pipeline

Roles & Responsibilities :

Requirements :

- Bachelor's or Master's degree in Computer Science, Engineering, Information Systems, or a related

field.

- 10+ years of experience in data architecture, with a strong focus on data transformation and curation.

- Hands-on experience with Pyspark notebook development, specifically for ETL/ELT data

transformations.

- Expertise in building and optimizing data transformation pipelines in large-scale, distributed environments using Pyspark.

- Solid experience with cloud-based data platforms (AWS, Azure, or Google Cloud), including experience with data processing services like AWS EMR, Azure Databricks, or Google Dataproc.

- Deep understanding of data modeling, data governance, and data quality practices.

- Proficiency in using Pyspark, Apache Spark, and related frameworks for distributed data processing.

- Strong knowledge of SQL, and experience with databases (e.g., SQL Server, Oracle, MySQL, or NoSQL).

- Ability to work in agile project environments and meet deadlines.

- Excellent communication and collaboration skills, with the ability to interact with both technical and

non-technical stakeholders.

- Ability to handle complex data problems and provide strategic solutions to ensure data is structured, transformed, and curated for business consumption.

Preferred Qualifications :

- Experience with cloud-native data architecture and data lakes (e.g., Delta Lake, AWS S3).

- Familiarity with Apache Airflow for orchestration of data workflows.

- Certifications in cloud platforms (AWS, Azure, or GCP) or big data technologies.

- Knowledge of data orchestration, and automation tools (e.g., Jenkins, Kubeflow).

- Experience working with data visualization tools (e.g., Tableau, Power BI) to ensure the curated data is easily accessible for business stakeholders.

Functional Areas: Software/Testing/Networking

Read full job description

Prepare for Data Architect roles with real interview advice