i
Exponentia.ai
90 Exponentia.ai Jobs
Exponentia.ai - Data Architect - ETL/PySpark (8-10 yrs)
Exponentia.ai
posted 7d ago
Flexible timing
Key skills for the job
Overview :
We are looking for an experienced Data Architect with 10+ years of experience. This role is focused on driving data transformation and the development of the Data Curation Layer using Pyspark notebooks.
You will play a key role in shaping our data architecture and enhancing our data pipelines.
Key Responsibilities :
- Design and develop the Data Curation Layer to ensure seamless access to curated data for business and analytics use cases.
- Lead the data transformation efforts utilizing Pyspark notebooks to process and transform large-scale data.
- Work closely with cross-functional teams (including Data Engineers, Analysts, and Business Stakeholders) to understand business requirements and translate them into scalable data architecture solutions.
- Develop and maintain efficient and optimized Pyspark notebooks for ETL/ELT processes, ensuring high-performance data transformation and integration across multiple sources.
- Define and enforce data governance, quality, and compliance standards for the Data Curation Layer.
- Collaborate with data engineering teams to build scalable and automated data pipelines using Pyspark, ensuring seamless transformation from raw data to refined datasets.
- Provide guidance on best practices for data modeling, transformation, and performance optimization within the Pyspark environment.
- Build and optimize data workflows in the cloud (AWS, Azure, or GCP) using Pyspark for efficient data processing.
- Document and maintain the architecture of data models, notebooks, and workflows, ensuring transparency and knowledge sharing across the team.
- Conduct code reviews and optimize existing Pyspark notebooks for performance and maintainability.
- Ensure data transformations are performed efficiently, with a focus on scalability and performance at every stage of the pipeline
Roles & Responsibilities :
Requirements :
- Bachelor's or Master's degree in Computer Science, Engineering, Information Systems, or a related
field.
- 10+ years of experience in data architecture, with a strong focus on data transformation and curation.
- Hands-on experience with Pyspark notebook development, specifically for ETL/ELT data
transformations.
- Expertise in building and optimizing data transformation pipelines in large-scale, distributed environments using Pyspark.
- Solid experience with cloud-based data platforms (AWS, Azure, or Google Cloud), including experience with data processing services like AWS EMR, Azure Databricks, or Google Dataproc.
- Deep understanding of data modeling, data governance, and data quality practices.
- Proficiency in using Pyspark, Apache Spark, and related frameworks for distributed data processing.
- Strong knowledge of SQL, and experience with databases (e.g., SQL Server, Oracle, MySQL, or NoSQL).
- Ability to work in agile project environments and meet deadlines.
- Excellent communication and collaboration skills, with the ability to interact with both technical and
non-technical stakeholders.
- Ability to handle complex data problems and provide strategic solutions to ensure data is structured, transformed, and curated for business consumption.
Preferred Qualifications :
- Experience with cloud-native data architecture and data lakes (e.g., Delta Lake, AWS S3).
- Familiarity with Apache Airflow for orchestration of data workflows.
- Certifications in cloud platforms (AWS, Azure, or GCP) or big data technologies.
- Knowledge of data orchestration, and automation tools (e.g., Jenkins, Kubeflow).
- Experience working with data visualization tools (e.g., Tableau, Power BI) to ensure the curated data is easily accessible for business stakeholders.
Functional Areas: Software/Testing/Networking
Read full job descriptionPrepare for Data Architect roles with real interview advice