i
TalentBox Labs
10 TalentBox Labs Jobs
Data Quality Engineer - SQL/PySpark (4-7 yrs)
TalentBox Labs
posted 3d ago
Key skills for the job
Job Description :
- Develop and execute data validation and reconciliation tests for Relational databases, Teradata to AWS Databricks migration.
- Ensure data accuracy, completeness, and consistency using SQL, PySpark, and automation tools
- Implement data quality frameworks and perform root cause analysis for discrepancies
- Create Automation strategy and solutions for end to end data validations
- Validate data between data extracted from mainframe to databricks
- Collaborate with engineers and stakeholders to establish testing best practices in the migration process.
Data Validation and Reconciliation :
- Develop and execute comprehensive data validation and reconciliation tests.
- Compare data across source (Relational databases, Teradata) and target (AWS Databricks) systems.
Data Accuracy, Completeness, and Consistency :
- Ensure data accuracy, completeness, and consistency throughout the migration process.
- Utilize SQL and PySpark for data profiling and validation.
Data Quality Frameworks and Root Cause Analysis :
- Implement and maintain data quality frameworks.
- Perform root cause analysis to identify and resolve data discrepancies.
Automation Strategy and Solutions :
- Develop and implement automation strategies for end-to-end data validation processes.
- Create automated data quality checks.
Mainframe to Databricks Validation :
- Validate data transferred from mainframe systems to Databricks.
Collaboration and Best Practices :
- Collaborate with data engineers and stakeholders to establish and enforce data quality testing best practices.
- Work with development teams to ensure quality at all stages of the data pipeline.
Required Skills and Experience :
Technical Skills :
SQL :
- Advanced SQL skills for data querying, manipulation, and validation.
PySpark :
- Proficiency in PySpark for data processing, transformation, and validation within Databricks.
Data Validation and Reconciliation :
- Strong understanding of data validation and reconciliation techniques.
- Experience in developing and executing data validation test cases.
Data Quality Frameworks :
- Knowledge of data quality frameworks and methodologies.
- Ability to implement and maintain data quality standards.
Automation :
- Experience in developing and implementing automation scripts for data validation.
- Familiarity with automation tools and frameworks.
Root Cause Analysis :
- Strong analytical and problem-solving skills to perform root cause analysis of data discrepancies.
Mainframe Data Validation :
- Experience validating mainframe data is a big plus.
Database Knowledge :
- Understanding of relational databases (e.g., Oracle, SQL Server, DB2) and Teradata.
- Understanding of data lake/data warehouse concepts.
AWS Databricks :
- Familiarity with the AWS Databricks platform.
Experience :
- Significant experience in data quality engineering or data testing roles.
- Experience in data migration projects, particularly from relational databases/Teradata to cloud environments.
- Proven track record of developing and implementing data quality frameworks and automation solutions.
- Experience in working with large datasets and complex data transformations.
- Experience validating mainframe data.
Soft Skills :
- Strong analytical and problem-solving skills.
- Excellent attention to detail.
- Strong communication and collaboration skills.
- Ability to work independently and as part of a team.
- Ability to document and present findings clearly and concisely.
- Strong understanding of testing best practices.
Functional Areas: Other
Read full job description