ETL Tester
100+ ETL Tester Interview Questions and Answers
Q51. Explain different types of data validation and checks performed in ETL testing
Different types of data validation and checks in ETL testing include schema validation, data completeness checks, data transformation checks, and data quality checks.
Schema validation ensures that the data conforms to the expected structure and format.
Data completeness checks verify that all expected data is present and accounted for.
Data transformation checks ensure that data is accurately transformed according to business rules.
Data quality checks assess the accuracy, consi...read more
Q52. Explain Defects life cycle
Defects life cycle is a process that describes the stages of a defect from discovery to resolution.
Defect is identified and reported
Defect is assigned to a developer
Developer fixes the defect
Defect is retested by the tester
If the defect is still present, it is reopened
If the defect is fixed, it is closed
Q53. Types of test data creation
Test data creation types include manual, automated, random, boundary, and negative testing.
Manual testing involves creating data by hand
Automated testing uses tools to generate data
Random testing involves creating data randomly
Boundary testing involves testing data at the limits of its range
Negative testing involves testing invalid or unexpected data
Q54. What is the flow of recent project
The recent project flow involved data extraction, transformation, and loading using ETL tools.
The project started with identifying the data sources and requirements.
Then, the data was extracted from various sources using ETL tools like Informatica, Talend, etc.
The extracted data was transformed as per the business rules and requirements.
Finally, the transformed data was loaded into the target system.
The project also involved testing the ETL process and ensuring data accuracy ...read more
Q55. what is difference between union and union all
Union combines the result sets of two or more SELECT statements, while Union All combines all rows from two or more SELECT statements.
Union removes duplicate rows from the result set, while Union All does not.
Union requires that the number and order of columns in all SELECT statements be the same, while Union All does not have this requirement.
Union is slower than Union All because it performs a distinct operation, while Union All simply concatenates the result sets.
Example: ...read more
Q56. Query to fetch data greater than average, min and max salary.
Query to fetch data greater than average, min and max salary.
Calculate average, min, and max salary first
Use a SELECT statement with WHERE clause to filter data greater than average, min, and max salary
Share interview questions and help millions of jobseekers 🌟
Q57. Query to fetch only numbers from alphanumeric data from a table.
Use regular expressions to extract numbers from alphanumeric data in a table.
Use the REGEXP_REPLACE function in SQL to replace all non-numeric characters with an empty string.
For example, SELECT REGEXP_REPLACE(column_name, '[^0-9]', '') FROM table_name;
This will return only the numbers from the alphanumeric data in the specified column.
Q58. How would you test a full load vs incremental load
To test a full load vs incremental load, compare the results of loading all data at once vs loading only new or updated data.
Create test cases to verify the accuracy of data loaded during a full load.
Create test cases to verify that only new or updated data is loaded during an incremental load.
Compare the results of the full load and incremental load to ensure consistency and accuracy.
Verify that the data integrity is maintained during both types of loads.
Use tools like SQL q...read more
ETL Tester Jobs
Q59. What type of data validation you performed.
I performed various types of data validation including format validation, completeness validation, and referential integrity validation.
Format validation: Ensuring that the data is in the correct format as per the defined specifications. For example, validating that a date field is in the format 'YYYY-MM-DD'.
Completeness validation: Checking if all the required fields are present and populated in the data. For example, ensuring that a customer record has a name, address, and ...read more
Q60. what is mapping design????
Mapping design is the process of creating a mapping between source and target data structures.
It involves identifying the source and target data structures
Defining the transformation rules to convert the source data to target data
Creating a mapping document to document the mapping rules
Examples include ETL mapping design for data integration projects
Q61. what is difference between join and union?
Join is used to combine rows from two or more tables based on a related column, while union is used to combine the result sets of two or more SELECT statements.
Join is used to retrieve data from multiple tables based on a related column
Union is used to combine the result sets of two or more SELECT statements
Join can be used with different types like INNER JOIN, LEFT JOIN, RIGHT JOIN, etc.
Union removes duplicate rows from the combined result set
Q62. How to find duplicate values in python?
Use a dictionary to find duplicate values in an array of strings in Python.
Create an empty dictionary to store the count of each string in the array.
Iterate through the array and for each string, check if it exists in the dictionary. If it does, increment the count. If not, add it to the dictionary with a count of 1.
After iterating through the array, check the dictionary for any strings with a count greater than 1. These are the duplicate values.
Q63. What is fact table?
Fact table is a table in a data warehouse that stores quantitative data about a business process.
Contains foreign keys to dimension tables
Stores numerical data such as sales, revenue, etc.
Used for analysis and reporting
Can have multiple fact tables in a data warehouse
Q64. how do you delete duplicates from each department
Use SQL query with GROUP BY and HAVING clause to delete duplicates from each department
Use GROUP BY clause to group the records by department
Use HAVING clause to filter out the duplicates within each department
Write a DELETE statement to remove the duplicate records
Q65. what are the types of scd
Types of SCD include Type 1, Type 2, Type 3, and Type 4.
Type 1 - Overwrite: Old record is replaced with new data.
Type 2 - Add new row: New record is added with a new surrogate key.
Type 3 - Update: New column is added to track changes.
Type 4 - Hybrid: Combination of Type 1 and Type 2.
Q66. What is Fact and Dimension Tables
Fact tables contain quantitative data while dimension tables contain descriptive attributes.
Fact tables store measurable, quantitative data such as sales revenue or quantity sold
Dimension tables store descriptive attributes related to the data in the fact table, such as product name or customer details
Fact tables are typically normalized while dimension tables are denormalized for easier querying
Example: In a sales database, the fact table would contain sales data while dimen...read more
Q67. Explain challenges you faced in etl testing?
Challenges in ETL testing include data quality issues, complex transformations, and handling large volumes of data.
Data quality issues such as missing or incorrect data can impact testing results
Complex transformations can be difficult to validate and may lead to errors in the ETL process
Handling large volumes of data can slow down testing processes and require efficient testing strategies
Q68. domain name from an email ?
A domain name from an email refers to the part after the @ symbol in an email address.
The domain name is typically the name of the organization or service provider associated with the email address.
For example, in the email address example@gmail.com, 'gmail.com' is the domain name.
Q69. SCD Types , Star Schema Vs SnowFlake Schema
SCD types refer to slowly changing dimensions in data warehousing. Star schema is denormalized while snowflake schema is normalized.
SCD types refer to how dimensions change over time in a data warehouse
Type 1: Overwrite existing data with new data
Type 2: Keep track of historical data with new records
Type 3: Keep track of some historical data with limited history
Star schema is denormalized with a central fact table connected to dimension tables
Snowflake schema is normalized wi...read more
Q70. Explain defect life cycle.
Defect life cycle is the process of identifying, reporting, prioritizing, fixing, and verifying defects in software.
Defect is identified by testers during testing
Defect is reported to development team
Development team prioritizes and fixes the defect
Fixed defect is verified by testers
If defect is not fixed, it goes back to development team
If defect is fixed, it is closed
Q71. What is ETL explain,stages Of etl
ETL stands for Extract, Transform, Load. It is a process of extracting data from various sources, transforming it into a consistent format, and loading it into a target database.
Extract: Data is extracted from different sources such as databases, files, APIs, etc.
Transform: Data is cleaned, validated, and transformed into a consistent format.
Load: Transformed data is loaded into a target database or data warehouse.
Stages of ETL: Extraction, Transformation, Loading.
Q72. Difference between snowflake schema and star schema
Snowflake schema is a normalized form of star schema with additional dimension tables.
Snowflake schema is a data modeling technique used in data warehousing.
In snowflake schema, dimensions are normalized into multiple related tables.
Snowflake schema reduces redundancy and improves data integrity.
Star schema is a denormalized form of snowflake schema with a single dimension table.
In star schema, dimensions are not normalized and are directly linked to the fact table.
Star schem...read more
Q73. rank dense_rank() ,row_numebr with example?
dense_rank() assigns a unique rank to each distinct row, row_number() assigns a unique sequential integer to each row.
dense_rank() is used to assign a unique rank to each distinct row based on the specified order.
row_number() is used to assign a unique sequential integer to each row in the result set.
Example: dense_rank() - SELECT dense_rank() OVER (ORDER BY salary) AS rank FROM employees;
Example: row_number() - SELECT row_number() OVER (ORDER BY hire_date) AS row_num FROM em...read more
Q74. What is entry criteria & exit criteria
Entry criteria and exit criteria are conditions that must be met before and after testing, respectively.
Entry criteria are the conditions that must be fulfilled before testing can begin
Exit criteria are the conditions that must be fulfilled before testing can be considered complete
Entry criteria can include having test data ready, test environment set up, and test cases prepared
Exit criteria can include all test cases executed, all defects fixed, and test reports generated
Q75. What is Scalar function with example.
Scalar function is a function that returns a single value based on input parameters.
Scalar functions are used in SQL to perform calculations on input values and return a single result.
They can be used in SELECT statements, WHERE clauses, and other SQL queries.
An example of a scalar function is the ABS() function, which returns the absolute value of a number.
Q76. How to find duplicates in SQL?
Use the GROUP BY clause with COUNT() function to find duplicates in SQL.
Use GROUP BY clause with COUNT() function to group the records by the columns you want to check for duplicates
Filter the results by using HAVING clause to only show records with count greater than 1
Example: SELECT column1, column2, COUNT(*) FROM table_name GROUP BY column1, column2 HAVING COUNT(*) > 1;
Q77. What is Slowly Changing Dimensions?
Slowly Changing Dimensions (SCD) are techniques used in data warehousing to track changes to dimension data over time.
SCD Type 1: Overwrites old data with new data, losing historical information.
SCD Type 2: Creates new records for each change, preserving historical data.
SCD Type 3: Creates additional columns to track changes, allowing limited history.
SCD Type 4: Uses separate tables to store historical and current data.
Example: In a customer dimension, SCD Type 1 would update...read more
Q78. Etl tester test cases in various stages
ETL tester tests in various stages to ensure data accuracy and completeness.
Test extraction process to ensure correct data is pulled from source systems
Test transformation process to ensure data is correctly transformed and formatted
Test loading process to ensure data is loaded into target systems without errors
Test data reconciliation to ensure data is complete and accurate in target systems
Test error handling and logging to ensure errors are properly handled and logged
Q79. What are constraints?
Constraints are rules or conditions that must be followed in order to ensure data integrity and accuracy.
Constraints define the limits or restrictions on data values in a database table
Examples of constraints include primary key, foreign key, unique key, and check constraints
Constraints help maintain data quality and consistency in a database
Q80. Query to get 4th highest row based on a column
Use SQL query with ORDER BY and LIMIT to get 4th highest row based on a column
Use ORDER BY clause to sort the column in descending order
Use LIMIT 1 OFFSET 3 to get the 4th highest row
Example: SELECT * FROM table_name ORDER BY column_name DESC LIMIT 1 OFFSET 3
Q81. Find the optimized approach for a given code
Identify bottlenecks and inefficiencies in the code to optimize performance
Analyze the code to identify any redundant or unnecessary operations
Optimize data processing by using efficient algorithms and data structures
Consider parallel processing or multi-threading for performance improvement
Use profiling tools to identify performance bottlenecks and optimize them
Implement caching mechanisms to reduce redundant computations
Q82. Expected CTC in present company
I prefer not to disclose my current salary as it is confidential.
It is not appropriate to disclose current salary as it is confidential information.
I am open to discussing a fair and competitive salary for the position.
My focus is on finding a challenging role that aligns with my skills and experience.
I am willing to negotiate based on the job responsibilities and market standards.
Q83. Flexible to work any environment
Yes
I am flexible to work in any environment, be it a team or individual work
I can adapt to different work cultures and work styles
I am open to learning new technologies and tools
I can work in different time zones and shifts if required
Q84. Try to give real time example
As an ETL Tester, a real-time example would be testing the data migration from a legacy system to a new system.
Testing the accuracy of data migration from one system to another
Ensuring data integrity during the migration process
Validating the completeness of data after migration
Verifying the consistency of data across systems
Identifying and resolving any data quality issues
Testing the performance of the ETL process
Q85. How to read a parquet file
To read a parquet file, use a library like Apache Parquet or PyArrow to load the file and access the data.
Use a library like Apache Parquet or PyArrow to read the parquet file
Load the parquet file using the library's functions
Access the data within the parquet file for analysis or processing
Q86. Type of scd.
SCD stands for Slowly Changing Dimensions. There are three types of SCD: Type 1, Type 2, and Type 3.
Type 1: Overwrites old data with new data.
Type 2: Creates a new record for new data and keeps the old record for historical data.
Type 3: Creates a new column for new data and keeps the old column for historical data.
Q87. Explain cross join with an example
Cross join combines each row from one table with each row from another table.
Cross join produces a Cartesian product of the two tables involved.
It does not require any common columns between the two tables.
It can result in a large number of rows if the tables are big.
Example: SELECT * FROM table1 CROSS JOIN table2;
Q88. Explain inner join with an example
Inner join combines rows from two tables based on a related column between them.
Inner join returns only the rows that have matching values in both tables
It is used to retrieve data that exists in both tables
Example: SELECT Orders.OrderID, Customers.CustomerName FROM Orders INNER JOIN Customers ON Orders.CustomerID = Customers.CustomerID;
Q89. Explain SCD types
SCD types are used to track changes in data over time in a data warehouse.
SCD stands for Slowly Changing Dimensions.
There are three types of SCD: Type 1, Type 2, and Type 3.
Type 1 overwrites old data with new data.
Type 2 creates a new record for each change and keeps a history of changes.
Type 3 keeps both old and new data in the same record.
SCD types are important for maintaining data integrity and accuracy in a data warehouse.
Q90. What is primary key?
Primary key is a unique identifier for each record in a database table.
Primary key ensures each record in a table is unique
It can be a single column or a combination of columns
Primary key values cannot be NULL
Q91. Query to fetch nth highest salary
Use SQL query with ORDER BY and LIMIT to fetch nth highest salary
Use ORDER BY salary DESC to sort salaries in descending order
Use LIMIT n-1,1 to fetch the nth highest salary
Replace n with the desired position of the salary
Q92. what is data aquisation
Data acquisition is the process of collecting, gathering, and capturing data from various sources.
Data acquisition involves retrieving data from different sources such as databases, files, APIs, or sensors.
It is an essential step in the ETL (Extract, Transform, Load) process.
Data acquisition can be manual or automated, depending on the source and the system being used.
Examples of data acquisition include extracting customer information from a CRM system, importing data from a...read more
Q93. what is data reconsilation
Data reconciliation is the process of comparing and verifying data between two or more sources to ensure accuracy and consistency.
Data reconciliation involves comparing data from different sources, such as databases or files.
It aims to identify any discrepancies or inconsistencies in the data.
Reconciliation can be done by comparing data values, counts, or summaries.
Examples include comparing sales data between a point-of-sale system and an inventory database.
Data reconciliati...read more
Q94. what is scrum ceremony
Scrum ceremonies are regular meetings held in Agile development to facilitate collaboration, planning, and review.
Scrum ceremonies are time-boxed events that occur at specific intervals during a sprint.
The ceremonies include Sprint Planning, Daily Stand-up, Sprint Review, and Sprint Retrospective.
Sprint Planning involves determining the work to be done in the upcoming sprint.
Daily Stand-up is a brief meeting where team members discuss progress, challenges, and plans for the d...read more
Q95. SQL query to find second Max salary.
Use SQL query with subquery to find second highest salary.
Use ORDER BY and LIMIT to get the second highest salary.
Use a subquery to exclude the highest salary from the result set.
Example: SELECT MAX(salary) FROM employees WHERE salary < (SELECT MAX(salary) FROM employees);
Q96. Union union all difference
Union, Union All, and Difference are SQL set operators used to combine or compare data from multiple tables.
Union combines the results of two or more SELECT statements and removes duplicates.
Union All combines the results of two or more SELECT statements and includes duplicates.
Difference (also known as Except) returns only the rows from the first SELECT statement that are not in the second SELECT statement.
All three operators require that the SELECT statements have the same ...read more
Q97. Write SQL for even andODD numbers
SQL query to retrieve even and odd numbers
Use the modulo operator (%) to determine if a number is even or odd
For even numbers: SELECT number FROM table WHERE number % 2 = 0
For odd numbers: SELECT number FROM table WHERE number % 2 <> 0
Q98. Write SQL for sec highest salary.
SQL query to find the second highest salary in a table.
Use the ORDER BY clause to sort the salaries in descending order.
Use the LIMIT clause to retrieve the second row after skipping the first row.
Q99. Query to get 3rd highest salary
Use SQL query with ORDER BY and LIMIT to get 3rd highest salary.
Use SELECT statement with ORDER BY salary DESC to sort salaries in descending order
Use LIMIT 1,1 to skip the first two highest salaries and get the third highest salary
Q100. What customer do with data.
Customers use data for various purposes such as analysis, decision-making, reporting, and personalization.
Data is used for analysis to gain insights and identify patterns or trends.
Data helps in decision-making by providing information and supporting evidence.
Data is used for reporting to communicate information and track performance.
Data enables personalization by tailoring products or services based on customer preferences.
Examples: analyzing sales data to identify top-sell...read more
Interview Questions of Similar Designations
Top Interview Questions for ETL Tester Related Skills
Interview experiences of popular companies
Calculate your in-hand salary
Confused about how your in-hand salary is calculated? Enter your annual salary (CTC) and get your in-hand salary
Reviews
Interviews
Salaries
Users/Month