ETL Tester

100+ ETL Tester Interview Questions and Answers

Updated 2 Mar 2025

Q51. Explain different types of data validation and checks performed in ETL testing

Ans.

Different types of data validation and checks in ETL testing include schema validation, data completeness checks, data transformation checks, and data quality checks.

  • Schema validation ensures that the data conforms to the expected structure and format.

  • Data completeness checks verify that all expected data is present and accounted for.

  • Data transformation checks ensure that data is accurately transformed according to business rules.

  • Data quality checks assess the accuracy, consi...read more

Q52. Explain Defects life cycle

Ans.

Defects life cycle is a process that describes the stages of a defect from discovery to resolution.

  • Defect is identified and reported

  • Defect is assigned to a developer

  • Developer fixes the defect

  • Defect is retested by the tester

  • If the defect is still present, it is reopened

  • If the defect is fixed, it is closed

Q53. Types of test data creation

Ans.

Test data creation types include manual, automated, random, boundary, and negative testing.

  • Manual testing involves creating data by hand

  • Automated testing uses tools to generate data

  • Random testing involves creating data randomly

  • Boundary testing involves testing data at the limits of its range

  • Negative testing involves testing invalid or unexpected data

Q54. What is the flow of recent project

Ans.

The recent project flow involved data extraction, transformation, and loading using ETL tools.

  • The project started with identifying the data sources and requirements.

  • Then, the data was extracted from various sources using ETL tools like Informatica, Talend, etc.

  • The extracted data was transformed as per the business rules and requirements.

  • Finally, the transformed data was loaded into the target system.

  • The project also involved testing the ETL process and ensuring data accuracy ...read more

Are these interview questions helpful?

Q55. what is difference between union and union all

Ans.

Union combines the result sets of two or more SELECT statements, while Union All combines all rows from two or more SELECT statements.

  • Union removes duplicate rows from the result set, while Union All does not.

  • Union requires that the number and order of columns in all SELECT statements be the same, while Union All does not have this requirement.

  • Union is slower than Union All because it performs a distinct operation, while Union All simply concatenates the result sets.

  • Example: ...read more

Q56. Query to fetch data greater than average, min and max salary.

Ans.

Query to fetch data greater than average, min and max salary.

  • Calculate average, min, and max salary first

  • Use a SELECT statement with WHERE clause to filter data greater than average, min, and max salary

Share interview questions and help millions of jobseekers 🌟

man-with-laptop

Q57. Query to fetch only numbers from alphanumeric data from a table.

Ans.

Use regular expressions to extract numbers from alphanumeric data in a table.

  • Use the REGEXP_REPLACE function in SQL to replace all non-numeric characters with an empty string.

  • For example, SELECT REGEXP_REPLACE(column_name, '[^0-9]', '') FROM table_name;

  • This will return only the numbers from the alphanumeric data in the specified column.

Q58. How would you test a full load vs incremental load

Ans.

To test a full load vs incremental load, compare the results of loading all data at once vs loading only new or updated data.

  • Create test cases to verify the accuracy of data loaded during a full load.

  • Create test cases to verify that only new or updated data is loaded during an incremental load.

  • Compare the results of the full load and incremental load to ensure consistency and accuracy.

  • Verify that the data integrity is maintained during both types of loads.

  • Use tools like SQL q...read more

ETL Tester Jobs

ETL Tester_ Pan India 3-8 years
Infosys
3.6
Hyderabad / Secunderabad
Etl Tester_Infosys 3-8 years
Infosys
3.6
Hyderabad / Secunderabad
ETL Tester- (Pan India) 3-8 years
Infosys
3.6
Hyderabad / Secunderabad

Q59. What type of data validation you performed.

Ans.

I performed various types of data validation including format validation, completeness validation, and referential integrity validation.

  • Format validation: Ensuring that the data is in the correct format as per the defined specifications. For example, validating that a date field is in the format 'YYYY-MM-DD'.

  • Completeness validation: Checking if all the required fields are present and populated in the data. For example, ensuring that a customer record has a name, address, and ...read more

Q60. what is mapping design????

Ans.

Mapping design is the process of creating a mapping between source and target data structures.

  • It involves identifying the source and target data structures

  • Defining the transformation rules to convert the source data to target data

  • Creating a mapping document to document the mapping rules

  • Examples include ETL mapping design for data integration projects

Q61. what is difference between join and union?

Ans.

Join is used to combine rows from two or more tables based on a related column, while union is used to combine the result sets of two or more SELECT statements.

  • Join is used to retrieve data from multiple tables based on a related column

  • Union is used to combine the result sets of two or more SELECT statements

  • Join can be used with different types like INNER JOIN, LEFT JOIN, RIGHT JOIN, etc.

  • Union removes duplicate rows from the combined result set

Q62. How to find duplicate values in python?

Ans.

Use a dictionary to find duplicate values in an array of strings in Python.

  • Create an empty dictionary to store the count of each string in the array.

  • Iterate through the array and for each string, check if it exists in the dictionary. If it does, increment the count. If not, add it to the dictionary with a count of 1.

  • After iterating through the array, check the dictionary for any strings with a count greater than 1. These are the duplicate values.

Q63. What is fact table?

Ans.

Fact table is a table in a data warehouse that stores quantitative data about a business process.

  • Contains foreign keys to dimension tables

  • Stores numerical data such as sales, revenue, etc.

  • Used for analysis and reporting

  • Can have multiple fact tables in a data warehouse

Q64. how do you delete duplicates from each department

Ans.

Use SQL query with GROUP BY and HAVING clause to delete duplicates from each department

  • Use GROUP BY clause to group the records by department

  • Use HAVING clause to filter out the duplicates within each department

  • Write a DELETE statement to remove the duplicate records

Q65. what are the types of scd

Ans.

Types of SCD include Type 1, Type 2, Type 3, and Type 4.

  • Type 1 - Overwrite: Old record is replaced with new data.

  • Type 2 - Add new row: New record is added with a new surrogate key.

  • Type 3 - Update: New column is added to track changes.

  • Type 4 - Hybrid: Combination of Type 1 and Type 2.

Q66. What is Fact and Dimension Tables

Ans.

Fact tables contain quantitative data while dimension tables contain descriptive attributes.

  • Fact tables store measurable, quantitative data such as sales revenue or quantity sold

  • Dimension tables store descriptive attributes related to the data in the fact table, such as product name or customer details

  • Fact tables are typically normalized while dimension tables are denormalized for easier querying

  • Example: In a sales database, the fact table would contain sales data while dimen...read more

Q67. Explain challenges you faced in etl testing?

Ans.

Challenges in ETL testing include data quality issues, complex transformations, and handling large volumes of data.

  • Data quality issues such as missing or incorrect data can impact testing results

  • Complex transformations can be difficult to validate and may lead to errors in the ETL process

  • Handling large volumes of data can slow down testing processes and require efficient testing strategies

Q68. domain name from an email ?

Ans.

A domain name from an email refers to the part after the @ symbol in an email address.

  • The domain name is typically the name of the organization or service provider associated with the email address.

  • For example, in the email address example@gmail.com, 'gmail.com' is the domain name.

Q69. SCD Types , Star Schema Vs SnowFlake Schema

Ans.

SCD types refer to slowly changing dimensions in data warehousing. Star schema is denormalized while snowflake schema is normalized.

  • SCD types refer to how dimensions change over time in a data warehouse

  • Type 1: Overwrite existing data with new data

  • Type 2: Keep track of historical data with new records

  • Type 3: Keep track of some historical data with limited history

  • Star schema is denormalized with a central fact table connected to dimension tables

  • Snowflake schema is normalized wi...read more

Q70. Explain defect life cycle.

Ans.

Defect life cycle is the process of identifying, reporting, prioritizing, fixing, and verifying defects in software.

  • Defect is identified by testers during testing

  • Defect is reported to development team

  • Development team prioritizes and fixes the defect

  • Fixed defect is verified by testers

  • If defect is not fixed, it goes back to development team

  • If defect is fixed, it is closed

Q71. What is ETL explain,stages Of etl

Ans.

ETL stands for Extract, Transform, Load. It is a process of extracting data from various sources, transforming it into a consistent format, and loading it into a target database.

  • Extract: Data is extracted from different sources such as databases, files, APIs, etc.

  • Transform: Data is cleaned, validated, and transformed into a consistent format.

  • Load: Transformed data is loaded into a target database or data warehouse.

  • Stages of ETL: Extraction, Transformation, Loading.

Q72. Difference between snowflake schema and star schema

Ans.

Snowflake schema is a normalized form of star schema with additional dimension tables.

  • Snowflake schema is a data modeling technique used in data warehousing.

  • In snowflake schema, dimensions are normalized into multiple related tables.

  • Snowflake schema reduces redundancy and improves data integrity.

  • Star schema is a denormalized form of snowflake schema with a single dimension table.

  • In star schema, dimensions are not normalized and are directly linked to the fact table.

  • Star schem...read more

Q73. rank dense_rank() ,row_numebr with example?

Ans.

dense_rank() assigns a unique rank to each distinct row, row_number() assigns a unique sequential integer to each row.

  • dense_rank() is used to assign a unique rank to each distinct row based on the specified order.

  • row_number() is used to assign a unique sequential integer to each row in the result set.

  • Example: dense_rank() - SELECT dense_rank() OVER (ORDER BY salary) AS rank FROM employees;

  • Example: row_number() - SELECT row_number() OVER (ORDER BY hire_date) AS row_num FROM em...read more

Q74. What is entry criteria & exit criteria

Ans.

Entry criteria and exit criteria are conditions that must be met before and after testing, respectively.

  • Entry criteria are the conditions that must be fulfilled before testing can begin

  • Exit criteria are the conditions that must be fulfilled before testing can be considered complete

  • Entry criteria can include having test data ready, test environment set up, and test cases prepared

  • Exit criteria can include all test cases executed, all defects fixed, and test reports generated

Q75. What is Scalar function with example.

Ans.

Scalar function is a function that returns a single value based on input parameters.

  • Scalar functions are used in SQL to perform calculations on input values and return a single result.

  • They can be used in SELECT statements, WHERE clauses, and other SQL queries.

  • An example of a scalar function is the ABS() function, which returns the absolute value of a number.

Q76. How to find duplicates in SQL?

Ans.

Use the GROUP BY clause with COUNT() function to find duplicates in SQL.

  • Use GROUP BY clause with COUNT() function to group the records by the columns you want to check for duplicates

  • Filter the results by using HAVING clause to only show records with count greater than 1

  • Example: SELECT column1, column2, COUNT(*) FROM table_name GROUP BY column1, column2 HAVING COUNT(*) > 1;

Q77. What is Slowly Changing Dimensions?

Ans.

Slowly Changing Dimensions (SCD) are techniques used in data warehousing to track changes to dimension data over time.

  • SCD Type 1: Overwrites old data with new data, losing historical information.

  • SCD Type 2: Creates new records for each change, preserving historical data.

  • SCD Type 3: Creates additional columns to track changes, allowing limited history.

  • SCD Type 4: Uses separate tables to store historical and current data.

  • Example: In a customer dimension, SCD Type 1 would update...read more

Q78. Etl tester test cases in various stages

Ans.

ETL tester tests in various stages to ensure data accuracy and completeness.

  • Test extraction process to ensure correct data is pulled from source systems

  • Test transformation process to ensure data is correctly transformed and formatted

  • Test loading process to ensure data is loaded into target systems without errors

  • Test data reconciliation to ensure data is complete and accurate in target systems

  • Test error handling and logging to ensure errors are properly handled and logged

Q79. What are constraints?

Ans.

Constraints are rules or conditions that must be followed in order to ensure data integrity and accuracy.

  • Constraints define the limits or restrictions on data values in a database table

  • Examples of constraints include primary key, foreign key, unique key, and check constraints

  • Constraints help maintain data quality and consistency in a database

Q80. Query to get 4th highest row based on a column

Ans.

Use SQL query with ORDER BY and LIMIT to get 4th highest row based on a column

  • Use ORDER BY clause to sort the column in descending order

  • Use LIMIT 1 OFFSET 3 to get the 4th highest row

  • Example: SELECT * FROM table_name ORDER BY column_name DESC LIMIT 1 OFFSET 3

Q81. Find the optimized approach for a given code

Ans.

Identify bottlenecks and inefficiencies in the code to optimize performance

  • Analyze the code to identify any redundant or unnecessary operations

  • Optimize data processing by using efficient algorithms and data structures

  • Consider parallel processing or multi-threading for performance improvement

  • Use profiling tools to identify performance bottlenecks and optimize them

  • Implement caching mechanisms to reduce redundant computations

Q82. Expected CTC in present company

Ans.

I prefer not to disclose my current salary as it is confidential.

  • It is not appropriate to disclose current salary as it is confidential information.

  • I am open to discussing a fair and competitive salary for the position.

  • My focus is on finding a challenging role that aligns with my skills and experience.

  • I am willing to negotiate based on the job responsibilities and market standards.

Q83. Flexible to work any environment

Ans.

Yes

  • I am flexible to work in any environment, be it a team or individual work

  • I can adapt to different work cultures and work styles

  • I am open to learning new technologies and tools

  • I can work in different time zones and shifts if required

Q84. Try to give real time example

Ans.

As an ETL Tester, a real-time example would be testing the data migration from a legacy system to a new system.

  • Testing the accuracy of data migration from one system to another

  • Ensuring data integrity during the migration process

  • Validating the completeness of data after migration

  • Verifying the consistency of data across systems

  • Identifying and resolving any data quality issues

  • Testing the performance of the ETL process

Q85. How to read a parquet file

Ans.

To read a parquet file, use a library like Apache Parquet or PyArrow to load the file and access the data.

  • Use a library like Apache Parquet or PyArrow to read the parquet file

  • Load the parquet file using the library's functions

  • Access the data within the parquet file for analysis or processing

Q86. Type of scd.

Ans.

SCD stands for Slowly Changing Dimensions. There are three types of SCD: Type 1, Type 2, and Type 3.

  • Type 1: Overwrites old data with new data.

  • Type 2: Creates a new record for new data and keeps the old record for historical data.

  • Type 3: Creates a new column for new data and keeps the old column for historical data.

Q87. Explain cross join with an example

Ans.

Cross join combines each row from one table with each row from another table.

  • Cross join produces a Cartesian product of the two tables involved.

  • It does not require any common columns between the two tables.

  • It can result in a large number of rows if the tables are big.

  • Example: SELECT * FROM table1 CROSS JOIN table2;

Q88. Explain inner join with an example

Ans.

Inner join combines rows from two tables based on a related column between them.

  • Inner join returns only the rows that have matching values in both tables

  • It is used to retrieve data that exists in both tables

  • Example: SELECT Orders.OrderID, Customers.CustomerName FROM Orders INNER JOIN Customers ON Orders.CustomerID = Customers.CustomerID;

Q89. Explain SCD types

Ans.

SCD types are used to track changes in data over time in a data warehouse.

  • SCD stands for Slowly Changing Dimensions.

  • There are three types of SCD: Type 1, Type 2, and Type 3.

  • Type 1 overwrites old data with new data.

  • Type 2 creates a new record for each change and keeps a history of changes.

  • Type 3 keeps both old and new data in the same record.

  • SCD types are important for maintaining data integrity and accuracy in a data warehouse.

Q90. What is primary key?

Ans.

Primary key is a unique identifier for each record in a database table.

  • Primary key ensures each record in a table is unique

  • It can be a single column or a combination of columns

  • Primary key values cannot be NULL

Q91. Query to fetch nth highest salary

Ans.

Use SQL query with ORDER BY and LIMIT to fetch nth highest salary

  • Use ORDER BY salary DESC to sort salaries in descending order

  • Use LIMIT n-1,1 to fetch the nth highest salary

  • Replace n with the desired position of the salary

Q92. what is data aquisation

Ans.

Data acquisition is the process of collecting, gathering, and capturing data from various sources.

  • Data acquisition involves retrieving data from different sources such as databases, files, APIs, or sensors.

  • It is an essential step in the ETL (Extract, Transform, Load) process.

  • Data acquisition can be manual or automated, depending on the source and the system being used.

  • Examples of data acquisition include extracting customer information from a CRM system, importing data from a...read more

Q93. what is data reconsilation

Ans.

Data reconciliation is the process of comparing and verifying data between two or more sources to ensure accuracy and consistency.

  • Data reconciliation involves comparing data from different sources, such as databases or files.

  • It aims to identify any discrepancies or inconsistencies in the data.

  • Reconciliation can be done by comparing data values, counts, or summaries.

  • Examples include comparing sales data between a point-of-sale system and an inventory database.

  • Data reconciliati...read more

Q94. what is scrum ceremony

Ans.

Scrum ceremonies are regular meetings held in Agile development to facilitate collaboration, planning, and review.

  • Scrum ceremonies are time-boxed events that occur at specific intervals during a sprint.

  • The ceremonies include Sprint Planning, Daily Stand-up, Sprint Review, and Sprint Retrospective.

  • Sprint Planning involves determining the work to be done in the upcoming sprint.

  • Daily Stand-up is a brief meeting where team members discuss progress, challenges, and plans for the d...read more

Q95. SQL query to find second Max salary.

Ans.

Use SQL query with subquery to find second highest salary.

  • Use ORDER BY and LIMIT to get the second highest salary.

  • Use a subquery to exclude the highest salary from the result set.

  • Example: SELECT MAX(salary) FROM employees WHERE salary < (SELECT MAX(salary) FROM employees);

Q96. Union union all difference

Ans.

Union, Union All, and Difference are SQL set operators used to combine or compare data from multiple tables.

  • Union combines the results of two or more SELECT statements and removes duplicates.

  • Union All combines the results of two or more SELECT statements and includes duplicates.

  • Difference (also known as Except) returns only the rows from the first SELECT statement that are not in the second SELECT statement.

  • All three operators require that the SELECT statements have the same ...read more

Q97. Write SQL for even andODD numbers

Ans.

SQL query to retrieve even and odd numbers

  • Use the modulo operator (%) to determine if a number is even or odd

  • For even numbers: SELECT number FROM table WHERE number % 2 = 0

  • For odd numbers: SELECT number FROM table WHERE number % 2 <> 0

Q98. Write SQL for sec highest salary.

Ans.

SQL query to find the second highest salary in a table.

  • Use the ORDER BY clause to sort the salaries in descending order.

  • Use the LIMIT clause to retrieve the second row after skipping the first row.

Q99. Query to get 3rd highest salary

Ans.

Use SQL query with ORDER BY and LIMIT to get 3rd highest salary.

  • Use SELECT statement with ORDER BY salary DESC to sort salaries in descending order

  • Use LIMIT 1,1 to skip the first two highest salaries and get the third highest salary

Q100. What customer do with data.

Ans.

Customers use data for various purposes such as analysis, decision-making, reporting, and personalization.

  • Data is used for analysis to gain insights and identify patterns or trends.

  • Data helps in decision-making by providing information and supporting evidence.

  • Data is used for reporting to communicate information and track performance.

  • Data enables personalization by tailoring products or services based on customer preferences.

  • Examples: analyzing sales data to identify top-sell...read more

Previous
1
2
3
Next
Interview Tips & Stories
Ace your next interview with expert advice and inspiring stories

Interview experiences of popular companies

3.7
 • 10.4k Interviews
3.8
 • 8.1k Interviews
3.6
 • 7.5k Interviews
3.7
 • 5.6k Interviews
3.7
 • 4.8k Interviews
3.5
 • 3.8k Interviews
3.8
 • 2.9k Interviews
3.7
 • 535 Interviews
3.7
 • 514 Interviews
View all

Calculate your in-hand salary

Confused about how your in-hand salary is calculated? Enter your annual salary (CTC) and get your in-hand salary

ETL Tester Interview Questions
Share an Interview
Stay ahead in your career. Get AmbitionBox app
qr-code
Helping over 1 Crore job seekers every month in choosing their right fit company
65 L+

Reviews

4 L+

Interviews

4 Cr+

Salaries

1 Cr+

Users/Month

Contribute to help millions

Made with ❤️ in India. Trademarks belong to their respective owners. All rights reserved © 2024 Info Edge (India) Ltd.

Follow us
  • Youtube
  • Instagram
  • LinkedIn
  • Facebook
  • Twitter