ETL Tester
100+ ETL Tester Interview Questions and Answers
Q1. If we have 200 staging tables, 40 dimensions tables and 20 facts table, How will you compare it with target systems
The number of staging, dimension and fact tables in source and target systems need to be compared.
Compare the number of staging, dimension and fact tables in source and target systems.
Check if the table names and column names are consistent in both systems.
Verify if the data types and data values are matching in both systems.
Ensure that the ETL process is properly mapping the data from source to target systems.
Perform data profiling to identify any discrepancies between the s...read more
Q2. 2.Difference between Union and union all,drop and Truncate,star schema and snowflake schema,Dimension table and fact table.
Union combines the result sets of two or more SELECT statements, while Union All combines all rows from two or more SELECT statements.
Union removes duplicate rows, while Union All does not.
Union requires the number and order of columns in all SELECT statements to be the same, while Union All does not have this requirement.
Example: SELECT column1 FROM table1 UNION SELECT column1 FROM table2;
Example: SELECT column1 FROM table1 UNION ALL SELECT column1 FROM table2;
ETL Tester Interview Questions and Answers for Freshers
Q3. What is the difference between Test Plan and Test strategy
Test plan is a detailed document that outlines the testing approach and activities. Test strategy is a high-level document that outlines the testing objectives and methods.
Test plan is more detailed than test strategy
Test plan includes specific test cases and test scenarios
Test strategy outlines the overall testing approach and objectives
Test strategy is created before the test plan
Test strategy is reviewed and updated regularly
Q4. Which type of Validation you will do at Landing and staging area.
At landing and staging area, I will perform data validation to ensure accuracy and completeness of data.
Validate data against source system
Check for missing or duplicate data
Verify data types and formats
Ensure data integrity and consistency
Perform data profiling and data quality checks
Q5. How do you get data integrity between source and destination, how you perform loading technique.
Data integrity is ensured through ETL testing and loading techniques.
Perform ETL testing to ensure data accuracy and completeness.
Use data profiling to identify data quality issues.
Perform data mapping to ensure correct data transfer.
Use checksums to verify data integrity.
Perform data validation to ensure data consistency.
Use error handling and logging to track and resolve issues.
Use incremental loading to improve performance and reduce errors.
Q6. What is requirement traceability matrix.. compatibility Testing
Requirement Traceability Matrix is a document that links requirements to test cases. Compatibility testing checks if software works on different platforms.
Requirement Traceability Matrix (RTM) is used to ensure that all requirements are tested and covered in test cases.
RTM helps in tracking the progress of testing and ensures that all requirements are met.
Compatibility testing is done to ensure that software works on different platforms, browsers, and devices.
Compatibility te...read more
Share interview questions and help millions of jobseekers 🌟
Q7. what is self Join,types of joins.what is CDC and how we will use it in ETL testing
Self join is joining a table with itself. Types of joins are inner, outer, left and right. CDC is change data capture used for tracking data changes.
Self join is used when we need to join a table with itself to retrieve data.
Types of joins are inner, outer, left and right join.
CDC is used to track data changes in the source system and apply those changes to the target system.
CDC can be used in ETL testing to verify that the data is being correctly captured and transformed.
Q8. Explain STLC .. difference between water fall and Agile methodologies
STLC is Software Testing Life Cycle which includes planning, designing, executing and reporting. Waterfall is a linear approach while Agile is iterative.
STLC is a process of testing software from planning to reporting
Waterfall is a linear approach where each phase is completed before moving to the next
Agile is an iterative approach where testing is done in short cycles called sprints
In Waterfall, testing is done at the end of the development cycle while in Agile, testing is d...read more
ETL Tester Jobs
Q9. What documents will you produce durimg and sfter testing phase
I will produce test plan, test cases, test scripts, defect reports, and test summary report.
Test plan outlining the testing approach and scope
Test cases with steps, expected results, and actual results
Test scripts for automation
Defect reports with steps to reproduce and severity
Test summary report with overall results and recommendations
Q10. Difference Between union and union all,having and where, rank and dense rank(),minus and intersection
Explaining the differences between union and union all, having and where, rank and dense rank(), minus and intersection.
Union combines the result sets of two or more SELECT statements, while Union All returns all rows including duplicates.
Having is used to filter the results of an aggregate function in a GROUP BY clause, while Where is used to filter rows before grouping.
Rank assigns a unique rank to each row within a result set, while Dense Rank assigns the same rank to rows...read more
Q11. 1.Find last 5 records. 2. find unique records.
To find the last 5 records, use the ORDER BY clause with a descending order and limit the result to 5. To find unique records, use the DISTINCT keyword.
To find the last 5 records, use the ORDER BY clause with a descending order and limit the result to 5.
Example: SELECT * FROM table_name ORDER BY column_name DESC LIMIT 5
To find unique records, use the DISTINCT keyword.
Example: SELECT DISTINCT column_name FROM table_name
Q12. 1.Difference between unique key and primary key.
Unique key allows null values while primary key does not.
Primary key is a unique identifier for a record in a table.
Unique key allows null values but primary key does not.
A table can have only one primary key but multiple unique keys.
Example: Employee ID can be a primary key while email can be a unique key.
Q13. How do you identify latest record in SCD
To identify latest record in SCD, check the effective end date column.
Look for the record with the latest effective end date
Effective end date column should have the maximum date value
If there are multiple records with the same effective end date, choose the one with the latest modified date
Q14. SQL-Display horizontal ORACLE into Vertical ORACLE (Pivot function)
To display horizontal Oracle into vertical Oracle, we can use the PIVOT function in SQL.
The PIVOT function is used to transform rows into columns.
It requires an aggregate function to be specified.
The PIVOT function can be used with the SELECT statement.
The PIVOT function can also be used with dynamic SQL.
Example: SELECT * FROM table_name PIVOT (SUM(column_name) FOR pivot_column IN (value1, value2, value3));
Q15. How do you compare files in Unix
Files can be compared in Unix using the diff command.
Use the 'diff' command followed by the two file names to compare them.
The output will show the differences between the two files.
Use the '-y' option to display the differences side by side.
Use the '-q' option to only show if the files are different or not.
Use the '-r' option to compare files in directories recursively.
Q16. What are the components of test cases
Components of test cases include test case ID, test case description, test steps, expected results, actual results, and status.
Test case ID: unique identifier for the test case
Test case description: brief description of the test case
Test steps: detailed steps to execute the test case
Expected results: expected outcome of the test case
Actual results: actual outcome of the test case
Status: pass/fail status of the test case
Q17. What risks do you see in Test completion
Incomplete testing can lead to undetected defects and potential risks in production.
Undetected defects can cause system failures and downtime.
Incomplete testing can lead to security vulnerabilities.
Lack of testing can result in poor user experience and customer dissatisfaction.
Incomplete testing can lead to increased maintenance costs and longer development cycles.
Q18. What are the prerequisites for etl testing?
Prerequisites for ETL testing include understanding of data warehousing concepts, SQL, and ETL tools.
Understanding of data warehousing concepts
Proficiency in SQL
Familiarity with ETL tools such as Informatica, Talend, or SSIS
Knowledge of data mapping and transformation
Ability to write test cases and execute them
Experience in data validation and reconciliation
Understanding of data quality and data profiling
Knowledge of source and target systems
Ability to troubleshoot issues and...read more
Q19. SQL-Find a last day of the previous month
SQL query to find the last day of the previous month.
Use the DATEADD function to subtract one day from the first day of the current month
Use the DAY function to get the day of the month
Subtract the day of the month from the date to get the last day of the previous month
Q20. How to find nth highest from a table
To find the nth highest from a table, use the SQL query with the ORDER BY and LIMIT clauses.
Write a SQL query to select the column you want to find the nth highest from.
Use the ORDER BY clause to sort the column in descending order.
Use the LIMIT clause to specify the number of rows to return, where n-1 is the value for LIMIT.
The nth highest value will be the first row returned by the query.
Q21. SQL- Fetch last 5 records from the table
To fetch last 5 records from a table in SQL
Use SELECT statement to retrieve data from the table
Use ORDER BY clause to sort the data in descending order based on a column
Use LIMIT clause to limit the number of rows returned to 5
Q22. How do you map requirement to test cases
Requirements are mapped to test cases by analyzing the requirements and creating test cases that cover all the aspects of the requirement.
Analyze the requirement document thoroughly
Identify the key functionalities and features
Create test cases that cover all the aspects of the requirement
Ensure that the test cases are traceable to the requirement
Review and validate the test cases with stakeholders
Q23. what is fact table,dimenssion table,star schema,snowflake schema???
Fact table, dimension table, star schema, and snowflake schema are concepts in data warehousing.
Fact table contains the measures or metrics of the data warehouse.
Dimension table contains the attributes or dimensions of the data warehouse.
Star schema is a type of schema where a fact table is connected to multiple dimension tables.
Snowflake schema is a type of schema where dimension tables are normalized into multiple related tables.
These concepts are used to design and organiz...read more
Q24. What are dimensions? And their types
Dimensions are attributes or characteristics of data that can be used for analysis and reporting.
Dimensions are used in data warehousing and business intelligence to categorize and organize data.
Types of dimensions include time, geography, product, customer, and organization.
Dimensions can be hierarchical, with subcategories and levels of detail.
Dimensions are often used in conjunction with measures, which are the numerical values being analyzed.
Q25. SQL Query to find such records which never ever contains flag 0 in critical column p1 1-1-21 1 p1 2-3-21 0 p2 3-2-21 0 p3 1-5-21 1 p3 1-5-21 1 p4 14-21 1
The SQL query to find records that never contain flag 0 in the critical column.
Use the NOT EXISTS operator to check if there are no records with flag 0 for each distinct value in the critical column.
Write a subquery to select distinct values from the critical column.
Join the main table with the subquery using the critical column.
Filter the result by checking if there are no records with flag 0 for each distinct value in the critical column.
Q26. What is Low Priority , Medium Priority & High priority defects?
Low, Medium, and High priority defects are categorized based on their impact on the system and urgency of fixing.
Low priority defects are minor issues that do not significantly impact the system's functionality.
Medium priority defects are issues that have a noticeable impact on the system but are not critical.
High priority defects are critical issues that severely impact the system's functionality and need immediate attention.
Examples: Low priority - Spelling errors, Medium p...read more
Q27. Difference between Star schema and snowflake schema.
Star schema has a single fact table and denormalized dimension tables, while snowflake schema has normalized dimension tables.
Star schema is simpler and easier to understand, but can lead to data redundancy.
Snowflake schema is more complex but saves storage space and allows for more flexibility in querying.
Star schema is better suited for smaller datasets, while snowflake schema is better for larger datasets.
Example of star schema: sales fact table with denormalized customer ...read more
Q28. Difference between white box and black box testing.
White box testing is testing the internal structure of the application while black box testing is testing the functionality without knowledge of the internal structure.
White box testing is also known as clear box testing or structural testing.
It requires knowledge of the internal workings of the application.
Examples include unit testing and integration testing.
Black box testing is also known as functional testing.
It does not require knowledge of the internal workings of the a...read more
Q29. How to do validation between two integers Source 12345 2122434 54534535 5345353455 target 0000012345 0002122434 0054534535 5345353455
To validate between two integers, compare each corresponding digit and check if they are equal.
Convert both integers to strings
Pad the source integer with leading zeros to match the length of the target integer
Compare each digit of the source and target integers
If any digit is not equal, the validation fails
Q30. Suppose we have following column what will be output for inner join col1 col2 1 1 1 null null null 1 1 1 blank blank blank
The output for inner join will be the rows where both col1 and col2 have non-null values.
Inner join returns only the matching rows from both tables.
In this case, the output will be the rows with non-null values in both col1 and col2.
Rows with null or blank values in either col1 or col2 will be excluded from the output.
Q31. Explain ETL architecture?
ETL architecture refers to the design and structure of the ETL process.
ETL architecture involves three main components: extraction, transformation, and loading.
Data is extracted from various sources, transformed to fit the target system, and loaded into the target database.
ETL architecture can be implemented using different tools and technologies, such as ETL software, data integration platforms, and cloud-based solutions.
The architecture should be designed to ensure data acc...read more
Q32. What is data repository?
A data repository is a centralized location where data is stored, managed, and maintained.
It is used to store and manage data in a structured manner
It can be a database, data warehouse, or data lake
It allows for easy access and retrieval of data
Examples include Hadoop Distributed File System (HDFS), Amazon S3, and Oracle Database
Q33. What is the difference between union and union all?
Union combines and removes duplicates, while union all combines without removing duplicates.
Union removes duplicates from the result set, while union all does not.
Union is slower than union all because it has to perform an additional step to remove duplicates.
Union all is used when duplicates are desired in the result set.
Example: SELECT column1 FROM table1 UNION SELECT column1 FROM table2;
Example: SELECT column1 FROM table1 UNION ALL SELECT column1 FROM table2;
Q34. What is data mart?
A data mart is a subset of a larger data warehouse that is designed to serve a specific business unit or department.
Contains a subset of data from a larger data warehouse
Designed to serve a specific business unit or department
Provides a more focused view of data for analysis and reporting
Can be created using a top-down or bottom-up approach
Examples include sales data mart, marketing data mart, finance data mart
Q35. What is the Difference Between Drop and Truncate Command ?
Drop command deletes the table structure along with data, while truncate command deletes only the data.
Drop command removes the table structure and all its data, while truncate command only removes the data but keeps the table structure intact.
Drop command is a DDL (Data Definition Language) command, while truncate command is a DML (Data Manipulation Language) command.
Drop command is slower than truncate command as it involves deleting the table structure.
Example: DROP TABLE ...read more
Q36. Write an SQL query to fetch 5th ,10th,15th .. records from a table?
Use OFFSET and FETCH in SQL to retrieve specific records from a table.
Use OFFSET to skip the first n-1 records
Use FETCH to limit the number of records to be retrieved
Example: SELECT * FROM table_name OFFSET 4 ROWS FETCH NEXT 5 ROWS ONLY
Q37. What is the difference between set & join?
Set and join are operations used in database management systems.
Set is used to perform mathematical set operations like union, intersection, and difference on two or more sets.
Join is used to combine rows from two or more tables based on a related column between them.
Set operations do not require a common column, while join operations do.
Set operations return a result set with unique values, while join operations return a result set with combined values.
Examples: UNION, INTER...read more
Q38. diff b/w where clause and having clause? diff b/w severity and priority
Difference between where and having clause and severity and priority
Where clause is used to filter rows before grouping while having clause is used to filter groups after grouping
Severity is the degree of impact of a defect on the system while priority is the order in which defects should be fixed
Where clause is used with SELECT, UPDATE and DELETE statements while having clause is used with GROUP BY clause
Severity is usually assigned by the tester while priority is assigned b...read more
Q39. What is a iterative model
Iterative model is a software development model where the development process is divided into smaller iterations.
Iterative model involves repeating the development process in small cycles.
Each cycle involves planning, designing, building, testing and evaluating the software.
Feedback from each cycle is used to improve the software in the next cycle.
Iterative model is useful when requirements are not clear or keep changing.
Examples of iterative models include Agile and Scrum.
Q40. 1 2 3 . . . 100 we want 100 at first row & rest below it
The question is asking to rearrange the numbers so that 100 is at the first row and the rest are below it.
Create an array of strings to store the numbers
Assign the value '100' to the first element of the array
Use a loop to assign the numbers 1 to 99 to the remaining elements of the array
Q41. Explain schema you used in your project
The schema used in my project was a star schema.
Star schema is a type of data warehouse schema where a central fact table is connected to multiple dimension tables.
The fact table contains the measurements or metrics of the business process, while the dimension tables provide context and descriptive attributes.
This schema is commonly used in data warehousing and business intelligence applications.
Example: In a sales analysis project, the fact table could contain sales transact...read more
Q42. 1. Fetch last record of the column. 2. Dense rank vs rank 3. Fetch end of 5 character from the string of the column
The interview questions are related to SQL and data manipulation.
To fetch the last record of a column, you can use the ORDER BY clause with DESC keyword and LIMIT 1.
Dense rank and rank are both ranking functions in SQL, but dense rank assigns consecutive rank numbers to rows, while rank leaves gaps in the ranking sequence.
To fetch the end of 5 characters from a string column, you can use the RIGHT function in SQL.
Q43. write query last hour hour transaction details through net banking r card
Query for last hour transaction details through net banking or card
Use the transaction table to filter transactions by timestamp
Join with the net banking or card table to get additional details
Filter for transactions within the last hour using a WHERE clause
Order the results by timestamp to get the most recent transactions first
Q44. What is fact?
A fact is a piece of information that is known to be true or proven through evidence.
Facts are objective and can be verified through research or observation.
Facts are not opinions or beliefs.
Examples of facts include the boiling point of water, the population of a city, and historical events that have been documented.
Facts can be used to support arguments or conclusions.
Facts can change over time as new information is discovered or theories are revised.
Q45. Find the nth row. Find the top 50% records, string function
Use SQL query to find the nth row and top 50% records using string function
To find the nth row, use LIMIT and OFFSET in SQL query
To find top 50% records, use PERCENTILE_CONT function in SQL
String functions like SUBSTRING can be used to manipulate string data
Q46. what is Star Schema and Snow flake schema ?
Star Schema is a type of database schema where a central fact table is connected to multiple dimension tables. Snowflake Schema is a normalized form of Star Schema where dimension tables are further normalized into sub-dimension tables.
Star Schema has a central fact table connected to multiple dimension tables
Snowflake Schema is a normalized form of Star Schema with sub-dimension tables
Star Schema is denormalized for better query performance
Snowflake Schema is normalized for ...read more
Q47. SQL query to find second highest salary.
SQL query to find second highest salary.
Use ORDER BY clause in descending order to sort the salaries
Use LIMIT 1,1 to select the second highest salary
Example: SELECT salary FROM employees ORDER BY salary DESC LIMIT 1,1
Q48. How you identify duplicates and how to find latest duplicate
Identifying duplicates and finding the latest duplicate in ETL testing
Identify duplicates by comparing key fields in the dataset
To find the latest duplicate, look for the record with the highest timestamp or version number
Use SQL queries or ETL tools to filter out duplicates and identify the latest one
Q49. What is the difference between star schema and snowflake schme
Star schema has a single fact table connected to multiple dimension tables directly, while snowflake schema has normalized dimension tables connected to each other.
Star schema has denormalized dimension tables, making it easier to query but potentially less efficient for storage.
Snowflake schema has normalized dimension tables, reducing redundancy but potentially making queries more complex.
Star schema is more suitable for data warehousing applications with simpler queries, w...read more
Q50. Group by clause and how it work with a query
Group by clause is used in SQL to group rows that have the same values into summary rows.
Group by clause is used with aggregate functions like SUM, COUNT, AVG, etc.
It is used to group rows based on one or more columns specified in the query.
Group by clause must be used after the WHERE clause and before the ORDER BY clause.
Example: SELECT department, COUNT(*) FROM employees GROUP BY department;
Interview Questions of Similar Designations
Top Interview Questions for ETL Tester Related Skills
Interview experiences of popular companies
Calculate your in-hand salary
Confused about how your in-hand salary is calculated? Enter your annual salary (CTC) and get your in-hand salary
Reviews
Interviews
Salaries
Users/Month