Cognizant
20+ Knorr-Bremse Interview Questions and Answers
Q1. If we have 200 staging tables, 40 dimensions tables and 20 facts table, How will you compare it with target systems
The number of staging, dimension and fact tables in source and target systems need to be compared.
Compare the number of staging, dimension and fact tables in source and target systems.
Check if the table names and column names are consistent in both systems.
Verify if the data types and data values are matching in both systems.
Ensure that the ETL process is properly mapping the data from source to target systems.
Perform data profiling to identify any discrepancies between the s...read more
Q2. 2.Difference between Union and union all,drop and Truncate,star schema and snowflake schema,Dimension table and fact table.
Union combines the result sets of two or more SELECT statements, while Union All combines all rows from two or more SELECT statements.
Union removes duplicate rows, while Union All does not.
Union requires the number and order of columns in all SELECT statements to be the same, while Union All does not have this requirement.
Example: SELECT column1 FROM table1 UNION SELECT column1 FROM table2;
Example: SELECT column1 FROM table1 UNION ALL SELECT column1 FROM table2;
Q3. Which type of Validation you will do at Landing and staging area.
At landing and staging area, I will perform data validation to ensure accuracy and completeness of data.
Validate data against source system
Check for missing or duplicate data
Verify data types and formats
Ensure data integrity and consistency
Perform data profiling and data quality checks
Q4. what is self Join,types of joins.what is CDC and how we will use it in ETL testing
Self join is joining a table with itself. Types of joins are inner, outer, left and right. CDC is change data capture used for tracking data changes.
Self join is used when we need to join a table with itself to retrieve data.
Types of joins are inner, outer, left and right join.
CDC is used to track data changes in the source system and apply those changes to the target system.
CDC can be used in ETL testing to verify that the data is being correctly captured and transformed.
Q5. 1.Find last 5 records. 2. find unique records.
To find the last 5 records, use the ORDER BY clause with a descending order and limit the result to 5. To find unique records, use the DISTINCT keyword.
To find the last 5 records, use the ORDER BY clause with a descending order and limit the result to 5.
Example: SELECT * FROM table_name ORDER BY column_name DESC LIMIT 5
To find unique records, use the DISTINCT keyword.
Example: SELECT DISTINCT column_name FROM table_name
Q6. 1.Difference between unique key and primary key.
Unique key allows null values while primary key does not.
Primary key is a unique identifier for a record in a table.
Unique key allows null values but primary key does not.
A table can have only one primary key but multiple unique keys.
Example: Employee ID can be a primary key while email can be a unique key.
Q7. How do you identify latest record in SCD
To identify latest record in SCD, check the effective end date column.
Look for the record with the latest effective end date
Effective end date column should have the maximum date value
If there are multiple records with the same effective end date, choose the one with the latest modified date
Q8. SQL-Display horizontal ORACLE into Vertical ORACLE (Pivot function)
To display horizontal Oracle into vertical Oracle, we can use the PIVOT function in SQL.
The PIVOT function is used to transform rows into columns.
It requires an aggregate function to be specified.
The PIVOT function can be used with the SELECT statement.
The PIVOT function can also be used with dynamic SQL.
Example: SELECT * FROM table_name PIVOT (SUM(column_name) FOR pivot_column IN (value1, value2, value3));
Q9. What are the prerequisites for etl testing?
Prerequisites for ETL testing include understanding of data warehousing concepts, SQL, and ETL tools.
Understanding of data warehousing concepts
Proficiency in SQL
Familiarity with ETL tools such as Informatica, Talend, or SSIS
Knowledge of data mapping and transformation
Ability to write test cases and execute them
Experience in data validation and reconciliation
Understanding of data quality and data profiling
Knowledge of source and target systems
Ability to troubleshoot issues and...read more
Q10. SQL-Find a last day of the previous month
SQL query to find the last day of the previous month.
Use the DATEADD function to subtract one day from the first day of the current month
Use the DAY function to get the day of the month
Subtract the day of the month from the date to get the last day of the previous month
Q11. SQL- Fetch last 5 records from the table
To fetch last 5 records from a table in SQL
Use SELECT statement to retrieve data from the table
Use ORDER BY clause to sort the data in descending order based on a column
Use LIMIT clause to limit the number of rows returned to 5
Q12. What are dimensions? And their types
Dimensions are attributes or characteristics of data that can be used for analysis and reporting.
Dimensions are used in data warehousing and business intelligence to categorize and organize data.
Types of dimensions include time, geography, product, customer, and organization.
Dimensions can be hierarchical, with subcategories and levels of detail.
Dimensions are often used in conjunction with measures, which are the numerical values being analyzed.
Q13. Explain ETL architecture?
ETL architecture refers to the design and structure of the ETL process.
ETL architecture involves three main components: extraction, transformation, and loading.
Data is extracted from various sources, transformed to fit the target system, and loaded into the target database.
ETL architecture can be implemented using different tools and technologies, such as ETL software, data integration platforms, and cloud-based solutions.
The architecture should be designed to ensure data acc...read more
Q14. What is data repository?
A data repository is a centralized location where data is stored, managed, and maintained.
It is used to store and manage data in a structured manner
It can be a database, data warehouse, or data lake
It allows for easy access and retrieval of data
Examples include Hadoop Distributed File System (HDFS), Amazon S3, and Oracle Database
Q15. What is data mart?
A data mart is a subset of a larger data warehouse that is designed to serve a specific business unit or department.
Contains a subset of data from a larger data warehouse
Designed to serve a specific business unit or department
Provides a more focused view of data for analysis and reporting
Can be created using a top-down or bottom-up approach
Examples include sales data mart, marketing data mart, finance data mart
Q16. Explain schema you used in your project
The schema used in my project was a star schema.
Star schema is a type of data warehouse schema where a central fact table is connected to multiple dimension tables.
The fact table contains the measurements or metrics of the business process, while the dimension tables provide context and descriptive attributes.
This schema is commonly used in data warehousing and business intelligence applications.
Example: In a sales analysis project, the fact table could contain sales transact...read more
Q17. What is fact?
A fact is a piece of information that is known to be true or proven through evidence.
Facts are objective and can be verified through research or observation.
Facts are not opinions or beliefs.
Examples of facts include the boiling point of water, the population of a city, and historical events that have been documented.
Facts can be used to support arguments or conclusions.
Facts can change over time as new information is discovered or theories are revised.
Q18. Types of test data creation
Test data creation types include manual, automated, random, boundary, and negative testing.
Manual testing involves creating data by hand
Automated testing uses tools to generate data
Random testing involves creating data randomly
Boundary testing involves testing data at the limits of its range
Negative testing involves testing invalid or unexpected data
Q19. How would you test a full load vs incremental load
To test a full load vs incremental load, compare the results of loading all data at once vs loading only new or updated data.
Create test cases to verify the accuracy of data loaded during a full load.
Create test cases to verify that only new or updated data is loaded during an incremental load.
Compare the results of the full load and incremental load to ensure consistency and accuracy.
Verify that the data integrity is maintained during both types of loads.
Use tools like SQL q...read more
Q20. What is fact table?
Fact table is a table in a data warehouse that stores quantitative data about a business process.
Contains foreign keys to dimension tables
Stores numerical data such as sales, revenue, etc.
Used for analysis and reporting
Can have multiple fact tables in a data warehouse
Q21. what are the types of scd
Types of SCD include Type 1, Type 2, Type 3, and Type 4.
Type 1 - Overwrite: Old record is replaced with new data.
Type 2 - Add new row: New record is added with a new surrogate key.
Type 3 - Update: New column is added to track changes.
Type 4 - Hybrid: Combination of Type 1 and Type 2.
Q22. Difference between snowflake schema and star schema
Snowflake schema is a normalized form of star schema with additional dimension tables.
Snowflake schema is a data modeling technique used in data warehousing.
In snowflake schema, dimensions are normalized into multiple related tables.
Snowflake schema reduces redundancy and improves data integrity.
Star schema is a denormalized form of snowflake schema with a single dimension table.
In star schema, dimensions are not normalized and are directly linked to the fact table.
Star schem...read more
Q23. How to read a parquet file
To read a parquet file, use a library like Apache Parquet or PyArrow to load the file and access the data.
Use a library like Apache Parquet or PyArrow to read the parquet file
Load the parquet file using the library's functions
Access the data within the parquet file for analysis or processing
Q24. Type of scd.
SCD stands for Slowly Changing Dimensions. There are three types of SCD: Type 1, Type 2, and Type 3.
Type 1: Overwrites old data with new data.
Type 2: Creates a new record for new data and keeps the old record for historical data.
Type 3: Creates a new column for new data and keeps the old column for historical data.
Q25. Explain SCD types
SCD types are used to track changes in data over time in a data warehouse.
SCD stands for Slowly Changing Dimensions.
There are three types of SCD: Type 1, Type 2, and Type 3.
Type 1 overwrites old data with new data.
Type 2 creates a new record for each change and keeps a history of changes.
Type 3 keeps both old and new data in the same record.
SCD types are important for maintaining data integrity and accuracy in a data warehouse.
More about working at Cognizant
Interview Process at Knorr-Bremse
Top ETL Tester Interview Questions from Similar Companies
Reviews
Interviews
Salaries
Users/Month