ETL Developer

80+ ETL Developer Interview Questions and Answers

Updated 15 Jul 2025

Q. What are the differences between truncate, delete, and drop commands?

Ans.

Difference between truncate, delete and drop

Truncate removes all data from a table but keeps the structure
Delete removes specific rows from a table
Drop removes the entire table and its structure
Truncate and drop cannot be rolled back, but delete can be
Truncate is faster than delete as it does not log individual row deletions

Asked in QS Quacquarelli Symonds

1w ago

Q. How would you design a data structure for a given scenario?

Ans.

Design a data structure for an ETL process to manage and transform data efficiently.

Use a staging area (e.g., a temporary database) to hold raw data before transformation.
Implement a schema for the target database that reflects the final data model.
Utilize ETL tools like Apache NiFi or Talend for data extraction and loading.
Incorporate logging and error handling mechanisms to track data quality.
Consider using a data warehouse for analytical queries post-transformation.

Asked in Automatic Data Processing (ADP)

1w ago

Q. How do you delete records in a table?

Ans.

Records in a table can be deleted using SQL DELETE statement.

Use the DELETE statement followed by the table name to delete records.
You can add a WHERE clause to specify which records to delete.
Make sure to commit the transaction after deleting records to make the changes permanent.

Q. What are the uses of the filter stage?

Ans.

Filter stage is used in ETL processes to selectively pass or reject data based on specified criteria.

Filter stage helps in removing unwanted data from the input dataset.
It can be used to apply conditions like filtering out duplicate records, selecting specific columns, or excluding certain values.
For example, a filter stage can be used to only pass records where the sales amount is greater than $1000.

Are these interview questions helpful?

Q. What are the uses of the transform stage?

Ans.

Transform stage is used in ETL process to apply business rules, clean and enrich data before loading into target database.

Transform stage is used to apply business rules to the data.
It is used to clean and standardize data before loading into the target database.
Transform stage can also be used to enrich data by combining multiple sources or adding calculated fields.
Examples include converting data types, removing duplicates, and aggregating data.

Asked in American Broadcasting Company

3d ago

Q. What is the difference between star schema and snowflake schema?

Ans.

Star schema has a centralized fact table with denormalized dimensions, while snowflake schema has normalized dimensions.

Star schema is simpler and easier to understand compared to snowflake schema.
In star schema, dimensions are denormalized and directly connected to the fact table.
Snowflake schema has normalized dimensions, meaning dimensions are further broken down into sub-dimensions.
Star schema is more suitable for data warehousing and reporting purposes.
Snowflake schema i...read more

ETL Developer Jobs

Semantic Web ETL Developer • 6-8 years

Siemens Limited

•

4.0

Bangalore / Bengaluru

Junior Semantic Web ETL Developer • 3-6 years

Siemens Limited

•

4.0

Bangalore / Bengaluru

ETL Developer - SSIS/Informatica (3-5 yrs) • 3-5 years

Almonds Ai

•

4.7

View all ETL Developer jobs

Asked in Infosys

1d ago

Q. How do you delete files older than 30 days?

Ans.

Use a script to find and delete files older than 30 days.

Use a script to identify files older than 30 days based on their creation or modification date
Utilize a command like find or PowerShell to search for files older than 30 days
Combine the script with a delete command to remove the identified files

Asked in Numenor

1w ago

Q. Why are you looking for a switch?

Ans.

Looking for new challenges and growth opportunities in a dynamic environment.

Seeking opportunities to work on more complex ETL projects
Interested in learning new technologies and tools in the field
Want to work in a collaborative team environment to enhance skills and knowledge

Share interview questions and help millions of jobseekers 🌟

Asked in TCS

1w ago

Q. What is normalization?

Ans.

Normalization is the process of organizing data in a database to reduce redundancy and improve data integrity.

Normalization involves breaking down a table into smaller tables to reduce redundancy.
It helps to avoid data inconsistencies and anomalies.
Normalization is achieved through a series of normal forms, such as first normal form (1NF), second normal form (2NF), and so on.
Each normal form has specific rules that must be followed to ensure data is properly organized.
Normali...read more

Q. Tell me about a complex SSIS package you have created.

Ans.

Yes, I created a complex SSIS package to automate data migration between multiple databases.

Used multiple data flow tasks to extract, transform, and load data
Implemented error handling and logging to track data flow
Utilized variables and expressions for dynamic behavior
Included conditional logic for different data processing scenarios

Asked in HCLTech

2d ago

Q. Which version of Talend are you using?

Ans.

I am using Talend version 7.3.1 for ETL development.

I am currently using Talend version 7.3.1 for ETL development.
The specific version of Talend being used is 7.3.1.
Talend 7.3.1 offers various features and improvements for ETL processes.

Q. What is the difference between connected and unconnected lookups?

Ans.

Connected lookup is used in mapping to return multiple columns, while unconnected lookup is used in expressions to return a single value.

Connected lookup is used in mapping to return multiple columns from a source, while unconnected lookup is used in expressions to return a single value.
Connected lookup is connected directly to the source in the mapping, while unconnected lookup is called from an expression transformation.
Connected lookup is faster as it caches the data, whil...read more

Asked in Mphasis

2d ago

Q. What was the greatest challenge you faced while migrating to the cloud from an on-premise database?

Ans.

The greatest challenge faced during migration to cloud from on-premise DB is ensuring data security and compliance.

Ensuring data security during transit and at rest
Maintaining compliance with regulations and industry standards
Minimizing downtime and ensuring data integrity
Optimizing performance and cost efficiency in the cloud environment

Q. How do you create a parallel job?

Ans.

To create a parallel job, use parallel processing techniques to divide tasks into smaller subtasks that can be executed simultaneously.

Identify tasks that can be executed independently and in parallel
Use parallel processing techniques such as multi-threading or distributed computing
Implement parallel job using ETL tools like Informatica or Talend
Monitor and optimize parallel job performance to ensure efficient execution

Q. What are the uses of the sort stage?

Ans.

Sort stage is used in ETL processes to sort data based on specified criteria before loading it into the target system.

Sort stage helps in arranging data in a specific order for better analysis and reporting
It can be used to remove duplicates from data before loading
Sorting can be done based on multiple columns or expressions
Example: Sorting customer data based on their purchase amount before loading into a data warehouse

Asked in PwC

1d ago

Q. How do you use error handling methods with Stored Procedures?

Ans.

Use TRY...CATCH block in SQL to handle errors in Stored Procedures

Enclose the code inside a TRY block
Use a CATCH block to handle any errors that occur
Use RAISEERROR or THROW statement to raise custom error messages
Use @@ERROR or @@ROWCOUNT to check for errors or affected rows

Asked in Anblicks

2w ago

Q. Explain the difference between ETL and ELT?

Ans.

ETL is Extract, Transform, Load where data is extracted, transformed, and loaded into a data warehouse. ELT is Extract, Load, Transform where data is extracted, loaded into a data warehouse, and then transformed.

ETL involves extracting data from source systems, transforming it according to business rules, and loading it into a data warehouse.
ELT involves extracting data from source systems, loading it into a data warehouse, and then transforming it as needed.
ETL is suitable f...read more

Asked in Decimal Point Analytics

4d ago

Q. What does the permission value '5' mean in Linux?

Ans.

Linux permissions include read, write, execute, setuid, and setgid.

Read permission allows a user to view the contents of a file or directory.
Write permission allows a user to modify the contents of a file or directory.
Execute permission allows a user to run a file or access the contents of a directory.
Setuid permission allows a user to execute a file with the permissions of the file's owner.
Setgid permission allows a user to execute a file with the permissions of the file's g...read more

Asked in Innotech Diamond

1d ago

Q. What is the difference between ETL and ELT?

Ans.

ETL stands for Extract, Transform, Load while ELT stands for Extract, Load, Transform.

ETL involves extracting data from source systems, transforming it, and then loading it into the target system.
ELT involves extracting data from source systems, loading it into the target system, and then transforming it as needed.
ETL is suitable for scenarios where data needs to be transformed before loading, while ELT is useful when raw data needs to be loaded first and then transformed.
ETL...read more

Asked in PwC

1w ago

Q. Write a stored procedure to get user input.

Ans.

Use a stored procedure to prompt user for input in ETL process.

Create a stored procedure with parameters to accept user input.
Use the parameters in the stored procedure to filter or manipulate data.
Prompt the user for input values when calling the stored procedure.

Asked in Citicorp

2d ago

Q. What is the difference between SCOPE_IDENTITY() and @@IDENTITY?

Ans.

Scope_identity() returns the last identity value in the current scope, while @@Identity returns the last identity value in the session.

Scope_identity() is limited to the current scope, ensuring it only retrieves the identity value from the last insert in the same context.
Example: If you insert a row in a stored procedure, Scope_identity() will return that row's identity value.
@@Identity returns the last identity value generated for any table in the current session, regardless...read more

Asked in UST

1w ago

Q. What is an initial load in ETL?

Ans.

Initial load in ETL refers to the process of loading data from source systems into the data warehouse for the first time.

Initial load is typically a one-time process to populate the data warehouse with historical data.
It involves extracting data from source systems, transforming it as needed, and loading it into the data warehouse.
Initial load is often done using bulk loading techniques to efficiently transfer large volumes of data.
It is important to carefully plan and execut...read more

Asked in EXL Service

1w ago

Q. 2) for each loop container

Ans.

For Each Loop Container is used to iterate through a set of objects or files in SSIS packages.

Used to perform repetitive tasks on a set of objects or files
Can be used to loop through files in a directory or rows in a table
Requires a data source and a variable to store the current object or file
Can be configured to run in parallel or sequentially

Asked in Cognizant

1w ago

Q. Explain the implementation of Slowly Changing Dimension (SCD) Type 1 in Informatica Intelligent Cloud Services (IICS).

Ans.

SCD Type 1 in IICS involves overwriting existing data with new data without maintaining historical changes.

In IICS, use the Mapping Designer to create a mapping that loads data from source to target.
Use a Lookup transformation to check if the record already exists in the target table.
If the record exists, update the existing record with new data using an Update Strategy transformation.
If the record does not exist, insert the new record into the target table.
Ensure that the ma...read more

Asked in Accenture

1w ago

Q. What is data mapping?

Ans.

Mapping is the process of defining the relationship between source and target data in ETL.

Mapping involves identifying the source and target data structures.
It also involves defining the transformation rules to be applied to the source data.
Mappings can be simple or complex depending on the complexity of the data.
Mapping is a critical step in ETL as it determines the accuracy and completeness of the data in the target system.

Asked in Accenture

2w ago

Q. Lookups and where to use them

Ans.

Lookups are used to retrieve related data from another table based on a common key.

Lookups are used in ETL processes to enrich data by adding additional information from a reference table.
They are commonly used to replace codes with corresponding descriptions, such as replacing product IDs with product names.
Lookups can be performed using SQL joins or lookup transformations in ETL tools like Informatica or Talend.

Asked in Citicorp

1w ago

Q. What are rollback and commit?

Ans.

Rollback and commit are database transactions used to manage changes made to the database.

Rollback is used to undo changes made in a transaction and restore the database to its previous state.
Commit is used to save the changes made in a transaction permanently to the database.
Rollback is typically used when an error occurs during a transaction, while commit is used when the transaction is successful.
Example: If a transaction to transfer money from one account to another fails...read more

Asked in Infosys

2w ago

Q. Write a query to find duplicate entries in a table.

Ans.

Use GROUP BY and HAVING clause to find duplicates in a table.

Use GROUP BY to group rows with same values together
Use HAVING COUNT(*) > 1 to filter out duplicates
Example: SELECT column_name, COUNT(*) FROM table_name GROUP BY column_name HAVING COUNT(*) > 1;

Asked in DataMetica

1w ago

Q. How can you optimize a session?

Ans.

Optimizing session involves tuning session settings, utilizing efficient data loading techniques, and minimizing resource usage.

Tune session settings such as buffer size, commit interval, and block size for optimal performance
Utilize efficient data loading techniques like bulk loading, incremental loading, and parallel processing
Minimize resource usage by optimizing SQL queries, reducing unnecessary transformations, and utilizing caching mechanisms

Asked in Infosys

1w ago

Q. What are the different types of loads in Informatica?

Ans.

Loads in Informatica refer to the process of moving data from source to target in a data warehouse.

Loads involve extracting data from source systems
Transforming the data as needed
Loading the data into the target data warehouse or database
Loads can be full, incremental, or delta depending on the requirements
Example: Loading customer data from a CRM system into a data warehouse for analysis