Senior Data Engineer

300+ Senior Data Engineer Interview Questions and Answers

Updated 5 Jul 2025

Asked in 7 Eleven

4d ago

Q. Write a query to get the customer with the highest total order value for each year and month. Order and Customer tables are separate, with Order_ID and Customer_ID as primary keys. The Customer table's Oid is a...

Ans.

Query to get the customer with the highest total order value for each year, month.

Join the Order and Customer tables on the foreign key
Group the results by year, month, and customer
Calculate the total order value for each group
Find the maximum total order value for each year, month
If there are multiple customers with the same highest total order value, select the one with the lower Customer_ID

Asked in 7 Eleven

3d ago

Q. There are 10 million records in the table and the schema does not contain the ModifiedDate column. One cell was modified the next day in the table. How will you fetch that particular information that needs to b...

Ans.

To fetch the modified information without ModifiedDate column from a table with 10 million records.

Create a trigger to capture the modified information and insert it into a separate table with ModifiedDate column.
Use a tool like Change Data Capture (CDC) to track changes in the table and extract the modified information.
Use a query to compare the current table with a backup copy taken the previous day to identify the modified information.

Senior Data Engineer Interview Questions and Answers for Freshers

View all interview questions

Asked in KFintech

6d ago

Q. Given infinite coins of some currency of denominations : 1,2,5,10, so In how many unique distinct ways can we obtain a total amount of say: 25 ? Same thing how to do it for non-unique repeated combinations or p...

Ans.

There are multiple ways to obtain a total amount of 25 using coins of denominations 1, 2, 5, and 10. The question asks for unique distinct ways and non-unique repeated combinations or permutations.

For unique distinct ways, you can use dynamic programming to calculate the number of ways to reach the target amount.
For non-unique repeated combinations or permutations, you can use a recursive function to generate all possible combinations.
Example for unique distinct ways: [1, 2, ...read more

Asked in 7 Eleven

3d ago

Q. How do you handle data pipelines when the schema information keeps changing at the source?

Ans.

Handle changing schema by using schema evolution techniques and version control.

Use schema evolution techniques like adding new fields, renaming fields, and changing data types.
Implement version control to track changes and ensure backward compatibility.
Use tools like Apache Avro or Apache Parquet to store data in a self-describing format.
Implement automated testing to ensure data quality and consistency.
Collaborate with data producers to establish clear communication and doc...read more

Are these interview questions helpful?

Asked in 7 Eleven

4d ago

Q. Difference between Parquet and ORC file. Why industry uses parquet over ORC? Can schema evolution happen in ORC?

Ans.

Parquet and ORC are columnar storage formats. Parquet is preferred due to its cross-platform support and better compression. ORC supports schema evolution.

Parquet is a columnar storage format that is widely used in the industry due to its cross-platform support and better compression.
ORC is another columnar storage format that supports schema evolution.
Parquet is preferred over ORC due to its better compression and support for a wider range of programming languages.
ORC is pre...read more

Asked in Sigmoid

6d ago

Q. Given a non-decreasing array, how can I determine the indices of an element X within it? If the element is not present, the output should be [-1, -1]. For example, for the array [1,2,3,3,5,5,7,8] and X=5, the e...

Ans.

Find indices of an element in a non-decreasing array

Iterate through the array and keep track of the indices where the element X is found
Return the list of indices or [-1, -1] if element X is not found
Handle edge cases like empty array or X not present in the array

Senior Data Engineer Jobs

Senior Data Engineer • 10-15 years

Jio

•

4.1

₹ 35 L/yr - ₹ 50 L/yr

Mumbai

Senior Data Engineer - UCS • 3-5 years

Robert Bosch Engineering and Business Solutions Private Limited

•

4.1

Bangalore / Bengaluru

Senior Data Engineer • 6-11 years

Jones Lang LaSalle Property Consultants (India) Pv t. Ltd.

•

4.1

₹ 12 L/yr - ₹ 37 L/yr

(AmbitionBox estimate)

Bangalore / Bengaluru

View all Senior Data Engineer jobs

Asked in 7 Eleven

5d ago

Q. What is Normalisation and Denormalisation? When do we use them? Give a real-time example that is implemented in your project.

Ans.

Normalisation is the process of organizing data in a database to reduce redundancy and improve data integrity. Denormalisation is the opposite process.

Normalisation is used to eliminate data redundancy and improve data integrity.
Denormalisation is used to improve query performance by reducing the number of joins required.
A real-time example of normalisation is breaking down a customer's information into separate tables such as customer details, order details, and payment deta...read more

Asked in 7 Eleven

3d ago

Q. What are the different types of schema you know in Data Warehousing?

Ans.

There are three types of schema in Data Warehousing: Star Schema, Snowflake Schema, and Fact Constellation Schema.

Star Schema: central fact table connected to dimension tables in a star shape
Snowflake Schema: extension of star schema with normalized dimension tables
Fact Constellation Schema: multiple fact tables connected to dimension tables in a complex structure

Share interview questions and help millions of jobseekers 🌟

Asked in Virtusa Consulting Services

2d ago

Q. How many stages will be created if a Spark job has 3 wide transformations and 2 narrow transformations?

Ans.

There will be 4 stages created in total for the spark job.

Wide transformations trigger a shuffle and create a new stage.
Narrow transformations do not trigger a shuffle and do not create a new stage.
In this case, 3 wide transformations will create 3 new stages and 2 narrow transformations will not create new stages.
Therefore, a total of 4 stages will be created.

Asked in EPAM Systems

4d ago

Q. How would you migrate thousands of tables using Spark (Databricks) notebooks?

Ans.

Use Spark (Databricks) notebooks to migrate 1000s of tables efficiently.

Utilize Spark's parallel processing capabilities to handle large volumes of data
Leverage Databricks notebooks for interactive data exploration and transformation
Automate the migration process using scripts or workflows
Optimize performance by tuning Spark configurations and cluster settings

Asked in Persistent Systems

3d ago

Q. What is the best approach to determine if a data frame is empty?

Ans.

Use the len() function to check the length of the data frame.

Use len() function to get the number of rows in the data frame.
If the length is 0, then the data frame is empty.
Example: if len(df) == 0: print('Data frame is empty')

Asked in Infosys

1d ago

Q. How do you calculate resource allocation based on the number of cores and memory, for example, 16 cores with 64 GB? What about overhead and driver memory?

Ans.

Calculating resources based on cores and memory given with overhead and driver memory

Calculate the total memory available by multiplying the number of cores with memory per core
Deduct the overhead memory required for the operating system and other processes
Allocate driver memory for each executor based on the workload
Consider the memory requirements for other services like Hadoop, Spark, etc.
Example: For 16 cores with 64 GB memory, assuming 1 GB overhead and 2 GB driver memor...read more

Asked in IBM

5d ago

Q. Big data Hadoop architecture and HDFS commands to copy and list files in hdfs spark architecture and Transformation and Action question what happen when we submit spark program spark dataframe coding question s...

Ans.

Questions on big data, Hadoop, Spark, Scala, Git, project and Agile.

Hadoop architecture and HDFS commands for copying and listing files in HDFS
Spark architecture and Transformation and Action question
What happens when we submit a Spark program
Spark DataFrame coding question
Scala basic program on List
Git and Github
Project-related question
Agile-related

Asked in KFintech

3d ago

Q. Explain Transaction Isolation, and what are the various types of Transaction isolation in RDBMS?

Ans.

Transaction isolation is a concept in databases that ensures transactions are executed independently of each other.

Transaction isolation levels determine the degree to which one transaction must be isolated from other transactions.
Types of transaction isolation levels include Read Uncommitted, Read Committed, Repeatable Read, and Serializable.
Each isolation level offers a different level of consistency and concurrency control.
For example, in Read Uncommitted isolation level, ...read more

Asked in EPAM Systems

2d ago

Q. Write PySpark pseudo code to join two DataFrames and replace null values with corresponding values from another DataFrame.

Ans.

Join two DataFrames in PySpark and replace null values with corresponding values from another DataFrame.

Use the 'join' method to combine two DataFrames based on a common key.
Utilize 'fillna' or 'when' functions to replace null values after the join.
Example: df1.join(df2, 'key').fillna(df2['value']) to replace nulls in df1 with values from df2.
Ensure the DataFrames are properly aligned on the join key to avoid data loss.

Asked in MOURI Tech

1d ago

Q. How would you pass connection strings if a Lambda function is connecting to a database?

Ans.

Pass connection string as environment variable or use AWS Secrets Manager

Store connection string as environment variable in Lambda function configuration
Retrieve connection string from AWS Secrets Manager and use it in Lambda function
Use IAM role to grant Lambda function access to database
Encrypt connection string using AWS KMS for added security

Asked in TCS

1d ago

Q. What is the difference between Tasks, and stages? About Spark UI?

Ans.

Tasks and stages are components of the execution plan in Spark UI.

Tasks are the smallest unit of work in Spark, representing a single operation on a partition of data.
Stages are groups of tasks that are executed together as part of a larger computation.
Tasks within a stage can be executed in parallel, while stages are executed sequentially.
Tasks are created based on the transformations and actions in the Spark application.
Stages are created based on the dependencies between R...read more

Asked in LTIMindtree

5d ago

Q. How does query acceleration speed up query processing?

Ans.

Query acceleration speeds up query processing by optimizing query execution and reducing the time taken to retrieve data.

Query acceleration uses techniques like indexing, partitioning, and caching to optimize query execution.
It reduces the time taken to retrieve data by minimizing disk I/O and utilizing in-memory processing.
Examples include using columnar storage formats like Parquet or optimizing join operations.

Asked in Impetus Technologies

2d ago

Q. Python Coding question : without python methods 1. to check if a list is sorted 2. sort the list , optimize the solution

Ans.

Check if a list is sorted and sort the list without using Python methods.

To check if a list is sorted, iterate through the list and compare each element with the next one. If any element is greater than the next one, the list is not sorted.
To sort the list without using Python methods, implement a sorting algorithm like bubble sort, selection sort, or insertion sort.
Example for checking if a list is sorted: ['a', 'b', 'c'] is sorted, ['c', 'b', 'a'] is not sorted.
Example for ...read more

Asked in InfoObjects

4d ago

Q. What is the difference between repartition and coalesce?

Ans.

Repartition increases or decreases the number of partitions in a DataFrame, while Coalesce only decreases the number of partitions.

Repartition can increase or decrease the number of partitions in a DataFrame, leading to a shuffle of data across the cluster.
Coalesce only decreases the number of partitions in a DataFrame without performing a full shuffle, making it more efficient than repartition.
Repartition is typically used when there is a need to increase the number of parti...read more

Asked in KFintech

4d ago

Q. Given a sorted array of integers, return the sorted array of squared elements with the least possible time complexity.

Ans.

Given a sorted array of integers, return the sorted array of squared elements in least time complexity.

Create a new array to store squared elements
Use two pointers to iterate through the original array from both ends
Compare the squared values and add them to the new array in descending order

Asked in IBM

3d ago

Q. What optimization techniques did you use in your project?

Ans.

Optimization techniques used in project

Caching
Parallel processing
Compression
Indexing
Query optimization

Asked in TCS

3d ago

Q. What optimization techniques were used in the project?

Ans.

Optimisation techniques used in the project include indexing, query optimization, caching, and parallel processing.

Indexing: Creating indexes on frequently queried columns to improve search performance.
Query optimization: Rewriting queries to make them more efficient and reduce execution time.
Caching: Storing frequently accessed data in memory to reduce the need for repeated database queries.
Parallel processing: Distributing tasks across multiple processors to speed up data p...read more

Asked in Persistent Systems

6d ago

Q. Two SQL Codes and Two Python codes like reverse a string ?

Ans.

Reverse a string using SQL and Python codes.

In SQL, use the REVERSE function to reverse a string.
In Python, use slicing with a step of -1 to reverse a string.

Asked in GoDaddy

4d ago

Q. Which AWS services have you used, and what AWS architecture did you implement for those services?

Ans.

AWS services used include S3, Redshift, Glue, EMR, and Lambda in a scalable and cost-effective architecture.

AWS S3 for storing large amounts of data
AWS Redshift for data warehousing and analytics
AWS Glue for ETL processes
AWS EMR for big data processing
AWS Lambda for serverless computing

Asked in EPAM Systems

5d ago

Q. Explain the different types of SQL joins and the number of records returned by each.

Ans.

Understanding SQL joins is crucial for data retrieval and analysis in relational databases.

INNER JOIN: Returns records with matching values in both tables. Example: SELECT * FROM A INNER JOIN B ON A.id = B.id.
LEFT JOIN: Returns all records from the left table and matched records from the right table. Example: SELECT * FROM A LEFT JOIN B ON A.id = B.id.
RIGHT JOIN: Returns all records from the right table and matched records from the left table. Example: SELECT * FROM A RIGHT J...read more

Asked in 7 Eleven

2d ago

Q. What is the difference between a Broadcast variable and an accumulator variable?

Ans.

Broadcast variables are read-only variables that are cached on each worker node while accumulator variables are write-only variables that are used to accumulate values across multiple tasks.

Broadcast variables are used to give every node a copy of a large input dataset or a small lookup table.
Accumulator variables are used to keep a running total of values across multiple tasks.
Broadcast variables are used for read-only operations while accumulator variables are used for writ...read more

Asked in Tiger Analytics

5d ago

Q. 1. Different Types of integration runtime in adf 2. How to copy 100 files from one adls path to another in adf 3. Diff between DAG and Lineage , narrow and wide transformation in Spark 4. DBUtils questions. 5....

Ans.

The interview questions cover topics related to Azure Data Factory, Spark, and Python programming.

Integration runtimes in ADF include Azure, Self-hosted, and SSIS IRs.
To copy 100 files in ADF, use a Copy Data activity with a wildcard path in source and sink datasets.
DAG in Spark represents a directed acyclic graph of computation, while lineage tracks the data flow.
Narrow transformations in Spark operate on a single partition, wide transformations shuffle data across partition...read more

Asked in TCS

3d ago

Q. What is the SQL query to group by employee ID in order to combine the first name and last name with a space?

Ans.

SQL query to group by employee ID and combine first name and last name with a space

Use the GROUP BY clause to group by employee ID
Use the CONCAT function to combine first name and last name with a space
Select employee ID, CONCAT(first_name, ' ', last_name) AS full_name

Asked in Aays

4d ago

Q. What has been your past experience with various Data Engineering tools and Business Intelligence (BI) tools?

Ans.

I have experience with various Data Engineering and BI tools such as SQL, Python, Tableau, and Apache Spark.

Proficient in SQL for data querying and manipulation
Experience with Python for data processing and analysis
Familiarity with Tableau for data visualization
Worked with Apache Spark for big data processing

Interview Questions of Similar Designations

Software Engineer Interview Questions and Answers

8.1k Questions

Senior Software Engineer Interview Questions and Answers

4.7k Questions

Senior Engineer Interview Questions and Answers

2.3k Questions

System Engineer Interview Questions and Answers

1.9k Questions

Senior Associate Interview Questions and Answers

1.6k Questions

Interview Experiences of Popular Companies

TCS Interview Questions

3.6

• 11.1k Interviews

Accenture Interview Questions

3.7

• 8.7k Interviews

HCLTech Interview Questions

3.5

• 4.1k Interviews

LTIMindtree Interview Questions

3.7

• 3k Interviews

IBM Interview Questions

4.0

• 2.5k Interviews

View all

Calculate your in-hand salary

Confused about how your in-hand salary is calculated? Enter your annual salary (CTC) and get your in-hand salary

Home

Interviews

Senior Data Engineer Interview Questions

Share an Interview

Stay ahead in your career. Get AmbitionBox app

Trusted by over 1.5 Crore job seekers to find their right fit company

80 L+

Reviews

10L+

Interviews

4 Cr+

Salaries

1.5 Cr+

Users

80 Lakh+

Reviews

10 Lakh+

Interviews

4 Crore+

Salaries

1.5 Crore+

Users

Contribute

Contribute to help millions

Company

Reviews

Users/Jobseekers

Employers

AmbitionBox Awards

AmbitionBox

Terms & Policies

Get AmbitionBox app

300+ Senior Data Engineer Interview Questions and Answers

Asked in 7 Eleven

Q. Write a query to get the customer with the highest total order value for each year and month. Order and Customer tables are separate, with Order_ID and Customer_ID as primary keys. The Customer table's Oid is a...

Asked in 7 Eleven

Q. There are 10 million records in the table and the schema does not contain the ModifiedDate column. One cell was modified the next day in the table. How will you fetch that particular information that needs to b...

Asked in KFintech

Q. Given infinite coins of some currency of denominations : 1,2,5,10, so In how many unique distinct ways can we obtain a total amount of say: 25 ? Same thing how to do it for non-unique repeated combinations or p...

Asked in 7 Eleven

Q. How do you handle data pipelines when the schema information keeps changing at the source?

Asked in 7 Eleven

Q. Difference between Parquet and ORC file. Why industry uses parquet over ORC? Can schema evolution happen in ORC?

Asked in Sigmoid

Q. Given a non-decreasing array, how can I determine the indices of an element X within it? If the element is not present, the output should be [-1, -1]. For example, for the array [1,2,3,3,5,5,7,8] and X=5, the e...

Senior Data Engineer Jobs

Asked in 7 Eleven

Q. What is Normalisation and Denormalisation? When do we use them? Give a real-time example that is implemented in your project.

Asked in 7 Eleven

Q. What are the different types of schema you know in Data Warehousing?

Asked in Virtusa Consulting Services

Q. How many stages will be created if a Spark job has 3 wide transformations and 2 narrow transformations?

Asked in EPAM Systems

Q. How would you migrate thousands of tables using Spark (Databricks) notebooks?

Asked in Persistent Systems

Q. What is the best approach to determine if a data frame is empty?

Asked in Infosys

Q. How do you calculate resource allocation based on the number of cores and memory, for example, 16 cores with 64 GB? What about overhead and driver memory?

Asked in IBM

Q. Big data Hadoop architecture and HDFS commands to copy and list files in hdfs spark architecture and Transformation and Action question what happen when we submit spark program spark dataframe coding question s...

Asked in KFintech

Q. Explain Transaction Isolation, and what are the various types of Transaction isolation in RDBMS?

Asked in EPAM Systems

Q. Write PySpark pseudo code to join two DataFrames and replace null values with corresponding values from another DataFrame.

Asked in MOURI Tech

Q. How would you pass connection strings if a Lambda function is connecting to a database?

Asked in TCS

Q. What is the difference between Tasks, and stages? About Spark UI?

Asked in LTIMindtree

Q. How does query acceleration speed up query processing?

Asked in Impetus Technologies

Q. Python Coding question : without python methods 1. to check if a list is sorted 2. sort the list , optimize the solution

Asked in InfoObjects

Q. What is the difference between repartition and coalesce?

Asked in KFintech

Q. Given a sorted array of integers, return the sorted array of squared elements with the least possible time complexity.

Asked in IBM

Q. What optimization techniques did you use in your project?

Asked in TCS

Q. What optimization techniques were used in the project?

Asked in Persistent Systems

Q. Two SQL Codes and Two Python codes like reverse a string ?

Asked in GoDaddy

Q. Which AWS services have you used, and what AWS architecture did you implement for those services?

Asked in EPAM Systems

Q. Explain the different types of SQL joins and the number of records returned by each.

Asked in 7 Eleven

Q. What is the difference between a Broadcast variable and an accumulator variable?

Asked in Tiger Analytics

Q. 1. Different Types of integration runtime in adf 2. How to copy 100 files from one adls path to another in adf 3. Diff between DAG and Lineage , narrow and wide transformation in Spark 4. DBUtils questions. 5....

Asked in TCS

Q. What is the SQL query to group by employee ID in order to combine the first name and last name with a space?

Asked in Aays

Q. What has been your past experience with various Data Engineering tools and Business Intelligence (BI) tools?

Interview Questions of Similar Designations

Interview Experiences of Popular Companies

Top Interview Questions for Senior Data Engineer Related Skills

Calculate your in-hand salary