Senior Data Engineer

300+ Senior Data Engineer Interview Questions and Answers

Updated 5 Jul 2025
search-icon

Asked in 7 Eleven

4d ago

Q. Write a query to get the customer with the highest total order value for each year and month. Order and Customer tables are separate, with Order_ID and Customer_ID as primary keys. The Customer table's Oid is a...

read more
Ans.

Query to get the customer with the highest total order value for each year, month.

  • Join the Order and Customer tables on the foreign key

  • Group the results by year, month, and customer

  • Calculate the total order value for each group

  • Find the maximum total order value for each year, month

  • If there are multiple customers with the same highest total order value, select the one with the lower Customer_ID

Asked in 7 Eleven

3d ago

Q. There are 10 million records in the table and the schema does not contain the ModifiedDate column. One cell was modified the next day in the table. How will you fetch that particular information that needs to b...

read more
Ans.

To fetch the modified information without ModifiedDate column from a table with 10 million records.

  • Create a trigger to capture the modified information and insert it into a separate table with ModifiedDate column.

  • Use a tool like Change Data Capture (CDC) to track changes in the table and extract the modified information.

  • Use a query to compare the current table with a backup copy taken the previous day to identify the modified information.

Senior Data Engineer Interview Questions and Answers for Freshers

illustration image

Asked in KFintech

6d ago

Q. Given infinite coins of some currency of denominations : 1,2,5,10, so In how many unique distinct ways can we obtain a total amount of say: 25 ? Same thing how to do it for non-unique repeated combinations or p...

read more
Ans.

There are multiple ways to obtain a total amount of 25 using coins of denominations 1, 2, 5, and 10. The question asks for unique distinct ways and non-unique repeated combinations or permutations.

  • For unique distinct ways, you can use dynamic programming to calculate the number of ways to reach the target amount.

  • For non-unique repeated combinations or permutations, you can use a recursive function to generate all possible combinations.

  • Example for unique distinct ways: [1, 2, ...read more

Asked in 7 Eleven

3d ago

Q. How do you handle data pipelines when the schema information keeps changing at the source?

Ans.

Handle changing schema by using schema evolution techniques and version control.

  • Use schema evolution techniques like adding new fields, renaming fields, and changing data types.

  • Implement version control to track changes and ensure backward compatibility.

  • Use tools like Apache Avro or Apache Parquet to store data in a self-describing format.

  • Implement automated testing to ensure data quality and consistency.

  • Collaborate with data producers to establish clear communication and doc...read more

Are these interview questions helpful?

Asked in 7 Eleven

4d ago

Q. Difference between Parquet and ORC file. Why industry uses parquet over ORC? Can schema evolution happen in ORC?

Ans.

Parquet and ORC are columnar storage formats. Parquet is preferred due to its cross-platform support and better compression. ORC supports schema evolution.

  • Parquet is a columnar storage format that is widely used in the industry due to its cross-platform support and better compression.

  • ORC is another columnar storage format that supports schema evolution.

  • Parquet is preferred over ORC due to its better compression and support for a wider range of programming languages.

  • ORC is pre...read more

Asked in Sigmoid

6d ago

Q. Given a non-decreasing array, how can I determine the indices of an element X within it? If the element is not present, the output should be [-1, -1]. For example, for the array [1,2,3,3,5,5,7,8] and X=5, the e...

read more
Ans.

Find indices of an element in a non-decreasing array

  • Iterate through the array and keep track of the indices where the element X is found

  • Return the list of indices or [-1, -1] if element X is not found

  • Handle edge cases like empty array or X not present in the array

Senior Data Engineer Jobs

Jio logo
Senior Data Engineer 10-15 years
Jio
4.1
₹ 35 L/yr - ₹ 50 L/yr
Mumbai
Robert Bosch Engineering and Business Solutions Private Limited logo
Senior Data Engineer - UCS 3-5 years
Robert Bosch Engineering and Business Solutions Private Limited
4.1
Bangalore / Bengaluru
Jones Lang LaSalle Property Consultants (India) Pv t. Ltd. logo
Senior Data Engineer 6-11 years
Jones Lang LaSalle Property Consultants (India) Pv t. Ltd.
4.1
₹ 12 L/yr - ₹ 37 L/yr
(AmbitionBox estimate)
Bangalore / Bengaluru

Asked in 7 Eleven

5d ago

Q. What is Normalisation and Denormalisation? When do we use them? Give a real-time example that is implemented in your project.

Ans.

Normalisation is the process of organizing data in a database to reduce redundancy and improve data integrity. Denormalisation is the opposite process.

  • Normalisation is used to eliminate data redundancy and improve data integrity.

  • Denormalisation is used to improve query performance by reducing the number of joins required.

  • A real-time example of normalisation is breaking down a customer's information into separate tables such as customer details, order details, and payment deta...read more

Asked in 7 Eleven

3d ago

Q. What are the different types of schema you know in Data Warehousing?

Ans.

There are three types of schema in Data Warehousing: Star Schema, Snowflake Schema, and Fact Constellation Schema.

  • Star Schema: central fact table connected to dimension tables in a star shape

  • Snowflake Schema: extension of star schema with normalized dimension tables

  • Fact Constellation Schema: multiple fact tables connected to dimension tables in a complex structure

Share interview questions and help millions of jobseekers 🌟

man-with-laptop

Q. How many stages will be created if a Spark job has 3 wide transformations and 2 narrow transformations?

Ans.

There will be 4 stages created in total for the spark job.

  • Wide transformations trigger a shuffle and create a new stage.

  • Narrow transformations do not trigger a shuffle and do not create a new stage.

  • In this case, 3 wide transformations will create 3 new stages and 2 narrow transformations will not create new stages.

  • Therefore, a total of 4 stages will be created.

Asked in EPAM Systems

4d ago

Q. How would you migrate thousands of tables using Spark (Databricks) notebooks?

Ans.

Use Spark (Databricks) notebooks to migrate 1000s of tables efficiently.

  • Utilize Spark's parallel processing capabilities to handle large volumes of data

  • Leverage Databricks notebooks for interactive data exploration and transformation

  • Automate the migration process using scripts or workflows

  • Optimize performance by tuning Spark configurations and cluster settings

3d ago

Q. What is the best approach to determine if a data frame is empty?

Ans.

Use the len() function to check the length of the data frame.

  • Use len() function to get the number of rows in the data frame.

  • If the length is 0, then the data frame is empty.

  • Example: if len(df) == 0: print('Data frame is empty')

Asked in Infosys

1d ago

Q. How do you calculate resource allocation based on the number of cores and memory, for example, 16 cores with 64 GB? What about overhead and driver memory?

Ans.

Calculating resources based on cores and memory given with overhead and driver memory

  • Calculate the total memory available by multiplying the number of cores with memory per core

  • Deduct the overhead memory required for the operating system and other processes

  • Allocate driver memory for each executor based on the workload

  • Consider the memory requirements for other services like Hadoop, Spark, etc.

  • Example: For 16 cores with 64 GB memory, assuming 1 GB overhead and 2 GB driver memor...read more

Asked in IBM

5d ago

Q. Big data Hadoop architecture and HDFS commands to copy and list files in hdfs spark architecture and Transformation and Action question what happen when we submit spark program spark dataframe coding question s...

read more
Ans.

Questions on big data, Hadoop, Spark, Scala, Git, project and Agile.

  • Hadoop architecture and HDFS commands for copying and listing files in HDFS

  • Spark architecture and Transformation and Action question

  • What happens when we submit a Spark program

  • Spark DataFrame coding question

  • Scala basic program on List

  • Git and Github

  • Project-related question

  • Agile-related

Asked in KFintech

3d ago

Q. Explain Transaction Isolation, and what are the various types of Transaction isolation in RDBMS?

Ans.

Transaction isolation is a concept in databases that ensures transactions are executed independently of each other.

  • Transaction isolation levels determine the degree to which one transaction must be isolated from other transactions.

  • Types of transaction isolation levels include Read Uncommitted, Read Committed, Repeatable Read, and Serializable.

  • Each isolation level offers a different level of consistency and concurrency control.

  • For example, in Read Uncommitted isolation level, ...read more

Asked in EPAM Systems

2d ago

Q. Write PySpark pseudo code to join two DataFrames and replace null values with corresponding values from another DataFrame.

Ans.

Join two DataFrames in PySpark and replace null values with corresponding values from another DataFrame.

  • Use the 'join' method to combine two DataFrames based on a common key.

  • Utilize 'fillna' or 'when' functions to replace null values after the join.

  • Example: df1.join(df2, 'key').fillna(df2['value']) to replace nulls in df1 with values from df2.

  • Ensure the DataFrames are properly aligned on the join key to avoid data loss.

Asked in MOURI Tech

1d ago

Q. How would you pass connection strings if a Lambda function is connecting to a database?

Ans.

Pass connection string as environment variable or use AWS Secrets Manager

  • Store connection string as environment variable in Lambda function configuration

  • Retrieve connection string from AWS Secrets Manager and use it in Lambda function

  • Use IAM role to grant Lambda function access to database

  • Encrypt connection string using AWS KMS for added security

Asked in TCS

1d ago

Q. What is the difference between Tasks, and stages? About Spark UI?

Ans.

Tasks and stages are components of the execution plan in Spark UI.

  • Tasks are the smallest unit of work in Spark, representing a single operation on a partition of data.

  • Stages are groups of tasks that are executed together as part of a larger computation.

  • Tasks within a stage can be executed in parallel, while stages are executed sequentially.

  • Tasks are created based on the transformations and actions in the Spark application.

  • Stages are created based on the dependencies between R...read more

Asked in LTIMindtree

5d ago

Q. How does query acceleration speed up query processing?

Ans.

Query acceleration speeds up query processing by optimizing query execution and reducing the time taken to retrieve data.

  • Query acceleration uses techniques like indexing, partitioning, and caching to optimize query execution.

  • It reduces the time taken to retrieve data by minimizing disk I/O and utilizing in-memory processing.

  • Examples include using columnar storage formats like Parquet or optimizing join operations.

2d ago

Q. Python Coding question : without python methods 1. to check if a list is sorted 2. sort the list , optimize the solution

Ans.

Check if a list is sorted and sort the list without using Python methods.

  • To check if a list is sorted, iterate through the list and compare each element with the next one. If any element is greater than the next one, the list is not sorted.

  • To sort the list without using Python methods, implement a sorting algorithm like bubble sort, selection sort, or insertion sort.

  • Example for checking if a list is sorted: ['a', 'b', 'c'] is sorted, ['c', 'b', 'a'] is not sorted.

  • Example for ...read more

Asked in InfoObjects

4d ago

Q. What is the difference between repartition and coalesce?

Ans.

Repartition increases or decreases the number of partitions in a DataFrame, while Coalesce only decreases the number of partitions.

  • Repartition can increase or decrease the number of partitions in a DataFrame, leading to a shuffle of data across the cluster.

  • Coalesce only decreases the number of partitions in a DataFrame without performing a full shuffle, making it more efficient than repartition.

  • Repartition is typically used when there is a need to increase the number of parti...read more

Asked in KFintech

4d ago

Q. Given a sorted array of integers, return the sorted array of squared elements with the least possible time complexity.

Ans.

Given a sorted array of integers, return the sorted array of squared elements in least time complexity.

  • Create a new array to store squared elements

  • Use two pointers to iterate through the original array from both ends

  • Compare the squared values and add them to the new array in descending order

Asked in IBM

3d ago

Q. What optimization techniques did you use in your project?

Ans.

Optimization techniques used in project

  • Caching

  • Parallel processing

  • Compression

  • Indexing

  • Query optimization

Asked in TCS

3d ago

Q. What optimization techniques were used in the project?

Ans.

Optimisation techniques used in the project include indexing, query optimization, caching, and parallel processing.

  • Indexing: Creating indexes on frequently queried columns to improve search performance.

  • Query optimization: Rewriting queries to make them more efficient and reduce execution time.

  • Caching: Storing frequently accessed data in memory to reduce the need for repeated database queries.

  • Parallel processing: Distributing tasks across multiple processors to speed up data p...read more

6d ago

Q. Two SQL Codes and Two Python codes like reverse a string ?

Ans.

Reverse a string using SQL and Python codes.

  • In SQL, use the REVERSE function to reverse a string.

  • In Python, use slicing with a step of -1 to reverse a string.

Asked in GoDaddy

4d ago

Q. Which AWS services have you used, and what AWS architecture did you implement for those services?

Ans.

AWS services used include S3, Redshift, Glue, EMR, and Lambda in a scalable and cost-effective architecture.

  • AWS S3 for storing large amounts of data

  • AWS Redshift for data warehousing and analytics

  • AWS Glue for ETL processes

  • AWS EMR for big data processing

  • AWS Lambda for serverless computing

Asked in EPAM Systems

5d ago

Q. Explain the different types of SQL joins and the number of records returned by each.

Ans.

Understanding SQL joins is crucial for data retrieval and analysis in relational databases.

  • INNER JOIN: Returns records with matching values in both tables. Example: SELECT * FROM A INNER JOIN B ON A.id = B.id.

  • LEFT JOIN: Returns all records from the left table and matched records from the right table. Example: SELECT * FROM A LEFT JOIN B ON A.id = B.id.

  • RIGHT JOIN: Returns all records from the right table and matched records from the left table. Example: SELECT * FROM A RIGHT J...read more

Asked in 7 Eleven

2d ago

Q. What is the difference between a Broadcast variable and an accumulator variable?

Ans.

Broadcast variables are read-only variables that are cached on each worker node while accumulator variables are write-only variables that are used to accumulate values across multiple tasks.

  • Broadcast variables are used to give every node a copy of a large input dataset or a small lookup table.

  • Accumulator variables are used to keep a running total of values across multiple tasks.

  • Broadcast variables are used for read-only operations while accumulator variables are used for writ...read more

5d ago

Q. 1. Different Types of integration runtime in adf 2. How to copy 100 files from one adls path to another in adf 3. Diff between DAG and Lineage , narrow and wide transformation in Spark 4. DBUtils questions. 5....

read more
Ans.

The interview questions cover topics related to Azure Data Factory, Spark, and Python programming.

  • Integration runtimes in ADF include Azure, Self-hosted, and SSIS IRs.

  • To copy 100 files in ADF, use a Copy Data activity with a wildcard path in source and sink datasets.

  • DAG in Spark represents a directed acyclic graph of computation, while lineage tracks the data flow.

  • Narrow transformations in Spark operate on a single partition, wide transformations shuffle data across partition...read more

Asked in TCS

3d ago

Q. What is the SQL query to group by employee ID in order to combine the first name and last name with a space?

Ans.

SQL query to group by employee ID and combine first name and last name with a space

  • Use the GROUP BY clause to group by employee ID

  • Use the CONCAT function to combine first name and last name with a space

  • Select employee ID, CONCAT(first_name, ' ', last_name) AS full_name

Asked in Aays

4d ago

Q. What has been your past experience with various Data Engineering tools and Business Intelligence (BI) tools?

Ans.

I have experience with various Data Engineering and BI tools such as SQL, Python, Tableau, and Apache Spark.

  • Proficient in SQL for data querying and manipulation

  • Experience with Python for data processing and analysis

  • Familiarity with Tableau for data visualization

  • Worked with Apache Spark for big data processing

1
2
3
4
5
6
7
Next

Interview Experiences of Popular Companies

TCS Logo
3.6
 • 11.1k Interviews
Accenture Logo
3.7
 • 8.7k Interviews
HCLTech Logo
3.5
 • 4.1k Interviews
LTIMindtree Logo
3.7
 • 3k Interviews
IBM Logo
4.0
 • 2.5k Interviews
View all
interview tips and stories logo
Interview Tips & Stories
Ace your next interview with expert advice and inspiring stories

Calculate your in-hand salary

Confused about how your in-hand salary is calculated? Enter your annual salary (CTC) and get your in-hand salary

Senior Data Engineer Interview Questions
Share an Interview
Stay ahead in your career. Get AmbitionBox app
play-icon
play-icon
qr-code
Trusted by over 1.5 Crore job seekers to find their right fit company
80 L+

Reviews

10L+

Interviews

4 Cr+

Salaries

1.5 Cr+

Users

Contribute to help millions

Made with ❤️ in India. Trademarks belong to their respective owners. All rights reserved © 2025 Info Edge (India) Ltd.

Follow Us
  • Youtube
  • Instagram
  • LinkedIn
  • Facebook
  • Twitter
Profile Image
Hello, Guest
AmbitionBox Employee Choice Awards 2025
Winners announced!
awards-icon
Contribute to help millions!
Write a review
Write a review
Share interview
Share interview
Contribute salary
Contribute salary
Add office photos
Add office photos
Add office benefits
Add office benefits