Home
Communities
Companies
- Companies
  
  Discover best places to work
- Compare Companies
  
  Compare & find best workplace
- Add Office Photos
  
  Bring your workplace to life
- Add Company Benefits
  
  Highlight your company's perks
Reviews
- Company reviews
  
  Read reviews for 6L+ companies
- Write a review
  
  Rate your former or current company
Salaries
- Browse salaries
  
  Discover salaries for 6L+ companies
- Salary calculator
  
  Calculate your take home salary
- Are you paid fairly?
  
  Check your market value
- Share your salary
  
  Help other jobseekers
- Gratuity calculator
  
  Check your gratuity amount
- HRA calculator
  
  Check how much of your HRA is tax-free
- Salary hike calculator
  
  Check your salary hike
Interviews
- Company interviews
  
  Read interviews for 40K+ companies
- Share interview questions
  
  Contribute your interview questions
Jobs
Awards

VIEW WINNERS
- ABECA 2025
  
  VIEW WINNERS
  
  AmbitionBox Employee Choice Awards - 4th Edition
- ABECA 2024
  
  AmbitionBox Employee Choice Awards - 3rd Edition
- AmbitionBox Best Places to Work 2022
  
  2nd Edition
Participate in ABECA 2026

Add office photos

Employer? Claim Account for FREE

KPMG India

Compare

3.5

based on 6.2k Reviews

Video summary

Filter interviews by

KPMG India Data Engineer Interview Questions and Answers

Updated 5 Nov 2024

13 Interview questions

A Data Engineer was asked 8mo ago

Q. What optimization techniques have you used?

Ans.

I have used partitioning and indexing to optimize query performance.

Implemented partitioning on large tables to improve query performance by limiting the data scanned
Created indexes on frequently queried columns to speed up data retrieval
Utilized clustering keys to physically organize data on disk for faster access

A Data Engineer was asked 10mo ago

Q. How do you copy data from on-premise to Azure cloud?

Ans.

Data can be copied from on-premise to Azure cloud using various methods like Azure Data Factory, Azure Storage Explorer, Azure Data Migration Service, etc.

Use Azure Data Factory to create data pipelines for moving data from on-premise to Azure cloud
Utilize Azure Storage Explorer to manually copy data from on-premise to Azure Blob Storage
Leverage Azure Data Migration Service for migrating large volumes of data from...

A Data Engineer was asked 10mo ago

Q. What is integration runtime in ADF?

Ans.

Integration run time in ADF is a compute infrastructure used to run activities in Azure Data Factory pipelines.

Integration run time is a managed compute infrastructure in Azure Data Factory.
It is used to run activities within pipelines, such as data movement or data transformation tasks.
Integration run time can be auto-scaled based on the workload requirements.
It supports various data integration scenarios, includ...

A Data Engineer was asked

Q. How do you connect a SQL server to Databricks?

Ans.

To connect SQL server to Databricks, use JDBC/ODBC drivers and configure the connection settings.

Install the appropriate JDBC/ODBC driver for SQL server
Configure the connection settings in Databricks
Use the JDBC/ODBC driver to establish the connection

What people are saying about KPMG India

View All

trendylion

student at

Chandigarh University

Data Science dream job: Need resume advice & referrals!

Got a question about KPMG India?

Ask anonymously on communities.

A Data Engineer was asked

Q. Explain the project architecture in detail.

Ans.

The project architecture involves the design and organization of data pipelines and systems for efficient data processing and storage.

The architecture includes components such as data sources, data processing frameworks, storage systems, and data delivery mechanisms.
It focuses on scalability, reliability, and performance to handle large volumes of data.
Example: A project architecture may involve using Apache Kafka...

A Data Engineer was asked

Q. How do you initiate SparkContext?

Ans.

To initiate Sparkcontext, create a SparkConf object and pass it to SparkContext constructor.

Create a SparkConf object with app name and master URL
Pass the SparkConf object to SparkContext constructor
Example: conf = SparkConf().setAppName('myApp').setMaster('local[*]') sc = SparkContext(conf=conf)
Stop SparkContext using sc.stop()

A Data Engineer was asked

Q. Write a function to check if a number is an Armstrong Number.

Ans.

Function to check if a number is an Armstrong Number

An Armstrong Number is a number that is equal to the sum of its own digits raised to the power of the number of digits
To check if a number is an Armstrong Number, we need to calculate the sum of each digit raised to the power of the number of digits
If the sum is equal to the original number, then it is an Armstrong Number

Are these interview questions helpful?

A Data Engineer was asked

Q. RDDs vs DataFrames: Which is better and why?

Ans.

DataFrames are better than RDDs due to their optimized performance and ease of use.

DataFrames are optimized for better performance than RDDs.
DataFrames have a schema, making it easier to work with structured data.
DataFrames support SQL queries and can be used with Spark SQL.
RDDs are more low-level and require more manual optimization.
RDDs are useful for unstructured data or when fine-grained control is needed.

A Data Engineer was asked

Q. Write PySpark code to change a column name and divide one column by another.

Ans.

Pyspark code to change column name and divide one column by another column.

Use 'withColumnRenamed' method to change column name
Use 'withColumn' method to divide one column by another column
Example: df = df.withColumnRenamed('old_col_name', 'new_col_name').withColumn('new_col_name', df['col1']/df['col2'])

A Data Engineer was asked

Q. What are the optimization techniques applied in PySpark code?

Ans.

Optimization techniques in PySpark code include partitioning, caching, and using broadcast variables.

Partitioning data based on key columns to optimize join operations
Caching frequently accessed data in memory to avoid recomputation
Using broadcast variables to efficiently share small data across nodes
Using appropriate data types and avoiding unnecessary type conversions
Avoiding shuffling of data by using appropria...

KPMG India Data Engineer Interview Experiences

12 interviews found

Data Engineer Interview Questions & Answers

Anonymous

posted on 17 Oct 2024

Interview experience

Good

Difficulty level

Moderate

Process Duration

2-4 weeks

Result

Not Selected

I applied via Naukri.com and was interviewed in Sep 2024. There were 3 interview rounds.

Round 1 - Coding Test

Some multiple choice, 2 sql and 2 python questions were asked

Round 2 - Technical

(2 Questions)

Q1. Tell me about you project

Ans.

Developed a real-time data processing system for analyzing customer behavior

Used Apache Kafka for streaming data ingestion
Implemented data pipelines using Apache Spark for processing and analysis
Utilized Elasticsearch for storing and querying large volumes of data
Developed custom machine learning models for predictive analytics

Answered by AI

Add your answer

Q2. Optimising technique that you have used

Ans.

I have used partitioning and indexing to optimize query performance.

Implemented partitioning on large tables to improve query performance by limiting the data scanned
Created indexes on frequently queried columns to speed up data retrieval
Utilized clustering keys to physically organize data on disk for faster access

Answered by AI

Add your answer

Round 3 - Technical

(2 Questions)

Q1. Window partition question was asked

Add your answer

Q2. Project related question

Add your answer

Skills evaluated in this interview

Data Engineer Interview Questions & Answers

Anonymous

posted on 7 Oct 2024

Interview experience

Good

Difficulty level

Moderate

Process Duration

Less than 2 weeks

Result

Not Selected

I applied via Naukri.com and was interviewed in Sep 2024. There were 2 interview rounds.

Round 1 - Technical

(2 Questions)

Q1. Spark optimization techniques

Ans.

Spark optimization techniques involve partitioning, caching, and tuning resources for efficient data processing.

Partitioning data to distribute workload evenly
Caching frequently accessed data to avoid recomputation
Tuning resources like memory allocation and parallelism
Using broadcast variables for small lookup tables

Answered by AI

Add your answer

Q2. Data warehousing questions

Add your answer

Round 2 - Technical

(2 Questions)

Q1. Project experience

Ans.

Developed a data pipeline to ingest, process, and analyze real-time streaming data from IoT devices.

Designed and implemented data ingestion process using Apache Kafka
Utilized Apache Spark for real-time data processing and analysis
Developed data models and algorithms to extract insights from the data
Worked with stakeholders to understand requirements and deliver actionable insights

Answered by AI

Add your answer

Q2. Challenges faced

Ans.

Some challenges faced include data quality issues, scalability issues, and keeping up with evolving technologies.

Data quality issues such as missing values, inconsistencies, and errors in data sources.
Scalability issues when dealing with large volumes of data and ensuring efficient processing.
Keeping up with evolving technologies and tools in the field of data engineering.
Collaborating with cross-functional teams and s...

Answered by AI

Add your answer

Skills evaluated in this interview

Data Engineer Interview Questions & Answers

Anonymous

posted on 12 Feb 2024

Interview experience

Good

Difficulty level

Moderate

Process Duration

Less than 2 weeks

Result

No response

I applied via Naukri.com and was interviewed in Jan 2024. There was 1 interview round.

Round 1 - One-on-one

(5 Questions)

Q1. 1. What is columnar storage,parquet,delta? Why it is used

Ans.

Columnar storage is a data storage format that stores data in columns rather than rows, improving query performance.

Columnar storage stores data in a column-wise manner instead of row-wise.
It improves query performance by reducing the amount of data that needs to be read from disk.
Parquet is a columnar storage file format that is optimized for big data workloads.
It is used in Apache Spark and other big data processing ...

Answered by AI

Add your answer

Q2. 2.list ,tuple,set in python 3.sql groupby and window function ,union

Add your answer

Q3. 3. Explain detail project architecture

Ans.

The project architecture involves the design and organization of data pipelines and systems for efficient data processing and storage.

The architecture includes components such as data sources, data processing frameworks, storage systems, and data delivery mechanisms.
It focuses on scalability, reliability, and performance to handle large volumes of data.
Example: A project architecture may involve using Apache Kafka for ...

Answered by AI

Add your answer

Q4. 4. How to connect SQL server to databricks

Ans.

To connect SQL server to Databricks, use JDBC/ODBC drivers and configure the connection settings.

Install the appropriate JDBC/ODBC driver for SQL server
Configure the connection settings in Databricks
Use the JDBC/ODBC driver to establish the connection

Answered by AI

Add your answer

Q5. Optimisation techniques used

Ans.

Optimisation techniques used in data engineering

Partitioning data to improve query performance
Using indexing to speed up data retrieval
Implementing caching mechanisms to reduce data access time
Optimizing data storage formats for efficient storage and processing
Parallel processing and distributed computing for faster data processing
Using compression techniques to reduce storage space and improve data transfer
Applying qu...

Answered by AI

Add your answer

Interview Preparation Tips

Topics to prepare for KPMG India Data Engineer interview:

Spark,
SQL
Python

Skills evaluated in this interview

Data Engineer Interview Questions & Answers

Anonymous

posted on 5 Nov 2024

Interview experience

Good

Difficulty level

Process Duration

Less than 2 weeks

Result

Not Selected

I applied via Recruitment Consulltant and was interviewed in May 2024. There was 1 interview round.

Round 1 - Technical

(2 Questions)

Q1. Question related to project

Add your answer

Q2. Sql medium level question

Add your answer

Data Engineer Interview Questions & Answers

Dark Prince

posted on 18 Jul 2024

Interview experience

Excellent

Difficulty level

Process Duration

Result

Round 1 - Technical

(2 Questions)

Q1. Pyspark question were asked

Add your answer

Q2. Sql questions were asked

Add your answer

Data Engineer Interview Questions & Answers

Anonymous

posted on 14 Jun 2024

Interview experience

Average

Difficulty level

Moderate

Process Duration

Less than 2 weeks

Result

No response

I applied via Approached by Company and was interviewed in May 2024. There was 1 interview round.

Round 1 - Technical

(1 Question)

Q1. SCD type 2, snowflake pipe

Add your answer

Data Engineer Interview Questions & Answers

Anonymous

posted on 10 Jun 2023

Interview experience

Bad

Difficulty level

Process Duration

2-4 weeks

Result

I applied via Approached by Company and was interviewed in May 2023. There were 2 interview rounds.

Round 1 - Resume Shortlist

Pro Tip by AmbitionBox:

Properly align and format text in your resume. A recruiter will have to spend more time reading poorly aligned text, leading to high chances of rejection.

View all tips

Round 2 - Technical

(2 Questions)

Q1. It's question on collect list - it should be a straight question instead it was asked like comma separated and in a vague way.

Add your answer

Q2. Few questions on triggers, integration runtime

Add your answer

Interview Preparation Tips

Interview preparation tips for other job seekers - Don't waste your time by attending this.
Few questions asked itself is wrong. Interviewer asking in her way of understanding.
Wishi Kaur is the interviewer and she is so rude and didn't know the way to ask questions and no respect over candidates.
She is thinking she knows everything.
It's a humble request to avoid these kind of interviewers.

Data Engineer Interview Questions & Answers

Anonymous

posted on 6 Apr 2023

Interview experience

Excellent

Difficulty level

Moderate

Process Duration

Result

I applied via Approached by Company and was interviewed in Mar 2023. There were 2 interview rounds.

Round 1 - Resume Shortlist

Pro Tip by AmbitionBox:

Don’t add your photo or details such as gender, age, and address in your resume. These details do not add any value.

View all tips

Round 2 - One-on-one

(4 Questions)

Q1. Python Dictionary operations

Add your answer

Q2. Write function to check if number is an Armstrong Number

Ans.

Function to check if a number is an Armstrong Number

An Armstrong Number is a number that is equal to the sum of its own digits raised to the power of the number of digits
To check if a number is an Armstrong Number, we need to calculate the sum of each digit raised to the power of the number of digits
If the sum is equal to the original number, then it is an Armstrong Number

Answered by AI

Add your answer

Q3. How to initiate Sparkcontext

Ans.

To initiate Sparkcontext, create a SparkConf object and pass it to SparkContext constructor.

Create a SparkConf object with app name and master URL
Pass the SparkConf object to SparkContext constructor
Example: conf = SparkConf().setAppName('myApp').setMaster('local[*]') sc = SparkContext(conf=conf)
Stop SparkContext using sc.stop()

Answered by AI

Add your answer

Q4. RDDs vs DataFrames. Which is better and why

Ans.

DataFrames are better than RDDs due to their optimized performance and ease of use.

DataFrames are optimized for better performance than RDDs.
DataFrames have a schema, making it easier to work with structured data.
DataFrames support SQL queries and can be used with Spark SQL.
RDDs are more low-level and require more manual optimization.
RDDs are useful for unstructured data or when fine-grained control is needed.

Answered by AI

Add your answer

Interview Preparation Tips

Interview preparation tips for other job seekers - Be well versed with Pyspark concepts. Do not fake answers. You can accept not knowing somethings.

Skills evaluated in this interview

Data Engineer Interview Questions & Answers

Anonymous

posted on 18 Oct 2022

I applied via Approached by Company and was interviewed in Sep 2022. There were 5 interview rounds.

Round 1 - Resume Shortlist

Pro Tip by AmbitionBox:

Keep your resume crisp and to the point. A recruiter looks at your resume for an average of 6 seconds, make sure to leave the best impression.

View all tips

Round 2 - Coding Test

It was a MCQ test to intepret codes and its outcomes.

Round 3 - Technical

(4 Questions)

Q1. Given a dictionary, find out the greatest number for same key in Python.

Ans.

Find the greatest number for same key in a Python dictionary.

Use max() function with key parameter to find the maximum value for each key in the dictionary.
Iterate through the dictionary and apply max() function on each key.
If the dictionary is nested, use recursion to iterate through all the keys.

Answered by AI

Add your answer

Q2. Write Pyspark code to read csv file and show top 10 records.

Ans.

Pyspark code to read csv file and show top 10 records.

Import the necessary libraries
Create a SparkSession
Read the CSV file using the SparkSession
Display the top 10 records using the show() method

Answered by AI

View 1 more answer

Q3. Write pyspark code to change column name, divide one column by another column.

Ans.

Pyspark code to change column name and divide one column by another column.

Use 'withColumnRenamed' method to change column name
Use 'withColumn' method to divide one column by another column
Example: df = df.withColumnRenamed('old_col_name', 'new_col_name').withColumn('new_col_name', df['col1']/df['col2'])

Answered by AI

Add your answer

Q4. What are the optimization techniques applied in pyspark code?

Ans.

Optimization techniques in PySpark code include partitioning, caching, and using broadcast variables.

Partitioning data based on key columns to optimize join operations
Caching frequently accessed data in memory to avoid recomputation
Using broadcast variables to efficiently share small data across nodes
Using appropriate data types and avoiding unnecessary type conversions
Avoiding shuffling of data by using appropriate tr...

Answered by AI

View 2 more answers

Round 4 - Behavioral

(1 Question)

Q1. How do you handle changing schema from source. What are the common issues faced in hadoop and how did you resolve it?

Ans.

Handling changing schema from source in Hadoop

Use schema evolution techniques like Avro or Parquet to handle schema changes
Implement a flexible ETL pipeline that can handle schema changes
Use tools like Apache NiFi to dynamically adjust schema during ingestion
Common issues include data loss, data corruption, and performance degradation
Resolve issues by implementing proper testing, monitoring, and backup strategies

Answered by AI

Add your answer

Round 5 - HR

(1 Question)

Q1. Mostly it was salary negotiation

Add your answer

Interview Preparation Tips

Interview preparation tips for other job seekers - Be confident and pratice python, pyspark, sql for big data ppsitiom

Skills evaluated in this interview

Data Engineer Interview Questions & Answers

Anonymous

posted on 10 Aug 2024

Interview experience

Excellent

Difficulty level

Moderate

Process Duration

2-4 weeks

Result

Not Selected

I applied via Naukri.com and was interviewed before Aug 2023. There were 2 interview rounds.

Round 1 - Technical

(2 Questions)

Q1. What is integration run time in adf

Ans.

Integration run time in ADF is a compute infrastructure used to run activities in Azure Data Factory pipelines.

Integration run time is a managed compute infrastructure in Azure Data Factory.
It is used to run activities within pipelines, such as data movement or data transformation tasks.
Integration run time can be auto-scaled based on the workload requirements.
It supports various data integration scenarios, including b...

Answered by AI

Add your answer

Q2. How do you copy data from on-premise to azure cloud

Ans.

Data can be copied from on-premise to Azure cloud using various methods like Azure Data Factory, Azure Storage Explorer, Azure Data Migration Service, etc.

Use Azure Data Factory to create data pipelines for moving data from on-premise to Azure cloud
Utilize Azure Storage Explorer to manually copy data from on-premise to Azure Blob Storage
Leverage Azure Data Migration Service for migrating large volumes of data from on-p...

Answered by AI

Add your answer

Round 2 - HR

(2 Questions)

Q1. How many members do you have in your family

Ans.

I have 4 members in my family including my parents, my sibling, and myself.

I have 2 parents
I have 1 sibling
I am included in the count

Answered by AI

Add your answer

Q2. Are ready to relocate

Ans.

Yes, I am open to relocating for the right opportunity.

I am willing to relocate for the right job opportunity
I am open to exploring new locations and experiences
I understand the importance of being flexible in the job market

Answered by AI

Add your answer

Interview Preparation Tips

Interview preparation tips for other job seekers - prepare everything

Skills evaluated in this interview

KPMG India Interview FAQs

How many rounds are there in KPMG India Data Engineer interview?

KPMG India interview process usually has 2-3 rounds. The most common rounds in the KPMG India interview process are Technical, Resume Shortlist and Coding Test.

How to prepare for KPMG India Data Engineer interview?

Go through your CV in detail and study all the technologies mentioned in your CV. Prepare at least two technologies or languages in depth if you are appearing for a technical interview at KPMG India. The most common topics and skills that interviewers at KPMG India expect are SQL, Python, Azure, Data Integration and ETL.

What are the top questions asked in KPMG India Data Engineer interview?

Some of the top questions asked at the KPMG India Data Engineer interview -

How do you handle changing schema from source. What are the common issues faced...read more
Write Pyspark code to read csv file and show top 10 recor...read more
What are the optimization techniques applied in pyspark co...read more

How long is the KPMG India Data Engineer interview process?

The duration of KPMG India Data Engineer interview process can vary, but typically it takes about less than 2 weeks to complete.

Tell us how to improve this page.

KPMG India Interviews By Designations

Interview Questions for Popular Designations

3.9/5

based on 10 interview experiences

Difficulty level

Moderate 100%

Duration

Less than 2 weeks 63%

2-4 weeks 38%

Deloitte Data Engineer Interview Questions

3.7

• 22 Interviews

PwC Data Engineer Interview Questions

3.4

• 20 Interviews

Ernst & Young Data Engineer Interview Questions

3.4

• 11 Interviews

ZS Data Engineer Interview Questions

3.3

• 8 Interviews

BCG Data Engineer Interview Questions

3.7

• 7 Interviews

Gartner Data Engineer Interview Questions

4.1

• 2 Interviews

The Smart Cube Data Engineer Interview Questions

3.6

• 2 Interviews

KPMG Global Services Data Engineer Interview Questions

3.5

• 1 Interview

Bain & Company Data Engineer Interview Questions

3.9

• 1 Interview

Milliman Data Engineer Interview Questions

3.8

• 1 Interview

View all

KPMG India Data Engineer Salary

based on 142 salaries

₹7.5 L/yr - ₹27 L/yr

57% more than the average Data Engineer Salary in India

View more details

Data Engineer Jobs at KPMG India

Celonis Data Engineer - Analyst

Bangalore / Bengaluru

1-2 Yrs

Not Disclosed

Explore more jobs

KPMG India Salaries in India

Consultant 8.7k salaries	₹6.8 L/yr - ₹27 L/yr
Assistant Manager 7.8k salaries	₹10.7 L/yr - ₹32 L/yr
Associate Consultant 5.1k salaries	₹4.6 L/yr - ₹18 L/yr
Analyst 3.8k salaries	₹1 L/yr - ₹9.1 L/yr
Manager 3.4k salaries	₹15.5 L/yr - ₹47.5 L/yr