Home
Communities
Companies
- Companies
  
  Discover best places to work
- Compare Companies
  
  Compare & find best workplace
- Add Office Photos
  
  Bring your workplace to life
- Add Company Benefits
  
  Highlight your company's perks
Reviews
- Company reviews
  
  Read reviews for 6L+ companies
- Write a review
  
  Rate your former or current company
Salaries
- Browse salaries
  
  Discover salaries for 6L+ companies
- Salary calculator
  
  Calculate your take home salary
- Are you paid fairly?
  
  Check your market value
- Share your salary
  
  Help other jobseekers
- Gratuity calculator
  
  Check your gratuity amount
- HRA calculator
  
  Check how much of your HRA is tax-free
- Salary hike calculator
  
  Check your salary hike
Interviews
- Company interviews
  
  Read interviews for 40K+ companies
- Share interview questions
  
  Contribute your interview questions
Jobs
Awards

VIEW WINNERS
- ABECA 2025
  
  VIEW WINNERS
  
  AmbitionBox Employee Choice Awards - 4th Edition
- ABECA 2024
  
  AmbitionBox Employee Choice Awards - 3rd Edition
- AmbitionBox Best Places to Work 2022
  
  2nd Edition
Participate in ABECA 2026

Add office photos

Employer? Claim Account for FREE

PwC

Compare

3.3

based on 10.5k Reviews

Video summary

Filter interviews by

PwC Data Engineer Interview Questions and Answers

Updated 20 Jan 2025

23 Interview questions

A Data Engineer was asked 5mo ago

Q. What is data skewness?

Ans.

Data skewness is a measure of asymmetry in the distribution of data values.

Data skewness indicates the lack of symmetry in the data distribution.
Positive skewness means the tail on the right side of the distribution is longer or fatter.
Negative skewness means the tail on the left side of the distribution is longer or fatter.
Skewness value of 0 indicates a perfectly symmetrical distribution.

A Data Engineer was asked 5mo ago

Q. What are transformations, and how many types of transformations exist?

Ans.

Transformations are operations performed on data to convert it from one form to another. There are mainly two types of transformations: narrow and wide.

Transformations are operations performed on data to convert it from one form to another.
Narrow transformations are those where each input partition will contribute to only one output partition, e.g., map, filter.
Wide transformations are those where each input parti...

A Data Engineer was asked 5mo ago

Q. Explain out-of-memory errors and driver head memory.

Ans.

OOM stands for Out Of Memory and driverhead memory refers to the memory allocated to the driver in a Spark application.

OOM occurs when a system runs out of memory to allocate for processes, leading to crashes or performance issues.
Driverhead memory in Spark is the memory allocated to the driver program, which coordinates tasks and manages the overall execution of the application.
Adjusting memory settings like exec...

A Data Engineer was asked 5mo ago

Q. Explain the Spark job process and its planning.

Ans.

Spark job process involves job submission, DAG creation, task scheduling, and task execution.

Spark job is submitted to the SparkContext by the user.
Spark creates a Directed Acyclic Graph (DAG) of the job's stages and tasks.
Tasks are scheduled by the Spark scheduler based on data locality and resource availability.
Tasks are executed on worker nodes in the cluster.
Output is collected and returned to the user.

What people are saying about PwC

View All

an associate2

Deloitte USI ASA2 or RSM USI SA1: Which way to go?

I'm at PwC AC as an Associate 2 with 2 yrs of PQE. Is it smart to join Deloitte USI Assurance as ASA2 (30% Hike)? Won't I be overqualified for that level? I also have an SA1 offer from RSM USI with the same pay (34% hike). Plus, I'm expecting a 10% raise at PwC by July end. Thoughts?

Got a question about PwC?

Ask anonymously on communities.

A Data Engineer was asked 5mo ago

Q. What are the concepts of coalesce and repartition in data processing?

Ans.

Coalesce and repartition are concepts used in data processing to control the number of partitions in a dataset.

Coalesce is used to reduce the number of partitions in a dataset without shuffling the data, which can improve performance.
Repartition is used to increase or decrease the number of partitions in a dataset by shuffling the data across the cluster.
Coalesce is preferred over repartition when reducing partiti...

A Data Engineer was asked 5mo ago

Q. What is the SQL query to find the third highest salary from a given table?

Ans.

Use SQL query with ORDER BY and LIMIT to find the third highest salary from a table.

Use ORDER BY clause to sort salaries in descending order
Use LIMIT 1 OFFSET 2 to skip the first two highest salaries
Example: SELECT salary FROM employees ORDER BY salary DESC LIMIT 1 OFFSET 2

A Data Engineer was asked 7mo ago

Q. Briefly explain SCD Type II.

Ans.

SCD Type II allows tracking historical changes in data by creating new records instead of overwriting existing ones.

Maintains a full history of changes to data over time.
Each change creates a new record with a start and end date.
Example: If a customer's address changes, a new record is created with the new address and a timestamp.
Useful in scenarios where historical accuracy is crucial, such as in financial or cus...

Are these interview questions helpful?

A Data Engineer was asked 9mo ago

Q. What is the difference between coalesce and repartition?

Ans.

Coalesce is used to return the first non-null value among its arguments, while reparation is not a standard function in SQL.

Coalesce is a standard SQL function, while reparation is not.
Coalesce returns the first non-null value among its arguments.
Reparation is not a standard SQL function and may refer to a custom function or process specific to a certain system or application.

A Data Engineer was asked 9mo ago

Q. Write a SQL query using the LAG window function.

Ans.

The SQL LAG function retrieves data from a previous row in a result set, useful for comparisons.

LAG function syntax: LAG(column_name, offset, default) OVER (PARTITION BY column ORDER BY column).
Example: SELECT date, sales, LAG(sales, 1) OVER (ORDER BY date) AS previous_sales FROM sales_data;
Useful for calculating differences, trends, or changes over time.
Can be used in financial analysis to compare current and pre...

A Data Engineer was asked 10mo ago

Q. Are you willing to travel on site?

Ans.

Yes, I am ready to travel on site for data engineering projects.

I am willing to travel for client meetings, project kick-offs, and on-site troubleshooting.
I understand the importance of face-to-face interactions in project delivery.
I have previous experience traveling for work, such as attending conferences or training sessions.
I am flexible with my schedule and can accommodate last-minute travel if needed.

PwC Data Engineer Interview Experiences

20 interviews found

Data Engineer Interview Questions & Answers

vinay kn

posted on 20 Jan 2025

Interview experience

Good

Difficulty level

Moderate

Process Duration

Result

No response

I appeared for an interview in Dec 2024.

Round 1 - Technical

(6 Questions)

Q1. What is spark,explain its ecosystem

Ans.

Spark is a fast and general-purpose cluster computing system for big data processing.

Spark provides APIs in Java, Scala, Python, and R for distributed data processing.
It includes components like Spark SQL for SQL and structured data processing, Spark Streaming for real-time data processing, MLlib for machine learning, and GraphX for graph processing.
Spark can run on top of Hadoop, Mesos, Kubernetes, or in standalone mo...

Answered by AI

Add your answer

Q2. What are transformations, and how many types of transformations exist?

Ans.

Transformations are operations performed on data to convert it from one form to another. There are mainly two types of transformations: narrow and wide.

Transformations are operations performed on data to convert it from one form to another.
Narrow transformations are those where each input partition will contribute to only one output partition, e.g., map, filter.
Wide transformations are those where each input partition ...

Answered by AI

Add your answer

Q3. Explain about spark job process and its planning

Ans.

Spark job process involves job submission, DAG creation, task scheduling, and task execution.

Spark job is submitted to the SparkContext by the user.
Spark creates a Directed Acyclic Graph (DAG) of the job's stages and tasks.
Tasks are scheduled by the Spark scheduler based on data locality and resource availability.
Tasks are executed on worker nodes in the cluster.
Output is collected and returned to the user.

Answered by AI

Add your answer

Q4. What are the concepts of coalesce and repartition in data processing?

Ans.

Coalesce and repartition are concepts used in data processing to control the number of partitions in a dataset.

Coalesce is used to reduce the number of partitions in a dataset without shuffling the data, which can improve performance.
Repartition is used to increase or decrease the number of partitions in a dataset by shuffling the data across the cluster.
Coalesce is preferred over repartition when reducing partitions t...

Answered by AI

Add your answer

Q5. Explain about oom and driverhead memory

Ans.

OOM stands for Out Of Memory and driverhead memory refers to the memory allocated to the driver in a Spark application.

OOM occurs when a system runs out of memory to allocate for processes, leading to crashes or performance issues.
Driverhead memory in Spark is the memory allocated to the driver program, which coordinates tasks and manages the overall execution of the application.
Adjusting memory settings like executor ...

Answered by AI

Add your answer

Q6. What is data skewness

Ans.

Data skewness is a measure of asymmetry in the distribution of data values.

Data skewness indicates the lack of symmetry in the data distribution.
Positive skewness means the tail on the right side of the distribution is longer or fatter.
Negative skewness means the tail on the left side of the distribution is longer or fatter.
Skewness value of 0 indicates a perfectly symmetrical distribution.

Answered by AI

Add your answer

Data Engineer Interview Questions & Answers

Anonymous

posted on 3 Oct 2024

Interview experience

Excellent

Difficulty level

Moderate

Process Duration

Less than 2 weeks

Result

Selected

I applied via LinkedIn and was interviewed in Sep 2024. There were 2 interview rounds.

Round 1 - Technical

(5 Questions)

Q1. Spark architecture

Add your answer

Q2. Difference between coalesce and reparation

Ans.

Coalesce is used to return the first non-null value among its arguments, while reparation is not a standard function in SQL.

Coalesce is a standard SQL function, while reparation is not.
Coalesce returns the first non-null value among its arguments.
Reparation is not a standard SQL function and may refer to a custom function or process specific to a certain system or application.

Answered by AI

Add your answer

Q3. SQL question using Lag window function

Ans.

The SQL LAG function retrieves data from a previous row in a result set, useful for comparisons.

LAG function syntax: LAG(column_name, offset, default) OVER (PARTITION BY column ORDER BY column).
Example: SELECT date, sales, LAG(sales, 1) OVER (ORDER BY date) AS previous_sales FROM sales_data;
Useful for calculating differences, trends, or changes over time.
Can be used in financial analysis to compare current and previous...

Answered by AI

Add your answer

Q4. Same question in Pyspark

Add your answer

Q5. Project and day to day activities

Add your answer

Round 2 - Behavioral

(2 Questions)

Q1. Scenario based questions

Add your answer

Q2. Gave details about work

Add your answer

Interview Preparation Tips

Topics to prepare for PwC Data Engineer interview:

Spark
SQL
Pyspark
Project
Soft Skills

Skills evaluated in this interview

Data Engineer Interview Questions & Answers

Anonymous

posted on 13 Jan 2025

Interview experience

Average

Difficulty level

Easy

Process Duration

Less than 2 weeks

Result

Not Selected

I applied via Naukri.com and was interviewed in Dec 2024. There was 1 interview round.

Round 1 - Technical

(2 Questions)

Q1. What is the SQL query to find the third highest salary from a given table?

Ans.

Use SQL query with ORDER BY and LIMIT to find the third highest salary from a table.

Use ORDER BY clause to sort salaries in descending order
Use LIMIT 1 OFFSET 2 to skip the first two highest salaries
Example: SELECT salary FROM employees ORDER BY salary DESC LIMIT 1 OFFSET 2

Answered by AI

Add your answer

Q2. What is the difference between reparation and coalesce? What is the difference between persist and cache?

Ans.

repartition vs coalesce, persist vs cache

repartition is used to increase or decrease the number of partitions in a DataFrame, while coalesce is used to decrease the number of partitions without shuffling
persist is used to persist the DataFrame in memory or disk for faster access, while cache is a shorthand for persisting the DataFrame in memory only
repartition example: df.repartition(10)
coalesce example: df.coalesce(5)
...

Answered by AI

Add your answer

Data Engineer Interview Questions & Answers

Anonymous

posted on 2 Dec 2024

Interview experience

Excellent

Difficulty level

Easy

Process Duration

Less than 2 weeks

Result

Not Selected

I applied via Naukri.com and was interviewed in Nov 2024. There was 1 interview round.

Round 1 - Technical

(1 Question)

Q1. Brief Explain about SCD Type II

Ans.

SCD Type II allows tracking historical changes in data by creating new records instead of overwriting existing ones.

Maintains a full history of changes to data over time.
Each change creates a new record with a start and end date.
Example: If a customer's address changes, a new record is created with the new address and a timestamp.
Useful in scenarios where historical accuracy is crucial, such as in financial or customer...

Answered by AI

Add your answer

Data Engineer Interview Questions & Answers

Anonymous

posted on 19 Aug 2024

Interview experience

Good

Difficulty level

Moderate

Process Duration

Less than 2 weeks

Result

Not Selected

I applied via Naukri.com and was interviewed in Jul 2024. There were 4 interview rounds.

Round 1 - Aptitude Test

Aptitude was Okay. Time given was less.

Round 2 - Technical

(2 Questions)

Q1. Project architecture

Add your answer

Q2. Dataframes in Pyspark

Ans.

Dataframes in Pyspark are distributed collections of data organized into named columns.

Dataframes are similar to tables in a relational database.
They can be created from various data sources like CSV, JSON, Parquet, etc.
Dataframes support SQL queries and transformations using PySpark functions.

Answered by AI

Add your answer

Round 3 - Technical

(2 Questions)

Q1. Python Question for 2nd highest salary

Ans.

Find the second highest salary from a list of employee salaries using Python.

Use a set to remove duplicates from the salary list.
Sort the unique salaries in descending order.
Access the second element in the sorted list to get the second highest salary.
Example: salaries = [3000, 2000, 3000, 4000]; unique_salaries = sorted(set(salaries), reverse=True); second_highest = unique_salaries[1].

Answered by AI

Add your answer

Q2. Pyspark scenario based question

Add your answer

Round 4 - HR

(2 Questions)

Q1. Ready to travel on site

Ans.

Yes, I am ready to travel on site for data engineering projects.

I am willing to travel for client meetings, project kick-offs, and on-site troubleshooting.
I understand the importance of face-to-face interactions in project delivery.
I have previous experience traveling for work, such as attending conferences or training sessions.
I am flexible with my schedule and can accommodate last-minute travel if needed.

Answered by AI

Add your answer

Q2. Salary negotiation

Add your answer

Skills evaluated in this interview

Data Engineer Interview Questions & Answers

Anonymous

posted on 3 Jul 2024

Interview experience

Good

Difficulty level

Process Duration

Result

Round 1 - Technical

(2 Questions)

Q1. Sql question of rank sumover and partion by kinda question

Add your answer

Q2. Roles n responsibility

Add your answer

Data Engineer Interview Questions & Answers

Anonymous

posted on 2 Jul 2024

Interview experience

Average

Difficulty level

Moderate

Process Duration

Less than 2 weeks

Result

No response

I applied via AmbitionBox and was interviewed in Jan 2024. There was 1 interview round.

Round 1 - Technical

(1 Question)

Q1. Write code to print reverse of string.

Ans.

Code to print reverse of string

Use a loop to iterate through the characters of the string in reverse order
Append each character to a new string to build the reversed string
Return the reversed string

Answered by AI

Add your answer

Data Engineer Interview Questions & Answers

Sourav Raj

posted on 10 Jul 2024

Interview experience

Excellent

Difficulty level

Moderate

Process Duration

Less than 2 weeks

Result

Selected

I applied via Referral and was interviewed before Jul 2023. There were 2 interview rounds.

Round 1 - Technical

(2 Questions)

Q1. Sql questions using row_num and rank

Ans.

Understanding SQL's ROW_NUMBER() and RANK() functions for data ranking and ordering.

ROW_NUMBER() assigns a unique sequential integer to rows within a partition, starting at 1.
RANK() assigns a rank to each row within a partition, with gaps for ties (e.g., 1, 1, 3).
Example of ROW_NUMBER(): SELECT name, ROW_NUMBER() OVER (ORDER BY score DESC) AS rank FROM players;
Example of RANK(): SELECT name, RANK() OVER (ORDER BY score...

Answered by AI

Add your answer

Q2. How to delete duplicate from a database

Ans.

To delete duplicates from a database, you can use SQL queries to identify and remove duplicate records.

Use the DISTINCT keyword in a SELECT query to retrieve unique records
Identify duplicate records using GROUP BY and HAVING clauses
Delete duplicate records using DELETE statement with subquery to keep only one instance

Answered by AI

Add your answer

Round 2 - HR

(2 Questions)

Q1. Why do I want to join PWC

Ans.

Joining PwC offers opportunities for growth, innovation, and collaboration in a leading global professional services firm.

Reputation: PwC is recognized as one of the Big Four accounting firms, providing a strong foundation for career development and networking.
Diverse Projects: Working at PwC allows engagement in a variety of projects across industries, enhancing skills and experience in data engineering.
Innovation Foc...

Answered by AI

Add your answer

Q2. What are qualities I bring to my job

Ans.

I bring a blend of technical skills, problem-solving abilities, and strong communication to my role as a Data Engineer.

Strong analytical skills: I excel at analyzing complex datasets to derive actionable insights, as demonstrated in my previous project where I optimized data pipelines.
Proficiency in programming languages: I am skilled in Python and SQL, which I used to automate data processing tasks, reducing processin...

Answered by AI

Add your answer

Skills evaluated in this interview

Data Engineer Interview Questions & Answers

Anonymous

posted on 10 Apr 2024

Interview experience

Average

Difficulty level

Process Duration

Result

I applied via Job Portal

Round 1 - Technical

(1 Question)

Q1. Pyspark related questions.

Add your answer

Data Engineer Interview Questions & Answers

Anonymous

posted on 29 Feb 2024

Interview experience

Poor

Difficulty level

Process Duration

Result

Round 1 - Technical

(2 Questions)

Q1. Repartition vs coalease

Ans.

Repartition is used to increase or decrease the number of partitions in a DataFrame, while coalesce is used to decrease the number of partitions without shuffling data.

Repartition involves shuffling data across the network, which can be expensive in terms of performance and resources.
Coalesce is a more efficient operation as it minimizes data movement by only creating new partitions if necessary.
Example: Repartition(10...

Answered by AI

Add your answer

Q2. Copy Activity in ADF

Ans.

Copy Activity in ADF is used to move data between supported data stores

Copy Activity is a built-in activity in Azure Data Factory (ADF)
It can be used to move data between supported data stores such as Azure Blob Storage, SQL Database, etc.
It supports various data movement methods like copy, transform, and load (ETL)
You can define source and sink datasets, mapping, and settings in Copy Activity
Example: Copying data from...

Answered by AI

Add your answer

Skills evaluated in this interview

PwC Interview FAQs

How many rounds are there in PwC Data Engineer interview?

PwC interview process usually has 1-2 rounds. The most common rounds in the PwC interview process are Technical, HR and One-on-one Round.

How to prepare for PwC Data Engineer interview?

Go through your CV in detail and study all the technologies mentioned in your CV. Prepare at least two technologies or languages in depth if you are appearing for a technical interview at PwC. The most common topics and skills that interviewers at PwC expect are SQL, Python, Data Modeling, AWS and Leadership Development.

What are the top questions asked in PwC Data Engineer interview?

Some of the top questions asked at the PwC Data Engineer interview -

What is data flow? Difference with ADF pipeline and data f...read more
What is the difference between reparation and coalesce? What is the difference ...read more
What are the concepts of coalesce and repartition in data processi...read more

How long is the PwC Data Engineer interview process?

The duration of PwC Data Engineer interview process can vary, but typically it takes about less than 2 weeks to complete.

Tell us how to improve this page.

PwC Interviews By Designations

Interview Questions for Popular Designations

3.9/5

based on 16 interview experiences

Difficulty level

Easy 18%

Moderate 73%

Hard 9%

Duration

Less than 2 weeks 80%

2-4 weeks 10%

More than 8 weeks 10%

Deloitte Data Engineer Interview Questions

3.7

• 22 Interviews

KPMG India Data Engineer Interview Questions

3.4

• 12 Interviews

Ernst & Young Data Engineer Interview Questions

3.4

• 11 Interviews

ZS Data Engineer Interview Questions

3.3

• 8 Interviews

BCG Data Engineer Interview Questions

3.7

• 7 Interviews

Gartner Data Engineer Interview Questions

4.1

• 2 Interviews

The Smart Cube Data Engineer Interview Questions

3.6

• 2 Interviews

Bain & Company Data Engineer Interview Questions

3.9

• 1 Interview

Milliman Data Engineer Interview Questions

3.8

• 1 Interview

View all

PwC Data Engineer Salary

based on 376 salaries

₹6.7 L/yr - ₹20.3 L/yr

16% more than the average Data Engineer Salary in India

View more details

Data Engineer Jobs at PwC

Associate_Data Engineer_Data and Analytics_Advisory

Bangalore / Bengaluru

2-4 Yrs

₹ 8-16.5 LPA

Manager_Data Engineer-- Data and Analytics_Advisory

Bangalore / Bengaluru

4-7 Yrs

₹ 7.5-28 LPA

Associate - Python Data Engineer - GDC

Kolkata

2-5 Yrs

₹ 8-17 LPA

Explore more jobs

PwC Salaries in India

Senior Associate 19k salaries	₹12.7 L/yr - ₹25 L/yr
Associate 15.1k salaries	₹7.9 L/yr - ₹14.5 L/yr
Manager 7.6k salaries	₹22.1 L/yr - ₹40 L/yr
Senior Consultant 4.9k salaries	₹15.9 L/yr - ₹26.3 L/yr
Associate2 4.7k salaries	₹7.5 L/yr - ₹14 L/yr