Upload Button Icon Add office photos

PwC

Compare button icon Compare button icon Compare

Filter interviews by

Clear (1)

PwC Data Engineer Interview Questions, Process, and Tips

Updated 20 Jan 2025

Top PwC Data Engineer Interview Questions and Answers

  • Q1. What is data flow? Difference with ADF pipeline and data flow
  • Q2. What is the difference between reparation and coalesce? What is the difference between persist and cache?
  • Q3. What are the concepts of coalesce and repartition in data processing?
View all 20 questions

PwC Data Engineer Interview Experiences

20 interviews found

Interview experience
4
Good
Difficulty level
Moderate
Process Duration
-
Result
No response

I was interviewed in Dec 2024.

Round 1 - Technical 

(6 Questions)

  • Q1. What is spark,explain its ecosystem
  • Ans. 

    Spark is a fast and general-purpose cluster computing system for big data processing.

    • Spark provides APIs in Java, Scala, Python, and R for distributed data processing.

    • It includes components like Spark SQL for SQL and structured data processing, Spark Streaming for real-time data processing, MLlib for machine learning, and GraphX for graph processing.

    • Spark can run on top of Hadoop, Mesos, Kubernetes, or in standalone mo...

  • Answered by AI
  • Q2. What are transformations, and how many types of transformations exist?
  • Ans. 

    Transformations are operations performed on data to convert it from one form to another. There are mainly two types of transformations: narrow and wide.

    • Transformations are operations performed on data to convert it from one form to another.

    • Narrow transformations are those where each input partition will contribute to only one output partition, e.g., map, filter.

    • Wide transformations are those where each input partition ...

  • Answered by AI
  • Q3. Explain about spark job process and its planning
  • Ans. 

    Spark job process involves job submission, DAG creation, task scheduling, and task execution.

    • Spark job is submitted to the SparkContext by the user.

    • Spark creates a Directed Acyclic Graph (DAG) of the job's stages and tasks.

    • Tasks are scheduled by the Spark scheduler based on data locality and resource availability.

    • Tasks are executed on worker nodes in the cluster.

    • Output is collected and returned to the user.

  • Answered by AI
  • Q4. What are the concepts of coalesce and repartition in data processing?
  • Ans. 

    Coalesce and repartition are concepts used in data processing to control the number of partitions in a dataset.

    • Coalesce is used to reduce the number of partitions in a dataset without shuffling the data, which can improve performance.

    • Repartition is used to increase or decrease the number of partitions in a dataset by shuffling the data across the cluster.

    • Coalesce is preferred over repartition when reducing partitions t...

  • Answered by AI
  • Q5. Explain about oom and driverhead memory
  • Ans. 

    OOM stands for Out Of Memory and driverhead memory refers to the memory allocated to the driver in a Spark application.

    • OOM occurs when a system runs out of memory to allocate for processes, leading to crashes or performance issues.

    • Driverhead memory in Spark is the memory allocated to the driver program, which coordinates tasks and manages the overall execution of the application.

    • Adjusting memory settings like executor ...

  • Answered by AI
  • Q6. What is data skewness
  • Ans. 

    Data skewness is a measure of asymmetry in the distribution of data values.

    • Data skewness indicates the lack of symmetry in the data distribution.

    • Positive skewness means the tail on the right side of the distribution is longer or fatter.

    • Negative skewness means the tail on the left side of the distribution is longer or fatter.

    • Skewness value of 0 indicates a perfectly symmetrical distribution.

  • Answered by AI
Interview experience
5
Excellent
Difficulty level
Moderate
Process Duration
Less than 2 weeks
Result
Selected Selected

I applied via LinkedIn and was interviewed in Sep 2024. There were 2 interview rounds.

Round 1 - Technical 

(5 Questions)

  • Q1. Spark architecture
  • Q2. Difference between coalesce and reparation
  • Ans. 

    Coalesce is used to return the first non-null value among its arguments, while reparation is not a standard function in SQL.

    • Coalesce is a standard SQL function, while reparation is not.

    • Coalesce returns the first non-null value among its arguments.

    • Reparation is not a standard SQL function and may refer to a custom function or process specific to a certain system or application.

  • Answered by AI
  • Q3. SQL question using Lag window function
  • Q4. Same question in Pyspark
  • Q5. Project and day to day activities
Round 2 - Behavioral 

(2 Questions)

  • Q1. Scenario based questions
  • Q2. Gave details about work

Interview Preparation Tips

Topics to prepare for PwC Data Engineer interview:
  • Spark
  • SQL
  • Pyspark
  • Project
  • Soft Skills

Skills evaluated in this interview

Data Engineer Interview Questions Asked at Other Companies

asked in Cisco
Q1. Optimal Strategy for a Coin Game You are playing a coin game with ... read more
asked in Sigmoid
Q2. Next Greater Element Problem Statement You are given an array arr ... read more
asked in Sigmoid
Q3. Problem: Search In Rotated Sorted Array Given a sorted array that ... read more
asked in Cisco
Q4. Covid Vaccination Distribution Problem As the Government ramps up ... read more
asked in LTIMindtree
Q5. 1) If you are given a card with 1-1000 numbers and there are 4 bo ... read more

Data Engineer Interview Questions & Answers

user image Anonymous

posted on 13 Jan 2025

Interview experience
3
Average
Difficulty level
Easy
Process Duration
Less than 2 weeks
Result
Not Selected

I applied via Naukri.com and was interviewed in Dec 2024. There was 1 interview round.

Round 1 - Technical 

(2 Questions)

  • Q1. What is the SQL query to find the third highest salary from a given table?
  • Ans. 

    Use SQL query with ORDER BY and LIMIT to find the third highest salary from a table.

    • Use ORDER BY clause to sort salaries in descending order

    • Use LIMIT 1 OFFSET 2 to skip the first two highest salaries

    • Example: SELECT salary FROM employees ORDER BY salary DESC LIMIT 1 OFFSET 2

  • Answered by AI
  • Q2. What is the difference between reparation and coalesce? What is the difference between persist and cache?
  • Ans. 

    repartition vs coalesce, persist vs cache

    • repartition is used to increase or decrease the number of partitions in a DataFrame, while coalesce is used to decrease the number of partitions without shuffling

    • persist is used to persist the DataFrame in memory or disk for faster access, while cache is a shorthand for persisting the DataFrame in memory only

    • repartition example: df.repartition(10)

    • coalesce example: df.coalesce(5)

    • ...

  • Answered by AI
Interview experience
5
Excellent
Difficulty level
Easy
Process Duration
Less than 2 weeks
Result
Not Selected

I applied via Naukri.com and was interviewed in Nov 2024. There was 1 interview round.

Round 1 - Technical 

(1 Question)

  • Q1. Brief Explain about SCD Type II

PwC interview questions for designations

 Senior Data Engineer

 (2)

 Big Data Engineer

 (1)

 Azure Data Engineer

 (3)

 Gcp Data Engineer

 (1)

 Data Analyst

 (15)

 Data Scientist

 (5)

 Data Analytics

 (1)

 Data Engineering Manager

 (1)

Data Engineer Interview Questions & Answers

user image Anonymous

posted on 19 Aug 2024

Interview experience
4
Good
Difficulty level
Moderate
Process Duration
Less than 2 weeks
Result
Not Selected

I applied via Naukri.com and was interviewed in Jul 2024. There were 4 interview rounds.

Round 1 - Aptitude Test 

Aptitude was Okay. Time given was less.

Round 2 - Technical 

(2 Questions)

  • Q1. Project architecture
  • Q2. Dataframes in Pyspark
  • Ans. 

    Dataframes in Pyspark are distributed collections of data organized into named columns.

    • Dataframes are similar to tables in a relational database.

    • They can be created from various data sources like CSV, JSON, Parquet, etc.

    • Dataframes support SQL queries and transformations using PySpark functions.

  • Answered by AI
Round 3 - Technical 

(2 Questions)

  • Q1. Python Question for 2nd highest salary
  • Q2. Pyspark scenario based question
Round 4 - HR 

(2 Questions)

  • Q1. Ready to travel on site
  • Ans. 

    Yes, I am ready to travel on site for data engineering projects.

    • I am willing to travel for client meetings, project kick-offs, and on-site troubleshooting.

    • I understand the importance of face-to-face interactions in project delivery.

    • I have previous experience traveling for work, such as attending conferences or training sessions.

    • I am flexible with my schedule and can accommodate last-minute travel if needed.

  • Answered by AI
  • Q2. Salary negotiation

Skills evaluated in this interview

Get interview-ready with Top PwC Interview Questions

Interview experience
4
Good
Difficulty level
-
Process Duration
-
Result
-
Round 1 - Technical 

(2 Questions)

  • Q1. Sql question of rank sumover and partion by kinda question
  • Q2. Roles n responsibility

Data Engineer Jobs at PwC

View all
Interview experience
3
Average
Difficulty level
Moderate
Process Duration
Less than 2 weeks
Result
No response

I applied via AmbitionBox and was interviewed in Jan 2024. There was 1 interview round.

Round 1 - Technical 

(1 Question)

  • Q1. Write code to print reverse of string.
  • Ans. 

    Code to print reverse of string

    • Use a loop to iterate through the characters of the string in reverse order

    • Append each character to a new string to build the reversed string

    • Return the reversed string

  • Answered by AI

Data Engineer Interview Questions & Answers

user image Sourav Raj

posted on 10 Jul 2024

Interview experience
5
Excellent
Difficulty level
Moderate
Process Duration
Less than 2 weeks
Result
Selected Selected

I applied via Referral and was interviewed before Jul 2023. There were 2 interview rounds.

Round 1 - Technical 

(2 Questions)

  • Q1. Sql questions using row_num and rank
  • Q2. How to delete duplicate from a database
  • Ans. 

    To delete duplicates from a database, you can use SQL queries to identify and remove duplicate records.

    • Use the DISTINCT keyword in a SELECT query to retrieve unique records

    • Identify duplicate records using GROUP BY and HAVING clauses

    • Delete duplicate records using DELETE statement with subquery to keep only one instance

  • Answered by AI
Round 2 - HR 

(2 Questions)

  • Q1. Why do I want to join PWC
  • Q2. What are qualities I bring to my job

Skills evaluated in this interview

Data Engineer Interview Questions & Answers

user image Anonymous

posted on 10 Apr 2024

Interview experience
3
Average
Difficulty level
-
Process Duration
-
Result
-

I applied via Job Portal

Round 1 - Technical 

(1 Question)

  • Q1. Pyspark related questions.

Data Engineer Interview Questions & Answers

user image Anonymous

posted on 29 Feb 2024

Interview experience
2
Poor
Difficulty level
-
Process Duration
-
Result
-
Round 1 - Technical 

(2 Questions)

  • Q1. Repartition vs coalease
  • Ans. 

    Repartition is used to increase or decrease the number of partitions in a DataFrame, while coalesce is used to decrease the number of partitions without shuffling data.

    • Repartition involves shuffling data across the network, which can be expensive in terms of performance and resources.

    • Coalesce is a more efficient operation as it minimizes data movement by only creating new partitions if necessary.

    • Example: Repartition(10...

  • Answered by AI
  • Q2. Copy Activity in ADF
  • Ans. 

    Copy Activity in ADF is used to move data between supported data stores

    • Copy Activity is a built-in activity in Azure Data Factory (ADF)

    • It can be used to move data between supported data stores such as Azure Blob Storage, SQL Database, etc.

    • It supports various data movement methods like copy, transform, and load (ETL)

    • You can define source and sink datasets, mapping, and settings in Copy Activity

    • Example: Copying data from...

  • Answered by AI

Skills evaluated in this interview

Contribute & help others!
anonymous
You can choose to be anonymous

PwC Interview FAQs

How many rounds are there in PwC Data Engineer interview?
PwC interview process usually has 1-2 rounds. The most common rounds in the PwC interview process are Technical, HR and One-on-one Round.
How to prepare for PwC Data Engineer interview?
Go through your CV in detail and study all the technologies mentioned in your CV. Prepare at least two technologies or languages in depth if you are appearing for a technical interview at PwC. The most common topics and skills that interviewers at PwC expect are SQL, Python, AWS, Leadership Development and Data Modeling.
What are the top questions asked in PwC Data Engineer interview?

Some of the top questions asked at the PwC Data Engineer interview -

  1. What is data flow? Difference with ADF pipeline and data f...read more
  2. What is the difference between reparation and coalesce? What is the difference ...read more
  3. What are the concepts of coalesce and repartition in data processi...read more
How long is the PwC Data Engineer interview process?

The duration of PwC Data Engineer interview process can vary, but typically it takes about less than 2 weeks to complete.

Recently Viewed

JOBS

Browse jobs

Discover jobs you love

COMPANY BENEFITS

KNR Constructions

20 benefits

COMPANY BENEFITS

IRB Infrastructure

60 benefits

COMPANY BENEFITS

Dilip Buildcon

304 benefits

COMPANY BENEFITS

Dilip Buildcon

304 benefits

COMPANY BENEFITS

PwC

No Benefits

INTERVIEWS

TCE

No Interviews

SALARIES

PwC

REVIEWS

Baker Tilly Virchow Krause

No Reviews

JOBS

PwC

No Jobs

Tell us how to improve this page.

PwC Data Engineer Interview Process

based on 16 interviews

2 Interview rounds

  • Technical Round
  • HR Round
View more
PwC Data Engineer Salary
based on 234 salaries
₹4.7 L/yr - ₹18.2 L/yr
At par with the average Data Engineer Salary in India
View more details

PwC Data Engineer Reviews and Ratings

based on 22 reviews

3.1/5

Rating in categories

3.3

Skill development

2.5

Work-life balance

2.7

Salary

2.7

Job security

3.0

Company culture

2.7

Promotions

2.5

Work satisfaction

Explore 22 Reviews and Ratings
Manager _ Data Engineer (BSM, Capital, Liq reporting)

Bangalore / Bengaluru

5-10 Yrs

Not Disclosed

Associate_Data Engineer_Data and Analytics_Advisory

Bangalore / Bengaluru

2-4 Yrs

₹ 4.68-16.466 LPA

Manager_Data Engineer-- Data and Analytics_Advisory

Bangalore / Bengaluru

4-7 Yrs

Not Disclosed

Explore more jobs
Senior Associate
15.3k salaries
unlock blur

₹0 L/yr - ₹0 L/yr

Associate
13.1k salaries
unlock blur

₹0 L/yr - ₹0 L/yr

Manager
6.8k salaries
unlock blur

₹0 L/yr - ₹0 L/yr

Senior Consultant
4.4k salaries
unlock blur

₹0 L/yr - ₹0 L/yr

Associate2
4.3k salaries
unlock blur

₹0 L/yr - ₹0 L/yr

Explore more salaries
Compare PwC with

Deloitte

3.8
Compare

Ernst & Young

3.4
Compare

Accenture

3.8
Compare

TCS

3.7
Compare
Did you find this page helpful?
Yes No
write
Share an Interview