Home
Communities
Companies
- Companies
  
  Discover best places to work
- Compare Companies
  
  Compare & find best workplace
- Add Office Photos
  
  Bring your workplace to life
- Add Company Benefits
  
  Highlight your company's perks
Reviews
- Company reviews
  
  Read reviews for 6L+ companies
- Write a review
  
  Rate your former or current company
Salaries
- Browse salaries
  
  Discover salaries for 6L+ companies
- Salary calculator
  
  Calculate your take home salary
- Are you paid fairly?
  
  Check your market value
- Share your salary
  
  Help other jobseekers
- Gratuity calculator
  
  Check your gratuity amount
- HRA calculator
  
  Check how much of your HRA is tax-free
- Salary hike calculator
  
  Check your salary hike
Interviews
- Company interviews
  
  Read interviews for 40K+ companies
- Campus placements
  
  Interviews questions for 2K+ colleges
- Share interview questions
  
  Contribute your interview questions
Jobs
Awards

WINNERS AWAITED!
- ABECA 2025
  
  WINNERS AWAITED!
  
  AmbitionBox Employee Choice Awards - 4th Edition
- ABECA 2024
  
  AmbitionBox Employee Choice Awards - 3rd Edition
- AmbitionBox Best Places to Work 2022
  
  2nd Edition
- AmbitionBox Best Places to Work 2021
  
  1st Edition

Add office photos

Engaged Employer

Capgemini

Compare

3.7

based on 41.7k Reviews

Video summary

Proud winner of ABECA 2024 - AmbitionBox Employee Choice Awards

Top Rated Mega Company - 2024

Top Rated IT/ITES Company - 2024

Filter interviews by

Capgemini Data Engineer Interview Questions, Process, and Tips

Updated 14 Jan 2025

Capgemini Data Engineer Interview Experiences

33 interviews found

Data Engineer Interview Questions & Answers

Brijesh yadav

posted on 9 Jan 2025

Interview experience

Good

Difficulty level

Process Duration

Result

Round 1 - Technical

(3 Questions)

Q1. What are the optimization techniques used in Apache Spark?

Add your answer

Q2. 2 SQL queries , 1 PySpark code and 1 Python Code .

Add your answer

Q3. 2-3 Scenario Based questions from ADF and databricks .

Add your answer

Data Engineer Interview Questions & Answers

nikhil yeole

posted on 14 Jan 2025

Interview experience

Excellent

Difficulty level

Process Duration

Result

Round 1 - Technical

(1 Question)

Q1. Basic question on AWS and SQL,Pyspark,Python

Add your answer

Data Engineer Interview Questions & Answers

Anonymous

posted on 25 Jul 2024

Interview experience

Average

Difficulty level

Easy

Process Duration

Less than 2 weeks

Result

No response

I was interviewed in Jun 2024.

Round 1 - Technical

(13 Questions)

Q1. How to remove Duplicates in Data frame using pyspark?

Ans.

Use dropDuplicates() function in pyspark to remove duplicates in a data frame.

Use dropDuplicates() function on the data frame to remove duplicates based on all columns.
Specify subset of columns to remove duplicates based on specific columns.
Use the distinct() function to remove duplicates and keep only distinct rows.

Answered by AI

Add your answer

Q2. What is Broadcast join ? How is this useful

Ans.

Broadcast join is a type of join operation in distributed computing where one smaller dataset is broadcasted to all nodes for efficient processing.

Reduces data shuffling by sending smaller dataset to all nodes
Useful when one dataset is significantly smaller than the other
Improves performance by reducing network traffic and processing time

Answered by AI

Add your answer

Q3. What is Re-Partition and Coalesce? How are these used?

Ans.

Re-Partition and Coalesce are methods used to control the number of partitions in a dataset in Apache Spark.

Re-Partition is used to increase or decrease the number of partitions in a dataset by shuffling the data across the cluster.
Coalesce is used to decrease the number of partitions in a dataset without shuffling the data, which can improve performance.
Re-Partition is typically used when there is a need to increase p...

Answered by AI

Add your answer

Q4. Write Python code to Extract Pincode from Address Field in Dataframe using Pyspark?

Ans.

Extract Pincode from Address Field in Dataframe using Pyspark

Use pyspark.sql.functions regexp_extract() function to extract pincode from address field
Create a new column in the dataframe to store the extracted pincode
Specify the regular expression pattern for pincode extraction
Example: df.withColumn('pincode', regexp_extract(df['address'], '\b\d{6}\b', 0))

Answered by AI

Add your answer

Q5. Write a SQL to get Student names who got marks>45 in each subject from Student table

Ans.

SQL query to retrieve student names with marks > 45 in each subject

Use GROUP BY and HAVING clauses to filter students with marks > 45 in each subject
Join Student table with Marks table on student_id to get marks for each student
Select student names from Student table based on the conditions

Answered by AI

Add your answer

Q6. How to Enable Hive support in spark?

Ans.

Enable Hive support in Spark for seamless integration of Hive tables and queries.

Set 'spark.sql.catalogImplementation' to 'hive' in SparkConf
Include 'spark-hive' dependency in the Spark application
Ensure Hive configuration files are available in the classpath
Use HiveContext or enable Hive support in SparkSession

Answered by AI

Add your answer

Q7. Explain Joins in spark using pyspark

Ans.

Joins in Spark using PySpark are used to combine data from two different DataFrames based on a common key.

Joins are performed using the join() function in PySpark.
Common types of joins include inner join, outer join, left join, and right join.
Example: df1.join(df2, df1.key == df2.key, 'inner')

Answered by AI

Add your answer

Q8. How will you Join if two tables are large in pyspark?

Ans.

Use broadcast join or partition join in pyspark to join two large tables efficiently.

Use broadcast join for smaller table and partition join for larger table.
Broadcast join - broadcast the smaller table to all worker nodes.
Partition join - partition both tables on the join key and join them.
Example: df1.join(broadcast(df2), 'join_key')
Example: df1.join(df2, 'join_key').repartition('join_key')

Answered by AI

Add your answer

Q9. What is df.explain() in pyspark

Ans.

df.explain() in pyspark is used to display the physical plan of the DataFrame operations.

df.explain() is used to show the execution plan of the DataFrame operations in pyspark.
It helps in understanding how the operations are being executed and optimized by Spark.
The output of df.explain() includes details like the logical and physical plans, optimizations applied, and stages of execution.

Answered by AI

Add your answer

Q10. Explain spark architecture

Ans.

Spark architecture is a distributed computing framework that consists of a driver program, cluster manager, and worker nodes.

Spark driver program coordinates the execution of tasks and maintains the overall state of the application.
Cluster manager allocates resources for the application and monitors its execution.
Worker nodes execute the tasks assigned by the driver program and store data in memory or disk.
Spark archit...

Answered by AI

Add your answer

Q11. What Purpose of Lineage graph ?

Ans.

Lineage graph is used to track the flow of data from source to destination, helping in understanding data dependencies and impact analysis.

Helps in understanding data dependencies and relationships
Tracks the flow of data from source to destination
Aids in impact analysis and troubleshooting
Useful for data governance and compliance
Can be visualized to easily comprehend complex data pipelines

Answered by AI

Add your answer

Q12. External Table vs Internal table

Ans.

External tables store data outside the database while internal tables store data within the database.

External tables reference data stored outside the database, such as in HDFS or S3, while internal tables store data within the database itself.
External tables are typically used for data that is not managed by the database system, while internal tables are used for data that is managed by the database system.
External ta...

Answered by AI

Add your answer

Q13. Assume below Dataframes DF1 (UserID,Name) DF2 (UserID,PageID,Timestamp,Events) Write code to Join the DF's, Count the No of Events and filter Users with 0 Events

Ans.

Join DF's, count events, filter users with 0 events

Use join operation to combine DF1 and DF2 on UserID
Group by UserID and count the number of events
Filter out users with 0 events

Answered by AI

Add your answer

Interview Preparation Tips

Interview preparation tips for other job seekers - Practice Pyspark,python and SQL Hands on

Skills evaluated in this interview

What people are saying about Capgemini

View All

a consultant

Hi Everyone, I need your advice on selecting the right company. My top priorities are work-life balance and job security. Below are the offers I have: Capgemini – 24 LPA (21.5 LPA Fixed + 2 LPA Variable) Happiest Minds – 22.5 LPA (Fixed) Experience: 8 Years

Which one to choose

Capgemini

Happiest mind

9 participants . poll closed

Got a question about Capgemini?

Ask anonymously on communities.

Data Engineer Interview Questions & Answers

Narmatha Rengaraj

posted on 27 Nov 2024

Interview experience

Excellent

Difficulty level

Moderate

Process Duration

2-4 weeks

Result

Selected

I applied via campus placement at Government College of Engineering, Salem and was interviewed in Oct 2024. There was 1 interview round.

Round 1 - Technical

(2 Questions)

Q1. What are the SQL concepts

Add your answer

Q2. Phython data frame concepts and all

Add your answer

Interview Preparation Tips

Interview preparation tips for other job seekers - Nothing

Data Engineer Interview Questions & Answers

Ziad Ouldbouya

posted on 5 Sep 2024

Interview experience

Good

Difficulty level

Moderate

Process Duration

2-4 weeks

Result

Selected

I applied via LinkedIn and was interviewed in Aug 2024. There were 3 interview rounds.

Round 1 - HR

(2 Questions)

Q1. Describe yourself

Ans.

I am a detail-oriented data engineer with a passion for problem-solving and a strong background in programming and data analysis.

Experienced in designing and implementing data pipelines
Proficient in programming languages such as Python, SQL, and Java
Skilled in data modeling and database management
Strong analytical skills and ability to work with large datasets
Excellent communication and teamwork skills

Answered by AI

Add your answer

Q2. Summary of your formation

Ans.

I have a Bachelor's degree in Computer Science and a Master's degree in Data Engineering.

Bachelor's degree in Computer Science
Master's degree in Data Engineering

Answered by AI

Add your answer

Round 2 - Coding Test

Some DSA problems medium level

Round 3 - Technical

(2 Questions)

Q1. Spark archi, hadoop ecosystem, hive

Add your answer

Q2. Some sql questions also

Add your answer

Get interview-ready with Top Capgemini Interview Questions

Data Engineer Interview Questions & Answers

Anonymous

posted on 18 Oct 2024

Interview experience

Good

Difficulty level

Moderate

Process Duration

Less than 2 weeks

Result

Not Selected

I applied via Naukri.com and was interviewed in Sep 2024. There was 1 interview round.

Round 1 - Technical

(2 Questions)

Q1. Find the second largest

Ans.

To find the second largest element in an array

Sort the array in descending order
Return the element at index 1

Answered by AI

Add your answer

Q2. Drop duplicates

Add your answer

Interview Preparation Tips

Interview preparation tips for other job seekers - questions from spark architecture, sql and python.

Skills evaluated in this interview

Data Engineer Jobs at Capgemini

View all

Data Engineer

Bangalore / Bengaluru

6-9 Yrs

Not Disclosed

Data Engineer

Pune

4-6 Yrs

Not Disclosed

Data Engineer

Hyderabad / Secunderabad

4-6 Yrs

₹ 5.6-15.84 LPA

Data Engineer

Bangalore / Bengaluru

3-6 Yrs

₹ 4-24 LPA

Data Engineer - C

Pune

3-5 Yrs

₹ 4.2-16 LPA

Data Engineer

Bangalore / Bengaluru

2-6 Yrs

₹ 3.25-24 LPA

Data Engineer Interview Questions & Answers

Anonymous

posted on 19 Aug 2024

Interview experience

Excellent

Difficulty level

Moderate

Process Duration

Less than 2 weeks

Result

Selected

I applied via Company Website and was interviewed in Jul 2024. There were 4 interview rounds.

Round 1 - Aptitude Test

Numeric ability , reasoning, maths

Round 2 - Coding Test

Python , data structures, c , c++, java

Round 3 - Technical

(2 Questions)

Q1. Python to write a factorial

Ans.

Python code to calculate factorial of a number

Use a recursive function to calculate the factorial
Base case: if n is 0 or 1, return 1
Recursive case: return n * factorial(n-1)
Example: def factorial(n): return 1 if n == 0 or n == 1 else n * factorial(n-1)

Answered by AI

Add your answer

Q2. Ddl commands and dml

Add your answer

Round 4 - HR

(2 Questions)

Q1. Tell me about yourself

Add your answer

Q2. Why IT after electrical engineering

Ans.

Combining my electrical engineering background with IT skills allows me to work on cutting-edge technologies and solve complex problems.

Interest in technology and data analysis sparked during electrical engineering studies
Realized the potential of combining electrical engineering knowledge with IT for innovative solutions
Opportunities in data engineering field align with my career goals

Answered by AI

Add your answer

Skills evaluated in this interview

Data Engineer Interview Questions & Answers

Gaurav Gujjar

posted on 8 Sep 2024

Interview experience

Excellent

Difficulty level

Process Duration

Result

Round 1 - Technical

(2 Questions)

Q1. Optimization technique in pyspark

Ans.

One optimization technique in PySpark is using partitioning to distribute data evenly across nodes.

Use partitioning to distribute data evenly across nodes
Avoid shuffling data unnecessarily
Cache intermediate results to avoid recomputation

Answered by AI

Add your answer

Q2. How to handle data skewness ?

Ans.

Data skewness can be handled by partitioning data, using sampling techniques, optimizing queries, and using parallel processing.

Partitioning data based on key values to distribute workload evenly
Using sampling techniques to estimate skewed data distribution
Optimizing queries by using appropriate indexes and query optimization techniques
Using parallel processing to distribute workload across multiple nodes

Answered by AI

Add your answer

Skills evaluated in this interview

Data Engineer Interview Questions & Answers

Anonymous

posted on 28 Aug 2024

Interview experience

Good

Difficulty level

Process Duration

Result

Round 1 - Technical

(2 Questions)

Q1. Write a query to find the 2nd highest salary.

Ans.

Query to find the 2nd highest salary in a database table.

Use the ORDER BY clause to sort salaries in descending order.
Use the LIMIT clause to retrieve the second row.
Consider handling cases where there may be ties for the highest salary.

Answered by AI

Add your answer

Q2. Write a code to count frequency of elements in a list.

Ans.

Code to count frequency of elements in a list of strings.

Use a dictionary to store the frequency of each element in the list.
Iterate through the list and update the count in the dictionary.
Return the dictionary with element frequencies.

Answered by AI

Add your answer

Skills evaluated in this interview

Data Engineer Interview Questions & Answers

prachi gujar

posted on 14 Jun 2024

Interview experience

Good

Difficulty level

Moderate

Process Duration

Less than 2 weeks

Result

I applied via Naukri.com and was interviewed in May 2024. There were 3 interview rounds.

Round 1 - One-on-one

(2 Questions)

Q1. What are the Components of Data factory pipeline ?

Ans.

Components of Data factory pipeline include datasets, activities, linked services, triggers, and pipelines.

Datasets: Define the data structure and location for input and output data.
Activities: Define the actions to be performed on the data such as data movement, data transformation, or data processing.
Linked Services: Define the connections to external data sources or destinations.
Triggers: Define the conditions that ...

Answered by AI

Add your answer

Q2. What is the activity use for creating email notification?

Ans.

The activity used for creating email notification is sending an email.

Use SMTP (Simple Mail Transfer Protocol) to send emails
Set up an email server or use a third-party email service provider
Include the recipient's email address, subject, and message content
Can be automated using tools like Python's smtplib library or email marketing platforms like Mailchimp

Answered by AI

Add your answer

Round 2 - Coding Test

English test for reading,writting ,listening,speaking skills

Round 3 - HR

(2 Questions)

Q1. Tell me about yourself ?

Add your answer

Q2. Career background and experience?

Add your answer

Skills evaluated in this interview

Capgemini Interview FAQs

How many rounds are there in Capgemini Data Engineer interview?

Capgemini interview process usually has 1-2 rounds. The most common rounds in the Capgemini interview process are Technical, One-on-one Round and HR.

How to prepare for Capgemini Data Engineer interview?

Go through your CV in detail and study all the technologies mentioned in your CV. Prepare at least two technologies or languages in depth if you are appearing for a technical interview at Capgemini. The most common topics and skills that interviewers at Capgemini expect are Python, Spark, AWS, SCALA and SQL.

What are the top questions asked in Capgemini Data Engineer interview?

Some of the top questions asked at the Capgemini Data Engineer interview -

How will you Join if two tables are large in pyspa...read more
Write a SQL to get Student names who got marks>45 in each subject from Student...read more
How to remove Duplicates in Data frame using pyspa...read more

How long is the Capgemini Data Engineer interview process?

The duration of Capgemini Data Engineer interview process can vary, but typically it takes about less than 2 weeks to complete.

Tell us how to improve this page.

Capgemini Interviews By Designations

Interview Questions for Popular Designations

Capgemini Data Engineer Interview Process

based on 40 interviews

2 Interview rounds

Technical Round - 1
Technical Round - 2

Top Skills for Capgemini Data Engineer

Big Data Interview Questions & Answers

250 Questions

SQL Interview Questions & Answers

250 Questions

Spark Interview Questions & Answers

50 Questions

TCS Data Engineer Interview Questions

3.7

• 90 Interviews

Accenture Data Engineer Interview Questions

3.8

• 78 Interviews

LTIMindtree Data Engineer Interview Questions

3.8

• 61 Interviews

IBM Data Engineer Interview Questions

4.0

• 41 Interviews

Cognizant Data Engineer Interview Questions

3.8

• 30 Interviews

Infosys Data Engineer Interview Questions

3.6

• 25 Interviews

Wipro Data Engineer Interview Questions

3.7

• 24 Interviews

Tech Mahindra Data Engineer Interview Questions

3.5

• 14 Interviews

HCLTech Data Engineer Interview Questions

3.5

• 11 Interviews

Genpact Data Engineer Interview Questions

3.8

• 7 Interviews

View all

Lovely Professional University (LPU) Placement Questions

11 Interviews

SRM university (SRMU) Placement Questions

9 Interviews

GLA Institute of Technology and Management, Mathura Placement Questions

6 Interviews

Indian Institute of Technology (IIT), Chennai Placement Questions

6 Interviews

Mumbai University Placement Questions

6 Interviews

Vidyalankar Institute of Technology, Mumbai Placement Questions

5 Interviews

Sathyabama University Placement Questions

5 Interviews

View all

Capgemini Data Engineer Salary

based on 1.5k salaries

₹2.8 L/yr - ₹15.9 L/yr

16% less than the average Data Engineer Salary in India

View more details

Data Engineer Jobs at Capgemini

Data Engineer

Bangalore / Bengaluru

6-9 Yrs

Not Disclosed

Data Engineer

Pune

4-6 Yrs

Not Disclosed

Data Engineer

Hyderabad / Secunderabad

4-6 Yrs

₹ 5.6-15.84 LPA

Explore more jobs

Capgemini Salaries in India

Consultant 55.2k salaries	₹5.2 L/yr - ₹17.5 L/yr
Associate Consultant 50.8k salaries	₹3 L/yr - ₹10 L/yr
Senior Consultant 46.1k salaries	₹7.5 L/yr - ₹24.5 L/yr
Senior Analyst 20.6k salaries	₹2 L/yr - ₹7.5 L/yr
Senior Software Engineer 20.2k salaries	₹3.5 L/yr - ₹12.1 L/yr