Home
Communities
Companies
- Companies
  
  Discover best places to work
- Compare Companies
  
  Compare & find best workplace
- Add Office Photos
  
  Bring your workplace to life
- Add Company Benefits
  
  Highlight your company's perks
Reviews
- Company reviews
  
  Read reviews for 6L+ companies
- Write a review
  
  Rate your former or current company
Salaries
- Browse salaries
  
  Discover salaries for 6L+ companies
- Salary calculator
  
  Calculate your take home salary
- Are you paid fairly?
  
  Check your market value
- Share your salary
  
  Help other jobseekers
- Gratuity calculator
  
  Check your gratuity amount
- HRA calculator
  
  Check how much of your HRA is tax-free
- Salary hike calculator
  
  Check your salary hike
Interviews
- Company interviews
  
  Read interviews for 40K+ companies
- Campus placements
  
  Interviews questions for 2K+ colleges
- Share interview questions
  
  Contribute your interview questions
Jobs
Awards

WINNERS AWAITED!
- ABECA 2025
  
  WINNERS AWAITED!
  
  AmbitionBox Employee Choice Awards - 4th Edition
- ABECA 2024
  
  AmbitionBox Employee Choice Awards - 3rd Edition
- AmbitionBox Best Places to Work 2022
  
  2nd Edition
- AmbitionBox Best Places to Work 2021
  
  1st Edition

Add office photos

Engaged Employer

IBM

Compare

4.0

based on 22k Reviews

Video summary

Proud winner of ABECA 2024 - AmbitionBox Employee Choice Awards

Top Rated Mega Company - 2024

Top Rated Company for Women - 2024

Top Rated IT/ITES Company - 2024

Filter interviews by

IBM Data Engineer Interview Questions, Process, and Tips

Updated 21 Jan 2025

IBM Data Engineer Interview Experiences

41 interviews found

Data Engineer Interview Questions & Answers

Anonymous

posted on 28 Jul 2021

Interview Questionnaire

5 Questions

Q1. 1) How to handle data skewness in spark.

Ans.

Data skewness in Spark can be handled by partitioning, bucketing, or using salting techniques.

Partitioning the data based on a key column can distribute the data evenly across the nodes.
Bucketing can group the data into buckets based on a key column, which can improve join performance.
Salting involves adding a random prefix to the key column, which can distribute the data evenly.
Using broadcast joins for small tables c...

Answered by AI

View 4 more answers

Q2. 2) Difference between partitioning and Bucketing

Ans.

Partitioning is dividing data into smaller chunks based on a column value. Bucketing is dividing data into equal-sized buckets based on a hash function.

Partitioning is used for organizing data for efficient querying and processing.
Bucketing is used for evenly distributing data across nodes in a cluster.
Partitioning is done based on a column value, such as date or region.
Bucketing is done based on a hash function, such ...

Answered by AI

View 1 more answer

Q3. 3) Difference between cache and persistent storage

Ans.

Cache is temporary storage used to speed up access to frequently accessed data. Persistent storage is permanent storage used to store data even after power loss.

Cache is faster but smaller than persistent storage
Cache is volatile and data is lost when power is lost
Persistent storage is non-volatile and data is retained even after power loss
Examples of cache include CPU cache, browser cache, and CDN cache
Examples of per...

Answered by AI

View 2 more answers

Q4. 4) How to read json data using spark

Ans.

To read JSON data using Spark, use the SparkSession.read.json() method.

Create a SparkSession object
Use the read.json() method to read the JSON data
Specify the path to the JSON file or directory containing JSON files
The resulting DataFrame can be manipulated using Spark's DataFrame API

Answered by AI

View 2 more answers

Q5. 5) How to create a kafka topic with replication factor 2

Ans.

To create a Kafka topic with replication factor 2, use the command line tool or Kafka API.

Use the command line tool 'kafka-topics.sh' with the '--replication-factor' flag set to 2.
Alternatively, use the Kafka API to create a topic with a replication factor of 2.
Ensure that the number of brokers in the Kafka cluster is greater than or equal to the replication factor.
Consider setting the 'min.insync.replicas' configurati...

Answered by AI

Add your answer

Skills evaluated in this interview

Data Engineer Interview Questions & Answers

Anonymous

posted on 11 Dec 2024

Interview experience

Good

Difficulty level

Process Duration

Result

Round 1 - Technical

(2 Questions)

Q1. About python, sql, pyspark

Add your answer

Q2. Spark Architecture.

Add your answer

Round 2 - HR

(2 Questions)

Q1. When can you join.

Ans.

I can join within two weeks of receiving an offer.

I can start within two weeks of receiving an offer.
I need to give notice at my current job before starting.
I have some personal commitments that I need to wrap up before joining.

Answered by AI

Add your answer

Q2. .

Add your answer

Data Engineer Interview Questions & Answers

Anonymous

posted on 14 Nov 2024

Interview experience

Good

Difficulty level

Process Duration

Result

Round 1 - One-on-one

(2 Questions)

Q1. What is datastage

Ans.

Datastage is an ETL tool used for extracting, transforming, and loading data from various sources to a target destination.

Datastage is a popular ETL tool developed by IBM.
It allows users to design and run jobs that move and transform data.
Datastage supports various data sources such as databases, flat files, and cloud services.
It provides a graphical interface for designing data integration jobs.
Datastage jobs can be s...

Answered by AI

Add your answer

Q2. What is RCP in datastage

Ans.

RCP in DataStage stands for Runtime Column Propagation.

RCP is a feature in IBM DataStage that allows the runtime engine to determine the columns that are needed for processing at runtime.
It helps in optimizing the job performance by reducing unnecessary column processing.
RCP can be enabled or disabled at the job level or individual stage level.
Example: By enabling RCP, DataStage can dynamically propagate only the requi...

Answered by AI

Add your answer

Skills evaluated in this interview

What people are saying about IBM

View All

a servicenow developer

Salary Negotiation

Hi Everyone I am working as a ServiceNow Developer with 6.75 LPA ( 6.1 fixed ) . I got an offer from PWC with 10.5 LPA fixed + 20% variable and I accepted the offer. Now I got another offer from IBM with 11.5 fixed + 10% variable. My LWD is May 02 I want to Negotiate with PWC if they can give more than IBM . My query is How much can I ask for with PWC ? Please pour your Insights Thank you!

Got a question about IBM?

Ask anonymously on communities.

Data Engineer Interview Questions & Answers

Anonymous

posted on 19 Jul 2024

Interview experience

Excellent

Difficulty level

Process Duration

Result

Round 1 - Technical

(2 Questions)

Q1. Questions on python basics and scenario based question on python dict.

Add your answer

Q2. Explanation of project

Add your answer

Round 2 - Technical

(2 Questions)

Q1. Explanation of project done so far

Add your answer

Q2. Technical skills I have and further plans if I have any in terms of certification

Add your answer

Data Engineer Interview Questions & Answers

Anonymous

posted on 3 Dec 2024

Interview experience

Average

Difficulty level

Process Duration

Result

Round 1 - Coding Test

- - - - --- --

Round 2 - Technical

(2 Questions)

Q1. Previous Experiences

Add your answer

Q2. Cloud Experiences, CICD

Add your answer

Get interview-ready with Top IBM Interview Questions

Data Engineer Interview Questions & Answers

Jharna Shivlani

posted on 16 Dec 2024

Interview experience

Excellent

Difficulty level

Process Duration

Result

Round 1 - Technical

(1 Question)

Q1. Snowflake's Architecture

Ans.

Snowflake is a cloud-based data warehousing platform that separates storage and compute, providing scalability and flexibility.

Snowflake uses a unique architecture called multi-cluster, shared data architecture.
It separates storage and compute, allowing users to scale each independently.
Data is stored in virtual warehouses, which are compute resources that can be scaled up or down based on workload.
Snowflake uses a cen...

Answered by AI

Add your answer

Data Engineer Jobs at IBM

View all

Data Engineer - Data Platforms

Pune

3-5 Yrs

₹ 4.25-25 LPA

Data Engineer- Data Platforms-AWS

Pune

2-6 Yrs

₹ 6.5986-27 LPA

Data Engineer-Data Platforms-AWS

Pune

2-6 Yrs

₹ 6.5986-27 LPA

Data Engineer-Data Platforms

Pune

2-5 Yrs

₹ 3.65-25 LPA

Data Engineer-Data Platforms-AWS

Bangalore / Bengaluru

2-6 Yrs

₹ 4.5-28 LPA

Data Engineer-Data Platforms-Azure

Kolkata

3-6 Yrs

₹ 13-13.6 LPA

Data Engineer Interview Questions & Answers

Anonymous

posted on 28 Sep 2024

Interview experience

Excellent

Difficulty level

Process Duration

Result

Round 1 - Technical

(2 Questions)

Q1. Tell me about yourself

Ans.

I am a data engineer with a strong background in programming and data analysis.

Experienced in designing and implementing data pipelines
Proficient in programming languages like Python, SQL, and Java
Skilled in data modeling and database management
Familiar with big data technologies such as Hadoop and Spark

Answered by AI

Add your answer

Q2. Tell me about your last project

Ans.

Developed a data pipeline to process and analyze customer feedback data

Used Apache Spark for data processing
Implemented machine learning models for sentiment analysis
Visualized insights using Tableau for stakeholders
Collaborated with cross-functional teams to improve customer experience

Answered by AI

Add your answer

Data Engineer Interview Questions & Answers

Sidharth Pani

posted on 16 Jun 2024

Interview experience

Excellent

Difficulty level

Process Duration

Result

Round 1 - Technical

(1 Question)

Q1. Difference between row_number and dense_rank

Ans.

row_number assigns unique sequential integers to rows, while dense_rank assigns ranks to rows with no gaps between ranks.

row_number function assigns a unique sequential integer to each row in the result set
dense_rank function assigns ranks to rows with no gaps between ranks
row_number does not handle ties, while dense_rank does
Example: row_number - 1, 2, 3, 4; dense_rank - 1, 2, 2, 3

Answered by AI

Add your answer

Round 2 - Technical

(1 Question)

Q1. Advantages and disadvantages of Hive?

Ans.

Hive is a data warehouse infrastructure built on top of Hadoop for providing data summarization, query, and analysis.

Advantages: SQL-like query language for querying large datasets, optimized for OLAP workloads, supports partitioning and bucketing for efficient queries.
Disadvantages: Slower performance compared to traditional databases for OLTP workloads, limited support for complex queries and transactions.
Example: Hi...

Answered by AI

Add your answer

Skills evaluated in this interview

Data Engineer Interview Questions & Answers

Anonymous

posted on 25 Oct 2024

Interview experience

Excellent

Difficulty level

Moderate

Process Duration

2-4 weeks

Result

Selected

I applied via Referral and was interviewed in Apr 2024. There was 1 interview round.

Round 1 - Technical

(2 Questions)

Q1. Tell me about overall IT experiance

Ans.

I have over 5 years of experience in IT, with a focus on data engineering and database management.

Worked on designing and implementing data pipelines to extract, transform, and load data from various sources
Managed and optimized databases for performance and scalability
Collaborated with cross-functional teams to develop data-driven solutions
Experience with tools like SQL, Python, Hadoop, and Spark
Participated in data m

Answered by AI

Add your answer

Q2. Explain the current project you are working on

Add your answer

Data Engineer Interview Questions & Answers

Anonymous

posted on 22 Aug 2024

Interview experience

Excellent

Difficulty level

Easy

Process Duration

Less than 2 weeks

Result

Selected

I applied via Naukri.com and was interviewed in Jul 2024. There was 1 interview round.

Round 1 - HR

(1 Question)

Q1. What is broadcast variable

Ans.

Broadcast variable is a read-only variable that is cached on each machine in a cluster instead of being shipped with tasks.

Broadcast variables are used to efficiently distribute large read-only datasets to worker nodes in Spark applications.
They are cached in memory on each machine and can be reused across multiple stages of a job.
Broadcast variables help in reducing the amount of data that needs to be transferred over

Answered by AI

Add your answer

Data Engineer Interview Questions & Answers

Tribhuvan Bisht

posted on 5 Nov 2024

Interview experience

Good

Difficulty level

Process Duration

Result

Round 1 - Coding Test

1 hour coding test with 1 coding question and 1 SQL question. Coding question was average, easy to solve. SQL question was very easy.

IBM Interview FAQs

How many rounds are there in IBM Data Engineer interview?

IBM interview process usually has 2-3 rounds. The most common rounds in the IBM interview process are Technical, One-on-one Round and Resume Shortlist.

How to prepare for IBM Data Engineer interview?

Go through your CV in detail and study all the technologies mentioned in your CV. Prepare at least two technologies or languages in depth if you are appearing for a technical interview at IBM. The most common topics and skills that interviewers at IBM expect are Python, Unix Shell Scripting, Big Data, SQL and Interpersonal Skills.

What are the top questions asked in IBM Data Engineer interview?

Some of the top questions asked at the IBM Data Engineer interview -

1) How to handle data skewness in spar...read more
5) How to create a kafka topic with replication facto...read more
4) How to read json data using sp...read more

How long is the IBM Data Engineer interview process?

The duration of IBM Data Engineer interview process can vary, but typically it takes about less than 2 weeks to complete.

Tell us how to improve this page.

IBM Interviews By Designations

Interview Questions for Popular Designations

IBM Data Engineer Interview Process

based on 40 interviews

3 Interview rounds

Technical Round - 1
Technical Round - 2
Personal Interview1 Round

TCS Data Engineer Interview Questions

3.7

• 90 Interviews

Accenture Data Engineer Interview Questions

3.8

• 78 Interviews

LTIMindtree Data Engineer Interview Questions

3.8

• 61 Interviews

Capgemini Data Engineer Interview Questions

3.7

• 33 Interviews

Cognizant Data Engineer Interview Questions

3.8

• 30 Interviews

Infosys Data Engineer Interview Questions

3.6

• 25 Interviews

Wipro Data Engineer Interview Questions

3.7

• 24 Interviews

Tech Mahindra Data Engineer Interview Questions

3.5

• 14 Interviews

HCLTech Data Engineer Interview Questions

3.5

• 11 Interviews

Genpact Data Engineer Interview Questions

3.8

• 7 Interviews

View all

National Institute of Technology (NIT), Warangal Placement Questions

27 Interviews

Indian Institute of Technology (IIT), Chennai Placement Questions

14 Interviews

Veermata Jijabai Technological Institute (VJTI), Mumbai Placement Questions

4 Interviews

Indian Institute of Technology (IIT), Kanpur Placement Questions

3 Interviews

Indian Institute of Technology (IIT), Kharagpur Placement Questions

3 Interviews

Sastra University Placement Questions

3 Interviews

NBKR Institute of Science and Technology, Vidyanagar Placement Questions

2 Interviews

View all

IBM Data Engineer Salary

based on 2.8k salaries

₹4.8 L/yr - ₹24.5 L/yr

37% more than the average Data Engineer Salary in India

View more details

Data Engineer Jobs at IBM

Data Engineer - Data Platforms

Pune

3-5 Yrs

₹ 4.25-25 LPA

Data Engineer- Data Platforms-AWS

Pune

2-6 Yrs

₹ 6.5986-27 LPA

Data Engineer-Data Platforms-AWS

Pune

2-6 Yrs

₹ 6.5986-27 LPA

Explore more jobs

IBM Salaries in India

Application Developer 11.7k salaries	₹5.9 L/yr - ₹26.5 L/yr
Software Engineer 5.5k salaries	₹5.4 L/yr - ₹22.6 L/yr
Advisory System Analyst 5.2k salaries	₹9.4 L/yr - ₹26 L/yr
Senior Software Engineer 4.8k salaries	₹8 L/yr - ₹30 L/yr
Senior Systems Engineer 4.5k salaries	₹5.6 L/yr - ₹20 L/yr