Home
Communities
Companies
- Companies
  
  Discover best places to work
- Compare Companies
  
  Compare & find best workplace
- Add Office Photos
  
  Bring your workplace to life
- Add Company Benefits
  
  Highlight your company's perks
Reviews
- Company reviews
  
  Read reviews for 6L+ companies
- Write a review
  
  Rate your former or current company
Salaries
- Browse salaries
  
  Discover salaries for 6L+ companies
- Salary calculator
  
  Calculate your take home salary
- Are you paid fairly?
  
  Check your market value
- Share your salary
  
  Help other jobseekers
- Gratuity calculator
  
  Check your gratuity amount
- HRA calculator
  
  Check how much of your HRA is tax-free
- Salary hike calculator
  
  Check your salary hike
Interviews
- Company interviews
  
  Read interviews for 40K+ companies
- Campus placements
  
  Interviews questions for 2K+ colleges
- Share interview questions
  
  Contribute your interview questions
Jobs
Awards

WINNERS AWAITED!
- ABECA 2025
  
  WINNERS AWAITED!
  
  AmbitionBox Employee Choice Awards - 4th Edition
- ABECA 2024
  
  AmbitionBox Employee Choice Awards - 3rd Edition
- AmbitionBox Best Places to Work 2022
  
  2nd Edition
- AmbitionBox Best Places to Work 2021
  
  1st Edition

Add office photos

Engaged Employer

Wipro

Compare

3.7

based on 51.9k Reviews

Filter interviews by

Wipro Lead Data Engineer Interview Questions, Process, and Tips

Updated 6 Dec 2024

Wipro Lead Data Engineer Interview Experiences

2 interviews found

Lead Data Engineer Interview Questions & Answers

Priyanshu Singh

posted on 17 Jun 2024

Interview experience

Average

Difficulty level

Moderate

Process Duration

Less than 2 weeks

Result

Selected

I applied via Approached by Company and was interviewed in May 2024. There was 1 interview round.

Round 1 - Technical

(6 Questions)

Q1. Architecture of spark

Ans.

Spark is a distributed computing framework that provides an interface for programming entire clusters with implicit data parallelism and fault tolerance.

Spark is built around the concept of Resilient Distributed Datasets (RDDs) which are immutable distributed collections of objects.
It supports various programming languages like Java, Scala, Python, and R.
Spark provides high-level APIs like Spark SQL for structured data...

Answered by AI

Add your answer

Q2. Methods to optimizing spark jobs

Ans.

Optimizing Spark jobs involves tuning configurations, partitioning data, caching, and using efficient transformations.

Tune Spark configurations for memory, cores, and parallelism
Partition data to distribute workload evenly
Cache intermediate results to avoid recomputation
Use efficient transformations like map, filter, and reduce
Avoid shuffling data unnecessarily

Answered by AI

Add your answer

Q3. Write SQL to find the second highest sal of emp in each dep

Ans.

SQL query to find the second highest salary of employees in each department

Use a subquery to rank the salaries within each department
Filter the results to only include the second highest salary for each department
Join the result with the employee table to get additional information if needed

Answered by AI

Add your answer

Q4. Write SQL to find the users who purchased 3 consecutive month in a year

Ans.

SQL query to find users who purchased 3 consecutive months in a year

Use a self join on the table to compare purchase months for each user
Group by user and year, then filter for counts of 3 consecutive months
Example: SELECT user_id FROM purchases p1 JOIN purchases p2 ON p1.user_id = p2.user_id WHERE p1.month = p2.month - 1 AND p2.month = p1.month + 1 GROUP BY p1.user_id, YEAR(p1.purchase_date) HAVING COUNT(DISTINCT MONT

Answered by AI

Add your answer

Q5. Working of kafka with spark streaming

Ans.

Kafka is used as a message broker to ingest data into Spark Streaming for real-time processing.

Kafka acts as a buffer between data producers and Spark Streaming to handle high throughput of data
Spark Streaming can consume data from Kafka topics in micro-batches for real-time processing
Kafka provides fault-tolerance and scalability for streaming data processing in Spark

Answered by AI

Add your answer

Q6. Fibonacci series

Add your answer

Interview Preparation Tips

Interview preparation tips for other job seekers - Work on SQL,Spark basic

Skills evaluated in this interview

Lead Data Engineer Interview Questions & Answers

Anonymous

posted on 6 Dec 2024

Interview experience

Excellent

Difficulty level

Easy

Process Duration

Less than 2 weeks

Result

I applied via Approached by Company and was interviewed in Nov 2024. There was 1 interview round.

Round 1 - Technical

(3 Questions)

Q1. SQL Question on window functions to find the highest sale amount per day of the stores

Add your answer

Q2. Build an ETL Pipeline to read json files which are dropping at irregular times into storage. So how do you transform and match the schema etc.,

Add your answer

Q3. Write a pyspark code to join two tables and explain broadcastjoin() & what it does?

Add your answer

Interview questions from similar companies

Lead Data Engineer Interview Questions & Answers

Koch Business Solutions

ashu Pandey

posted on 2 Aug 2024

Interview experience

Excellent

Difficulty level

Process Duration

Result

Round 1 - Technical

(2 Questions)

Q1. Project work currently working

Ans.

Currently working on developing a real-time data processing pipeline for a financial services company.

Designing and implementing data ingestion processes using Apache Kafka
Building data processing workflows with Apache Spark
Optimizing data storage and retrieval with Apache Hadoop
Collaborating with data scientists to integrate machine learning models into the pipeline

Answered by AI

Add your answer

Q2. Sql,python and other data warehousing concept

Add your answer

Lead Data Engineer Interview Questions & Answers

Accenture

Ravi Kiran

posted on 20 Feb 2024

Interview experience

Excellent

Difficulty level

Process Duration

Result

Round 1 - Technical

(3 Questions)

Q1. Given a DataFrame df with columns 'A', 'B','C' how would you group the data by the values in column 'A' and calculate the mean of column 'B' for each group, while also summing the values in column 'C' ?

Ans.

Group data by column 'A', calculate mean of column 'B' and sum values in column 'C' for each group.

Use groupby() function in pandas to group data by column 'A'
Apply mean() function on column 'B' and sum() function on column 'C' for each group
Example: df.groupby('A').agg({'B':'mean', 'C':'sum'})

Answered by AI

Add your answer

Q2. Explain the difference deepcopy() and copy() methods in Python's copy module. Provide a scenario where you would use deepcopy() over copy().

Ans.

deepcopy() creates a new object with completely independent copies of nested objects, while copy() creates a shallow copy.

deepcopy() creates a new object and recursively copies all nested objects, while copy() creates a shallow copy of the top-level object only.
Use deepcopy() when you need to create a deep copy of an object with nested structures, to avoid any references to the original object.
Use copy() when you only ...

Answered by AI

Add your answer

Q3. Discuss the concept of Python decorators and provide an example of how you would use decorators to measure the execution time of a function.

Ans.

Python decorators are functions that modify the behavior of other functions. They are commonly used for adding functionality to existing functions without modifying their code.

Decorators are defined using the @ symbol followed by the decorator function name.
They can be used to measure the execution time of a function by wrapping the function with a timer decorator.
Example: def timer(func): def wrapper(*args, **kwargs...

Answered by AI

Add your answer