Add office photos
Employer?
Claim Account for FREE

Virtusa Consulting Services

3.8
based on 4.7k Reviews
Filter interviews by

Deloitte Interview Questions and Answers

Updated 31 Dec 2024
Popular Designations

Q1. What is the difference between the reduceBy and groupBy transformations in Apache Spark?

Ans.

reduceBy is used to aggregate data based on key, while groupBy is used to group data based on key.

  • reduceBy is a transformation that combines the values of each key using an associative function and a neutral 'zero value'.

  • groupBy is a transformation that groups the data based on a key and returns a grouped data set.

  • reduceBy is more efficient for aggregating data as it reduces the data before shuffling, while groupBy shuffles all the data before grouping.

  • reduceBy is typically u...read more

Add your answer

Q2. What is the difference between RDD (Resilient Distributed Datasets) and DataFrame in Apache Spark?

Ans.

RDD is a low-level abstraction representing a distributed collection of objects, while DataFrame is a higher-level abstraction representing a distributed collection of data organized into named columns.

  • RDD is more suitable for unstructured data and low-level transformations, while DataFrame is more suitable for structured data and high-level abstractions.

  • DataFrames provide optimizations like query optimization and code generation, while RDDs do not.

  • DataFrames support SQL quer...read more

Add your answer

Q3. What is PySpark, and can you explain its features and uses?

Ans.

PySpark is a Python API for Apache Spark, used for big data processing and analytics.

  • PySpark is a Python API for Apache Spark, a fast and general-purpose cluster computing system.

  • It allows for easy integration with Python libraries and provides high-level APIs in Python.

  • PySpark can be used for processing large datasets, machine learning, real-time data streaming, and more.

  • It supports various data sources such as HDFS, Apache Hive, JSON, Parquet, and more.

  • PySpark is widely use...read more

Add your answer

Q4. What are the different modes of execution in Apache Spark?

Ans.

The different modes of execution in Apache Spark include local mode, standalone mode, YARN mode, and Mesos mode.

  • Local mode: Spark runs on a single machine with one executor.

  • Standalone mode: Spark runs on a cluster managed by a standalone cluster manager.

  • YARN mode: Spark runs on a Hadoop cluster using YARN as the resource manager.

  • Mesos mode: Spark runs on a Mesos cluster with Mesos as the resource manager.

Add your answer
Discover Deloitte interview dos and don'ts from real experiences

Q5. What is the architecture of Apache Spark?

Ans.

Apache Spark architecture includes a cluster manager, worker nodes, and driver program.

  • Apache Spark architecture consists of a cluster manager, which allocates resources and schedules tasks.

  • Worker nodes execute tasks and store data in memory or disk.

  • Driver program coordinates tasks and communicates with the cluster manager.

  • Spark applications run as independent sets of processes on a cluster, coordinated by the SparkContext object.

  • Data is processed in parallel across the worke...read more

Add your answer

Q6. What is the difference between PySpark and Python?

Ans.

PySpark is a Python API for Apache Spark, while Python is a general-purpose programming language.

  • PySpark is specifically designed for big data processing using Spark, while Python is a versatile programming language used for various applications.

  • PySpark allows for distributed computing and parallel processing, while Python is primarily used for sequential programming.

  • PySpark provides libraries and tools for working with large datasets efficiently, while Python may require add...read more

Add your answer

Q7. What is with clause in SQL ?

Ans.

WITH clause in SQL is used to create temporary named result sets that can be referenced within the main query.

  • WITH clause is used to improve the readability and maintainability of complex SQL queries.

  • It allows creating subqueries or common table expressions (CTEs) that can be referenced multiple times.

  • The result sets created using WITH clause can be used for recursive queries, data transformation, or simplifying complex queries.

  • It helps in breaking down complex queries into s...read more

Add your answer
Contribute & help others!
Write a review
Share interview
Contribute salary
Add office photos

Interview Process at Deloitte

based on 4 interviews
Interview experience
4.0
Good
View more
Interview Tips & Stories
Ace your next interview with expert advice and inspiring stories

Top Data Engineer Interview Questions from Similar Companies

3.8
 • 72 Interview Questions
3.7
 • 64 Interview Questions
3.8
 • 28 Interview Questions
3.4
 • 18 Interview Questions
3.5
 • 16 Interview Questions
3.5
 • 13 Interview Questions
View all
Share an Interview
Stay ahead in your career. Get AmbitionBox app
qr-code
Helping over 1 Crore job seekers every month in choosing their right fit company
70 Lakh+

Reviews

5 Lakh+

Interviews

4 Crore+

Salaries

1 Cr+

Users/Month

Contribute to help millions

Made with ❤️ in India. Trademarks belong to their respective owners. All rights reserved © 2024 Info Edge (India) Ltd.

Follow us
  • Youtube
  • Instagram
  • LinkedIn
  • Facebook
  • Twitter