Filter interviews by
RDD stands for Resilient Distributed Dataset and is the fundamental data structure of Apache Spark.
RDD is a distributed collection of objects that can be operated on in parallel.
DataFrames and Datasets are higher-level abstractions built on top of RDDs.
RDDs are more low-level and offer more control over data processing compared to DataFrames and Datasets.
Partitioning is the process of dividing data into smaller chunks for better organization and processing in distributed systems.
Partitioning helps in distributing data across multiple nodes for parallel processing.
Coalesce is used to reduce the number of partitions without shuffling data, while repartition is used to increase the number of partitions by shuffling data.
Example: coalesce(5) will merge partitions into 5 pa...
Spark is a distributed computing framework that provides an interface for programming entire clusters with implicit data parallelism and fault tolerance.
Spark has a master-slave architecture with a driver program that communicates with a cluster manager to distribute work across worker nodes.
It uses Resilient Distributed Datasets (RDDs) for fault-tolerant distributed data processing.
Spark supports various programming l...
DAG stands for Directed Acyclic Graph. It is a finite directed graph with no cycles.
DAG is a collection of nodes connected by edges where each edge goes from one node to another, but no cycles are allowed.
In the context of Spark, a DAG represents the sequence of transformations that need to be applied to the input data to get the final output.
When a Spark job is submitted, Spark creates a DAG of the transformations spe...
Top trending discussions
I applied via Recruitment Consultant and was interviewed before Nov 2020. There was 1 interview round.
I applied via Recruitment Consulltant and was interviewed in Jul 2023. There were 2 interview rounds.
Python code basics like array and all
I applied via Campus Placement and was interviewed before May 2023. There were 2 interview rounds.
Py and big data general questions
3 coding questions and manager round
I applied via Recruitment Consulltant and was interviewed before May 2023. There was 1 interview round.
I applied via Referral and was interviewed in Dec 2023. There were 2 interview rounds.
Coder pad , hacker rank, leet code medium
Concurrency is the ability of multiple tasks to run simultaneously, while inheritance is a mechanism in object-oriented programming where a class inherits properties and behaviors from another class.
Concurrency allows multiple tasks to be executed at the same time, improving performance and efficiency.
Inheritance allows a new class to inherit properties and behaviors from an existing class, promoting code reusability.
C...
I applied via LinkedIn and was interviewed in Feb 2024. There was 1 interview round.
Program to check balanced brackets in a given string
Use a stack to keep track of opening brackets
Iterate through the string and push opening brackets onto the stack
When a closing bracket is encountered, pop from the stack and check if it matches the corresponding opening bracket
If stack is empty at the end and all brackets are matched, the string is balanced
based on 1 interview
Interview experience
based on 1 review
Rating in categories
Senior Applied Data Scientist
127
salaries
| ₹10.9 L/yr - ₹20 L/yr |
Applied Data Scientist
84
salaries
| ₹9.5 L/yr - ₹15.5 L/yr |
Lead Applied Data Scientist
82
salaries
| ₹17 L/yr - ₹28.5 L/yr |
Senior Engineer
59
salaries
| ₹10 L/yr - ₹34 L/yr |
Lead Engineer
49
salaries
| ₹16 L/yr - ₹53.6 L/yr |
EXL Service
Access Healthcare
S&P Global
Acuity Knowledge Partners