Filter interviews by
I applied via Naukri.com and was interviewed in Oct 2022. There were 3 interview rounds.
Had coding test on Python which comprises of 2 questions in which one had 8 test cases and the other had 7 test cases, in order to clear the round we have to get a minimum percentage of 50 and above.
Internal tables store data in a Hive-managed warehouse while external tables store data outside of Hive.
Internal tables are managed by Hive and are stored in a Hive warehouse directory
External tables are not managed by Hive and can be stored in any location accessible by Hive
Dropping an internal table also drops the data while dropping an external table only drops the metadata
Internal tables are faster for querying as ...
Hadoop Architecture is a distributed computing framework that allows for the processing of large data sets.
Hadoop consists of two main components: Hadoop Distributed File System (HDFS) and MapReduce.
HDFS is responsible for storing data across multiple nodes in a cluster.
MapReduce is responsible for processing the data stored in HDFS by dividing it into smaller chunks and processing them in parallel.
Hadoop also includes...
I applied via Naukri.com and was interviewed in Jan 2018. There were 4 interview rounds.
I applied via Campus Placement and was interviewed in Aug 2024. There were 3 interview rounds.
Logical reasoning ,quantitative aptitude
2 coding questions,one was easy and other was medium to tough level
I applied via Naukri.com and was interviewed in Sep 2024. There were 4 interview rounds.
Basic aptitude questions
Data structure and algorithms
I applied via Walk-in and was interviewed in Apr 2024. There were 3 interview rounds.
Lazy evaluation in Spark delays the execution of transformations until an action is called.
Lazy evaluation allows Spark to optimize the execution plan by combining multiple transformations into a single stage.
Transformations are not executed immediately, but are stored as a directed acyclic graph (DAG) of operations.
Actions trigger the execution of the DAG and produce results.
Example: map() and filter() are transformat...
MapReduce is a programming model and processing technique for parallel and distributed computing.
MapReduce is used to process large datasets in parallel across a distributed cluster of computers.
It consists of two main functions - Map function for processing key/value pairs and Reduce function for aggregating the results.
Popularly used in big data processing frameworks like Hadoop for tasks like data sorting, searching...
Skewness is a measure of asymmetry in a distribution. Skewed tables are tables with imbalanced data distribution.
Skewness is a statistical measure that describes the asymmetry of the data distribution around the mean.
Positive skewness indicates a longer tail on the right side of the distribution, while negative skewness indicates a longer tail on the left side.
Skewed tables in data engineering refer to tables with imba...
Spark is a distributed computing framework designed for big data processing.
Spark is built around the concept of Resilient Distributed Datasets (RDDs) which allow for fault-tolerant parallel processing of data.
It provides high-level APIs in Java, Scala, Python, and R for ease of use.
Spark can run on top of Hadoop, Mesos, Kubernetes, or in standalone mode.
It includes modules for SQL, streaming, machine learning, and gra...
I applied via Referral and was interviewed in Oct 2024. There was 1 interview round.
Basic aptitude test like distance problem , age etc
I applied via Naukri.com and was interviewed in Mar 2024. There were 3 interview rounds.
Error handling in PySpark involves using try-except blocks and logging to handle exceptions and errors.
Use try-except blocks to catch and handle exceptions in PySpark code
Utilize logging to record errors and exceptions for debugging purposes
Consider using the .option('mode', 'PERMISSIVE') method to handle corrupt records in data processing
I was asked Python, sql, coding questions
Case study on how would you identify the total number of footfall on a airport
I applied via LinkedIn and was interviewed in Mar 2024. There were 2 interview rounds.
Coding questions on sql python and spark
Implement a function to pair elements of an array based on a given sum.
Iterate through the array and check if the current element plus any other element equals the given sum.
Use a hash set to store elements already visited to avoid duplicate pairs.
Return an array of arrays containing the pairs that sum up to the given value.
Interview experience
based on 1 review
Rating in categories
Assistant Vice President
4.6k
salaries
| ₹17 L/yr - ₹47.5 L/yr |
Assistant Manager
3.3k
salaries
| ₹6 L/yr - ₹20 L/yr |
Officer
2.8k
salaries
| ₹10 L/yr - ₹35 L/yr |
Vice President
2.5k
salaries
| ₹24 L/yr - ₹70 L/yr |
Manager
2.3k
salaries
| ₹9.4 L/yr - ₹37 L/yr |
State Bank of India
HDFC Bank
ICICI Bank
Axis Bank