Filter interviews by
I applied via Naukri.com and was interviewed in Oct 2023. There were 2 interview rounds.
ETL pipeline on cloud involves extracting data from various sources, transforming it, and loading it into a cloud-based data warehouse.
Use cloud-based ETL tools like AWS Glue, Google Cloud Dataflow, or Azure Data Factory to extract, transform, and load data.
Design the pipeline to handle large volumes of data efficiently and securely.
Utilize serverless computing and auto-scaling capabilities of cloud platforms to optimi...
Data modelling techniques involve creating a visual representation of data relationships and structures.
Identifying entities and their relationships
Creating entity-relationship diagrams
Normalizing data to reduce redundancy
Using tools like ERwin, Visio, or Lucidchart
Implementing data models in databases
Data Warehouse implementation involves designing, building, and maintaining a centralized repository for storing and analyzing data.
Designing the data warehouse schema to meet business requirements
Extracting, transforming, and loading data from various sources into the warehouse
Implementing data quality processes to ensure accuracy and consistency
Creating data models and reports for analysis and decision-making
Optimizi...
I applied via Campus Placement and was interviewed in Sep 2022. There were 3 interview rounds.
2 questions on hackerrank, easy to medium
Top trending discussions
I applied via Campus Placement and was interviewed in Sep 2024. There was 1 interview round.
SQL query using aggregate functions to perform calculations on a dataset
Use aggregate functions like SUM, AVG, COUNT, MIN, MAX to perform calculations on a dataset
Group data using GROUP BY clause to apply aggregate functions on specific groups
Filter data using HAVING clause after applying aggregate functions
I applied via Naukri.com and was interviewed in Sep 2024. There were 4 interview rounds.
Basic aptitude questions
Data structure and algorithms
I applied via Walk-in and was interviewed in Apr 2024. There were 3 interview rounds.
Lazy evaluation in Spark delays the execution of transformations until an action is called.
Lazy evaluation allows Spark to optimize the execution plan by combining multiple transformations into a single stage.
Transformations are not executed immediately, but are stored as a directed acyclic graph (DAG) of operations.
Actions trigger the execution of the DAG and produce results.
Example: map() and filter() are transformat...
MapReduce is a programming model and processing technique for parallel and distributed computing.
MapReduce is used to process large datasets in parallel across a distributed cluster of computers.
It consists of two main functions - Map function for processing key/value pairs and Reduce function for aggregating the results.
Popularly used in big data processing frameworks like Hadoop for tasks like data sorting, searching...
Skewness is a measure of asymmetry in a distribution. Skewed tables are tables with imbalanced data distribution.
Skewness is a statistical measure that describes the asymmetry of the data distribution around the mean.
Positive skewness indicates a longer tail on the right side of the distribution, while negative skewness indicates a longer tail on the left side.
Skewed tables in data engineering refer to tables with imba...
Spark is a distributed computing framework designed for big data processing.
Spark is built around the concept of Resilient Distributed Datasets (RDDs) which allow for fault-tolerant parallel processing of data.
It provides high-level APIs in Java, Scala, Python, and R for ease of use.
Spark can run on top of Hadoop, Mesos, Kubernetes, or in standalone mode.
It includes modules for SQL, streaming, machine learning, and gra...
I applied via Naukri.com and was interviewed in Mar 2024. There were 3 interview rounds.
Error handling in PySpark involves using try-except blocks and logging to handle exceptions and errors.
Use try-except blocks to catch and handle exceptions in PySpark code
Utilize logging to record errors and exceptions for debugging purposes
Consider using the .option('mode', 'PERMISSIVE') method to handle corrupt records in data processing
posted on 16 Oct 2024
I applied via LinkedIn and was interviewed in Mar 2024. There were 2 interview rounds.
Coding questions on sql python and spark
Implement a function to pair elements of an array based on a given sum.
Iterate through the array and check if the current element plus any other element equals the given sum.
Use a hash set to store elements already visited to avoid duplicate pairs.
Return an array of arrays containing the pairs that sum up to the given value.
Interview experience
based on 5 reviews
Rating in categories
Engineer
168
salaries
| ₹7.3 L/yr - ₹30.3 L/yr |
Analyst
90
salaries
| ₹3 L/yr - ₹13 L/yr |
Senior Analyst
78
salaries
| ₹3.8 L/yr - ₹7.6 L/yr |
Software Engineer
55
salaries
| ₹11.2 L/yr - ₹32.5 L/yr |
Data Engineer
37
salaries
| ₹8.6 L/yr - ₹30 L/yr |
HSBC Group
JPMorgan Chase & Co.
Barclays Global Service Centre
Standard Chartered