Filter interviews by
I applied via Naukri.com and was interviewed in Jun 2024. There was 1 interview round.
Apache Spark is an open-source distributed computing system that provides an interface for programming entire clusters with implicit data parallelism and fault tolerance.
Apache Spark is designed for speed and ease of use in processing large amounts of data.
It can run programs up to 100x faster than Hadoop MapReduce in memory, or 10x faster on disk.
Spark provides high-level APIs in Java, Scala, Python, and R, and an opt...
Core components of Spark include Spark Core, Spark SQL, Spark Streaming, MLlib, and GraphX.
Spark Core: foundation of the Spark platform, provides basic functionality for distributed data processing
Spark SQL: module for working with structured data using SQL and DataFrame API
Spark Streaming: extension of the core Spark API that enables scalable, high-throughput, fault-tolerant stream processing of live data streams
MLlib...
Implement fault tolerance by using checkpointing, replication, and monitoring mechanisms.
Enable checkpointing in Spark Streaming to save the state of the computation periodically to a reliable storage like HDFS or S3.
Use replication in Kafka to ensure that data is not lost in case of node failures.
Monitor the health of the Kafka and Spark clusters using tools like Prometheus and Grafana to detect and address issues pro
Hive Architecture is a data warehousing infrastructure built on top of Hadoop for querying and analyzing large datasets.
Hive uses a language called HiveQL which is similar to SQL for querying data stored in Hadoop.
It organizes data into tables, partitions, and buckets to optimize queries and improve performance.
Hive metastore stores metadata about tables, columns, partitions, and their locations.
Hive queries are conver...
Vectorization is the process of converting data into a format that can be easily processed by a computer's CPU or GPU.
Vectorization allows for parallel processing of data, improving computational efficiency.
It involves performing operations on entire arrays or matrices at once, rather than on individual elements.
Examples include using libraries like NumPy in Python to perform vectorized operations on arrays.
Vectorizati...
Partition in Hive is a way to organize data in a table into multiple directories based on the values of one or more columns.
Partitions help in improving query performance by allowing Hive to only read the relevant data directories.
Partitions are defined when creating a table in Hive using the PARTITIONED BY clause.
Example: CREATE TABLE table_name (column1 INT, column2 STRING) PARTITIONED BY (column3 STRING);
Functions in SQL are built-in operations that can be used to manipulate data or perform calculations within a database.
Functions in SQL can be used to perform operations on data, such as mathematical calculations, string manipulation, date/time functions, and more.
Examples of SQL functions include SUM(), AVG(), CONCAT(), UPPER(), LOWER(), DATE_FORMAT(), and many others.
Functions can be used in SELECT statements, WHERE ...
Rank, Dense_rank, and row_number are window functions used in SQL to assign a rank to each row based on a specified order.
Rank function assigns a unique rank to each row based on the specified order.
Dense_rank function assigns a unique rank to each row without any gaps based on the specified order.
Row_number function assigns a unique sequential integer to each row based on the specified order.
I applied via Naukri.com and was interviewed in Feb 2024. There was 1 interview round.
explode function is used in Apache Spark to split a column containing arrays into multiple rows.
Used in Apache Spark to split a column containing arrays into multiple rows
Creates a new row for each element in the array
Syntax: explode(col: Column): Column
Example: df.select(explode(col('array_column')))
I applied via Naukri.com and was interviewed in Dec 2024. There were 4 interview rounds.
Set of questions on english , aptitude , all are at easy level
Sql basics and some query questions
I applied via Naukri.com and was interviewed in Sep 2024. There were 3 interview rounds.
Some multiple choice, 2 sql and 2 python questions were asked
Developed a real-time data processing system for analyzing customer behavior
Used Apache Kafka for streaming data ingestion
Implemented data pipelines using Apache Spark for processing and analysis
Utilized Elasticsearch for storing and querying large volumes of data
Developed custom machine learning models for predictive analytics
I have used partitioning and indexing to optimize query performance.
Implemented partitioning on large tables to improve query performance by limiting the data scanned
Created indexes on frequently queried columns to speed up data retrieval
Utilized clustering keys to physically organize data on disk for faster access
I applied via Company Website and was interviewed in Aug 2024. There were 2 interview rounds.
Uber data model design for efficient storage and retrieval of ride-related information.
Create tables for users, drivers, rides, payments, and ratings
Include attributes like user_id, driver_id, ride_id, payment_id, rating_id, timestamp, location, fare, etc.
Establish relationships between tables using foreign keys
Implement indexing for faster query performance
I applied via Newspaper Ad and was interviewed in Aug 2024. There were 3 interview rounds.
Three sections are there 1) Aptitude Test 2) SQL 3) DSA
DSA stands for Data Structures and Algorithms. Sorting is the process of arranging data in a particular order. Array is a data structure that stores elements of the same data type in contiguous memory locations, while linked list is a data structure that stores elements in nodes with pointers to the next node.
DSA stands for Data Structures and Algorithms
Sorting is the process of arranging data in a particular order
Arra...
I have experience working on various data analysis projects, including market research, customer segmentation, and predictive modeling.
Developed predictive models to forecast customer behavior and optimize marketing strategies
Conducted market research to identify trends and opportunities for growth
Performed customer segmentation analysis to target specific demographics with personalized marketing campaigns
ALL() ignores all filters in the query context, while ALLSELECTED() ignores only filters on columns in the visual.
ALL() removes all filters from the specified column or table.
ALLSELECTED() removes filters from the specified column or table, but keeps filters on other columns in the visual.
Example: ALL('Table') would remove all filters on the 'Table' in the query context.
Example: ALLSELECTED('Column') would remove filte...
COUNT() counts only numeric values, while COUNTA() counts all non-empty cells.
COUNT() counts only cells with numerical values.
COUNTA() counts all non-empty cells, including text and errors.
Example: COUNT(A1:A5) will count only cells with numbers, while COUNTA(A1:A5) will count all non-empty cells.
I applied via Approached by Company and was interviewed in Aug 2024. There was 1 interview round.
Maxium sub string and reverse a string
I applied via campus placement at Lady Shri Ram College for Women, Delhi
Basic English, Quants and Statistics
Easy, relevant to pandemic
based on 1 review
Rating in categories
Bangalore / Bengaluru
3-7 Yrs
Not Disclosed
Bangalore / Bengaluru
3-7 Yrs
Not Disclosed
Senior Associate
14.7k
salaries
| ₹8 L/yr - ₹30 L/yr |
Associate
12.7k
salaries
| ₹4.6 L/yr - ₹16 L/yr |
Manager
6.6k
salaries
| ₹13.5 L/yr - ₹50 L/yr |
Senior Consultant
4.4k
salaries
| ₹9 L/yr - ₹32 L/yr |
Associate2
4.1k
salaries
| ₹4.5 L/yr - ₹16.6 L/yr |
Deloitte
Ernst & Young
Accenture
TCS