i
Merilytics
Filter interviews by
Aptitude test contains of programming language and aptitude questions
My favorite programming language is Python because of its readability, versatility, and extensive libraries.
Python is known for its readability, making it easier to write and maintain code.
Python is versatile and can be used for web development, data analysis, machine learning, and more.
Python has a vast collection of libraries like NumPy, Pandas, and Matplotlib that make data manipulation and visualization easier.
SQL commands are instructions used to interact with databases to perform tasks such as querying, updating, and managing data.
SQL commands are used to perform various operations on databases, such as SELECT, INSERT, UPDATE, DELETE.
Examples of SQL commands include SELECT * FROM table_name, INSERT INTO table_name (column1, column2) VALUES (value1, value2), UPDATE table_name SET column1 = value1 WHERE condition, DELETE FRO...
Left join returns all records from the left table and the matched records from the right table, while right join returns all records from the right table and the matched records from the left table.
Left join keeps all records from the left table, even if there are no matches in the right table.
Right join keeps all records from the right table, even if there are no matches in the left table.
Example: If we have a table o...
Top trending discussions
I applied via campus placement at Indian Institute of Technology (IIT), Bhuvaneshwar and was interviewed in Nov 2024. There were 3 interview rounds.
2 coding problem 1)NSE
This is again a coding round 1) boundary traversal of binary tree and 2)Rotten Oranges
RDD stands for Resilient Distributed Dataset and is the fundamental data structure of Apache Spark.
RDD is a distributed collection of objects that can be operated on in parallel.
DataFrames and Datasets are higher-level abstractions built on top of RDDs.
RDDs are more low-level and offer more control over data processing compared to DataFrames and Datasets.
Partitioning is the process of dividing data into smaller chunks for better organization and processing in distributed systems.
Partitioning helps in distributing data across multiple nodes for parallel processing.
Coalesce is used to reduce the number of partitions without shuffling data, while repartition is used to increase the number of partitions by shuffling data.
Example: coalesce(5) will merge partitions into 5 pa...
Spark is a distributed computing framework that provides an interface for programming entire clusters with implicit data parallelism and fault tolerance.
Spark has a master-slave architecture with a driver program that communicates with a cluster manager to distribute work across worker nodes.
It uses Resilient Distributed Datasets (RDDs) for fault-tolerant distributed data processing.
Spark supports various programming l...
DAG stands for Directed Acyclic Graph. It is a finite directed graph with no cycles.
DAG is a collection of nodes connected by edges where each edge goes from one node to another, but no cycles are allowed.
In the context of Spark, a DAG represents the sequence of transformations that need to be applied to the input data to get the final output.
When a Spark job is submitted, Spark creates a DAG of the transformations spe...
posted on 13 Mar 2024
I applied via campus placement at Zagdu Singh Chartitable Trust's Thakur Institute of Management Studies & Research, Mumbai and was interviewed in Feb 2024. There were 3 interview rounds.
Aptitude test was divided into 3 parts and was not a typical quant, logical question it was more of an application based questions
I applied via Approached by Company and was interviewed in Oct 2023. There were 3 interview rounds.
I applied via Walk-in and was interviewed in Dec 2022. There were 3 interview rounds.
Presence of mind, logical approach, problem solving attitude, flexibility
I was interviewed in Sep 2016.
I applied via Naukri.com and was interviewed in Sep 2024. There was 1 interview round.
SCD stands for Slowly Changing Dimension in Data Warehousing.
SCD is a technique used in data warehousing to track changes to dimension data over time.
There are different types of SCDs - Type 1, Type 2, and Type 3.
Type 1 SCD overwrites old data with new data, Type 2 creates new records for changes, and Type 3 maintains both old and new values in separate columns.
Example: In a customer dimension table, if a customer chan...
inferschema in pyspark is used to automatically infer the schema of a file when reading it.
inferschema is a parameter in pyspark that can be set to true when reading a file to automatically infer the schema based on the data
It is useful when the schema of the file is not known beforehand
Example: df = spark.read.csv('file.csv', header=True, inferSchema=True)
Rank assigns unique ranks to each distinct value, while dense rank assigns ranks without gaps.
Rank function assigns unique ranks to each distinct value in a result set.
Dense rank function assigns ranks to rows in a result set without any gaps between the ranks.
Rank function may skip ranks if there are ties in values, while dense rank will not skip ranks.
Optimizing techniques in Spark involve partitioning, caching, and tuning resources for efficient data processing.
Use partitioning to distribute data evenly across nodes for parallel processing
Cache frequently accessed data in memory to avoid recomputation
Tune resources such as memory allocation and parallelism settings for optimal performance
Repartition is used to increase the number of partitions in a DataFrame, while coalesce is used to decrease the number of partitions.
Repartition involves shuffling data across the network, which can be expensive in terms of performance and resources.
Coalesce is a more efficient operation as it minimizes data movement by only merging existing partitions.
Repartition is typically used when there is a need for more paralle...
Normalization in databases is the process of organizing data in a database to reduce redundancy and improve data integrity.
Normalization is used to eliminate redundant data and ensure data integrity.
It involves breaking down a table into smaller tables and defining relationships between them.
There are different normal forms such as 1NF, 2NF, 3NF, and BCNF.
Normalization helps in reducing data redundancy and improving qu...
Transformation involves changing the data structure, while action involves performing a computation on the data.
Transformation changes the data structure without executing any computation
Action performs a computation on the data and triggers the execution
Examples of transformation include map, filter, and reduce in Spark or Pandas
Examples of action include count, collect, and saveAsTextFile in Spark
I applied via Referral and was interviewed before May 2023. There were 2 interview rounds.
I applied via Campus Placement
General Aptitude test and 3 Medium level coding questions
Sort an array of 0s and 1s
Use a sorting algorithm like counting sort or two-pointer approach
Count the number of 0s and 1s and then reconstruct the array
Example: Input array = ['0', '1', '0', '1', '1'], Output array = ['0', '0', '1', '1', '1']
Matrix traversal in spiral manner involves visiting each element of a matrix in a spiral order.
Start by traversing the outermost layer of the matrix from top left to top right, then top right to bottom right, bottom right to bottom left, and finally bottom left to top left.
Continue this process for the inner layers until all elements are visited.
Keep track of the boundaries of the matrix to know when to switch directio
Senior Business Analyst
161
salaries
| ₹6.6 L/yr - ₹11 L/yr |
Business Associate
130
salaries
| ₹8.1 L/yr - ₹15 L/yr |
Business Analyst
89
salaries
| ₹5 L/yr - ₹10 L/yr |
Senior Technical Associate
89
salaries
| ₹5 L/yr - ₹10 L/yr |
Senior Analyst
77
salaries
| ₹4 L/yr - ₹13 L/yr |
Fractal Analytics
Mu Sigma
Tiger Analytics
LatentView Analytics