Filter interviews by
I applied via LinkedIn and was interviewed before Feb 2023. There were 2 interview rounds.
Top trending discussions
I applied via Campus Placement and was interviewed in Dec 2024. There were 4 interview rounds.
Two coding questions related to matrices and heaps.
I applied via Naukri.com and was interviewed in Nov 2024. There were 2 interview rounds.
The Aptitude Test session accesses mathematical and logical reasoning abilities
Vlookup is a function in Excel used to search for a value in a table and return a corresponding value from another column.
Vlookup stands for 'Vertical Lookup'
It is commonly used in Excel to search for a value in the leftmost column of a table and return a value in the same row from a specified column
Syntax: =VLOOKUP(lookup_value, table_array, col_index_num, [range_lookup])
Example: =VLOOKUP(A2, B2:D10, 3, FALSE) - searc...
My day in my previous organization involved analyzing large datasets, creating reports, and presenting findings to stakeholders.
Reviewing and cleaning large datasets to ensure accuracy
Creating visualizations and reports to communicate insights
Collaborating with team members to identify trends and patterns
Presenting findings to stakeholders in meetings or presentations
I possess strong technical skills in data analysis, including proficiency in programming languages, statistical analysis, and data visualization tools.
Proficient in programming languages such as Python, R, SQL
Skilled in statistical analysis and data modeling techniques
Experience with data visualization tools like Tableau, Power BI
Knowledge of machine learning algorithms and techniques
A Pivot Table is a data summarization tool used in spreadsheet programs to analyze, summarize, and present data in a tabular format.
Pivot tables allow users to reorganize and summarize selected columns and rows of data to obtain desired insights.
Users can easily group and filter data, perform calculations, and create visualizations using pivot tables.
Pivot tables are commonly used in Excel and other spreadsheet program...
To find the highest-paid employee in each department, we need to group employees by department and then select the employee with the highest salary in each group.
Group employees by department
Find the employee with the highest salary in each group
Retrieve the employee's name, salary, and department name
posted on 31 Dec 2024
Apache Spark architecture includes a cluster manager, worker nodes, and driver program.
Apache Spark architecture consists of a cluster manager, which allocates resources and schedules tasks.
Worker nodes execute tasks and store data in memory or disk.
Driver program coordinates tasks and communicates with the cluster manager.
Spark applications run as independent sets of processes on a cluster, coordinated by the SparkCon...
reduceBy is used to aggregate data based on key, while groupBy is used to group data based on key.
reduceBy is a transformation that combines the values of each key using an associative function and a neutral 'zero value'.
groupBy is a transformation that groups the data based on a key and returns a grouped data set.
reduceBy is more efficient for aggregating data as it reduces the data before shuffling, while groupBy shu...
RDD is a low-level abstraction representing a distributed collection of objects, while DataFrame is a higher-level abstraction representing a distributed collection of data organized into named columns.
RDD is more suitable for unstructured data and low-level transformations, while DataFrame is more suitable for structured data and high-level abstractions.
DataFrames provide optimizations like query optimization and code...
The different modes of execution in Apache Spark include local mode, standalone mode, YARN mode, and Mesos mode.
Local mode: Spark runs on a single machine with one executor.
Standalone mode: Spark runs on a cluster managed by a standalone cluster manager.
YARN mode: Spark runs on a Hadoop cluster using YARN as the resource manager.
Mesos mode: Spark runs on a Mesos cluster with Mesos as the resource manager.
I applied via AmbitionBox and was interviewed in Nov 2024. There were 4 interview rounds.
I utilize tools such as Excel, Python, SQL, and Tableau for data analysis.
Excel for basic data manipulation and visualization
Python for advanced data analysis and machine learning
SQL for querying databases
Tableau for creating interactive visualizations
Data analysis of code in the context of data analysis.
Coding logical question paper.
I applied via Naukri.com and was interviewed in Oct 2024. There were 2 interview rounds.
I applied via Job Fair and was interviewed in Nov 2024. There were 2 interview rounds.
I was interviewed in Dec 2024.
I applied via Company Website and was interviewed in Sep 2024. There were 2 interview rounds.
Platform - Hackerank
Duration - 2 Hours
Topics - Spark and SQL
Common file formats used in data storages include CSV, JSON, Parquet, Avro, and ORC. Parquet is best for compression.
CSV (Comma-Separated Values) - simple and widely used, but not efficient for large datasets
JSON (JavaScript Object Notation) - human-readable and easy to parse, but can be inefficient for storage
Parquet - columnar storage format that is highly efficient for compression and query performance
Avro - efficie...
Python program to find the most repeating substring in a list of words.
Iterate through each word in the list
Generate all possible substrings for each word
Count the occurrences of each substring using a dictionary
Find the substring with the highest count
I applied via Naukri.com and was interviewed in Nov 2024. There was 1 interview round.
Enhanced optimization in AWS Glue improves job performance by automatically adjusting resources based on workload
Enhanced optimization in AWS Glue automatically adjusts resources like DPUs based on workload
It helps improve job performance by optimizing resource allocation
Users can enable enhanced optimization in AWS Glue job settings
Optimizing querying in Amazon Redshift involves proper table design, distribution keys, sort keys, and query optimization techniques.
Use appropriate distribution keys to evenly distribute data across nodes for parallel processing.
Utilize sort keys to physically order data on disk, reducing the need for sorting during queries.
Avoid using SELECT * and instead specify only the columns needed to reduce data transfer.
Use AN...
Front end Developer
8
salaries
| ₹7 L/yr - ₹9.2 L/yr |
Software Engineer
5
salaries
| ₹6 L/yr - ₹13.6 L/yr |
Quality Analyst
5
salaries
| ₹3.3 L/yr - ₹5 L/yr |
Data Annotation Engineer
5
salaries
| ₹3 L/yr - ₹4 L/yr |
Product Manager
4
salaries
| ₹17.5 L/yr - ₹17.5 L/yr |
Fractal Analytics
Mu Sigma
Algonomy
Tiger Analytics