Filter interviews by
SQL Joins are used to combine rows from two or more tables based on a related column between them.
SQL Joins are used to retrieve data from multiple tables based on a related column between them.
Common types of SQL Joins include INNER JOIN, LEFT JOIN, RIGHT JOIN, and FULL JOIN.
Example: SELECT * FROM table1 INNER JOIN table2 ON table1.column = table2.column;
Python numpy provides various functions for numerical operations on arrays.
numpy.sum() - calculates the sum of array elements
numpy.mean() - calculates the mean of array elements
numpy.max() - returns the maximum value in an array
numpy.min() - returns the minimum value in an array
numpy.std() - calculates the standard deviation of array elements
I applied via Naukri.com and was interviewed in May 2024. There was 1 interview round.
I applied via Naukri.com and was interviewed in May 2024. There were 2 interview rounds.
Excel file was give, used sumifs, countifs , index match formulas
I applied via LinkedIn and was interviewed before Aug 2023. There were 2 interview rounds.
Union combines and removes duplicates, Union All combines without removing duplicates.
Union combines result sets and removes duplicates
Union All combines result sets without removing duplicates
Union is slower than Union All as it involves removing duplicates
Example: SELECT column1 FROM table1 UNION SELECT column1 FROM table2;
Example: SELECT column1 FROM table1 UNION ALL SELECT column1 FROM table2;
To show top 5 in pandas, use the nlargest() function.
Use the nlargest() function with the 'n' parameter set to 5 to get the top 5 values in a pandas DataFrame.
For example: df['column_name'].nlargest(5) will return the top 5 values in the specified column.
A scatter plot is a better representation for 3 numerical columns.
Use a scatter plot to show the relationship between the numerical columns.
Scatter plots are effective for visualizing correlations and patterns in data.
Each point on the plot represents a data point with values from all 3 columns.
I applied via LinkedIn and was interviewed in Jul 2024. There was 1 interview round.
SQL Window Functions like RANK and DENSE RANK are used to assign a rank to rows within a partition.
RANK function assigns a unique rank to each distinct row within a partition.
DENSE RANK function assigns a unique rank to each distinct row within a partition, but without any gaps.
Both functions are used with the OVER() clause in SQL to define the partition and order of rows.
I applied via Naukri.com and was interviewed in May 2024. There were 2 interview rounds.
Excel file was give, used sumifs, countifs , index match formulas
I applied via LinkedIn and was interviewed before Aug 2023. There were 2 interview rounds.
Union combines and removes duplicates, Union All combines without removing duplicates.
Union combines result sets and removes duplicates
Union All combines result sets without removing duplicates
Union is slower than Union All as it involves removing duplicates
Example: SELECT column1 FROM table1 UNION SELECT column1 FROM table2;
Example: SELECT column1 FROM table1 UNION ALL SELECT column1 FROM table2;
To show top 5 in pandas, use the nlargest() function.
Use the nlargest() function with the 'n' parameter set to 5 to get the top 5 values in a pandas DataFrame.
For example: df['column_name'].nlargest(5) will return the top 5 values in the specified column.
A scatter plot is a better representation for 3 numerical columns.
Use a scatter plot to show the relationship between the numerical columns.
Scatter plots are effective for visualizing correlations and patterns in data.
Each point on the plot represents a data point with values from all 3 columns.
I applied via Referral and was interviewed in Nov 2021. There was 1 interview round.
Coalesce is used to select the first non-null value from a set of columns. Repartition is used to shuffle data across nodes.
Coalesce reduces the number of partitions to the minimum required.
Repartition increases or decreases the number of partitions.
Coalesce is a narrow transformation while repartition is a wide transformation.
Coalesce is used to optimize data for queries while repartition is used to balance data acros...
Optimizing joins involves selecting appropriate join types, indexing tables, and minimizing data movement.
Choose the appropriate join type based on the size and structure of the tables being joined
Index the tables on the join columns to speed up the join process
Minimize data movement by selecting only the necessary columns and filtering rows before joining
Consider using denormalization or materialized views to precompu
RDD is a low-level distributed data structure while DataFrame is a high-level structured data abstraction.
RDD is immutable and unstructured while DataFrame is structured and has a schema
DataFrames are optimized for SQL queries and can be cached in memory
RDDs are more flexible and can be used for complex data processing tasks
DataFrames are easier to use and provide a more concise syntax for data manipulation
RDDs are the...
I applied via LinkedIn and was interviewed before Oct 2022. There were 4 interview rounds.
Basic dsa question in python and data engineering questions ,sql
Basic dsa question in python and data engineering questions
I applied via LinkedIn and was interviewed before Apr 2022. There were 4 interview rounds.
based on 2 reviews
Rating in categories
Senior Executive
48
salaries
| ₹4.8 L/yr - ₹11 L/yr |
Assistant Manager
35
salaries
| ₹8.4 L/yr - ₹14 L/yr |
Senior Analyst
28
salaries
| ₹6.7 L/yr - ₹14.9 L/yr |
Senior Associate
22
salaries
| ₹5.7 L/yr - ₹11 L/yr |
Data Analyst
16
salaries
| ₹4.1 L/yr - ₹8.5 L/yr |
Groupm Media
Dentsu Aegis Network
Madison World
Havas Media