Add office photos
Employer?
Claim Account for FREE

HSBC Group

4.0
based on 4.5k Reviews
Proud winner of ABECA 2024 - AmbitionBox Employee Choice Awards
Filter interviews by

10+ MCI-IIT-JEE Interview Questions and Answers

Updated 3 May 2024
Popular Designations

Q1. 1. What is udf in Spark? 2. Write PySpark code to check the validity of mobile_number column

Ans.

UDF stands for User-Defined Function in Spark. It allows users to define their own functions to process data.

  • UDFs can be written in different programming languages like Python, Scala, and Java.

  • UDFs can be used to perform complex operations on data that are not available in built-in functions.

  • PySpark code to check the validity of mobile_number column can be written using regular expressions and the `regexp_extract` function.

  • Example: `df.select('mobile_number', regexp_extract('...read more

Add your answer

Q2. Merge two unsorted lists such that the output list is sorted. You are free to use inbuilt sorting functions to sort the input lists

Ans.

Merge two unsorted lists into a sorted list using inbuilt sorting functions.

  • Use inbuilt sorting functions to sort the input lists

  • Merge the sorted lists using a merge algorithm

  • Return the merged and sorted list

Add your answer

Q3. SQL query for getting 2nd highest salary from each department

Ans.

SQL query to retrieve the second highest salary from each department

  • Use the RANK() function to assign a rank to each salary within each department

  • Filter the results to only include rows with a rank of 2

  • Group the results by department to get the second highest salary for each department

View 1 answer

Q4. How to delete duplicate rows from a table

Ans.

To delete duplicate rows from a table, use the DISTINCT keyword or GROUP BY clause.

  • Use the DISTINCT keyword to select unique rows from the table.

  • Use the GROUP BY clause to group the rows by a specific column and select the unique rows.

  • Use the DELETE statement with a subquery to delete the duplicate rows.

  • Create a new table with the unique rows and drop the old table.

Add your answer
Discover MCI-IIT-JEE interview dos and don'ts from real experiences

Q5. What are the window functions you have used?

Ans.

Window functions are used to perform calculations across a set of rows that are related to the current row.

  • Commonly used window functions include ROW_NUMBER, RANK, DENSE_RANK, LAG, LEAD, FIRST_VALUE, LAST_VALUE, and NTILE.

  • Window functions are used in conjunction with the OVER clause to define the window or set of rows to perform the calculation on.

  • Window functions can be used to calculate running totals, moving averages, and other aggregate calculations.

  • Window functions are s...read more

Add your answer

Q6. How do you handles null values in PySpark

Ans.

Null values in PySpark are handled using functions such as dropna(), fillna(), and replace().

  • dropna() function is used to drop rows or columns with null values

  • fillna() function is used to fill null values with a specified value or method

  • replace() function is used to replace null values with a specified value

  • coalesce() function is used to replace null values with the first non-null value in a list of columns

Add your answer
Are these interview questions helpful?

Q7. What is imputer function in PySpark

Ans.

Imputer function in PySpark is used to replace missing values in a DataFrame.

  • Imputer is a transformer in PySpark ML library.

  • It replaces missing values in a DataFrame with either mean, median, or mode of the column.

  • It can be used with both numerical and categorical columns.

  • Example: imputer = Imputer(inputCols=['col1', 'col2'], outputCols=['col1_imputed', 'col2_imputed'], strategy='mean')

  • Example: imputed_df = imputer.fit(df).transform(df)

Add your answer

Q8. What is lazy evaluation in spark.

Ans.

Lazy evaluation in Spark delays the execution of transformations until an action is called.

  • Lazy evaluation allows Spark to optimize the execution plan by combining multiple transformations into a single stage.

  • Transformations are not executed immediately, but are stored as a directed acyclic graph (DAG) of operations.

  • Actions trigger the execution of the DAG and produce results.

  • Example: map() and filter() are transformations that are lazily evaluated until an action like collec...read more

Add your answer
Share interview questions and help millions of jobseekers 🌟

Q9. What is skewness and skewd tables

Ans.

Skewness is a measure of asymmetry in a distribution. Skewed tables are tables with imbalanced data distribution.

  • Skewness is a statistical measure that describes the asymmetry of the data distribution around the mean.

  • Positive skewness indicates a longer tail on the right side of the distribution, while negative skewness indicates a longer tail on the left side.

  • Skewed tables in data engineering refer to tables with imbalanced data distribution, which can impact query performan...read more

Add your answer

Q10. What is spark and explain working

Ans.

Spark is a distributed computing framework designed for big data processing.

  • Spark is built around the concept of Resilient Distributed Datasets (RDDs) which allow for fault-tolerant parallel processing of data.

  • It provides high-level APIs in Java, Scala, Python, and R for ease of use.

  • Spark can run on top of Hadoop, Mesos, Kubernetes, or in standalone mode.

  • It includes modules for SQL, streaming, machine learning, and graph processing.

  • Spark uses in-memory processing to speed up ...read more

Add your answer

Q11. What is mapreduce

Ans.

MapReduce is a programming model and processing technique for parallel and distributed computing.

  • MapReduce is used to process large datasets in parallel across a distributed cluster of computers.

  • It consists of two main functions - Map function for processing key/value pairs and Reduce function for aggregating the results.

  • Popularly used in big data processing frameworks like Hadoop for tasks like data sorting, searching, and counting.

  • Example: Counting the frequency of words in...read more

Add your answer
Contribute & help others!
Write a review
Share interview
Contribute salary
Add office photos

Interview Process at MCI-IIT-JEE

based on 5 interviews in the last 1 year
Interview experience
4.0
Good
View more
Interview Tips & Stories
Ace your next interview with expert advice and inspiring stories

Top Data Engineer Interview Questions from Similar Companies

3.9
 • 32 Interview Questions
3.4
 • 18 Interview Questions
3.5
 • 16 Interview Questions
3.4
 • 12 Interview Questions
3.7
 • 12 Interview Questions
3.6
 • 11 Interview Questions
View all
Share an Interview
Stay ahead in your career. Get AmbitionBox app
qr-code
Helping over 1 Crore job seekers every month in choosing their right fit company
70 Lakh+

Reviews

5 Lakh+

Interviews

4 Crore+

Salaries

1 Cr+

Users/Month

Contribute to help millions
Get AmbitionBox app

Made with ❤️ in India. Trademarks belong to their respective owners. All rights reserved © 2024 Info Edge (India) Ltd.

Follow us
  • Youtube
  • Instagram
  • LinkedIn
  • Facebook
  • Twitter