Upload Button Icon Add office photos

Filter interviews by

Mphasis Senior Data Engineer Interview Questions and Answers

Updated 31 Aug 2024

Mphasis Senior Data Engineer Interview Experiences

1 interview found

Interview experience
5
Excellent
Difficulty level
-
Process Duration
-
Result
-
Round 1 - Technical 

(3 Questions)

  • Q1. ReduceByKey vs groupByKey
  • Ans. 

    reduceByKey is more efficient than groupByKey for aggregating data in Spark due to reduced shuffling.

    • reduceByKey combines values for each key in each partition before shuffling data

    • groupByKey shuffles all data to a single partition before combining values for each key

    • reduceByKey is preferred for large datasets to minimize data movement and improve performance

  • Answered by AI
  • Q2. Word count in scala
  • Ans. 

    Scala provides a simple way to count words in a string using built-in functions.

    • Use the split function to split the string into an array of words

    • Use the length function to get the count of words in the array

  • Answered by AI
  • Q3. Second highest salary SQL
  • Ans. 

    Use SQL query with ORDER BY and LIMIT to find the second highest salary.

    • Use ORDER BY clause to sort salaries in descending order

    • Use LIMIT 1,1 to skip the first highest salary and get the second highest salary

  • Answered by AI

Skills evaluated in this interview

Senior Data Engineer Jobs at Mphasis

View all

Interview questions from similar companies

Interview experience
3
Average
Difficulty level
-
Process Duration
-
Result
-
Round 1 - Aptitude Test 

The aptitude test lasts 30 minutes and focuses on topics relevant to data engineering, including Spark, SQL, Azure, and PySpark.

Round 2 - Coding Test 

The coding test is a one-hour examination on PySpark.

Round 3 - Technical 

(3 Questions)

  • Q1. What is the difference between Cache() and Persist()?
  • Q2. What does the purpose of the Spark Submit command in Apache Spark?
  • Q3. What are window functions in SQL?
Round 4 - HR 

(2 Questions)

  • Q1. Could you provide more details about the daily responsibilities associated with this role?
  • Q2. How would you describe your work culture?
Interview experience
4
Good
Difficulty level
Moderate
Process Duration
Less than 2 weeks
Result
No response

I applied via Naukri.com and was interviewed in Aug 2024. There were 2 interview rounds.

Round 1 - Technical 

(12 Questions)

  • Q1. Tell me about yourself and Project
  • Ans. 

    I am a Senior Data Engineer with experience in developing data pipelines and optimizing data storage for various projects.

    • Developed data pipelines using Apache Spark for real-time data processing

    • Optimized data storage using technologies like Hadoop and AWS S3

    • Worked on a project to analyze customer behavior and improve marketing strategies

  • Answered by AI
  • Q2. What was you day-to-day job in your project
  • Ans. 

    My day-to-day job in the project involved designing and implementing data pipelines, optimizing data workflows, and collaborating with cross-functional teams.

    • Designing and implementing data pipelines to extract, transform, and load data from various sources

    • Optimizing data workflows to improve efficiency and performance

    • Collaborating with cross-functional teams including data scientists, analysts, and business stakeholde...

  • Answered by AI
  • Q3. Spark Architecture
  • Q4. How DAG handle Fault tolerance?
  • Ans. 

    DAGs handle fault tolerance by rerunning failed tasks and maintaining task dependencies.

    • DAGs rerun failed tasks automatically to ensure completion.

    • DAGs maintain task dependencies to ensure proper sequencing.

    • DAGs can be configured to retry failed tasks a certain number of times before marking them as failed.

  • Answered by AI
  • Q5. What is shuffling? How to Handle Shuffling?
  • Ans. 

    Shuffling is the process of redistributing data across partitions in a distributed computing environment.

    • Shuffling is necessary when data needs to be grouped or aggregated across different partitions.

    • It can be handled efficiently by minimizing the amount of data being shuffled and optimizing the partitioning strategy.

    • Techniques like partitioning, combiners, and reducers can help reduce the amount of shuffling in MapRed

  • Answered by AI
  • Q6. What is the difference between repartition and Coelsce?
  • Ans. 

    Repartition increases or decreases the number of partitions in a DataFrame, while Coalesce only decreases the number of partitions.

    • Repartition can increase or decrease the number of partitions in a DataFrame, leading to a shuffle of data across the cluster.

    • Coalesce only decreases the number of partitions in a DataFrame without performing a full shuffle, making it more efficient than repartition.

    • Repartition is typically...

  • Answered by AI
  • Q7. How do you handle Incremental data?
  • Ans. 

    Incremental data is handled by identifying new data since the last update and merging it with existing data.

    • Identify new data since last update

    • Merge new data with existing data

    • Update data warehouse or database with incremental changes

  • Answered by AI
  • Q8. What is SCD ??
  • Ans. 

    SCD stands for Slowly Changing Dimension, a concept in data warehousing to track changes in data over time.

    • SCD is used to maintain historical data in a data warehouse.

    • There are three types of SCD - Type 1, Type 2, and Type 3.

    • Type 1 SCD overwrites old data with new data.

    • Type 2 SCD creates a new record for each change, preserving history.

    • Type 3 SCD maintains both old and new values in the same record.

    • SCD is important for...

  • Answered by AI
  • Q9. Scenerio based questions related to Spark ?
  • Q10. Two SQL Codes and Two Python codes like reverse a string ?
  • Ans. 

    Reverse a string using SQL and Python codes.

    • In SQL, use the REVERSE function to reverse a string.

    • In Python, use slicing with a step of -1 to reverse a string.

  • Answered by AI
  • Q11. Find top 5 countries with highest population in Spark and SQL
  • Ans. 

    Use Spark and SQL to find the top 5 countries with the highest population.

    • Use Spark to load the data and perform data processing.

    • Use SQL queries to group by country and sum the population.

    • Order the results in descending order and limit to top 5.

    • Example: SELECT country, SUM(population) AS total_population FROM table_name GROUP BY country ORDER BY total_population DESC LIMIT 5

  • Answered by AI
  • Q12. Using two tables find the different records for different joins
  • Ans. 

    To find different records for different joins using two tables

    • Use the SQL query to perform different joins like INNER JOIN, LEFT JOIN, RIGHT JOIN, and FULL JOIN

    • Identify the key columns in both tables to join on

    • Select the columns from both tables and use WHERE clause to filter out the different records

  • Answered by AI
Round 2 - One-on-one 

(7 Questions)

  • Q1. What is a catalyst optimiser? How it works?
  • Ans. 

    A catalyst optimizer is a query optimization tool used in Apache Spark to improve performance by generating an optimal query plan.

    • Catalyst optimizer is a rule-based query optimization framework in Apache Spark.

    • It leverages rules to transform the logical query plan into a more optimized physical plan.

    • The optimizer applies various optimization techniques like predicate pushdown, constant folding, and join reordering.

    • By o...

  • Answered by AI
  • Q2. Tell me about the optimization you used in your project.
  • Ans. 

    Used query optimization techniques to improve performance in database queries.

    • Utilized indexing to speed up search queries.

    • Implemented query caching to reduce redundant database calls.

    • Optimized SQL queries by restructuring joins and subqueries.

    • Utilized database partitioning to improve query performance.

    • Used query profiling tools to identify and optimize slow queries.

  • Answered by AI
  • Q3. Pyspark question related to merging two schemas?
  • Q4. What is the best approach to finding whether the data frame is empty or not?
  • Ans. 

    Use the len() function to check the length of the data frame.

    • Use len() function to get the number of rows in the data frame.

    • If the length is 0, then the data frame is empty.

    • Example: if len(df) == 0: print('Data frame is empty')

  • Answered by AI
  • Q5. Spark Architecture
  • Q6. How do you decide on cores and worker nodes?
  • Ans. 

    Cores and worker nodes are decided based on the workload requirements and scalability needs of the data processing system.

    • Consider the size and complexity of the data being processed

    • Evaluate the processing speed and memory requirements of the tasks

    • Take into account the parallelism and concurrency needed for efficient data processing

    • Monitor the system performance and adjust cores and worker nodes as needed

  • Answered by AI
  • Q7. What happens when we enforce schema ?
  • Ans. 

    Enforcing schema ensures that data conforms to a predefined structure and rules.

    • Ensures data integrity by validating incoming data against predefined schema

    • Helps in maintaining consistency and accuracy of data

    • Prevents data corruption and errors in data processing

    • Can lead to rejection of data that does not adhere to the schema

  • Answered by AI

Interview Preparation Tips

Topics to prepare for Persistent Systems Senior Data Engineer interview:
  • SQL
  • Pyspark
  • Python
  • Spark
  • Database
Interview preparation tips for other job seekers - Be prepared with Spark core concepts and SQL Coding

Skills evaluated in this interview

Interview experience
5
Excellent
Difficulty level
-
Process Duration
-
Result
-

I was interviewed in Dec 2024.

Round 1 - Technical 

(2 Questions)

  • Q1. Windows function-related questions.
  • Q2. Join Related Questions.
Round 2 - Technical 

(2 Questions)

  • Q1. Join related Questions
  • Q2. Subqueries related queations.
Interview experience
3
Average
Difficulty level
-
Process Duration
-
Result
-
Round 1 - Technical 

(2 Questions)

  • Q1. Sql query to write max salary
  • Ans. 

    Use SQL query with MAX function to find the highest salary in a table.

    • Use SELECT MAX(salary) FROM table_name;

    • Make sure to replace 'salary' with the actual column name in the table.

    • Ensure proper permissions to access the table.

  • Answered by AI
  • Q2. What is dense rank in sql
  • Ans. 

    Dense rank in SQL assigns a unique rank to each distinct row in a result set, with no gaps between the ranks.

    • Dense rank is used to assign a rank to each row in a result set without any gaps.

    • It differs from regular rank in that it does not skip ranks if there are ties.

    • For example, if two rows have the same value and are ranked 1st, the next row will be ranked 2nd, not 3rd.

  • Answered by AI
Round 2 - Technical 

(2 Questions)

  • Q1. What is spark cluster
  • Ans. 

    Spark cluster is a group of interconnected computers that work together to process large datasets using Apache Spark.

    • Consists of a master node and multiple worker nodes

    • Master node manages the distribution of tasks and resources

    • Worker nodes execute the tasks in parallel

    • Used for processing big data and running distributed computing jobs

  • Answered by AI
  • Q2. How hive works in hdfs
  • Ans. 

    Hive is a data warehouse system built on top of Hadoop for querying and analyzing large datasets stored in HDFS.

    • Hive translates SQL-like queries into MapReduce jobs to process data stored in HDFS

    • It uses a metastore to store metadata about tables and partitions

    • HiveQL is the query language used in Hive, similar to SQL

    • Hive supports partitioning, bucketing, and indexing for optimizing queries

  • Answered by AI

Skills evaluated in this interview

Interview experience
4
Good
Difficulty level
-
Process Duration
-
Result
-
Round 1 - Technical 

(1 Question)

  • Q1. Adf,etl,python,adb
Interview experience
4
Good
Difficulty level
Moderate
Process Duration
Less than 2 weeks
Result
Selected Selected

I applied via Naukri.com and was interviewed in Jun 2024. There were 3 interview rounds.

Round 1 - Technical 

(2 Questions)

  • Q1. More questions on spark and java
  • Q2. More questions on sql
Round 2 - Technical 

(2 Questions)

  • Q1. Over all project architecture
  • Q2. Sample data and its transformations
  • Ans. 

    Sample data and its transformations

    • Sample data can be in the form of CSV, JSON, or database tables

    • Transformations include cleaning, filtering, aggregating, and joining data

    • Examples: converting date formats, removing duplicates, calculating averages

  • Answered by AI
Round 3 - HR 

(1 Question)

  • Q1. Why are u leaving current job
  • Ans. 

    Seeking new challenges and opportunities for growth in a more dynamic environment.

    • Looking for new challenges and opportunities for growth

    • Seeking a more dynamic work environment

    • Interested in expanding skill set and knowledge

    • Want to work on more innovative projects

  • Answered by AI
Interview experience
4
Good
Difficulty level
-
Process Duration
-
Result
-
Round 1 - Technical 

(2 Questions)

  • Q1. Write python code to remove duplicates from list of string
  • Ans. 

    Python code to remove duplicates from list of strings

    • Use set() to remove duplicates from the list

    • Convert the set back to a list to maintain the order of strings

    • Example: input_list = ['apple', 'banana', 'apple', 'orange']

    • Output: ['apple', 'banana', 'orange']

  • Answered by AI
  • Q2. Write SQL to get 2nd highest sal
  • Ans. 

    Use SQL query with ORDER BY and LIMIT to get 2nd highest salary.

    • Use SELECT statement to retrieve salary column

    • Use ORDER BY clause to sort salaries in descending order

    • Use LIMIT 1,1 to get the second highest salary

  • Answered by AI
Round 2 - Technical 

(2 Questions)

  • Q1. Pyspark scd type2 implementation
  • Ans. 

    Implementing Slowly Changing Dimension Type 2 in PySpark

    • Use PySpark DataFrame operations to handle SCD Type 2 implementation

    • Maintain historical records by adding new rows with updated information and end dates for previous records

    • Utilize window functions and joins to identify changes and update records accordingly

  • Answered by AI
  • Q2. Remove duplicates in a dataframe
  • Ans. 

    Use drop_duplicates() method to remove duplicates in a dataframe

    • Use drop_duplicates() method on the dataframe to remove duplicates based on all columns

    • Specify subset parameter to remove duplicates based on specific columns

    • Use keep parameter to control which duplicate to keep (first, last, or False)

    • Example: df.drop_duplicates()

    • Example: df.drop_duplicates(subset=['column1', 'column2'])

  • Answered by AI

Interview Preparation Tips

Interview preparation tips for other job seekers - Interview was cleared

Skills evaluated in this interview

I applied via Recruitment Consulltant and was interviewed in Nov 2021. There were 4 interview rounds.

Round 1 - Aptitude Test 

Normal aptitude test

Round 2 - Technical 

(1 Question)

  • Q1. Technical discussion on big data and some programing problem
Round 3 - Technical 

(1 Question)

  • Q1. 2nd technical discussion based on project, big data and python
Round 4 - HR 

(1 Question)

  • Q1. Normal HR discussion and salary negotiation

Interview Preparation Tips

Interview preparation tips for other job seekers - Brush up your core skill and basic programming knowledge
Interview experience
4
Good
Difficulty level
Moderate
Process Duration
Less than 2 weeks
Result
Not Selected

I applied via Company Website and was interviewed in Sep 2024. There were 2 interview rounds.

Round 1 - Coding Test 

Platform - Hackerank
Duration - 2 Hours
Topics - Spark and SQL

Round 2 - Technical 

(3 Questions)

  • Q1. What are the common file formats used in data storages? Which one is best for compression?
  • Ans. 

    Common file formats used in data storages include CSV, JSON, Parquet, Avro, and ORC. Parquet is best for compression.

    • CSV (Comma-Separated Values) - simple and widely used, but not efficient for large datasets

    • JSON (JavaScript Object Notation) - human-readable and easy to parse, but can be inefficient for storage

    • Parquet - columnar storage format that is highly efficient for compression and query performance

    • Avro - efficie...

  • Answered by AI
  • Q2. SQL Problem - Given the empoyee attendance table, write a query to print the employees who is abscent for more than cosecutive 10 days in their tenure.
  • Q3. Given the list of words, write the Python program to print the most repeating substring out of all words.
  • Ans. 

    Python program to find the most repeating substring in a list of words.

    • Iterate through each word in the list

    • Generate all possible substrings for each word

    • Count the occurrences of each substring using a dictionary

    • Find the substring with the highest count

  • Answered by AI

Interview Preparation Tips

Interview preparation tips for other job seekers - Prepare well in SQL, Spark and Python coding problems.

Skills evaluated in this interview

Mphasis Interview FAQs

How many rounds are there in Mphasis Senior Data Engineer interview?
Mphasis interview process usually has 1 rounds. The most common rounds in the Mphasis interview process are Technical.
How to prepare for Mphasis Senior Data Engineer interview?
Go through your CV in detail and study all the technologies mentioned in your CV. Prepare at least two technologies or languages in depth if you are appearing for a technical interview at Mphasis. The most common topics and skills that interviewers at Mphasis expect are Data Modeling, AWS, Amazon Redshift, Analysis Services and Analytical skills.
What are the top questions asked in Mphasis Senior Data Engineer interview?

Some of the top questions asked at the Mphasis Senior Data Engineer interview -

  1. reduceByKey vs groupBy...read more
  2. Word count in sc...read more

Tell us how to improve this page.

Mphasis Senior Data Engineer Interview Process

based on 1 interview

Interview experience

5
  
Excellent
View more
Mphasis Senior Data Engineer Salary
based on 20 salaries
₹8.7 L/yr - ₹27 L/yr
At par with the average Senior Data Engineer Salary in India
View more details
Software Engineer
6.3k salaries
unlock blur

₹2 L/yr - ₹11 L/yr

Senior Software Engineer
5.6k salaries
unlock blur

₹6 L/yr - ₹23.4 L/yr

Associate Software Engineer
4.6k salaries
unlock blur

₹2 L/yr - ₹6.4 L/yr

Module Lead
2.4k salaries
unlock blur

₹8.1 L/yr - ₹33 L/yr

Transaction Processing Officer
2.3k salaries
unlock blur

₹1.4 L/yr - ₹4.5 L/yr

Explore more salaries
Compare Mphasis with

Cognizant

3.8
Compare

Wipro

3.7
Compare

Accenture

3.9
Compare

TCS

3.7
Compare
Did you find this page helpful?
Yes No
write
Share an Interview