Upload Button Icon Add office photos
Engaged Employer

i

This company page is being actively managed by Egen Team. If you also belong to the team, you can get access from here

Egen Verified Tick

Compare button icon Compare button icon Compare

Filter interviews by

Egen Senior Data Engineer Interview Questions and Answers

Updated 17 Dec 2024

Egen Senior Data Engineer Interview Experiences

2 interviews found

Interview experience
5
Excellent
Difficulty level
-
Process Duration
-
Result
-
Round 1 - Technical 

(1 Question)

  • Q1. Reverse a string
Interview experience
3
Average
Difficulty level
-
Process Duration
-
Result
-
Round 1 - Technical 

(2 Questions)

  • Q1. In depth spark fundamentals
  • Q2. Data Modelling case studies

Senior Data Engineer Interview Questions Asked at Other Companies

asked in 7 Eleven
Q1. Write a query to get the customer with the highest total order va ... read more
asked in 7 Eleven
Q2. There are 10 million records in the table and the schema does not ... read more
asked in 7 Eleven
Q3. How do you handle data pipeline when the schema information keeps ... read more
asked in 7 Eleven
Q4. Difference between Parquet and ORC file. Why industry uses parque ... read more
asked in 7 Eleven
Q5. What is Normalisation and Denormalisation? When do we use them? G ... read more

Interview questions from similar companies

Interview experience
4
Good
Difficulty level
Easy
Process Duration
Less than 2 weeks
Result
Selected Selected

I applied via Campus Placement and was interviewed before Dec 2021. There were 4 interview rounds.

Round 1 - Resume Shortlist 
Pro Tip by AmbitionBox:
Keep your resume crisp and to the point. A recruiter looks at your resume for an average of 6 seconds, make sure to leave the best impression.
View all tips
Round 2 - Aptitude Test 

Simple questions on aptitude, verbal, and behavioral questions.

Round 3 - Coding Test 

Simple coding questions which can be solved in 2 mins.

Round 4 - Technical 

(2 Questions)

  • Q1. Asked about my resume and my projects? All the best?
  • Q2. Questions on python?

Interview Preparation Tips

Interview preparation tips for other job seekers - Easy to crack the interview. .................................
Interview experience
4
Good
Difficulty level
Moderate
Process Duration
Less than 2 weeks
Result
Selected Selected

I applied via Naukri.com and was interviewed before Jun 2023. There were 3 interview rounds.

Round 1 - One-on-one 

(2 Questions)

  • Q1. It’s general type of question
  • Q2. Experience n all
Round 2 - Group Discussion 

It’s just reasoning type questions.

Round 3 - Technical 

(2 Questions)

  • Q1. What is ssis? How we use
  • Ans. 

    SSIS stands for SQL Server Integration Services, a tool provided by Microsoft for data integration and workflow applications.

    • SSIS is a platform for building high-performance data integration and workflow solutions.

    • It allows you to create packages that move data from various sources to destinations.

    • SSIS includes a visual design interface for creating, monitoring, and managing data integration processes.

    • You can use SSIS ...

  • Answered by AI
  • Q2. When we use ssis packages? Difference between union merge
  • Ans. 

    SSIS packages are used for ETL processes in SQL Server. Union combines datasets vertically, while merge combines them horizontally.

    • SSIS packages are used for Extract, Transform, Load (ETL) processes in SQL Server.

    • Union in SSIS combines datasets vertically, stacking rows on top of each other.

    • Merge in SSIS combines datasets horizontally, matching rows based on specified columns.

    • Union All in SSIS combines datasets vertica...

  • Answered by AI

Skills evaluated in this interview

Interview experience
5
Excellent
Difficulty level
-
Process Duration
-
Result
-
Round 1 - Technical 

(3 Questions)

  • Q1. ReduceByKey vs groupByKey
  • Ans. 

    reduceByKey is more efficient than groupByKey for aggregating data in Spark due to reduced shuffling.

    • reduceByKey combines values for each key in each partition before shuffling data

    • groupByKey shuffles all data to a single partition before combining values for each key

    • reduceByKey is preferred for large datasets to minimize data movement and improve performance

  • Answered by AI
  • Q2. Word count in scala
  • Ans. 

    Scala provides a simple way to count words in a string using built-in functions.

    • Use the split function to split the string into an array of words

    • Use the length function to get the count of words in the array

  • Answered by AI
  • Q3. Second highest salary SQL
  • Ans. 

    Use SQL query with ORDER BY and LIMIT to find the second highest salary.

    • Use ORDER BY clause to sort salaries in descending order

    • Use LIMIT 1,1 to skip the first highest salary and get the second highest salary

  • Answered by AI

Skills evaluated in this interview

Interview experience
4
Good
Difficulty level
Moderate
Process Duration
Less than 2 weeks
Result
No response

I applied via Naukri.com and was interviewed in Aug 2024. There were 2 interview rounds.

Round 1 - Technical 

(12 Questions)

  • Q1. Tell me about yourself and Project
  • Ans. 

    I am a Senior Data Engineer with experience in developing data pipelines and optimizing data storage for various projects.

    • Developed data pipelines using Apache Spark for real-time data processing

    • Optimized data storage using technologies like Hadoop and AWS S3

    • Worked on a project to analyze customer behavior and improve marketing strategies

  • Answered by AI
  • Q2. What was you day-to-day job in your project
  • Ans. 

    My day-to-day job in the project involved designing and implementing data pipelines, optimizing data workflows, and collaborating with cross-functional teams.

    • Designing and implementing data pipelines to extract, transform, and load data from various sources

    • Optimizing data workflows to improve efficiency and performance

    • Collaborating with cross-functional teams including data scientists, analysts, and business stakeholde...

  • Answered by AI
  • Q3. Spark Architecture
  • Q4. How DAG handle Fault tolerance?
  • Ans. 

    DAGs handle fault tolerance by rerunning failed tasks and maintaining task dependencies.

    • DAGs rerun failed tasks automatically to ensure completion.

    • DAGs maintain task dependencies to ensure proper sequencing.

    • DAGs can be configured to retry failed tasks a certain number of times before marking them as failed.

  • Answered by AI
  • Q5. What is shuffling? How to Handle Shuffling?
  • Ans. 

    Shuffling is the process of redistributing data across partitions in a distributed computing environment.

    • Shuffling is necessary when data needs to be grouped or aggregated across different partitions.

    • It can be handled efficiently by minimizing the amount of data being shuffled and optimizing the partitioning strategy.

    • Techniques like partitioning, combiners, and reducers can help reduce the amount of shuffling in MapRed

  • Answered by AI
  • Q6. What is the difference between repartition and Coelsce?
  • Ans. 

    Repartition increases or decreases the number of partitions in a DataFrame, while Coalesce only decreases the number of partitions.

    • Repartition can increase or decrease the number of partitions in a DataFrame, leading to a shuffle of data across the cluster.

    • Coalesce only decreases the number of partitions in a DataFrame without performing a full shuffle, making it more efficient than repartition.

    • Repartition is typically...

  • Answered by AI
  • Q7. How do you handle Incremental data?
  • Ans. 

    Incremental data is handled by identifying new data since the last update and merging it with existing data.

    • Identify new data since last update

    • Merge new data with existing data

    • Update data warehouse or database with incremental changes

  • Answered by AI
  • Q8. What is SCD ??
  • Ans. 

    SCD stands for Slowly Changing Dimension, a concept in data warehousing to track changes in data over time.

    • SCD is used to maintain historical data in a data warehouse.

    • There are three types of SCD - Type 1, Type 2, and Type 3.

    • Type 1 SCD overwrites old data with new data.

    • Type 2 SCD creates a new record for each change, preserving history.

    • Type 3 SCD maintains both old and new values in the same record.

    • SCD is important for...

  • Answered by AI
  • Q9. Scenerio based questions related to Spark ?
  • Q10. Two SQL Codes and Two Python codes like reverse a string ?
  • Ans. 

    Reverse a string using SQL and Python codes.

    • In SQL, use the REVERSE function to reverse a string.

    • In Python, use slicing with a step of -1 to reverse a string.

  • Answered by AI
  • Q11. Find top 5 countries with highest population in Spark and SQL
  • Ans. 

    Use Spark and SQL to find the top 5 countries with the highest population.

    • Use Spark to load the data and perform data processing.

    • Use SQL queries to group by country and sum the population.

    • Order the results in descending order and limit to top 5.

    • Example: SELECT country, SUM(population) AS total_population FROM table_name GROUP BY country ORDER BY total_population DESC LIMIT 5

  • Answered by AI
  • Q12. Using two tables find the different records for different joins
  • Ans. 

    To find different records for different joins using two tables

    • Use the SQL query to perform different joins like INNER JOIN, LEFT JOIN, RIGHT JOIN, and FULL JOIN

    • Identify the key columns in both tables to join on

    • Select the columns from both tables and use WHERE clause to filter out the different records

  • Answered by AI
Round 2 - One-on-one 

(7 Questions)

  • Q1. What is a catalyst optimiser? How it works?
  • Ans. 

    A catalyst optimizer is a query optimization tool used in Apache Spark to improve performance by generating an optimal query plan.

    • Catalyst optimizer is a rule-based query optimization framework in Apache Spark.

    • It leverages rules to transform the logical query plan into a more optimized physical plan.

    • The optimizer applies various optimization techniques like predicate pushdown, constant folding, and join reordering.

    • By o...

  • Answered by AI
  • Q2. Tell me about the optimization you used in your project.
  • Ans. 

    Used query optimization techniques to improve performance in database queries.

    • Utilized indexing to speed up search queries.

    • Implemented query caching to reduce redundant database calls.

    • Optimized SQL queries by restructuring joins and subqueries.

    • Utilized database partitioning to improve query performance.

    • Used query profiling tools to identify and optimize slow queries.

  • Answered by AI
  • Q3. Pyspark question related to merging two schemas?
  • Q4. What is the best approach to finding whether the data frame is empty or not?
  • Ans. 

    Use the len() function to check the length of the data frame.

    • Use len() function to get the number of rows in the data frame.

    • If the length is 0, then the data frame is empty.

    • Example: if len(df) == 0: print('Data frame is empty')

  • Answered by AI
  • Q5. Spark Architecture
  • Q6. How do you decide on cores and worker nodes?
  • Ans. 

    Cores and worker nodes are decided based on the workload requirements and scalability needs of the data processing system.

    • Consider the size and complexity of the data being processed

    • Evaluate the processing speed and memory requirements of the tasks

    • Take into account the parallelism and concurrency needed for efficient data processing

    • Monitor the system performance and adjust cores and worker nodes as needed

  • Answered by AI
  • Q7. What happens when we enforce schema ?
  • Ans. 

    Enforcing schema ensures that data conforms to a predefined structure and rules.

    • Ensures data integrity by validating incoming data against predefined schema

    • Helps in maintaining consistency and accuracy of data

    • Prevents data corruption and errors in data processing

    • Can lead to rejection of data that does not adhere to the schema

  • Answered by AI

Interview Preparation Tips

Topics to prepare for Persistent Systems Senior Data Engineer interview:
  • SQL
  • Pyspark
  • Python
  • Spark
  • Database
Interview preparation tips for other job seekers - Be prepared with Spark core concepts and SQL Coding

Skills evaluated in this interview

Interview experience
2
Poor
Difficulty level
Moderate
Process Duration
2-4 weeks
Result
Selected Selected

I applied via Recruitment Consulltant and was interviewed before Nov 2022. There were 3 interview rounds.

Round 1 - Resume Shortlist 
Pro Tip by AmbitionBox:
Keep your resume crisp and to the point. A recruiter looks at your resume for an average of 6 seconds, make sure to leave the best impression.
View all tips
Round 2 - Technical 

(1 Question)

  • Q1. After the First Round of MCQ they will schedule a Technical Video Call. Questions were more from SQL - Windowing Functions, Unix - Shell Scripting, Pattern Matching, Apache Spark - Dataframe, Streaming, Op...
Round 3 - HR 

(1 Question)

  • Q1. It was a audio call discussing about the salary structure, components and the rest.
Interview experience
3
Average
Difficulty level
-
Process Duration
-
Result
-
Round 1 - Technical 

(2 Questions)

  • Q1. Introduction of urs
  • Ans. 

    I am a Senior Data Engineer with expertise in data processing and analysis.

    • Experienced in designing and implementing data pipelines

    • Proficient in programming languages like Python and SQL

    • Skilled in working with big data technologies like Hadoop and Spark

    • Familiar with data warehousing and ETL processes

    • Strong problem-solving and analytical skills

  • Answered by AI
  • Q2. Basic structured query language
Round 2 - HR 

(1 Question)

  • Q1. Introduction of urs
  • Ans. 

    I am a Senior Data Engineer with expertise in data processing and analysis.

    • Experienced in designing and implementing data pipelines

    • Proficient in programming languages like Python and SQL

    • Skilled in working with big data technologies such as Hadoop and Spark

    • Familiar with data warehousing concepts and ETL processes

    • Strong problem-solving and troubleshooting skills

    • Effective communication and collaboration with cross-functio

  • Answered by AI
Interview experience
3
Average
Difficulty level
Hard
Process Duration
Less than 2 weeks
Result
No response

I was interviewed in Jul 2023.

Round 1 - Resume Shortlist 
Pro Tip by AmbitionBox:
Keep your resume crisp and to the point. A recruiter looks at your resume for an average of 6 seconds, make sure to leave the best impression.
View all tips
Round 2 - One-on-one 

(7 Questions)

  • Q1. 1) Snowflake architecture in your current project.
  • Ans. 

    Snowflake architecture is used in our project for cloud-based data warehousing.

    • Snowflake follows a multi-cluster shared data architecture.

    • It separates storage and compute resources, allowing for independent scaling.

    • Data is stored in virtual warehouses, which are compute clusters that can be scaled up or down based on workload.

    • Snowflake uses a unique architecture called a multi-cluster, shared data architecture, which s...

  • Answered by AI
  • Q2. 2) Database roles in Snowflake.
  • Ans. 

    Database roles in Snowflake define permissions and access control for users and objects.

    • Database roles in Snowflake are used to manage permissions and access control for users and objects.

    • Roles can be assigned to users or other roles to grant specific privileges.

    • Examples of roles in Snowflake include ACCOUNTADMIN, SYSADMIN, SECURITYADMIN, and PUBLIC.

  • Answered by AI
  • Q3. 3) Session Policy in Snowflake.
  • Ans. 

    Session Policy in Snowflake defines the behavior of a session, including session timeout and idle timeout settings.

    • Session Policy can be set at the account, user, or role level in Snowflake.

    • Session Policy settings include session timeout, idle timeout, and other session-related configurations.

    • Example: Setting a session timeout of 30 minutes will automatically end the session if there is no activity for 30 minutes.

  • Answered by AI
  • Q4. 4) Describe the SSO process between Snowflake and Azure Active Directory.
  • Ans. 

    SSO process between Snowflake and Azure Active Directory involves configuring SAML-based authentication.

    • Configure Snowflake to use SAML authentication with Azure AD as the identity provider

    • Set up a trust relationship between Snowflake and Azure AD

    • Users authenticate through Azure AD and are granted access to Snowflake resources

    • SSO eliminates the need for separate logins and passwords for Snowflake and Azure AD

  • Answered by AI
  • Q5. 5) Network Policy in Snowflake.
  • Ans. 

    Network Policy in Snowflake controls access to Snowflake resources based on IP addresses or ranges.

    • Network Policies are used to restrict access to Snowflake resources based on IP addresses or ranges.

    • They can be applied at the account, user, or role level.

    • Network Policies can be used to whitelist specific IP addresses or ranges that are allowed to access Snowflake resources.

    • They can also be used to blacklist IP addresse...

  • Answered by AI
  • Q6. 6) Automatic data loading from pipes in to Snowflake.
  • Ans. 

    Automate data loading from pipes into Snowflake for efficient data processing.

    • Use Snowpipe, a continuous data ingestion service provided by Snowflake, to automatically load data from pipes into Snowflake tables.

    • Snowpipe monitors a stage for new data files and loads them into the specified table in real-time.

    • Configure Snowpipe to trigger a data load whenever new data files are added to the stage, eliminating the need fo...

  • Answered by AI
  • Q7. 7) How does query acceleration speed up query processing?
  • Ans. 

    Query acceleration speeds up query processing by optimizing query execution and reducing the time taken to retrieve data.

    • Query acceleration uses techniques like indexing, partitioning, and caching to optimize query execution.

    • It reduces the time taken to retrieve data by minimizing disk I/O and utilizing in-memory processing.

    • Examples include using columnar storage formats like Parquet or optimizing join operations.

  • Answered by AI

Skills evaluated in this interview

I applied via Naukri.com and was interviewed in Jul 2021. There were 4 interview rounds.

Interview Questionnaire 

1 Question

  • Q1. As it was developer role so they asked performance turning practise in all Hadoop tool like hive, sqoop,spark etc.

Interview Preparation Tips

Interview preparation tips for other job seekers - There was 2 technical round. 1st was 45 min and 2nd was 30 min. As I am hadoop data engineer so they asked question from different Hadoop tool like spark hive scala. Suggestion is like U need to prepare whichever tool/technology u have mentioned in resume.

Egen Interview FAQs

How many rounds are there in Egen Senior Data Engineer interview?
Egen interview process usually has 1 rounds. The most common rounds in the Egen interview process are Technical.
What are the top questions asked in Egen Senior Data Engineer interview?

Some of the top questions asked at the Egen Senior Data Engineer interview -

  1. In depth spark fundament...read more
  2. Data Modelling case stud...read more
  3. Reverse a str...read more

Tell us how to improve this page.

Egen Senior Data Engineer Interview Process

based on 2 interviews

Interview experience

4
  
Good
View more

Egen Senior Data Engineer Reviews and Ratings

based on 1 review

5.0/5

Rating in categories

5.0

Skill development

5.0

Work-life balance

4.0

Salary

4.0

Job security

4.0

Company culture

4.0

Promotions

4.0

Work satisfaction

Explore 1 Review and Rating
Talent Acquisition Specialist
12 salaries
unlock blur

₹0 L/yr - ₹0 L/yr

Associate Application Engineer
10 salaries
unlock blur

₹0 L/yr - ₹0 L/yr

Software Engineer
7 salaries
unlock blur

₹0 L/yr - ₹0 L/yr

Data Engineer
6 salaries
unlock blur

₹0 L/yr - ₹0 L/yr

Senior Software Engineer
5 salaries
unlock blur

₹0 L/yr - ₹0 L/yr

Explore more salaries
Compare Egen with

TCS

3.7
Compare

Infosys

3.6
Compare

Wipro

3.7
Compare

HCLTech

3.5
Compare
Did you find this page helpful?
Yes No
write
Share an Interview