Add office photos
EPAM Systems logo
Employer?
Claim Account for FREE

EPAM Systems

3.7
based on 1.4k Reviews
Filter interviews by
Data Engineer
Clear (1)

10+ EPAM Systems Data Engineer Interview Questions and Answers

Updated 22 Nov 2024

Q1. Write code for printing duplicate numbers in a list.

Ans.

Code to print duplicate numbers in a list.

  • Iterate through the list and keep track of the count of each number using a dictionary.

  • Print the numbers that have a count greater than 1.

View 2 more answers
right arrow

Q2. Write code to print reverse of a sentence word by word.

Ans.

Code to print reverse of a sentence word by word.

  • Split the sentence into words using space as delimiter

  • Store the words in an array

  • Print the words in reverse order

Add your answer
right arrow

Q3. Difference between cache and persist, repartition and coalesce.

Ans.

Cache and persist are used to store data in memory. Repartition and coalesce are used to change the number of partitions.

  • Cache stores the data in memory for faster access while persist allows the user to choose the storage level.

  • Repartition increases the number of partitions while coalesce decreases the number of partitions.

  • Cache and persist are transformations while repartition and coalesce are actions.

  • Cache and persist are used for iterative algorithms while repartition and...read more

Add your answer
right arrow

Q4. Elaboration of Spark optimization techniques. Types of transformations, shuffling.

Ans.

Spark optimization techniques include partitioning, caching, and using appropriate transformations.

  • Partitioning data can improve performance by reducing shuffling.

  • Caching frequently used data can reduce the need for recomputation.

  • Transformations like filter, map, and reduceByKey can be used to optimize data processing.

  • Shuffling can be minimized by using operations like reduceByKey instead of groupByKey.

  • Broadcasting small data can improve performance by reducing network traffi...read more

Add your answer
right arrow
Discover EPAM Systems interview dos and don'ts from real experiences

Q5. Hive types of tables and difference between them

Ans.

Hive has two types of tables - Managed and External. Managed tables are managed by Hive, while External tables are managed outside of Hive.

  • Managed tables are created using 'CREATE TABLE' command and data is stored in Hive's warehouse directory

  • External tables are created using 'CREATE EXTERNAL TABLE' command and data is stored outside of Hive's warehouse directory

  • Managed tables are deleted when the table is dropped, while External tables are not

  • Managed tables have full control...read more

Add your answer
right arrow

Q6. Difference between RDD, Dataframe, Dataset.

Ans.

RDD, Dataframe, and Dataset are data structures in Apache Spark with different characteristics and functionalities.

  • RDD (Resilient Distributed Datasets) is a fundamental data structure in Spark that represents an immutable distributed collection of objects. It provides low-level APIs for distributed data processing and fault tolerance.

  • Dataframe is a distributed collection of data organized into named columns. It is similar to a table in a relational database and provides a hig...read more

Add your answer
right arrow
Are these interview questions helpful?

Q7. Connecting Spark to Azure SQL Database.

Ans.

Spark can connect to Azure SQL Database using JDBC driver.

  • Download and install the JDBC driver for Azure SQL Database.

  • Set up the connection string with the appropriate credentials.

  • Use the JDBC API to connect Spark to Azure SQL Database.

  • Example: val df = spark.read.jdbc(jdbcUrl, tableName, connectionProperties)

  • Ensure that the firewall rules for the Azure SQL Database allow access from the Spark cluster.

Add your answer
right arrow

Q8. Discuss project and it's architecture.

Ans.

Developed a data pipeline to process and analyze customer behavior data.

  • Used Apache Kafka for real-time data streaming

  • Implemented data processing using Apache Spark

  • Stored data in Hadoop Distributed File System (HDFS)

  • Used Tableau for data visualization

Add your answer
right arrow
Share interview questions and help millions of jobseekers 🌟
man with laptop

Q9. How will you handle data skewness in spark

Ans.

Data skewness can be handled in Spark by using techniques like partitioning, bucketing, and broadcasting.

  • Partitioning the data based on a key column can distribute the data evenly across the cluster.

  • Bucketing can further divide the data into smaller buckets based on a hash function.

  • Broadcasting small tables can reduce the amount of data shuffled across the network.

  • Using dynamic allocation can also help in handling data skewness by allocating more resources to tasks that are t...read more

Add your answer
right arrow

Q10. What Azure solutions have you worked with?

Ans.

I have worked with Azure Data Factory, Azure Databricks, and Azure SQL Database.

  • Azure Data Factory for data integration and orchestration

  • Azure Databricks for big data processing and analytics

  • Azure SQL Database for relational database management

Add your answer
right arrow

Q11. What tech stack are used

Ans.

The tech stack used includes Python, SQL, Apache Spark, Hadoop, AWS, and Docker.

  • Python for data processing and analysis

  • SQL for database querying

  • Apache Spark for big data processing

  • Hadoop for distributed storage and processing

  • AWS for cloud services

  • Docker for containerization

Add your answer
right arrow

Q12. types of Variables in Scala

Ans.

Scala has two types of variables - mutable and immutable.

  • Scala has mutable variables that can be reassigned using the var keyword.

  • Scala also has immutable variables that cannot be reassigned once they are initialized using the val keyword.

  • Example: var mutableVariable = 10; val immutableVariable = 20;

Add your answer
right arrow
Contribute & help others!
Write a review
Write a review
Share interview
Share interview
Contribute salary
Contribute salary
Add office photos
Add office photos

Interview Process at EPAM Systems Data Engineer

based on 6 interviews
1 Interview rounds
HR Round
View more
interview tips and stories logo
Interview Tips & Stories
Ace your next interview with expert advice and inspiring stories

Top Data Engineer Interview Questions from Similar Companies

Capgemini Logo
3.7
 • 40 Interview Questions
LTIMindtree Logo
3.8
 • 32 Interview Questions
Cognizant Logo
3.7
 • 28 Interview Questions
PwC Logo
3.4
 • 20 Interview Questions
CitiusTech Logo
3.4
 • 18 Interview Questions
Altimetrik Logo
3.8
 • 13 Interview Questions
View all
Recently Viewed
INTERVIEWS
DXC Technology
No Interviews
INTERVIEWS
Infosys
No Interviews
REVIEWS
AU Small Finance Bank
No Reviews
REVIEWS
AU Small Finance Bank
No Reviews
INTERVIEWS
Altimetrik
10 top interview questions
INTERVIEWS
Accenture
No Interviews
INTERVIEWS
Capgemini
No Interviews
INTERVIEWS
Wipro
No Interviews
INTERVIEWS
Cognizant
No Interviews
INTERVIEWS
Cognizant
No Interviews
Share an Interview
Stay ahead in your career. Get AmbitionBox app
play-icon
play-icon
qr-code
Helping over 1 Crore job seekers every month in choosing their right fit company
75 Lakh+

Reviews

5 Lakh+

Interviews

4 Crore+

Salaries

1 Cr+

Users/Month

Contribute to help millions

Made with ❤️ in India. Trademarks belong to their respective owners. All rights reserved © 2024 Info Edge (India) Ltd.

Follow us
  • Youtube
  • Instagram
  • LinkedIn
  • Facebook
  • Twitter