Filter interviews by
I have worked with Azure Data Factory, Azure Databricks, and Azure SQL Database.
Azure Data Factory for data integration and orchestration
Azure Databricks for big data processing and analytics
Azure SQL Database for relational database management
I applied via Approached by Company and was interviewed in Jan 2024. There were 3 interview rounds.
What people are saying about EPAM Systems
The tech stack used includes Python, SQL, Apache Spark, Hadoop, AWS, and Docker.
Python for data processing and analysis
SQL for database querying
Apache Spark for big data processing
Hadoop for distributed storage and processing
AWS for cloud services
Docker for containerization
EPAM Systems interview questions for designations
I applied via LinkedIn and was interviewed in Sep 2023. There were 2 interview rounds.
Scala has two types of variables - mutable and immutable.
Scala has mutable variables that can be reassigned using the var keyword.
Scala also has immutable variables that cannot be reassigned once they are initialized using the val keyword.
Example: var mutableVariable = 10; val immutableVariable = 20;
Hacker Rank Assessment - take home
Get interview-ready with Top EPAM Systems Interview Questions
I applied via LinkedIn and was interviewed in Jun 2022. There were 3 interview rounds.
Two coding questions on codility. One was easy and second medium. 10 MCQ questions on Big Data related technologies.
Code to print duplicate numbers in a list.
Iterate through the list and keep track of the count of each number using a dictionary.
Print the numbers that have a count greater than 1.
Spark can connect to Azure SQL Database using JDBC driver.
Download and install the JDBC driver for Azure SQL Database.
Set up the connection string with the appropriate credentials.
Use the JDBC API to connect Spark to Azure SQL Database.
Example: val df = spark.read.jdbc(jdbcUrl, tableName, connectionProperties)
Ensure that the firewall rules for the Azure SQL Database allow access from the Spark cluster.
Spark optimization techniques include partitioning, caching, and using appropriate transformations.
Partitioning data can improve performance by reducing shuffling.
Caching frequently used data can reduce the need for recomputation.
Transformations like filter, map, and reduceByKey can be used to optimize data processing.
Shuffling can be minimized by using operations like reduceByKey instead of groupByKey.
Broadcasting sma...
Cache and persist are used to store data in memory. Repartition and coalesce are used to change the number of partitions.
Cache stores the data in memory for faster access while persist allows the user to choose the storage level.
Repartition increases the number of partitions while coalesce decreases the number of partitions.
Cache and persist are transformations while repartition and coalesce are actions.
Cache and persi...
Hive has two types of tables - Managed and External. Managed tables are managed by Hive, while External tables are managed outside of Hive.
Managed tables are created using 'CREATE TABLE' command and data is stored in Hive's warehouse directory
External tables are created using 'CREATE EXTERNAL TABLE' command and data is stored outside of Hive's warehouse directory
Managed tables are deleted when the table is dropped, whi...
Developed a data pipeline to process and analyze customer behavior data.
Used Apache Kafka for real-time data streaming
Implemented data processing using Apache Spark
Stored data in Hadoop Distributed File System (HDFS)
Used Tableau for data visualization
Code to print reverse of a sentence word by word.
Split the sentence into words using space as delimiter
Store the words in an array
Print the words in reverse order
RDD, Dataframe, and Dataset are data structures in Apache Spark with different characteristics and functionalities.
RDD (Resilient Distributed Datasets) is a fundamental data structure in Spark that represents an immutable distributed collection of objects. It provides low-level APIs for distributed data processing and fault tolerance.
Dataframe is a distributed collection of data organized into named columns. It is simi...
I applied via Job Fair and was interviewed before Feb 2023. There was 1 interview round.
Data skewness can be handled in Spark by using techniques like partitioning, bucketing, and broadcasting.
Partitioning the data based on a key column can distribute the data evenly across the cluster.
Bucketing can further divide the data into smaller buckets based on a hash function.
Broadcasting small tables can reduce the amount of data shuffled across the network.
Using dynamic allocation can also help in handling data...
posted on 16 Nov 2024
1 Interview rounds
based on 6 reviews
Rating in categories
Senior Software Engineer
2.6k
salaries
| ₹15 L/yr - ₹42.7 L/yr |
Software Engineer
1.7k
salaries
| ₹6.9 L/yr - ₹24 L/yr |
Lead Software Engineer
831
salaries
| ₹18 L/yr - ₹52 L/yr |
Senior Systems Engineer
304
salaries
| ₹12 L/yr - ₹36.3 L/yr |
Software Test Automation Engineer
267
salaries
| ₹7 L/yr - ₹20 L/yr |
TCS
Infosys
Wipro
HCLTech