Filter interviews by
I have worked with Azure Data Factory, Azure Databricks, and Azure SQL Database.
Azure Data Factory for data integration and orchestration
Azure Databricks for big data processing and analytics
Azure SQL Database for relational database management
I applied via Approached by Company and was interviewed in Jan 2024. There were 3 interview rounds.
What people are saying about EPAM Systems
I appeared for an interview in Oct 2024, where I was asked the following questions.
EPAM Systems interview questions for designations
I appeared for an interview in Sep 2024, where I was asked the following questions.
Get interview-ready with Top EPAM Systems Interview Questions
The tech stack used includes Python, SQL, Apache Spark, Hadoop, AWS, and Docker.
Python for data processing and analysis
SQL for database querying
Apache Spark for big data processing
Hadoop for distributed storage and processing
AWS for cloud services
Docker for containerization
I applied via LinkedIn and was interviewed in Sep 2023. There were 3 interview rounds.
Scala has two types of variables - mutable and immutable.
Scala has mutable variables that can be reassigned using the var keyword.
Scala also has immutable variables that cannot be reassigned once they are initialized using the val keyword.
Example: var mutableVariable = 10; val immutableVariable = 20;
Hacker Rank Assessment - take home
I applied via LinkedIn and was interviewed in Mar 2023. There was 0 interview round.
This code identifies and prints duplicate numbers from a given list using a dictionary to track occurrences.
Use a dictionary to count occurrences of each number.
Iterate through the list and update the count in the dictionary.
Print numbers that have a count greater than 1.
Example: For the list [1, 2, 3, 2, 4, 3], the output should be 2 and 3.
Connecting Spark to Azure SQL Database involves configuring JDBC and using Spark's DataFrame API for data operations.
Use the JDBC driver for Azure SQL Database to establish a connection.
Example connection string: 'jdbc:sqlserver://<server>.database.windows.net:1433;database=<database>;user=<user>@<server>;password=<password>'
Utilize Spark's DataFrame API to read and write data: df.write.jd...
Spark optimization techniques enhance performance through efficient data processing and resource management.
Use DataFrames and Datasets for optimized execution plans.
Leverage lazy evaluation to minimize unnecessary computations.
Apply partitioning to distribute data evenly across nodes, e.g., using 'repartition' or 'coalesce'.
Minimize shuffling by using narrow transformations like 'map' and 'filter' instead of wide tran...
Cache stores data in memory for quick access, while persist saves it to disk. Repartition changes data distribution; coalesce reduces partitions.
Cache: Stores DataFrame in memory for faster access during subsequent operations.
Persist: Saves DataFrame to disk, allowing for fault tolerance but slower than cache.
Repartition: Increases or decreases the number of partitions, potentially shuffling data across nodes.
Coalesce:...
Hive supports two types of tables: Managed and External, each with distinct data management and storage characteristics.
Managed Tables: Hive manages both the schema and the data. Dropping the table deletes the data.
External Tables: Hive manages only the schema. Dropping the table does not delete the data, which remains in the external storage.
Use Managed Tables for temporary data that can be recreated easily.
Use Extern...
RDD, DataFrame, and Dataset are core abstractions in Apache Spark for handling distributed data processing.
RDD (Resilient Distributed Dataset) is the fundamental data structure in Spark, representing an immutable distributed collection of objects.
DataFrames are similar to RDDs but are optimized for performance and allow for schema-based operations, making them easier to use.
Datasets combine the benefits of RDDs and Dat...
I applied via Job Fair and was interviewed before Feb 2023. There was 1 interview round.
Data skewness can be handled in Spark by using techniques like partitioning, bucketing, and broadcasting.
Partitioning the data based on a key column can distribute the data evenly across the cluster.
Bucketing can further divide the data into smaller buckets based on a hash function.
Broadcasting small tables can reduce the amount of data shuffled across the network.
Using dynamic allocation can also help in handling data...
based on 8 interviews
1 Interview rounds
based on 8 reviews
Rating in categories
Senior Software Engineer
3.1k
salaries
| ₹15 L/yr - ₹42 L/yr |
Software Engineer
1.9k
salaries
| ₹5.1 L/yr - ₹24 L/yr |
Lead Software Engineer
954
salaries
| ₹16.5 L/yr - ₹53 L/yr |
Senior Systems Engineer
320
salaries
| ₹12 L/yr - ₹36 L/yr |
Software Test Automation Engineer
266
salaries
| ₹6 L/yr - ₹21.2 L/yr |
TCS
Accenture
DXC Technology
Optum Global Solutions