Filter interviews by
Top trending discussions
ADF questions refer to Azure Data Factory questions which are related to data integration and data transformation processes.
ADF questions are related to Azure Data Factory, a cloud-based data integration service.
These questions may involve data pipelines, data flows, activities, triggers, and data movement.
Candidates may be asked about their experience with designing, monitoring, and managing data pipelines in ADF.
Exam...
I applied via Instahyre and was interviewed in Dec 2024. There was 1 interview round.
Basic aptitude questions
Online test on sql join sub query
I applied via Naukri.com and was interviewed in Dec 2024. There were 2 interview rounds.
Python coding and SQL questions.
I applied via Company Website and was interviewed in Jul 2024. There were 2 interview rounds.
Aws, scala and cloud concept
Spark Context is the entry point to any Spark functionality while Spark Session is a unified entry point for Spark 2.0+.
Spark Context is the old entry point to Spark functionality.
Spark Session is a unified entry point for Spark 2.0+.
Spark Context is used to create RDDs, accumulators and broadcast variables.
Spark Session is used to create DataFrames, execute SQL queries and read data from external sources.
Repartitioning increases partitions while Coalesce reduces partitions.
Repartitioning shuffles data across the cluster and can be used to increase parallelism.
Coalesce merges partitions without shuffling data and can be used to reduce overhead.
Repartitioning is expensive and should be used sparingly.
Coalesce is faster but may not be as effective as repartitioning in increasing parallelism.
Both can be used to optimize da
Sql query to find Second Highest Salary
Use ORDER BY and LIMIT to select the second highest salary
Use subquery to select the maximum salary and exclude it from the result set
Handle cases where there are ties for the highest salary
Spark is a distributed computing engine that processes large datasets in parallel across a cluster of computers.
Spark uses a master-slave architecture with a driver program that coordinates tasks across worker nodes.
Data is stored in Resilient Distributed Datasets (RDDs) that can be cached in memory for faster processing.
Spark supports multiple programming languages including Java, Scala, and Python.
Spark can be used f...
Broadcast Join is a technique used in distributed computing to optimize join operations.
Broadcast Join is used when one table is small enough to fit in memory of all nodes in a cluster.
The smaller table is broadcasted to all nodes in the cluster, reducing network traffic.
Broadcast Join is faster than other join techniques when used appropriately.
Example: Joining a small reference table with a large fact table in a data
Interview experience
Senior Software Engineer
319
salaries
| ₹9.7 L/yr - ₹29.7 L/yr |
Software Engineer
164
salaries
| ₹3.8 L/yr - ₹14.9 L/yr |
Software Engineer2
164
salaries
| ₹5.2 L/yr - ₹18 L/yr |
Senior Developer
160
salaries
| ₹9.9 L/yr - ₹29.4 L/yr |
Lead Software Engineer
130
salaries
| ₹15.6 L/yr - ₹42 L/yr |
Mu Sigma
Fractal Analytics
TCS
Wipro