Filter interviews by
I was interviewed in May 2017.
US companies work to detailed analysis of their data through various methods and tools.
US companies use data analytics platforms and software to analyze their data.
They employ data scientists and analysts who are skilled in data analysis techniques.
Companies collect and store large amounts of data from various sources.
They use statistical analysis and machine learning algorithms to uncover patterns and insights.
Data vi...
Online test on sql join sub query
I applied via Naukri.com and was interviewed before Aug 2023. There were 2 interview rounds.
I applied via Naukri.com and was interviewed before May 2023. There was 1 interview round.
posted on 18 Mar 2024
I applied via Walk-in and was interviewed before Mar 2023. There were 4 interview rounds.
Aptitude Test will be given to solve
Assignment will be share on email
I applied via Recruitment Consultant and was interviewed in Aug 2021. There were 3 interview rounds.
I applied via Company Website and was interviewed in Mar 2024. There was 1 interview round.
Spark Context is the entry point to any Spark functionality while Spark Session is a unified entry point for Spark 2.0+.
Spark Context is the old entry point to Spark functionality.
Spark Session is a unified entry point for Spark 2.0+.
Spark Context is used to create RDDs, accumulators and broadcast variables.
Spark Session is used to create DataFrames, execute SQL queries and read data from external sources.
Repartitioning increases partitions while Coalesce reduces partitions.
Repartitioning shuffles data across the cluster and can be used to increase parallelism.
Coalesce merges partitions without shuffling data and can be used to reduce overhead.
Repartitioning is expensive and should be used sparingly.
Coalesce is faster but may not be as effective as repartitioning in increasing parallelism.
Both can be used to optimize da
Sql query to find Second Highest Salary
Use ORDER BY and LIMIT to select the second highest salary
Use subquery to select the maximum salary and exclude it from the result set
Handle cases where there are ties for the highest salary
Spark is a distributed computing engine that processes large datasets in parallel across a cluster of computers.
Spark uses a master-slave architecture with a driver program that coordinates tasks across worker nodes.
Data is stored in Resilient Distributed Datasets (RDDs) that can be cached in memory for faster processing.
Spark supports multiple programming languages including Java, Scala, and Python.
Spark can be used f...
Broadcast Join is a technique used in distributed computing to optimize join operations.
Broadcast Join is used when one table is small enough to fit in memory of all nodes in a cluster.
The smaller table is broadcasted to all nodes in the cluster, reducing network traffic.
Broadcast Join is faster than other join techniques when used appropriately.
Example: Joining a small reference table with a large fact table in a data
R.R. Donnelley
Denave
Smollan Group
Echobooom Management & Entrepreneurial Solutions