Filter interviews by
Online test on sql join sub query
Spark Context is the entry point to any Spark functionality while Spark Session is a unified entry point for Spark 2.0+.
Spark Context is the old entry point to Spark functionality.
Spark Session is a unified entry point for Spark 2.0+.
Spark Context is used to create RDDs, accumulators and broadcast variables.
Spark Session is used to create DataFrames, execute SQL queries and read data from external sources.
Repartitioning increases partitions while Coalesce reduces partitions.
Repartitioning shuffles data across the cluster and can be used to increase parallelism.
Coalesce merges partitions without shuffling data and can be used to reduce overhead.
Repartitioning is expensive and should be used sparingly.
Coalesce is faster but may not be as effective as repartitioning in increasing parallelism.
Both can be used to optimize da
Sql query to find Second Highest Salary
Use ORDER BY and LIMIT to select the second highest salary
Use subquery to select the maximum salary and exclude it from the result set
Handle cases where there are ties for the highest salary
Spark is a distributed computing engine that processes large datasets in parallel across a cluster of computers.
Spark uses a master-slave architecture with a driver program that coordinates tasks across worker nodes.
Data is stored in Resilient Distributed Datasets (RDDs) that can be cached in memory for faster processing.
Spark supports multiple programming languages including Java, Scala, and Python.
Spark can be used f...
Broadcast Join is a technique used in distributed computing to optimize join operations.
Broadcast Join is used when one table is small enough to fit in memory of all nodes in a cluster.
The smaller table is broadcasted to all nodes in the cluster, reducing network traffic.
Broadcast Join is faster than other join techniques when used appropriately.
Example: Joining a small reference table with a large fact table in a data
Top trending discussions
I applied via Campus Placement and was interviewed before Jul 2021. There were 3 interview rounds.
In this round we have aptitude plus coding mcq questions
Here we have to write full fledge code 2 questions were there and are easy
I applied via Campus Placement and was interviewed before Jan 2021. There were 4 interview rounds.
I have worked on various technologies including Hadoop, Spark, SQL, Python, and AWS.
Experience with Hadoop and Spark for big data processing
Proficient in SQL for data querying and manipulation
Skilled in Python for data analysis and scripting
Familiarity with AWS services such as S3, EC2, and EMR
Knowledge of data warehousing and ETL processes
I applied via Referral and was interviewed before Jun 2021. There were 2 interview rounds.
Remove duplicate characters from a string
Iterate through the string and keep track of characters seen
Use a set to store unique characters and remove duplicates
Reconstruct the string without duplicates
SQL query to retrieve second highest salary from a table
Use the ORDER BY clause to sort salaries in descending order
Use the LIMIT clause to retrieve the second row
I applied via Naukri.com and was interviewed in Feb 2024. There was 1 interview round.
Select count(0) returns the count of rows in a table, regardless of the values in the specified column.
Select count(0) counts all rows in a table, ignoring the values in the specified column.
It is equivalent to select count(*) or select count(1).
Example: SELECT COUNT(0) FROM table_name;
Sharding is a database partitioning technique where large databases are divided into smaller, more manageable parts called shards.
Sharding helps distribute data across multiple servers to improve performance and scalability.
Each shard contains a subset of the data, allowing for parallel processing and faster query execution.
Common sharding strategies include range-based sharding, hash-based sharding, and list-based sha...
Slots in Bigquery are virtual partitions that allow users to control the amount of data processed by a query.
Slots help in managing query resources and controlling costs
Users can purchase additional slots to increase query capacity
Slots are used to allocate processing power for queries based on the amount purchased
Serverless computing in Databricks allows users to run code without managing servers, scaling automatically based on workload.
Serverless computing in Databricks enables users to focus on writing code without worrying about server management.
It automatically scales resources based on workload, reducing costs and improving efficiency.
Users can run code in Databricks without provisioning or managing servers, making it eas...
Azure tech stack used in the current project includes Azure Data Factory, Azure Databricks, and Azure SQL Database.
Azure Data Factory for data integration and orchestration
Azure Databricks for big data processing and analytics
Azure SQL Database for storing and querying structured data
based on 1 interview
Interview experience
based on 1 review
Rating in categories
Associate Manager
235
salaries
| ₹0 L/yr - ₹0 L/yr |
Business Manager
153
salaries
| ₹0 L/yr - ₹0 L/yr |
Senior Associate
137
salaries
| ₹0 L/yr - ₹0 L/yr |
Analyst
118
salaries
| ₹0 L/yr - ₹0 L/yr |
Associate
110
salaries
| ₹0 L/yr - ₹0 L/yr |
WPP
Interpublic Group
Accenture
M&C Saatchi Performance