Filter interviews by
Use SQL query to find the product with the second highest total price.
Use the ORDER BY clause to sort the products by total price in descending order
Use the LIMIT clause to select the second row after sorting
I applied via Naukri.com and was interviewed in Mar 2022. There were 3 interview rounds.
Top trending discussions
posted on 11 Sep 2024
I applied via Approached by Company and was interviewed in Apr 2024. There was 1 interview round.
I have handled terabytes of data in my POCs, including data from various sources and formats.
Handled terabytes of data in POCs
Worked with data from various sources and formats
Used tools like Hadoop, Spark, and SQL for data processing
Repartition is used for increasing partitions for parallelism, while coalesce is used for decreasing partitions to reduce shuffling.
Repartition is used when there is a need for more partitions to increase parallelism.
Coalesce is used when there are too many partitions and need to reduce them to avoid shuffling.
Example: Repartition can be used before a join operation to evenly distribute data across partitions for bette...
Designing/configuring a cluster for 10 petabytes of data involves considerations for storage capacity, processing power, network bandwidth, and fault tolerance.
Consider using a distributed file system like HDFS or object storage like Amazon S3 to store and manage the large volume of data.
Implement a scalable processing framework like Apache Spark or Hadoop to efficiently process and analyze the data in parallel.
Utilize...
I applied via Naukri.com
I have worked on various AWS services including S3, EC2, Lambda, Glue, and Redshift.
S3 - Used for storing and retrieving data
EC2 - Used for running virtual servers
Lambda - Used for serverless computing
Glue - Used for ETL (Extract, Transform, Load) processes
Redshift - Used for data warehousing and analytics
I applied via Naukri.com and was interviewed in Jan 2024. There was 1 interview round.
ADF triggers are used in Azure Data Factory to schedule and orchestrate data pipelines.
ADF triggers enable the automation of data movement and data transformation activities.
Triggers can be scheduled to run at specific times or based on event-based triggers.
They can be used to start or stop pipelines, and can be configured with parameters and dependencies.
Examples of triggers include time-based schedules, event-based t...
IR stands for Integration Runtime. Dataset is a representation of data, while linked service is a connection to the data source.
IR is a compute infrastructure used to provide data integration capabilities
Dataset is a structured representation of data used in data engineering tasks
Linked service is a connection to a data source, providing access to the data
IR enables data movement and transformation between different da...
Optimization techniques in Spark
Partitioning data to optimize data locality
Caching frequently accessed data
Using broadcast variables for small data sets
Using appropriate data structures and algorithms
Avoiding unnecessary shuffling of data
I applied via Naukri.com and was interviewed in Mar 2024. There was 1 interview round.
I applied via LinkedIn and was interviewed in Feb 2024. There was 1 interview round.
Spark Architecture is a distributed computing framework that provides high-level APIs for various languages.
Spark Architecture consists of a cluster manager, worker nodes, and a driver program.
It uses Resilient Distributed Datasets (RDDs) for fault-tolerant distributed data processing.
Spark supports multiple programming languages such as Scala, Java, Python, and R.
It includes components like Spark Core, Spark SQL, Spar...
Pyspark and SQL coding questions
I applied via Approached by Company and was interviewed in Nov 2023. There were 2 interview rounds.
Worked on ETL projects involving data extraction, transformation, and loading using SQL proficiency.
Developed ETL pipelines to extract data from various sources such as databases, APIs, and flat files
Transformed the extracted data using SQL queries to meet business requirements
Loaded the transformed data into a data warehouse or other target systems
Optimized SQL queries for performance and efficiency
Collaborated with c...
based on 3 reviews
Rating in categories
Senior Engineer
5.3k
salaries
| ₹5 L/yr - ₹17 L/yr |
Engineer
4.4k
salaries
| ₹1 L/yr - ₹8.9 L/yr |
Technical Lead
2k
salaries
| ₹8.2 L/yr - ₹26.9 L/yr |
Project Lead
1.6k
salaries
| ₹6 L/yr - ₹22.1 L/yr |
Senior Software Engineer
1.4k
salaries
| ₹4.8 L/yr - ₹18.2 L/yr |
TCS
Infosys
Wipro
Tech Mahindra