Filter interviews by
I applied via Approached by Company and was interviewed in Mar 2024. There were 2 interview rounds.
You will have to take the test on their portal.
I am a detail-oriented individual with a background in data annotation and a passion for accuracy and efficiency.
Background in data annotation
Detail-oriented and accuracy-driven
Efficient and organized in work approach
Yes, the job role involves labeling and categorizing data for machine learning algorithms.
Labeling data accurately according to guidelines provided
Categorizing data into different classes or categories
Ensuring data quality and consistency
Communicating any issues or discrepancies in the data
Working closely with data scientists and engineers to improve models
Top trending discussions
posted on 31 Dec 2024
Apache Spark architecture includes a cluster manager, worker nodes, and driver program.
Apache Spark architecture consists of a cluster manager, which allocates resources and schedules tasks.
Worker nodes execute tasks and store data in memory or disk.
Driver program coordinates tasks and communicates with the cluster manager.
Spark applications run as independent sets of processes on a cluster, coordinated by the SparkCon...
reduceBy is used to aggregate data based on key, while groupBy is used to group data based on key.
reduceBy is a transformation that combines the values of each key using an associative function and a neutral 'zero value'.
groupBy is a transformation that groups the data based on a key and returns a grouped data set.
reduceBy is more efficient for aggregating data as it reduces the data before shuffling, while groupBy shu...
RDD is a low-level abstraction representing a distributed collection of objects, while DataFrame is a higher-level abstraction representing a distributed collection of data organized into named columns.
RDD is more suitable for unstructured data and low-level transformations, while DataFrame is more suitable for structured data and high-level abstractions.
DataFrames provide optimizations like query optimization and code...
The different modes of execution in Apache Spark include local mode, standalone mode, YARN mode, and Mesos mode.
Local mode: Spark runs on a single machine with one executor.
Standalone mode: Spark runs on a cluster managed by a standalone cluster manager.
YARN mode: Spark runs on a Hadoop cluster using YARN as the resource manager.
Mesos mode: Spark runs on a Mesos cluster with Mesos as the resource manager.
I applied via Job Fair and was interviewed in Nov 2024. There were 2 interview rounds.
I applied via Naukri.com and was interviewed in Nov 2024. There was 1 interview round.
Enhanced optimization in AWS Glue improves job performance by automatically adjusting resources based on workload
Enhanced optimization in AWS Glue automatically adjusts resources like DPUs based on workload
It helps improve job performance by optimizing resource allocation
Users can enable enhanced optimization in AWS Glue job settings
Optimizing querying in Amazon Redshift involves proper table design, distribution keys, sort keys, and query optimization techniques.
Use appropriate distribution keys to evenly distribute data across nodes for parallel processing.
Utilize sort keys to physically order data on disk, reducing the need for sorting during queries.
Avoid using SELECT * and instead specify only the columns needed to reduce data transfer.
Use AN...
posted on 28 Sep 2024
I applied via Campus Placement and was interviewed in Aug 2024. There were 8 interview rounds.
Database Management system SQL and PlSQL
Database Base Management system SQL and PlSQL
Database Management system
Database Management system
Database Management system
Database Management system
Database Base Management system
I applied via Naukri.com and was interviewed in Sep 2024. There was 1 interview round.
SCD type 2 is a method used in data warehousing to track historical changes by creating a new record for each change.
SCD type 2 stands for Slowly Changing Dimension type 2
It involves creating a new record in the dimension table whenever there is a change in the data
The old record is marked as inactive and the new record is marked as current
It allows for historical tracking of changes in data over time
Example: If a cust...
I applied via Naukri.com and was interviewed in Mar 2024. There was 1 interview round.
1 ques of pyspark based on time series
I applied via Recruitment Consulltant and was interviewed in Nov 2023. There were 4 interview rounds.
Python and Spark code
based on 1 review
Rating in categories
Project Associate
103
salaries
| ₹1.8 L/yr - ₹3.8 L/yr |
Senior Software Engineer
32
salaries
| ₹12 L/yr - ₹25 L/yr |
Associate Software Engineer
27
salaries
| ₹3 L/yr - ₹4 L/yr |
Software Engineer
22
salaries
| ₹4 L/yr - ₹16.1 L/yr |
Data Engineer
16
salaries
| ₹2.2 L/yr - ₹5 L/yr |
Infosys
TCS
Wipro
HCLTech