Filter interviews by
I applied via Referral and was interviewed before Feb 2023. There was 1 interview round.
Dataflow is used to efficiently process and analyze large volumes of data in real-time.
Dataflow allows for parallel processing of data, enabling faster analysis and insights.
It provides a scalable and reliable way to handle streaming and batch data processing.
Dataflow can be used for tasks such as ETL (Extract, Transform, Load), real-time analytics, and machine learning.
It helps in managing and optimizing data pipeline...
Built a batch datapipeline to process and analyze customer transaction data.
Used Apache Spark for distributed data processing
Ingested data from various sources like databases and files
Performed data cleaning, transformation, and aggregation
Utilized SQL for querying and analyzing data
Generated reports and visualizations for stakeholders
Top trending discussions
I applied via Naukri.com and was interviewed in Dec 2024. There were 4 interview rounds.
NA kjwnoi wniowe nfiow flmi
NA fklwmoiwef,m ionfwno njnwfeio onfwp
I applied via LinkedIn and was interviewed in Jul 2024. There were 2 interview rounds.
It was pair programming round where we need to attempt a couple of Spark Scenario along with the Interviewer. You will have a boiler plate code with some functionalities to be filled up. You will be assessed on writing clean and extensible code and test cases.
I applied via Approached by Company and was interviewed in Oct 2023. There were 2 interview rounds.
posted on 22 Jan 2024
I applied via Naukri.com and was interviewed in Dec 2023. There were 2 interview rounds.
SQL is a programming language used for managing and manipulating relational databases.
SQL stands for Structured Query Language
It is used to create, modify, and query databases
Common SQL commands include SELECT, INSERT, UPDATE, and DELETE
SQL can be used with various database management systems such as MySQL, Oracle, and SQL Server
I applied via Recruitment Consulltant and was interviewed in Sep 2024. There were 2 interview rounds.
Accumulators are shared variables that are updated by worker nodes and can be used for aggregating information across tasks.
Accumulators are used for implementing counters and sums in Spark.
They are only updated by worker nodes and are read-only by the driver program.
Accumulators are useful for debugging and monitoring purposes.
Example: counting the number of errors encountered during processing.
Spark architecture is a distributed computing framework that consists of a driver program, cluster manager, and worker nodes.
Spark architecture includes a driver program that manages the execution of the Spark application.
It also includes a cluster manager that allocates resources and schedules tasks on worker nodes.
Worker nodes are responsible for executing the tasks and storing data in memory or disk.
Spark architectu...
Query to find duplicate data using SQL
Use GROUP BY and HAVING clause to identify duplicate records
Select columns to check for duplicates
Use COUNT() function to count occurrences of each record
Pub/sub is a messaging pattern where senders (publishers) of messages do not program the messages to be sent directly to specific receivers (subscribers).
Pub/sub stands for publish/subscribe.
Publishers send messages to a topic, and subscribers receive messages from that topic.
It allows for decoupling of components in a system, enabling scalability and flexibility.
Examples include Apache Kafka, Google Cloud Pub/Sub, and
I have used services like BigQuery, Dataflow, Pub/Sub, and Cloud Storage in GCP.
BigQuery for data warehousing and analytics
Dataflow for real-time data processing
Pub/Sub for messaging and event ingestion
Cloud Storage for storing data and files
I applied via Recruitment Consulltant and was interviewed in Feb 2024. There was 1 interview round.
1 hour, in your platform of choice. Must know OOPS concepts, TDD, Debugging, SOLID principles
The interview questions cover topics related to Azure Data Factory, Spark, and Python programming.
Integration runtimes in ADF include Azure, Self-hosted, and SSIS IRs.
To copy 100 files in ADF, use a Copy Data activity with a wildcard path in source and sink datasets.
DAG in Spark represents a directed acyclic graph of computation, while lineage tracks the data flow.
Narrow transformations in Spark operate on a single par...
based on 1 interview
Interview experience
based on 3 reviews
Rating in categories
Software Engineer
146
salaries
| ₹4.5 L/yr - ₹15.2 L/yr |
Cloud Engineer
112
salaries
| ₹4 L/yr - ₹15 L/yr |
Cloud Consultant
85
salaries
| ₹7 L/yr - ₹21 L/yr |
Analyst
65
salaries
| ₹2 L/yr - ₹5.3 L/yr |
Data Engineer
53
salaries
| ₹4.2 L/yr - ₹14.4 L/yr |
WNS
Genpact
TCS
Infosys