i
Infovision
Filter interviews by
Built a data engineer pipeline to ingest, process, and analyze large volumes of data for real-time insights.
Designed and implemented data ingestion process using tools like Apache Kafka or AWS Kinesis.
Developed data processing workflows using technologies like Apache Spark or Apache Flink.
Built data storage solutions using databases like Apache HBase or Amazon Redshift.
Implemented data quality checks and monitoring mec...
Extract values from a data frame in a continuous time interval
Top trending discussions
I applied via Naukri.com and was interviewed in Dec 2024. There were 4 interview rounds.
NA kjwnoi wniowe nfiow flmi
NA fklwmoiwef,m ionfwno njnwfeio onfwp
I applied via LinkedIn and was interviewed in Jul 2024. There were 2 interview rounds.
It was pair programming round where we need to attempt a couple of Spark Scenario along with the Interviewer. You will have a boiler plate code with some functionalities to be filled up. You will be assessed on writing clean and extensible code and test cases.
Python and sql questions
I applied via Approached by Company and was interviewed in Oct 2023. There were 2 interview rounds.
posted on 22 Jan 2024
I applied via Naukri.com and was interviewed in Dec 2023. There were 2 interview rounds.
SQL is a programming language used for managing and manipulating relational databases.
SQL stands for Structured Query Language
It is used to create, modify, and query databases
Common SQL commands include SELECT, INSERT, UPDATE, and DELETE
SQL can be used with various database management systems such as MySQL, Oracle, and SQL Server
I applied via Company Website and was interviewed before Mar 2022. There were 6 interview rounds.
Coding test with SQL queries
Pyspark batch processing
Task with all good queries
I applied via Recruitment Consulltant and was interviewed in Sep 2024. There were 2 interview rounds.
Accumulators are shared variables that are updated by worker nodes and can be used for aggregating information across tasks.
Accumulators are used for implementing counters and sums in Spark.
They are only updated by worker nodes and are read-only by the driver program.
Accumulators are useful for debugging and monitoring purposes.
Example: counting the number of errors encountered during processing.
Spark architecture is a distributed computing framework that consists of a driver program, cluster manager, and worker nodes.
Spark architecture includes a driver program that manages the execution of the Spark application.
It also includes a cluster manager that allocates resources and schedules tasks on worker nodes.
Worker nodes are responsible for executing the tasks and storing data in memory or disk.
Spark architectu...
Query to find duplicate data using SQL
Use GROUP BY and HAVING clause to identify duplicate records
Select columns to check for duplicates
Use COUNT() function to count occurrences of each record
Pub/sub is a messaging pattern where senders (publishers) of messages do not program the messages to be sent directly to specific receivers (subscribers).
Pub/sub stands for publish/subscribe.
Publishers send messages to a topic, and subscribers receive messages from that topic.
It allows for decoupling of components in a system, enabling scalability and flexibility.
Examples include Apache Kafka, Google Cloud Pub/Sub, and
I have used services like BigQuery, Dataflow, Pub/Sub, and Cloud Storage in GCP.
BigQuery for data warehousing and analytics
Dataflow for real-time data processing
Pub/Sub for messaging and event ingestion
Cloud Storage for storing data and files
Softwaretest Engineer
309
salaries
| ₹2.5 L/yr - ₹6.2 L/yr |
Senior Software Engineer
289
salaries
| ₹6 L/yr - ₹21.7 L/yr |
Software Engineer
246
salaries
| ₹2.8 L/yr - ₹10 L/yr |
Technical Lead
178
salaries
| ₹10.1 L/yr - ₹35 L/yr |
Software Developer
143
salaries
| ₹3 L/yr - ₹10.2 L/yr |
TCS
Wipro
HCLTech
Tech Mahindra