i
Wipro
Filter interviews by
I applied via Approached by Company and was interviewed in Nov 2024. There was 1 interview round.
I applied via Approached by Company and was interviewed in May 2024. There was 1 interview round.
Spark is a distributed computing framework that provides an interface for programming entire clusters with implicit data parallelism and fault tolerance.
Spark is built around the concept of Resilient Distributed Datasets (RDDs) which are immutable distributed collections of objects.
It supports various programming languages like Java, Scala, Python, and R.
Spark provides high-level APIs like Spark SQL for structured data...
Optimizing Spark jobs involves tuning configurations, partitioning data, caching, and using efficient transformations.
Tune Spark configurations for memory, cores, and parallelism
Partition data to distribute workload evenly
Cache intermediate results to avoid recomputation
Use efficient transformations like map, filter, and reduce
Avoid shuffling data unnecessarily
SQL query to find the second highest salary of employees in each department
Use a subquery to rank the salaries within each department
Filter the results to only include the second highest salary for each department
Join the result with the employee table to get additional information if needed
SQL query to find users who purchased 3 consecutive months in a year
Use a self join on the table to compare purchase months for each user
Group by user and year, then filter for counts of 3 consecutive months
Example: SELECT user_id FROM purchases p1 JOIN purchases p2 ON p1.user_id = p2.user_id WHERE p1.month = p2.month - 1 AND p2.month = p1.month + 1 GROUP BY p1.user_id, YEAR(p1.purchase_date) HAVING COUNT(DISTINCT MONT
Kafka is used as a message broker to ingest data into Spark Streaming for real-time processing.
Kafka acts as a buffer between data producers and Spark Streaming to handle high throughput of data
Spark Streaming can consume data from Kafka topics in micro-batches for real-time processing
Kafka provides fault-tolerance and scalability for streaming data processing in Spark
posted on 2 Aug 2024
Currently working on developing a real-time data processing pipeline for a financial services company.
Designing and implementing data ingestion processes using Apache Kafka
Building data processing workflows with Apache Spark
Optimizing data storage and retrieval with Apache Hadoop
Collaborating with data scientists to integrate machine learning models into the pipeline
Group data by column 'A', calculate mean of column 'B' and sum values in column 'C' for each group.
Use groupby() function in pandas to group data by column 'A'
Apply mean() function on column 'B' and sum() function on column 'C' for each group
Example: df.groupby('A').agg({'B':'mean', 'C':'sum'})
deepcopy() creates a new object with completely independent copies of nested objects, while copy() creates a shallow copy.
deepcopy() creates a new object and recursively copies all nested objects, while copy() creates a shallow copy of the top-level object only.
Use deepcopy() when you need to create a deep copy of an object with nested structures, to avoid any references to the original object.
Use copy() when you only ...
Python decorators are functions that modify the behavior of other functions. They are commonly used for adding functionality to existing functions without modifying their code.
Decorators are defined using the @ symbol followed by the decorator function name.
They can be used to measure the execution time of a function by wrapping the function with a timer decorator.
Example: def timer(func): def wrapper(*args, **kwargs...
I applied via Naukri.com and was interviewed in Jun 2024. There were 3 interview rounds.
I applied via Recruitment Consulltant and was interviewed in May 2024. There were 2 interview rounds.
SQL Scripts to write and also also asked to design an data model of my choice in Telecom Domain
SQL architecture refers to the structure and components of a SQL database system.
SQL architecture includes components like storage engine, query processor, and buffer manager.
The storage engine manages data storage and retrieval, while the query processor processes SQL queries.
The buffer manager handles caching and memory management to optimize performance.
Examples of SQL architectures include MySQL, Oracle, and SQL Se
I applied via Company Website and was interviewed in Apr 2024. There were 3 interview rounds.
Test your algorithmic thinking and problem solving skills
I applied via Naukri.com and was interviewed in Mar 2024. There was 1 interview round.
Data Stage is an ETL tool by IBM, while Informatica is a popular ETL tool by Informatica Corporation.
Data Stage is developed by IBM, while Informatica is developed by Informatica Corporation.
Data Stage is known for its parallel processing capabilities, while Informatica is known for its ease of use and flexibility.
Data Stage has a graphical interface for designing jobs, while Informatica uses a more traditional workflo...
posted on 9 Sep 2024
Experienced Oracle DBA with 5+ years of hands-on experience in managing databases, optimizing performance, and ensuring data security.
5+ years of experience as an Oracle DBA
Proficient in database management, performance optimization, and data security
Skilled in troubleshooting and resolving database issues
Strong knowledge of Oracle database architecture and SQL
Certified Oracle Database Administrator (OCA/OCP)
Interview experience
based on 1 review
Rating in categories
Project Engineer
32.7k
salaries
| ₹1.8 L/yr - ₹8.3 L/yr |
Senior Software Engineer
23k
salaries
| ₹5.8 L/yr - ₹22.8 L/yr |
Senior Associate
21.2k
salaries
| ₹0.8 L/yr - ₹5.5 L/yr |
Senior Project Engineer
20.5k
salaries
| ₹5 L/yr - ₹19.5 L/yr |
Technical Lead
18.6k
salaries
| ₹8.2 L/yr - ₹36.5 L/yr |
TCS
Infosys
Tesla
Amazon