i
Wipro
Filter interviews by
I applied via Approached by Company and was interviewed in May 2024. There was 1 interview round.
Spark is a distributed computing framework that provides an interface for programming entire clusters with implicit data parallelism and fault tolerance.
Spark is built around the concept of Resilient Distributed Datasets (RDDs) which are immutable distributed collections of objects.
It supports various programming languages like Java, Scala, Python, and R.
Spark provides high-level APIs like Spark SQL for structured data...
Optimizing Spark jobs involves tuning configurations, partitioning data, caching, and using efficient transformations.
Tune Spark configurations for memory, cores, and parallelism
Partition data to distribute workload evenly
Cache intermediate results to avoid recomputation
Use efficient transformations like map, filter, and reduce
Avoid shuffling data unnecessarily
SQL query to find the second highest salary of employees in each department
Use a subquery to rank the salaries within each department
Filter the results to only include the second highest salary for each department
Join the result with the employee table to get additional information if needed
SQL query to find users who purchased 3 consecutive months in a year
Use a self join on the table to compare purchase months for each user
Group by user and year, then filter for counts of 3 consecutive months
Example: SELECT user_id FROM purchases p1 JOIN purchases p2 ON p1.user_id = p2.user_id WHERE p1.month = p2.month - 1 AND p2.month = p1.month + 1 GROUP BY p1.user_id, YEAR(p1.purchase_date) HAVING COUNT(DISTINCT MONT
Kafka is used as a message broker to ingest data into Spark Streaming for real-time processing.
Kafka acts as a buffer between data producers and Spark Streaming to handle high throughput of data
Spark Streaming can consume data from Kafka topics in micro-batches for real-time processing
Kafka provides fault-tolerance and scalability for streaming data processing in Spark
I applied via Approached by Company and was interviewed in Nov 2024. There was 1 interview round.
posted on 2 Aug 2024
Currently working on developing a real-time data processing pipeline for a financial services company.
Designing and implementing data ingestion processes using Apache Kafka
Building data processing workflows with Apache Spark
Optimizing data storage and retrieval with Apache Hadoop
Collaborating with data scientists to integrate machine learning models into the pipeline
Group data by column 'A', calculate mean of column 'B' and sum values in column 'C' for each group.
Use groupby() function in pandas to group data by column 'A'
Apply mean() function on column 'B' and sum() function on column 'C' for each group
Example: df.groupby('A').agg({'B':'mean', 'C':'sum'})
deepcopy() creates a new object with completely independent copies of nested objects, while copy() creates a shallow copy.
deepcopy() creates a new object and recursively copies all nested objects, while copy() creates a shallow copy of the top-level object only.
Use deepcopy() when you need to create a deep copy of an object with nested structures, to avoid any references to the original object.
Use copy() when you only ...
Python decorators are functions that modify the behavior of other functions. They are commonly used for adding functionality to existing functions without modifying their code.
Decorators are defined using the @ symbol followed by the decorator function name.
They can be used to measure the execution time of a function by wrapping the function with a timer decorator.
Example: def timer(func): def wrapper(*args, **kwargs...
I applied via Recruitment Consulltant and was interviewed in Nov 2024. There were 2 interview rounds.
Various data warehousing techniques include dimensional modeling, star schema, snowflake schema, and data vault.
Dimensional modeling involves organizing data into facts and dimensions to facilitate easy querying and analysis.
Star schema is a type of dimensional modeling where a central fact table is connected to multiple dimension tables.
Snowflake schema is an extension of star schema where dimension tables are normali...
My analytics work has helped the organization make data-driven decisions, improve operational efficiency, and identify new opportunities for growth.
Developed data models and algorithms to optimize business processes
Generated insights from large datasets to drive strategic decision-making
Identified trends and patterns to improve customer experience and retention
Implemented data governance policies to ensure data quality
I would respond in various situations by remaining calm, assessing the situation, and providing a thoughtful and strategic solution.
Remain calm and composed
Assess the situation thoroughly
Provide a thoughtful and strategic solution
Communicate effectively with all parties involved
Both career and team are important, but ultimately career growth should be prioritized.
Career growth is essential for personal development and achieving professional goals.
A strong team can support career growth by providing mentorship, collaboration, and opportunities for learning.
Balancing career and team dynamics is key to long-term success in any role.
I applied via Naukri.com and was interviewed in Jun 2024. There were 3 interview rounds.
I have used HUDI and Iceberg in my previous project for managing large-scale data lakes efficiently.
Implemented HUDI for incremental data ingestion and managing large datasets in real-time
Utilized Iceberg for efficient table management and data versioning
Integrated HUDI and Iceberg with Apache Spark for processing and querying data
posted on 28 Oct 2024
DBA stands for Database Administrator. The architecture of DBA involves managing and maintaining databases to ensure data integrity and security.
DBA is responsible for installing, configuring, and upgrading database software.
They monitor database performance and troubleshoot issues.
DBA designs and implements backup and recovery strategies to prevent data loss.
They also manage user access and security permissions within...
Maintaining the database involves regular monitoring, performance tuning, applying patches, and ensuring backups are taken regularly.
Regularly monitor database performance and usage
Perform routine maintenance tasks such as applying patches and updates
Take regular backups to ensure data integrity and disaster recovery
Implement security measures to protect the database from unauthorized access
Optimize database performanc
Join is used to combine rows from two or more tables based on a related column, while lookup is used to retrieve data from a reference table based on a matching key.
Join combines rows from multiple tables based on a related column
Lookup retrieves data from a reference table based on a matching key
Join can result in duplicate rows if there are multiple matches, while lookup returns only the first matching row
Join is use...
Fact table contains quantitative data and measures, while dimension table contains descriptive attributes.
Fact table contains numerical data that can be aggregated (e.g. sales revenue, quantity sold)
Dimension table contains descriptive attributes for analysis (e.g. product name, customer details)
Fact table is typically normalized, while dimension table is denormalized for faster queries
Fact table is usually larger in s
Use sed command to display the line before a specific pattern
Use 'sed -n '/pattern/{g;1!p;};h' file.txt' to display the line before the pattern
Replace 'pattern' with the specific pattern you are looking for
This command will display the line before the pattern in the file
I applied via Recruitment Consulltant and was interviewed in May 2024. There were 2 interview rounds.
SQL Scripts to write and also also asked to design an data model of my choice in Telecom Domain
based on 3 interviews
Interview experience
based on 1 review
Rating in categories
Project Engineer
32.7k
salaries
| ₹1.8 L/yr - ₹8.3 L/yr |
Senior Software Engineer
23.1k
salaries
| ₹5.8 L/yr - ₹22.5 L/yr |
Senior Associate
21.3k
salaries
| ₹0.8 L/yr - ₹5.5 L/yr |
Senior Project Engineer
20.5k
salaries
| ₹5 L/yr - ₹19.5 L/yr |
Technical Lead
18.6k
salaries
| ₹8.2 L/yr - ₹36.5 L/yr |
TCS
Infosys
Tesla
Amazon