i
Wipro
Filter interviews by
I was interviewed in Sep 2023.
OSI/TCP/IP models are networking models that define how data is transmitted over a network. LAN/WAN refer to local and wide area networks. IP addresses are unique identifiers for devices on a network, while MAC addresses are unique identifiers for network interfaces.
OSI model is a conceptual framework that standardizes the functions of a telecommunication or computing system into seven layers.
TCP/IP model is a protocol...
Ethernet is a type of networking technology commonly used for connecting devices in a local area network (LAN).
Ethernet is a widely used networking technology for connecting devices within a LAN.
It uses a system of cables, switches, and routers to transmit data between devices.
Ethernet allows for fast and reliable communication between devices, making it ideal for businesses and homes.
Examples of Ethernet standards inc...
IPSec VPN is a secure network protocol used to encrypt and authenticate data traffic over the internet.
IPSec VPN stands for Internet Protocol Security Virtual Private Network.
It provides secure communication by encrypting data traffic between two endpoints.
There are two modes of IPSec VPN: Transport mode and Tunnel mode.
Transport mode encrypts only the data payload, while Tunnel mode encrypts the entire packet.
Examples...
I applied via Naukri.com and was interviewed before May 2023. There were 3 interview rounds.
Python and Pyspark questions
I applied via Approached by Company and was interviewed before May 2023. There were 2 interview rounds.
Simple assignment with mcq and a couple of coding questions
I applied via Recruitment Consulltant
I applied via Naukri.com and was interviewed in Nov 2024. There was 1 interview round.
I am a Senior Data Engineer with experience in building scalable data pipelines and optimizing data processing workflows.
Experience in designing and implementing ETL processes using tools like Apache Spark and Airflow
Proficient in working with large datasets and optimizing query performance
Strong background in data modeling and database design
Worked on projects involving real-time data processing and streaming analytic
Decorators in Python are functions that modify the behavior of other functions or methods.
Decorators are defined using the @decorator_name syntax before a function definition.
They can be used to add functionality to existing functions without modifying their code.
Decorators can be used for logging, timing, authentication, and more.
Example: @staticmethod decorator in Python is used to define a static method in a class.
SQL query to group by employee ID and combine first name and last name with a space
Use the GROUP BY clause to group by employee ID
Use the CONCAT function to combine first name and last name with a space
Select employee ID, CONCAT(first_name, ' ', last_name) AS full_name
Constructors in Python are special methods used for initializing objects. They are called automatically when a new instance of a class is created.
Constructors are defined using the __init__() method in a class.
They are used to initialize instance variables of a class.
Example: class Person: def __init__(self, name, age): self.name = name self.age = age person1 = Person('Alice', 30)
Indexing in SQL is a technique used to improve the performance of queries by creating a data structure that allows for faster retrieval of data.
Indexes are created on columns in a database table to speed up the retrieval of rows that match a certain condition in a WHERE clause.
Indexes can be created using CREATE INDEX statement in SQL.
Types of indexes include clustered indexes, non-clustered indexes, unique indexes, an...
Spark works well with Parquet files due to its columnar storage format, efficient compression, and ability to push down filters.
Parquet files are columnar storage format, which aligns well with Spark's processing model of working on columns rather than rows.
Parquet files support efficient compression, reducing storage space and improving read performance in Spark.
Spark can push down filters to Parquet files, allowing f...
I applied via Recruitment Consulltant and was interviewed in Nov 2024. There were 2 interview rounds.
Different types of joins available in Databricks include inner join, outer join, left join, right join, and cross join.
Inner join: Returns only the rows that have matching values in both tables.
Outer join: Returns all rows when there is a match in either table.
Left join: Returns all rows from the left table and the matched rows from the right table.
Right join: Returns all rows from the right table and the matched rows ...
Implementing fault tolerance in a data pipeline involves redundancy, monitoring, and error handling.
Use redundant components to ensure continuous data flow
Implement monitoring tools to detect failures and bottlenecks
Set up automated alerts for immediate response to issues
Design error handling mechanisms to gracefully handle failures
Use checkpoints and retries to ensure data integrity
AutoLoader is a feature in data engineering that automatically loads data from various sources into a data warehouse or database.
Automates the process of loading data from different sources
Reduces manual effort and human error
Can be scheduled to run at specific intervals
Examples: Apache Nifi, AWS Glue
To connect to different services in Azure, you can use Azure SDKs, REST APIs, Azure Portal, Azure CLI, and Azure PowerShell.
Use Azure SDKs for programming languages like Python, Java, C#, etc.
Utilize REST APIs to interact with Azure services programmatically.
Access and manage services through the Azure Portal.
Leverage Azure CLI for command-line interface interactions.
Automate tasks using Azure PowerShell scripts.
Linked Services are connections to external data sources or destinations in Azure Data Factory.
Linked Services define the connection information needed to connect to external data sources or destinations.
They can be used in Data Factory pipelines to read from or write to external systems.
Examples of Linked Services include Azure Blob Storage, Azure SQL Database, and Amazon S3.
I applied via Recruitment Consulltant and was interviewed in Nov 2024. There was 1 interview round.
Bigtable is a NoSQL database for real-time analytics, while BigQuery is a fully managed data warehouse for running SQL queries.
Bigtable is a NoSQL database designed for real-time analytics and high throughput, while BigQuery is a fully managed data warehouse for running SQL queries.
Bigtable is used for storing large amounts of semi-structured data, while BigQuery is used for analyzing structured data using SQL queries.
...
To remove duplicate rows from BigQuery, use the DISTINCT keyword. To find the month of a given date, use the EXTRACT function.
To remove duplicate rows, use SELECT DISTINCT * FROM table_name;
To find the month of a given date, use SELECT EXTRACT(MONTH FROM date_column) AS month_name FROM table_name;
Make sure to replace 'table_name' and 'date_column' with the appropriate values in your query.
The operator used in Composer to move data from GCS to BigQuery is the GCS to BigQuery operator.
The GCS to BigQuery operator is used in Apache Airflow, which is the underlying technology of Composer.
This operator allows you to transfer data from Google Cloud Storage (GCS) to BigQuery.
You can specify the source and destination parameters in the operator to define the data transfer process.
Code to square each element in the input array.
Iterate through the input array and square each element.
Store the squared values in a new array to get the desired output.
Dataflow is a fully managed stream and batch processing service, while Dataproc is a managed Apache Spark and Hadoop service.
Dataflow is a serverless data processing service that automatically scales to handle your data, while Dataproc is a managed Spark and Hadoop service that requires you to provision and manage clusters.
Dataflow is designed for both batch and stream processing, allowing you to process data in real-t...
BigQuery architecture includes storage, execution, and optimization components for efficient query processing.
BigQuery stores data in Capacitor storage system for fast access.
Query execution is distributed across multiple nodes for parallel processing.
Query optimization techniques include partitioning tables, clustering tables, and using query cache.
Using partitioned tables can help eliminate scanning unnecessary data.
...
RDD vs dataframe vs dataset in PySpark
RDD (Resilient Distributed Dataset) is the basic abstraction in PySpark, representing a distributed collection of objects
Dataframe is a distributed collection of data organized into named columns, similar to a table in a relational database
Dataset is a distributed collection of data with the ability to use custom classes for type safety and user-defined functions
Dataframes and Data...
Aptitude test involved with quantative aptitude, logical reasoning and reading comprehensions.
I have strong skills in data processing, ETL, data modeling, and programming languages like Python and SQL.
Proficient in data processing and ETL techniques
Strong knowledge of data modeling and database design
Experience with programming languages like Python and SQL
Familiarity with big data technologies such as Hadoop and Spark
Yes, I am open to relocating for the right opportunity.
I am willing to relocate for the right job opportunity.
I have experience moving for previous roles.
I am flexible and adaptable to new locations.
I am excited about the possibility of exploring a new city or country.
I applied via Walk-in and was interviewed in Nov 2024. There were 2 interview rounds.
Online aptitude test with maths and statistics and data analysis
based on 5 reviews
Rating in categories
Project Engineer
32.6k
salaries
| ₹1.8 L/yr - ₹8.3 L/yr |
Senior Software Engineer
22.9k
salaries
| ₹5.8 L/yr - ₹22.5 L/yr |
Senior Associate
21.1k
salaries
| ₹0.8 L/yr - ₹5.5 L/yr |
Senior Project Engineer
20.6k
salaries
| ₹5 L/yr - ₹19.1 L/yr |
Technical Lead
18.6k
salaries
| ₹8.2 L/yr - ₹36.5 L/yr |
TCS
Infosys
Tesla
Amazon