i
Wipro
Filter interviews by
A data migration project involves planning, executing, and validating the transfer of data between systems.
1. Assess the current data landscape: Understand the source and target systems, data types, and volume.
2. Define migration strategy: Choose between big bang or phased migration based on project needs.
3. Data mapping: Create a detailed mapping of how data fields in the source correspond to those in the target.
...
Merge two unsorted arrays into a single sorted array.
Create a new array to store the merged result
Iterate through both arrays and compare elements to merge in sorted order
Handle remaining elements in either array after one array is fully processed
Apache Spark is an open-source distributed computing system for big data processing and analytics.
Supports in-memory data processing, which speeds up analytics tasks.
Used for batch processing, stream processing, machine learning, and graph processing.
Integrates with Hadoop, allowing it to process data stored in HDFS.
Commonly used in data lakes and data warehouses for ETL processes.
Example: Analyzing large datasets...
Self-hosted Integration Runtime (IR) in Azure Data Factory enables data integration across on-premises and cloud environments.
Self-hosted IR allows data movement between on-premises data sources and cloud services.
It can connect to various data stores like SQL Server, Oracle, and file systems.
Example: You can use self-hosted IR to copy data from an on-premises SQL Server to Azure Blob Storage.
It requires installat...
What people are saying about Wipro
ADF questions refer to Azure Data Factory questions which are related to data integration and data transformation processes.
ADF questions are related to Azure Data Factory, a cloud-based data integration service.
These questions may involve data pipelines, data flows, activities, triggers, and data movement.
Candidates may be asked about their experience with designing, monitoring, and managing data pipelines in ADF...
External tables reference data stored outside the database, while internal tables store data within the database.
External tables are defined on data that is stored outside the database, such as in HDFS or S3.
Internal tables store data within the database itself, typically in a managed storage like HDFS or S3.
External tables do not delete data when dropped, while internal tables do.
Internal tables are managed by th...
PySpark is a Python API for Apache Spark, a powerful open-source distributed computing system.
PySpark is used for processing large datasets with distributed computing.
It provides high-level APIs in Python for Spark programming.
PySpark allows seamless integration with Python libraries like Pandas and NumPy.
Example: PySpark can be used for data processing, machine learning, and real-time analytics.
Spark is a distributed computing framework that provides in-memory processing capabilities for big data analytics.
Spark has a master-slave architecture with a central coordinator called the Spark Master and distributed workers called Spark Workers.
It uses Resilient Distributed Datasets (RDDs) for fault-tolerant distributed data processing.
Spark supports various programming languages like Scala, Java, Python, and R...
Migrating from Hive to Bigquery involves exporting data from Hive, transforming it into a compatible format, and importing it into Bigquery.
Export data from Hive using tools like Sqoop or Apache NiFi
Transform the data into a compatible format like Avro or Parquet
Import the transformed data into Bigquery using tools like Dataflow or Bigquery Data Transfer Service
Executor memory is the amount of memory allocated to each executor in a Spark application.
Executor memory is specified using the 'spark.executor.memory' configuration property.
It determines how much memory each executor can use to process tasks.
It is important to properly configure executor memory to avoid out-of-memory errors or inefficient resource utilization.
ADF questions refer to Azure Data Factory questions which are related to data integration and data transformation processes.
ADF questions are related to Azure Data Factory, a cloud-based data integration service.
These questions may involve data pipelines, data flows, activities, triggers, and data movement.
Candidates may be asked about their experience with designing, monitoring, and managing data pipelines in ADF.
Exam...
I applied via Naukri.com and was interviewed in Dec 2024. There were 2 interview rounds.
Python coding and SQL questions.
Optimization techniques are methods used to improve the efficiency and performance of data processing.
Use indexing to speed up data retrieval
Implement caching to reduce redundant computations
Utilize parallel processing for faster execution
Optimize algorithms for better performance
Use data partitioning to distribute workload evenly
Merge two unsorted arrays into a single sorted array.
Create a new array to store the merged result
Iterate through both arrays and compare elements to merge in sorted order
Handle remaining elements in either array after one array is fully processed
Apache Spark is a fast and general-purpose cluster computing system.
Apache Spark is an open-source distributed computing system that provides an interface for programming entire clusters with implicit data parallelism and fault tolerance.
It can be used for a wide range of applications such as batch processing, real-time stream processing, machine learning, and graph processing.
Spark provides high-level APIs in Java, Sc...
It was more aboiut coding
Apache Spark is an open-source distributed computing system for big data processing and analytics.
Supports in-memory data processing, which speeds up analytics tasks.
Used for batch processing, stream processing, machine learning, and graph processing.
Integrates with Hadoop, allowing it to process data stored in HDFS.
Commonly used in data lakes and data warehouses for ETL processes.
Example: Analyzing large datasets in r...
Self-hosted Integration Runtime (IR) in Azure Data Factory enables data integration across on-premises and cloud environments.
Self-hosted IR allows data movement between on-premises data sources and cloud services.
It can connect to various data stores like SQL Server, Oracle, and file systems.
Example: You can use self-hosted IR to copy data from an on-premises SQL Server to Azure Blob Storage.
It requires installation o...
I appeared for an interview in Apr 2025, where I was asked the following questions.
Different types of rank include dense rank, regular rank, and percent rank, each serving unique purposes in data analysis.
Dense Rank: Assigns ranks without gaps; e.g., values 10, 10, 20 get ranks 1, 1, 2.
Regular Rank: Assigns ranks with gaps; e.g., values 10, 10, 20 get ranks 1, 1, 3.
Percent Rank: Calculates the relative rank as a percentage; e.g., value 20 in a set of 100 gives a percent rank of 20%.
A data migration project involves planning, executing, and validating the transfer of data between systems.
1. Assess the current data landscape: Understand the source and target systems, data types, and volume.
2. Define migration strategy: Choose between big bang or phased migration based on project needs.
3. Data mapping: Create a detailed mapping of how data fields in the source correspond to those in the target.
4. Da...
I applied via Recruitment Consulltant and was interviewed in May 2024. There was 1 interview round.
PySpark is a Python API for Apache Spark, a powerful open-source distributed computing system.
PySpark is used for processing large datasets with distributed computing.
It provides high-level APIs in Python for Spark programming.
PySpark allows seamless integration with Python libraries like Pandas and NumPy.
Example: PySpark can be used for data processing, machine learning, and real-time analytics.
External tables reference data stored outside the database, while internal tables store data within the database.
External tables are defined on data that is stored outside the database, such as in HDFS or S3.
Internal tables store data within the database itself, typically in a managed storage like HDFS or S3.
External tables do not delete data when dropped, while internal tables do.
Internal tables are managed by the dat...
Quant , reasoning, english, coding
based on 27 interview experiences
Difficulty level
Duration
based on 66 reviews
Rating in categories
Project Engineer
33.4k
salaries
| ₹3.5 L/yr - ₹8.2 L/yr |
Senior Software Engineer
23.1k
salaries
| ₹6.2 L/yr - ₹19 L/yr |
Senior Associate
21.8k
salaries
| ₹1.8 L/yr - ₹5.5 L/yr |
Technical Lead
20.1k
salaries
| ₹16.5 L/yr - ₹30 L/yr |
Senior Project Engineer
18.7k
salaries
| ₹6.4 L/yr - ₹18.6 L/yr |
TCS
Infosys
Tesla
Amazon