i
Wipro
Filter interviews by
Executor memory is the amount of memory allocated to each executor in a Spark application.
Executor memory is specified using the 'spark.executor.memory' configuration property.
It determines how much memory each executor can use to process tasks.
It is important to properly configure executor memory to avoid out-of-memory errors or inefficient resource utilization.
RDD is a low-level abstraction in Spark representing distributed data, while DataFrames are higher-level structured APIs for working with data.
RDD is an immutable distributed collection of objects, while DataFrames are distributed collection of data organized into named columns.
RDDs are more suitable for unstructured data and low-level transformations, while DataFrames provide a more user-friendly API for structured da...
What people are saying about Wipro
Wipro interview questions for designations
I applied via Approached by Company and was interviewed in May 2024. There was 1 interview round.
Migrating from Hive to Bigquery involves exporting data from Hive, transforming it into a compatible format, and importing it into Bigquery.
Export data from Hive using tools like Sqoop or Apache NiFi
Transform the data into a compatible format like Avro or Parquet
Import the transformed data into Bigquery using tools like Dataflow or Bigquery Data Transfer Service
Get interview-ready with Top Wipro Interview Questions
To add a new column to data in Pyspark, use 'withColumn' method. To read data from a CSV file, use 'spark.read.csv' method.
To add a new column to data in Pyspark, use 'withColumn' method
Example: df.withColumn('new_column', df['existing_column'] * 2)
To read data from a CSV file, use 'spark.read.csv' method
Example: df = spark.read.csv('file.csv', header=True, inferSchema=True)
Broadcast and accumulator are used in Spark for efficient data sharing and aggregation across tasks.
Broadcast variables are used to efficiently distribute large read-only data to all tasks in a Spark job.
Accumulators are used for aggregating values from all tasks in a Spark job to a shared variable.
Broadcast variables help in reducing data transfer costs and improving performance.
Accumulators are used for tasks like co...
Coalesce and repartition are operations in Spark used to control the number of partitions in a DataFrame.
Coalesce reduces the number of partitions without shuffling data, while repartition reshuffles data to create a specified number of partitions.
Coalesce is more efficient when reducing partitions, as it minimizes data movement.
Repartition is useful for evenly distributing data across a specified number of partitions.
...
I applied via Job Fair and was interviewed before Oct 2023. There was 1 interview round.
Spark is a distributed computing framework that provides in-memory processing capabilities for big data analytics.
Spark has a master-slave architecture with a central coordinator called the Spark Master and distributed workers called Spark Workers.
It uses Resilient Distributed Datasets (RDDs) for fault-tolerant distributed data processing.
Spark supports various programming languages like Scala, Java, Python, and R for ...
Python code to check if a string is a palindrome or not.
Define a function that takes a string as input.
Use string slicing to reverse the input string.
Compare the reversed string with the original string to check for palindrome.
Return True if the string is a palindrome, False otherwise.
2 coding questions asked moderate type
based on 25 interviews
2 Interview rounds
based on 45 reviews
Rating in categories
Project Engineer
32.7k
salaries
| ₹1.8 L/yr - ₹8.3 L/yr |
Senior Software Engineer
23.1k
salaries
| ₹5.8 L/yr - ₹22.5 L/yr |
Senior Associate
21.3k
salaries
| ₹0.8 L/yr - ₹5.5 L/yr |
Senior Project Engineer
20.5k
salaries
| ₹5 L/yr - ₹19.5 L/yr |
Technical Lead
18.6k
salaries
| ₹8.2 L/yr - ₹36.5 L/yr |
TCS
Infosys
Tesla
Amazon