Birlasoft
Networkz Systems Interview Questions and Answers
Q1. What are the different Cluster Managers available in Spark ?
Apache Spark supports several cluster managers including YARN, Mesos, and Standalone.
YARN is the default cluster manager for Spark and is used for Hadoop-based clusters.
Mesos is a general-purpose cluster manager that can be used with Spark, Hadoop, and other frameworks.
Standalone is a simple cluster manager that comes bundled with Spark and is suitable for testing and development purposes.
Q2. What are the features of the Apache Spark ?
Apache Spark is a fast and general-purpose cluster computing system.
Distributed computing engine
In-memory processing
Supports multiple languages
Machine learning and graph processing libraries
Real-time stream processing
Fault-tolerant
Scalable
Q3. How is Spark different from Map reduce ?
Spark is faster than MapReduce due to in-memory processing and DAG execution model.
Spark uses in-memory processing while MapReduce uses disk-based processing.
Spark has DAG (Directed Acyclic Graph) execution model while MapReduce has Map and Reduce phases.
Spark supports real-time processing while MapReduce is batch-oriented.
Spark has a higher level of abstraction and supports multiple languages while MapReduce is limited to Java.
Spark has built-in libraries for SQL, streaming,...read more
Q4. What is the difference between RDD and coalesce ?
RDD is a distributed collection of data while coalesce is a method to reduce the number of partitions in an RDD.
RDD is immutable while coalesce creates a new RDD with fewer partitions
RDD is used for parallel processing while coalesce is used for reducing the number of partitions
RDD can be created from Hadoop InputFormats while coalesce is a method of RDD
Example: rdd.coalesce(1) merges all partitions into a single partition
Q5. What is React ,Why it is used in modern days apps
React is a JavaScript library for building user interfaces, known for its efficiency and flexibility.
React allows for the creation of reusable UI components, making development faster and more efficient.
It uses a virtual DOM to improve performance by only updating the necessary parts of the UI.
React is popular for single-page applications and dynamic web interfaces.
It is widely used in modern web development due to its declarative and component-based approach.
React Native all...read more
Q6. What are RDD in Pyspark ?
RDD stands for Resilient Distributed Datasets in Pyspark, which are fault-tolerant collections of elements that can be processed in parallel.
RDDs are the fundamental data structure in Pyspark.
They are immutable and can be cached in memory for faster processing.
RDDs can be created from Hadoop Distributed File System (HDFS), local file system, or by transforming existing RDDs.
Examples of transformations include map, filter, and reduceByKey.
Actions like count, collect, and saveA...read more
Q7. What is props,Virtual Dom
Props are used to pass data from parent to child components in React. Virtual DOM is a lightweight copy of the actual DOM for efficient updates.
Props are read-only and cannot be modified by the child component
Virtual DOM is a concept in React where changes are first made to a lightweight copy of the actual DOM before being applied to the real DOM
Virtual DOM helps in improving performance by minimizing the number of updates needed to the actual DOM
Interview Process at Networkz Systems
Top Software Developer Interview Questions from Similar Companies
Reviews
Interviews
Salaries
Users/Month