Add office photos
Employer?
Claim Account for FREE

PwC

3.4
based on 8.5k Reviews
Filter interviews by

The Espee Global School Interview Questions and Answers

Updated 18 Jul 2024
Popular Designations

Q1. If we have streaming data coming from kafka and spark , how will you handle fault tolerance?

Ans.

Implement fault tolerance by using checkpointing, replication, and monitoring mechanisms.

  • Enable checkpointing in Spark Streaming to save the state of the computation periodically to a reliable storage like HDFS or S3.

  • Use replication in Kafka to ensure that data is not lost in case of node failures.

  • Monitor the health of the Kafka and Spark clusters using tools like Prometheus and Grafana to detect and address issues proactively.

Add your answer

Q2. What are core components of spark?

Ans.

Core components of Spark include Spark Core, Spark SQL, Spark Streaming, MLlib, and GraphX.

  • Spark Core: foundation of the Spark platform, provides basic functionality for distributed data processing

  • Spark SQL: module for working with structured data using SQL and DataFrame API

  • Spark Streaming: extension of the core Spark API that enables scalable, high-throughput, fault-tolerant stream processing of live data streams

  • MLlib: machine learning library for Spark that provides scalabl...read more

Add your answer

Q3. What is Apache spark?

Ans.

Apache Spark is an open-source distributed computing system that provides an interface for programming entire clusters with implicit data parallelism and fault tolerance.

  • Apache Spark is designed for speed and ease of use in processing large amounts of data.

  • It can run programs up to 100x faster than Hadoop MapReduce in memory, or 10x faster on disk.

  • Spark provides high-level APIs in Java, Scala, Python, and R, and an optimized engine that supports general execution graphs.

  • It al...read more

Add your answer

Q4. What is hive Architecture?

Ans.

Hive Architecture is a data warehousing infrastructure built on top of Hadoop for querying and analyzing large datasets.

  • Hive uses a language called HiveQL which is similar to SQL for querying data stored in Hadoop.

  • It organizes data into tables, partitions, and buckets to optimize queries and improve performance.

  • Hive metastore stores metadata about tables, columns, partitions, and their locations.

  • Hive queries are converted into MapReduce jobs to process data in parallel across...read more

Add your answer
Discover The Espee Global School interview dos and don'ts from real experiences

Q5. What is vectorization in ?

Ans.

Vectorization is the process of converting data into a format that can be easily processed by a computer's CPU or GPU.

  • Vectorization allows for parallel processing of data, improving computational efficiency.

  • It involves performing operations on entire arrays or matrices at once, rather than on individual elements.

  • Examples include using libraries like NumPy in Python to perform vectorized operations on arrays.

  • Vectorization is commonly used in machine learning and data analysis ...read more

Add your answer

Q6. What is partition in hive?

Ans.

Partition in Hive is a way to organize data in a table into multiple directories based on the values of one or more columns.

  • Partitions help in improving query performance by allowing Hive to only read the relevant data directories.

  • Partitions are defined when creating a table in Hive using the PARTITIONED BY clause.

  • Example: CREATE TABLE table_name (column1 INT, column2 STRING) PARTITIONED BY (column3 STRING);

Add your answer

Q7. What are functions in SQL?

Ans.

Functions in SQL are built-in operations that can be used to manipulate data or perform calculations within a database.

  • Functions in SQL can be used to perform operations on data, such as mathematical calculations, string manipulation, date/time functions, and more.

  • Examples of SQL functions include SUM(), AVG(), CONCAT(), UPPER(), LOWER(), DATE_FORMAT(), and many others.

  • Functions can be used in SELECT statements, WHERE clauses, ORDER BY clauses, and more to manipulate data as ...read more

Add your answer

Q8. Explain Rank, Dense_rank , row_number

Ans.

Rank, Dense_rank, and row_number are window functions used in SQL to assign a rank to each row based on a specified order.

  • Rank function assigns a unique rank to each row based on the specified order.

  • Dense_rank function assigns a unique rank to each row without any gaps based on the specified order.

  • Row_number function assigns a unique sequential integer to each row based on the specified order.

Add your answer
Contribute & help others!
Write a review
Share interview
Contribute salary
Add office photos
Interview Tips & Stories
Ace your next interview with expert advice and inspiring stories
Share an Interview
Stay ahead in your career. Get AmbitionBox app
qr-code
Helping over 1 Crore job seekers every month in choosing their right fit company
70 Lakh+

Reviews

5 Lakh+

Interviews

4 Crore+

Salaries

1 Cr+

Users/Month

Contribute to help millions
Get AmbitionBox app

Made with ❤️ in India. Trademarks belong to their respective owners. All rights reserved © 2024 Info Edge (India) Ltd.

Follow us
  • Youtube
  • Instagram
  • LinkedIn
  • Facebook
  • Twitter