Top 20 Hive Interview Questions and Answers

Updated 14 Nov 2024

Q1. What are internal and external tables in hive?

Ans.

Internal and external tables in Hive are two types of tables used to store data in Hive.

  • Internal tables store data in a default location managed by Hive

  • External tables store data in a location specified by the user

  • Internal tables are deleted when the table is dropped, while external tables retain data in the specified location

  • Example: CREATE TABLE internal_table (col1 INT, col2 STRING) STORED AS ORC;

  • Example: CREATE EXTERNAL TABLE external_table (col1 INT, col2 STRING) LOCATIO...read more

Add your answer
Frequently asked in

Q2. How do you eliminate null values in HIVE

Ans.

Null values in HIVE can be eliminated using the COALESCE function.

  • Use the COALESCE function to replace null values with a default value.

  • Syntax: COALESCE(column_name, default_value)

  • Example: SELECT COALESCE(name, 'Unknown') FROM table_name;

Add your answer
Frequently asked in

Q3. Difference between Hive and Sql

Ans.

Hive is a data warehousing tool for Hadoop while SQL is a language used to manage relational databases.

  • Hive is used for big data processing while SQL is used for relational databases.

  • Hive uses Hadoop Distributed File System (HDFS) while SQL uses traditional storage systems.

  • Hive queries are written in Hive Query Language (HQL) while SQL queries are written in Structured Query Language.

  • Hive is optimized for batch processing while SQL is optimized for transaction processing.

  • Hive...read more

Add your answer

Q4. What is partition in hive?

Ans.

Partition in Hive is a way to organize data in a table into multiple directories based on the values of one or more columns.

  • Partitions help in improving query performance by allowing Hive to only read the relevant data directories.

  • Partitions are defined when creating a table in Hive using the PARTITIONED BY clause.

  • Example: CREATE TABLE table_name (column1 INT, column2 STRING) PARTITIONED BY (column3 STRING);

Add your answer
Frequently asked in
Are these interview questions helpful?

Q5. What are the analytical function does HIVE support

Ans.

Hive supports various analytical functions for data processing and analysis.

  • Hive supports aggregation functions like SUM, COUNT, AVG, MIN, MAX, etc.

  • It also supports window functions like ROW_NUMBER, RANK, LAG, LEAD, etc.

  • Hive provides statistical functions such as CORR, COVAR_POP, STDDEV, etc.

  • It supports date and time functions like YEAR, MONTH, DAY, HOUR, etc.

  • Hive also offers string manipulation functions like CONCAT, SUBSTRING, LENGTH, etc.

Add your answer
Frequently asked in

Q6. What are ACID properties in HIVE table

Ans.

ACID properties ensure data consistency and reliability in HIVE tables.

  • ACID stands for Atomicity, Consistency, Isolation, and Durability.

  • Atomicity ensures that a transaction is treated as a single unit of work.

  • Consistency ensures that the data remains in a valid state before and after the transaction.

  • Isolation ensures that concurrent transactions do not interfere with each other.

  • Durability ensures that once a transaction is committed, it is permanent and can survive system fa...read more

Add your answer
Frequently asked in
Share interview questions and help millions of jobseekers 🌟

Q7. How you do data validation in hive

Ans.

Data validation in Hive involves using built-in functions and custom scripts to ensure data accuracy and consistency.

  • Use built-in functions like IS NULL, IS NOT NULL, and COALESCE to check for missing or null values

  • Use regular expressions and pattern matching to validate data formats

  • Write custom scripts to perform more complex data validation tasks

  • Perform data profiling to identify potential data quality issues

  • Use data quality tools like Apache Nifi or Talend to automate data...read more

Add your answer
Frequently asked in

Q8. How to troubleshoot hive slowness

Ans.

To troubleshoot hive slowness, check for resource contention, optimize queries, and monitor system performance.

  • Check for resource contention such as CPU, memory, and disk usage

  • Optimize queries by reducing data scanned and avoiding unnecessary joins

  • Monitor system performance using tools like Ganglia or Ambari

  • Consider partitioning tables to improve query performance

  • Use compression to reduce disk I/O and network traffic

Add your answer

Hive Jobs

Data Engineer-Data Platforms 6-10 years
IBM India Pvt. Limited
4.0
Hyderabad / Secunderabad
Data Engineer-Data Platforms-AWS 2-6 years
IBM India Pvt. Limited
4.0
Pune
Data Engineer-Data Platforms 6-10 years
IBM India Pvt. Limited
4.0
Mysuru / Mysore

Q9. Add data into a partitioned hive table

Ans.

To add data into a partitioned hive table, you can use the INSERT INTO statement with the PARTITION clause.

  • Use INSERT INTO statement to add data into the table.

  • Specify the partition column values using the PARTITION clause.

  • Example: INSERT INTO table_name PARTITION (partition_column=value) VALUES (data);

Add your answer
Frequently asked in

Q10. explain details of hive meta store

Ans.

Hive meta store stores metadata about Hive tables, partitions, columns, and storage location.

  • Hive meta store is a central repository that stores metadata information about Hive tables, partitions, columns, and storage location.

  • It stores this metadata in a relational database like MySQL, Derby, or PostgreSQL.

  • The metadata includes information such as table names, column names, data types, file formats, and storage location.

  • Hive uses the meta store to retrieve metadata informati...read more

Add your answer
Frequently asked in

Q11. Hive types of tables and difference between them

Ans.

Hive has two types of tables - Managed and External. Managed tables are managed by Hive, while External tables are managed outside of Hive.

  • Managed tables are created using 'CREATE TABLE' command and data is stored in Hive's warehouse directory

  • External tables are created using 'CREATE EXTERNAL TABLE' command and data is stored outside of Hive's warehouse directory

  • Managed tables are deleted when the table is dropped, while External tables are not

  • Managed tables have full control...read more

Add your answer

Q12. Hive vs external table?

Ans.

Hive tables store data in HDFS while external tables reference data stored outside HDFS.

  • Hive tables store data in HDFS, while external tables reference data stored outside HDFS.

  • External tables are useful when data needs to be shared across different Hive instances or when data is already stored in a different location.

  • Hive tables are managed by Hive and are deleted when the table is dropped, while external tables are not deleted when the table is dropped.

  • Example: Creating a H...read more

Add your answer
Frequently asked in

Q13. Hive External vs managed

Ans.

Hive External vs managed

  • Hive External tables store data outside of the Hive warehouse directory

  • Managed tables store data in the Hive warehouse directory

  • External tables can be used to access data from different storage systems

  • Managed tables are easier to manage as Hive takes care of data storage and metadata

  • External tables require manual management of data and metadata

Add your answer
Frequently asked in

Q14. Hive table loads on incremental process

Ans.

Hive table loads can be done incrementally by using partitioning or by using timestamp columns.

  • Partition the table based on a column like date or timestamp to load data incrementally

  • Use dynamic partition inserts to add new data to specific partitions

  • Utilize timestamp columns to filter and load only new data

  • Consider using tools like Apache NiFi or Apache Sqoop for efficient incremental data loading

Add your answer
Frequently asked in

Q15. Duplicate Records elimination in hive table

Ans.

Use DISTINCT keyword in SELECT query to eliminate duplicate records in a Hive table

  • Use SELECT DISTINCT * FROM table_name to retrieve unique records

  • Consider using GROUP BY clause with appropriate columns to eliminate duplicates

  • Utilize the ROW_NUMBER() window function to assign a unique row number to each record and filter out duplicates

Add your answer
Frequently asked in

Q16. What are external internal tables in hive?

Ans.

External and internal tables in Hive are two types of tables used to store data in Hive.

  • External tables store data outside of Hive's warehouse directory, while internal tables store data within the warehouse directory

  • External tables do not delete data when the table is dropped, while internal tables delete data

  • External tables are useful for sharing data between different Hive instances or with external systems

  • Internal tables are managed by Hive, while external tables are mana...read more

Add your answer
Frequently asked in

Q17. What is hive in big data

Ans.

Hive is a data warehouse infrastructure built on top of Hadoop for providing data summarization, query, and analysis.

  • Hive uses a SQL-like language called HiveQL to query and manage large datasets stored in Hadoop

  • It allows users to write complex queries to analyze and process data

  • Hive organizes data into tables, partitions, and buckets for efficient querying

  • It is commonly used for data warehousing, data analysis, and data processing tasks

Add your answer

Q18. What are the tables in hive

Ans.

Hive tables are used to store structured data in Hive, similar to tables in a traditional database.

  • Hive tables are created using the CREATE TABLE statement.

  • Tables can be partitioned based on one or more columns.

  • External tables in Hive store data outside of the default location in HDFS.

  • Managed tables store data in the default location in HDFS.

  • Tables can be queried using SQL-like syntax in HiveQL.

Add your answer

Q19. Architecture of hive,types of hive table, file formats in hive, dynamic partition in hive

Ans.

Hive architecture, table types, file formats, and dynamic partitioning.

  • Hive architecture consists of metastore, driver, compiler, and execution engine.

  • Hive tables can be of two types: managed tables and external tables.

  • File formats supported by Hive include text, sequence, ORC, and Parquet.

  • Dynamic partitioning allows automatic creation of partitions based on data.

Add your answer

Q20. Advantages and disadvantages of Hive?

Ans.

Hive is a data warehouse infrastructure built on top of Hadoop for providing data summarization, query, and analysis.

  • Advantages: SQL-like query language for querying large datasets, optimized for OLAP workloads, supports partitioning and bucketing for efficient queries.

  • Disadvantages: Slower performance compared to traditional databases for OLTP workloads, limited support for complex queries and transactions.

  • Example: Hive can be used to analyze large volumes of log data to ext...read more

Add your answer
Frequently asked in
Interview Tips & Stories
Ace your next interview with expert advice and inspiring stories

Interview Questions of Hive Related Designations

Interview experiences of popular companies

3.7
 • 10.5k Interviews
3.7
 • 5.7k Interviews
3.5
 • 3.9k Interviews
View all
Hive Interview Questions
Share an Interview
Stay ahead in your career. Get AmbitionBox app
qr-code
Helping over 1 Crore job seekers every month in choosing their right fit company
70 Lakh+

Reviews

5 Lakh+

Interviews

4 Crore+

Salaries

1 Cr+

Users/Month

Contribute to help millions

Made with ❤️ in India. Trademarks belong to their respective owners. All rights reserved © 2024 Info Edge (India) Ltd.

Follow us
  • Youtube
  • Instagram
  • LinkedIn
  • Facebook
  • Twitter