Beeline
IBM Interview Questions and Answers
Q1. Difference between bucket and partitioning in hive
Bucketing is a way of organizing data files into multiple files based on a hash function, while partitioning is dividing data into different directories based on the column values.
Bucketing is used for evenly distributing data across files for better query performance.
Partitioning is used for organizing data based on specific column values for easier data retrieval.
Example: Bucketing can be used to evenly distribute sales data into 4 files, while partitioning can organize the...read more
Q2. What are the tables in hive
Hive tables are used to store structured data in Hive, similar to tables in a traditional database.
Hive tables are created using the CREATE TABLE statement.
Tables can be partitioned based on one or more columns.
External tables in Hive store data outside of the default location in HDFS.
Managed tables store data in the default location in HDFS.
Tables can be queried using SQL-like syntax in HiveQL.
Q3. Types of read mode in spark
Types of read mode in Spark include permissive, dropMalformed, and failFast.
Permissive mode - ignores corrupted records and loads all possible data
DropMalformed mode - drops corrupted records during reading
FailFast mode - fails immediately upon encountering corrupted records
Q4. JDK, JVM, JRE difference
JDK is a development kit, JRE is a runtime environment, and JVM is a virtual machine that executes Java bytecode.
JDK (Java Development Kit) is a software development kit used to develop Java applications.
JRE (Java Runtime Environment) is a software package that provides the libraries and components necessary for running Java applications.
JVM (Java Virtual Machine) is an abstract machine that provides the runtime environment in which Java bytecode can be executed.
JDK includes ...read more
Q5. JVM GC description
JVM GC manages memory by reclaiming unused objects, improving performance.
JVM GC stands for Java Virtual Machine Garbage Collection
It automatically manages memory by reclaiming unused objects
Different types of GC algorithms like Serial, Parallel, CMS, G1 are available
GC can cause pauses in application execution, affecting performance
Q6. Generics in kotlin
Generics in Kotlin allow you to write flexible and reusable code by defining classes, functions, and interfaces with type parameters.
Generics in Kotlin are defined using angle brackets <> after the class name or function name.
You can specify the type parameter when creating an instance of a generic class or calling a generic function.
Generics help in writing type-safe code and avoid the need for casting.
Example: List
is a generic type representing a list of strings.
Q7. Partition in spark
Partition in Spark is a way to divide data into smaller chunks for parallel processing.
Partitions are basic units of parallelism in Spark
Data in RDDs are divided into partitions which are processed in parallel
Number of partitions can be controlled using repartition() or coalesce() methods
Top Software Engineer Interview Questions from Similar Companies
Reviews
Interviews
Salaries
Users/Month