Hadoop Developer
Hadoop Developer Interview Questions and Answers
Popular Companies
Q1. How to ingest csv file to spark dataframe and write it to hive table.
Ingest CSV file to Spark dataframe and write to Hive table.
Create SparkSession object
Read CSV file using SparkSession.read.csv() method
Create a dataframe from the CSV file
Create a Hive table using SparkSession.sql() method
Write the dataframe to the Hive table using dataframe.write.saveAsTable() method
Q2. Architecture of spark. What is lazy evaluation? Difference between repartition and coalesce function?
Spark architecture, lazy evaluation, repartition vs coalesce
Spark architecture consists of a driver program, cluster manager, and worker nodes
Lazy evaluation is a feature of Spark where transformations are not executed until an action is called
Repartition function shuffles data across partitions while coalesce reduces the number of partitions
Repartition can increase or decrease the number of partitions while coalesce only decreases
Repartition is a costly operation while coale...read more
Q3. What is mapreduce? Advantages of spark over Hadoop
MapReduce is a programming model and software framework for processing large amounts of data in parallel on a cluster.
MapReduce is used for distributed processing of big data
It consists of two phases: Map and Reduce
Map phase processes input data and produces intermediate key-value pairs
Reduce phase takes the output of the Map phase and combines the values for each key
MapReduce is fault-tolerant and highly scalable
Example: Word count program in MapReduce
Q4. What is Managed table and External table in hive
Managed tables are physically stored in Hive's warehouse directory while external tables are not.
Managed tables are created and managed by Hive while external tables are created outside of Hive.
Managed tables are physically stored in Hive's warehouse directory while external tables are not.
Managed tables are deleted when the table is dropped while external tables are not.
Managed tables are used for internal purposes while external tables are used for external purposes.
Example...read more
Q5. What is the role of boundary query in sqoop
Boundary query in Sqoop is used to import data within a specific range of values.
Boundary query is used to import data within a specific range of values
It is used with the --boundary-query option in Sqoop
It is useful when importing large datasets and you only need a subset of the data
For example, importing data from a database table where the values in a particular column fall within a specific range
Q6. Architecture of hive,types of hive table, file formats in hive, dynamic partition in hive
Hive architecture, table types, file formats, and dynamic partitioning.
Hive architecture consists of metastore, driver, compiler, and execution engine.
Hive tables can be of two types: managed tables and external tables.
File formats supported by Hive include text, sequence, ORC, and Parquet.
Dynamic partitioning allows automatic creation of partitions based on data.
Share interview questions and help millions of jobseekers 🌟
Q7. What is top command in shell scripting
Top command is a Linux utility that displays the system's processes in real-time.
Displays the processes running on the system
Updates the list of processes in real-time
Provides information on CPU usage, memory usage, and process IDs
Can be used to monitor system performance and identify resource-intensive processes
Q8. Joins window functions in spark, partition vs colsec, performance optimization techniques
The question is about joins, window functions, partition vs colsec, and performance optimization techniques in Spark.
Joins in Spark can be performed using various methods such as broadcast join, shuffle join, and sort-merge join.
Window functions in Spark allow us to perform calculations across a group of rows that are related to the current row.
Partitioning in Spark can be done based on columns or keys, and it affects the performance of operations such as joins and aggregatio...read more
Hadoop Developer Jobs
Interview Questions of Similar Designations
Interview experiences of popular companies
Calculate your in-hand salary
Confused about how your in-hand salary is calculated? Enter your annual salary (CTC) and get your in-hand salary
Reviews
Interviews
Salaries
Users/Month