Hadoop Developer

filter-iconFilter interviews by

Hadoop Developer Interview Questions and Answers

Updated 25 Jun 2022

Popular Companies

search-icon

Q1. How to ingest csv file to spark dataframe and write it to hive table.

Ans.

Ingest CSV file to Spark dataframe and write to Hive table.

  • Create SparkSession object

  • Read CSV file using SparkSession.read.csv() method

  • Create a dataframe from the CSV file

  • Create a Hive table using SparkSession.sql() method

  • Write the dataframe to the Hive table using dataframe.write.saveAsTable() method

Q2. Architecture of spark. What is lazy evaluation? Difference between repartition and coalesce function?

Ans.

Spark architecture, lazy evaluation, repartition vs coalesce

  • Spark architecture consists of a driver program, cluster manager, and worker nodes

  • Lazy evaluation is a feature of Spark where transformations are not executed until an action is called

  • Repartition function shuffles data across partitions while coalesce reduces the number of partitions

  • Repartition can increase or decrease the number of partitions while coalesce only decreases

  • Repartition is a costly operation while coale...read more

Q3. What is mapreduce? Advantages of spark over Hadoop

Ans.

MapReduce is a programming model and software framework for processing large amounts of data in parallel on a cluster.

  • MapReduce is used for distributed processing of big data

  • It consists of two phases: Map and Reduce

  • Map phase processes input data and produces intermediate key-value pairs

  • Reduce phase takes the output of the Map phase and combines the values for each key

  • MapReduce is fault-tolerant and highly scalable

  • Example: Word count program in MapReduce

Q4. What is Managed table and External table in hive

Ans.

Managed tables are physically stored in Hive's warehouse directory while external tables are not.

  • Managed tables are created and managed by Hive while external tables are created outside of Hive.

  • Managed tables are physically stored in Hive's warehouse directory while external tables are not.

  • Managed tables are deleted when the table is dropped while external tables are not.

  • Managed tables are used for internal purposes while external tables are used for external purposes.

  • Example...read more

Are these interview questions helpful?

Q5. What is the role of boundary query in sqoop

Ans.

Boundary query in Sqoop is used to import data within a specific range of values.

  • Boundary query is used to import data within a specific range of values

  • It is used with the --boundary-query option in Sqoop

  • It is useful when importing large datasets and you only need a subset of the data

  • For example, importing data from a database table where the values in a particular column fall within a specific range

Q6. Architecture of hive,types of hive table, file formats in hive, dynamic partition in hive

Ans.

Hive architecture, table types, file formats, and dynamic partitioning.

  • Hive architecture consists of metastore, driver, compiler, and execution engine.

  • Hive tables can be of two types: managed tables and external tables.

  • File formats supported by Hive include text, sequence, ORC, and Parquet.

  • Dynamic partitioning allows automatic creation of partitions based on data.

Share interview questions and help millions of jobseekers 🌟

man-with-laptop

Q7. What is top command in shell scripting

Ans.

Top command is a Linux utility that displays the system's processes in real-time.

  • Displays the processes running on the system

  • Updates the list of processes in real-time

  • Provides information on CPU usage, memory usage, and process IDs

  • Can be used to monitor system performance and identify resource-intensive processes

Q8. Joins window functions in spark, partition vs colsec, performance optimization techniques

Ans.

The question is about joins, window functions, partition vs colsec, and performance optimization techniques in Spark.

  • Joins in Spark can be performed using various methods such as broadcast join, shuffle join, and sort-merge join.

  • Window functions in Spark allow us to perform calculations across a group of rows that are related to the current row.

  • Partitioning in Spark can be done based on columns or keys, and it affects the performance of operations such as joins and aggregatio...read more

Hadoop Developer Jobs

Hadoop Developer 3-8 years
Wipro Limited
3.7
Noida
Hadoop Developer 4-9 years
HCLTech
3.5
Bangalore / Bengaluru
Hadoop Developer 2-7 years
Wipro Limited
3.7
Bangalore / Bengaluru
Interview Tips & Stories
Ace your next interview with expert advice and inspiring stories

Interview experiences of popular companies

3.6
 • 7.5k Interviews
3.8
 • 2.9k Interviews
3.7
 • 535 Interviews
3.9
 • 484 Interviews
View all

Calculate your in-hand salary

Confused about how your in-hand salary is calculated? Enter your annual salary (CTC) and get your in-hand salary

Recently Viewed
LIST OF COMPANIES
Credit Bajaar
Overview
PHOTOS
InsuranceDekho
3 office photos
SALARIES
eTeam
LIST OF COMPANIES
Randstad
Overview
LIST OF COMPANIES
LanceSoft
Overview
JOBS
Careernet
No Jobs
JOBS
Upraisal
No Jobs
LIST OF COMPANIES
Talentpro
Overview
JOBS
Talentpro
No Jobs
LIST OF COMPANIES
CIEL HR
Overview
Hadoop Developer Interview Questions
Share an Interview
Stay ahead in your career. Get AmbitionBox app
qr-code
Helping over 1 Crore job seekers every month in choosing their right fit company
65 L+

Reviews

4 L+

Interviews

4 Cr+

Salaries

1 Cr+

Users/Month

Contribute to help millions

Made with ❤️ in India. Trademarks belong to their respective owners. All rights reserved © 2024 Info Edge (India) Ltd.

Follow us
  • Youtube
  • Instagram
  • LinkedIn
  • Facebook
  • Twitter