Premium Employer

Impetus Technologies

3.5
based on 468 Reviews
Filter interviews by

AstaGuru Auction House Interview Questions and Answers

Updated 5 Feb 2024
Popular Designations

Q1. Difference between partitioning and bucketing. Types of joins in spark Optimization Techniques in spark Broadcast variable and broadcast join Difference between ORC and Parquet Difference between RDD and Datafr...

read more
Ans.

Explaining partitioning, bucketing, joins, optimization, broadcast variables, ORC vs Parquet, RDD vs Dataframe, project architecture and responsibilities for Big Data Engineer role.

  • Partitioning is dividing data into smaller chunks for parallel processing, while bucketing is organizing data into buckets based on a hash function.

  • Types of joins in Spark include inner join, outer join, left join, right join, and full outer join.

  • Optimization techniques in Spark include caching, re...read more

Add your answer

Q2. Second round: spark how to handle upserts in spark

Ans.

Spark can handle upserts using merge() function

  • Use merge() function to handle upserts in Spark

  • Specify the primary key column(s) to identify matching rows

  • Specify the update column(s) to update existing rows

  • Specify the insert column(s) to insert new rows

  • Example: df1.merge(df2, on='id', whenMatched='update', whenNotMatched='insert')

Add your answer

Q3. SQL question Remove duplicate records 5th highest salary department wise

Ans.

Remove duplicate records and find 5th highest salary department wise using SQL.

  • Use DISTINCT keyword to remove duplicate records.

  • Use GROUP BY clause to group the records by department.

  • Use ORDER BY clause to sort the salaries in descending order.

  • Use LIMIT clause to get the 5th highest salary.

  • Combine all the above clauses to get the desired result.

Add your answer

Q4. Spark memory optimisation techniques

Ans.

Spark memory optimisation techniques

  • Use broadcast variables to reduce memory usage

  • Use persist() or cache() to store RDDs in memory

  • Use partitioning to reduce shuffling and memory usage

  • Use off-heap memory to avoid garbage collection overhead

  • Tune memory settings such as spark.driver.memory and spark.executor.memory

Add your answer
Discover AstaGuru Auction House interview dos and don'ts from real experiences

Q5. Hadoop serialisation techniques.

Ans.

Hadoop serialisation techniques are used to convert data into a format that can be stored and processed in Hadoop.

  • Hadoop uses Writable interface for serialisation and deserialisation of data

  • Avro, Thrift, and Protocol Buffers are popular serialisation frameworks used in Hadoop

  • Serialisation can be customised using custom Writable classes or external libraries

  • Serialisation plays a crucial role in Hadoop performance and efficiency

Add your answer

Q6. Java collection vs collections

Ans.

Java collection is a single interface while collections is a utility class.

  • Java collection is an interface that provides a unified architecture for manipulating and storing groups of objects.

  • Collections is a utility class that provides static methods for working with collections.

  • Java collection is a part of the Java Collections Framework while collections is not.

  • Examples of Java collections include List, Set, and Map while examples of methods in collections include sort, reve...read more

Add your answer
Contribute & help others!
Write a review
Share interview
Contribute salary
Add office photos
Interview Tips & Stories
Ace your next interview with expert advice and inspiring stories
Share an Interview
Stay ahead in your career. Get AmbitionBox app
qr-code
Helping over 1 Crore job seekers every month in choosing their right fit company
70 Lakh+

Reviews

5 Lakh+

Interviews

4 Crore+

Salaries

1 Cr+

Users/Month

Contribute to help millions

Made with ❤️ in India. Trademarks belong to their respective owners. All rights reserved © 2024 Info Edge (India) Ltd.

Follow us
  • Youtube
  • Instagram
  • LinkedIn
  • Facebook
  • Twitter