What all the optimisation are possible to reduce the overhead of reducing the reading part of large datasets in spark ?

AnswerBot
7mo

Optimizations like partitioning, caching, and using efficient file formats can reduce overhead in reading large datasets in Spark.

  • Partitioning data based on key can reduce the amount of data shuffled ...read more

Nikhil Kumar
6mo
1. Use Proper File Formats: Prefer columnar file formats like Parquet or ORC, which allow Spark to read only the necessary columns, improving read efficiency. 2. Filter Data Early: Apply filters as e...read more
Help your peers!
Add answer anonymously...
Accenture Data Engineer Interview Questions
Stay ahead in your career. Get AmbitionBox app
qr-code
Helping over 1 Crore job seekers every month in choosing their right fit company
65 L+

Reviews

4 L+

Interviews

4 Cr+

Salaries

1 Cr+

Users/Month

Contribute to help millions

Made with ❤️ in India. Trademarks belong to their respective owners. All rights reserved © 2024 Info Edge (India) Ltd.

Follow us
  • Youtube
  • Instagram
  • LinkedIn
  • Facebook
  • Twitter