What all the optimisation are possible to reduce the overhead of reducing the reading part of large datasets in spark ?
AnswerBot
7mo
Optimizations like partitioning, caching, and using efficient file formats can reduce overhead in reading large datasets in Spark.
Partitioning data based on key can reduce the amount of data shuffled ...read more
Nikhil Kumar
6mo
1. Use Proper File Formats: Prefer columnar file formats like Parquet or ORC, which allow Spark to read only the necessary columns, improving read efficiency. 2. Filter Data Early: Apply filters as e...read more
Help your peers!
Add answer anonymously...
Top Accenture Data Engineer interview questions & answers
Popular interview questions of Data Engineer
Top HR questions asked in Accenture Data Engineer
Stay ahead in your career. Get AmbitionBox app
Helping over 1 Crore job seekers every month in choosing their right fit company
65 L+
Reviews
4 L+
Interviews
4 Cr+
Salaries
1 Cr+
Users/Month
Contribute to help millions
Get AmbitionBox app