What are the optimization techniques applied in pyspark code?

AnswerBot
1y

Optimization techniques in PySpark code include partitioning, caching, and using broadcast variables.

  • Partitioning data based on key columns to optimize join operations

  • Caching frequently accessed data ...read more

Saravanan Kumar
1y
  1. Broadcast join: is recommended If you have one dataset which is smaller than other dataset during join.

To use the Broadcast join:

(df1. join(broadcast(df2))

  • Broadcasting plays an important role whil...read more

Varsha Madhukar Bhujbal
1y

Cache & Persist

File format selection

Repartition & colease

Serialization

Add answer anonymously...
KPMG India Data Engineer Interview Questions
Stay ahead in your career. Get AmbitionBox app
qr-code
Helping over 1 Crore job seekers every month in choosing their right fit company
65 L+

Reviews

4 L+

Interviews

4 Cr+

Salaries

1 Cr+

Users/Month

Contribute to help millions
Get AmbitionBox app

Made with ❤️ in India. Trademarks belong to their respective owners. All rights reserved © 2024 Info Edge (India) Ltd.

Follow us
  • Youtube
  • Instagram
  • LinkedIn
  • Facebook
  • Twitter