What are the optimization techniques applied in pyspark code?
AnswerBot
1y
Optimization techniques in PySpark code include partitioning, caching, and using broadcast variables.
Partitioning data based on key columns to optimize join operations
Caching frequently accessed data ...read more
Saravanan Kumar
1y
Broadcast join: is recommended If you have one dataset which is smaller than other dataset during join.
To use the Broadcast join:
(df1. join(broadcast(df2))
Broadcasting plays an important role whil...read more
Varsha Madhukar Bhujbal
1y
Cache & Persist
File format selection
Repartition & colease
Serialization
Add answer anonymously...
Top KPMG India Data Engineer interview questions & answers
Popular interview questions of Data Engineer
Top HR questions asked in KPMG India Data Engineer
Stay ahead in your career. Get AmbitionBox app
Helping over 1 Crore job seekers every month in choosing their right fit company
65 L+
Reviews
4 L+
Interviews
4 Cr+
Salaries
1 Cr+
Users/Month
Contribute to help millions
Get AmbitionBox app