1) How to handle data skewness in spark.
AnswerBot
1y
Data skewness in Spark can be handled by partitioning, bucketing, or using salting techniques.
Partitioning the data based on a key column can distribute the data evenly across the nodes.
Bucketing can ...read more
Amol Vitthal Khade
1y
if one executer got the lot of load in work node after the data shuffling we call it as a data skewness.
Boddu SatishKumar
1y
1. Repartition by Column(s)
The first solution is to logically re-partition your data based on the transformations in your script. In short, if you’re grouping or joining, partitioning by the groupBy/j...read more
Sanket Kailas Gorane
1y
We handle skewness via
1) log transform
2) square root transform
Tejaswini Kotkar
2y
We can drop the tables including back-up tables associated with that db to reduce skewness
Add answer anonymously...
Top IBM Data Engineer interview questions & answers
Popular interview questions of Data Engineer
Stay ahead in your career. Get AmbitionBox app
Helping over 1 Crore job seekers every month in choosing their right fit company
65 L+
Reviews
4 L+
Interviews
4 Cr+
Salaries
1 Cr+
Users/Month
Contribute to help millions
Get AmbitionBox app