1) How to handle data skewness in spark.

AnswerBot
1y

Data skewness in Spark can be handled by partitioning, bucketing, or using salting techniques.

  • Partitioning the data based on a key column can distribute the data evenly across the nodes.

  • Bucketing can ...read more

Amol Vitthal Khade
1y

if one executer got the lot of load in work node after the data shuffling we call it as a data skewness.

Boddu SatishKumar
1y

1. Repartition by Column(s)

The first solution is to logically re-partition your data based on the transformations in your script. In short, if you’re grouping or joining, partitioning by the groupBy/j...read more

Sanket Kailas Gorane
1y

We handle skewness via

1) log transform

2) square root transform

Tejaswini Kotkar
2y

We can drop the tables including back-up tables associated with that db to reduce skewness 

Add answer anonymously...
IBM Data Engineer Interview Questions
Stay ahead in your career. Get AmbitionBox app
qr-code
Helping over 1 Crore job seekers every month in choosing their right fit company
65 L+

Reviews

4 L+

Interviews

4 Cr+

Salaries

1 Cr+

Users/Month

Contribute to help millions
Get AmbitionBox app

Made with ❤️ in India. Trademarks belong to their respective owners. All rights reserved © 2024 Info Edge (India) Ltd.

Follow us
  • Youtube
  • Instagram
  • LinkedIn
  • Facebook
  • Twitter