How to remove Duplicates in Data frame using pyspark?
AnswerBot
7mo
Use dropDuplicates() function in pyspark to remove duplicates in a data frame.
Use dropDuplicates() function on the data frame to remove duplicates based on all columns.
Specify subset of columns to rem...read more
Help your peers!
Add answer anonymously...
Top Capgemini Data Engineer interview questions & answers
Popular interview questions of Data Engineer
Top HR questions asked in Capgemini Data Engineer
Stay ahead in your career. Get AmbitionBox app
Helping over 1 Crore job seekers every month in choosing their right fit company
65 L+
Reviews
4 L+
Interviews
4 Cr+
Salaries
1 Cr+
Users/Month
Contribute to help millions
Get AmbitionBox app