how will you handle 100 files of 100 GB size files in pyspark. Design end to end pipleline.

AnswerBot
6mo

I will use PySpark to handle 100 files of 100 GB size in an end-to-end pipeline.

  • Use PySpark to distribute processing across a cluster of machines

  • Read files in parallel using SparkContext and SparkSess...read more

Help your peers!
Add answer anonymously...
Coforge Big Data Engineer Lead Interview Questions
Stay ahead in your career. Get AmbitionBox app
qr-code
Helping over 1 Crore job seekers every month in choosing their right fit company
65 L+

Reviews

4 L+

Interviews

4 Cr+

Salaries

1 Cr+

Users/Month

Contribute to help millions
Get AmbitionBox app

Made with ❤️ in India. Trademarks belong to their respective owners. All rights reserved © 2024 Info Edge (India) Ltd.

Follow us
  • Youtube
  • Instagram
  • LinkedIn
  • Facebook
  • Twitter