33 Whizz HR Jobs
Data Scientist - Python/PySpark (4-8 yrs)
Whizz HR
posted 17d ago
Flexible timing
Key skills for the job
Job Description :
The requirements for the candidate.
Key Responsibilities :
- Build and maintain scalable data pipelines and data assets using best practices.
- Develop, optimize, and refactor code in Python and PySpark, adhering to Python frameworks, libraries, and best practices.
- Enhance the performance and efficiency of Spark SQL and PySpark code.
- Work with AWS services like S3, EC2, Lambda, Redshift, CloudFormation, etc., and articulate the benefits of each service.
- Modernize and clean up legacy codebases for better readability, maintainability, and performance.
- Testing and Debugging : Implement unit tests, adopt TDD practices, and resolve complex bugs, including performance, concurrency, and logic flaws.
- Utilize tools like Git for code versioning and manage artifacts with JFrog Artifactory.
- Work closely with cross-functional teams to understand data requirements and deliver robust solutions.
Required Skills & Qualifications :
- 3+ years of experience building data pipelines and data assets.
- 4+ years of hands-on experience in Python, PySpark, and Boto3, with a deep understanding of Python frameworks and libraries.
- In-depth understanding of AWS services such as S3, EC2, Lambda, Redshift, CloudFormation, and the ability to leverage
their benefits.
- Proven ability to optimize Spark SQL and PySpark code for performance and scalability.
- Experience modernizing and improving legacy codebases.
- Strong experience in unit testing, TDD, and debugging complex issues.
- Proficiency with Git and JFrog Artifactory for version control and artifact management.
- Experience with data visualization tools and libraries.
- Familiarity with big data technologies such as Hadoop or Kafka.
- Knowledge of CI/CD pipelines and DevOps practices.
- Certifications in AWS or related technologies.
Functional Areas: Other
Read full job description7-10 Yrs