Must be strong with Python for ML pipelines specifically with Pytorch and scikit-learn AWS is required, building pipelines within Should have a background in LLM (langchain, agents, extensive prompt engineering)
The "strong additional requirements" below are required.
Responsibilities:
Ingesting, structuring and analyzing a wide range of unstructured datasources
Designing, maintaining and orchestrating data pipelines in an AWS environment for production processing and training flows
Continuously evaluate, analyze, test and improve the quality, privacy and performance of our data systems
Contribute across the product, where - from front-end UX and product design, API/systems architecture and ML processing/training
Minimum Qualifications:
3+ years of experience ingesting, analyzing and structuring a wide variety of datasources
Significant experience building and maintaining data pipelines in a production environment