- The Document Platforms and AI Team has delivered breakthrough AI-powered solutions that have significantly influenced business outcomes over the past three years.
- In this role, you will contribute to the next generation of AI-driven products, enhancing and optimizing existing solutions while developing new capabilities.
What s in it for you: - Be part of a global enterprise and build AI solutions at scale.
- Work alongside a highly skilled and technically strong team.
- Contribute to solving high-complexity, high-impact challenges in data transformation and machine learning.
Responsibilities: -
Build production ready data acquisition and transformation pipelines from ideation to deployment.
-
Being a hands-on problem solver and developer helping to extend and manage the data platforms.
-
Apply best practices in data modeling and building ETL pipelines (streaming and batch) using cloud-native solutions
-
Model development: Design, develop, and evaluate state-of-the-art machine learning models for information extraction, leveraging techniques from NLP, computer vision (if applicable), and other relevant domains.
-
Data preprocessing and feature engineering: Develop robust pipelines for data cleaning, preprocessing, and feature engineering to prepare data for model training.
-
Model training and evaluation: Train, tune, and evaluate machine learning models, ensuring high accuracy, efficiency, and scalability.
-
Deployment and monitoring: Deploy and maintain machine learning models in a production environment, monitoring their performance and ensuring their reliability.
-
Research and innovation: Stay up-to-date with the latest advancements in machine learning and NLP, and explore new techniques and technologies to improve the extraction process.
-
Collaboration: Work closely with product managers, data scientists, and other engineers to understand project requirements and deliver effective solutions.
-
Code quality and best practices: Ensure high code quality and adherence to best practices for software development.
-
Communication: Effectively communicate technical concepts and project updates to both technical and non-technical audiences.
What We re Looking For:
-
4+ years of professional software work experience, with a strong focus on Machine Learning, Natural Language Processing (NLP) for information extraction and MLOps
-
Expertise in Python and related NLP libraries (e.g., spaCy, NLTK, Transformers, Hugging Face)
-
Experience with Apache Spark or other distributed computing frameworks for large-scale data processing.
-
AWS/GCP Cloud expertise, particularly in deploying and scaling ML pipelines for NLP tasks.
-
Solid understanding of the Machine Learning model lifecycle, including data preprocessing, feature engineering, model training, evaluation, deployment, and monitoring, specifically for information extraction models .
-
Experience with CI/CD pipelines for ML models, including automated testing and deployment.
-
Docker Kubernetes experience for containerization and orchestration.
-
OOP Design patterns, Test-Driven Development and Enterprise System design
-
SQL (any variant, bonus if this is a big data variant)
-
Linux OS (e.g. bash toolset and other utilities)
-
Version control system experience with Git, GitHub, or Azure DevOps.
-
Excellent Problem-solving, Code Review and Debugging skills
-
Software craftsmanship, adherence to Agile principles and taking pride in writing good code
-
Techniques to communicate change to non-technical people
Nice to have
-
Core Java 17+, preferably Java 21+, and associated toolchain
-
Apache Avro
-
Apache Kafka
-
Other JVM based languages - e.g. Kotlin, Scala
Employment Type: Full Time, Permanent
Read full job description