10 StockX Jobs
StockX - MLOps Engineer - AWS Cloud Platform (5-9 yrs)
StockX
posted 12hr ago
Flexible timing
Key skills for the job
Job Description :
We are seeking a versatile and skilled MLOps Engineer with expertise in DevOps, CloudOps (preferably AWS Cloud), and foundational knowledge of Data Engineering and Software Engineering. The ideal candidate should have 4-5 years of relevant experience, a solid understanding of deploying, managing, and scaling machine learning pipelines in production, and experience in backend software development and API design.
Responsibilities :
- Design, build, and maintain end-to-end machine learning pipelines, including model training, validation, deployment, monitoring, and updating.
- Implement CI/CD pipelines tailored for machine learning workflows.
- Enhance ML Pipeline Efficiency Directly improve the robustness scalability and performance of our TensorFlow and Kubeflow pipelines.
- Focus on optimizing model training and serving processes minimizing downtime and automating routine tasks to increase operational efficiency.
- Drive Platform Scaling and Innovation Take a leadership role in expanding our ML platform's capacity to manage larger data volumes and more complex models.
- Research and integrate cutting-edge technologies to develop scalable architectures and elevate system performance and efficiency through continuous enhancements
- Establish MLOps Excellence Design and implement robust MLOps frameworks that streamline the integration continuous deployment and monitoring of ML models.
- Set up comprehensive CI/CD pipelines to automate testing and create monitoring tools to proactively track model performance and detect issues.
- Foster Cross-Functional Collaboration Partner with data scientists software engineers and product teams to transform business requirements into scalable and dependable machine learning solutions.
- Bridge the gap between model development and deployment ensuring models are production-ready and align with performance standards.
- Overcome Production Challenges Proactively monitor troubleshoot and resolve issues affecting model performance data pipeline integrity and system efficiency.
- Identify root causes and implement strategic solutions to ensure the ongoing stability and performance of our ML infrastructure.
- Streamline cloud infrastructure (AWS preferred), automate deployments with IaC tools, and ensure scalable, reliable, and high-performing ML workflows.
- Deploy ML models with containerization and orchestration, and establish monitoring systems for performance, data drift, and system health.
- Develop scalable backend solutions, integrate ML models via APIs, and ensure clean code practices, with optional frontend expertise.
- Collaborate across teams to align ML solutions with business goals and document processes for reproducibility and excellence.
- Support data engineering by integrating pipelines, optimizing storage, and preprocessing datasets for ML workloads.
Requirements :
- Educational and Technical Foundation Bachelor's degree in Computer Science or a related technical field or equivalent practical experience.
- You should have solid experience in maintaining and scaling machine learning pipelines using TensorFlow and Kubeflow.
- Knowledge of ML Engineering, including model architecture design and hyperparameter tuning.
- Advanced MLOps Proficiency At least 3 years of experience in ML Engineering with expertise in deploying models and managing ML workflows and familiarity with MLflow TFX or Airflow.
- Strategic Problem Solver with Collaborative Spirit Excel at solving complex problems at scale and have a proven ability to work effectively within collaborative fast-paced cross-functional teams.
- You're adept at communicating technical concepts across various stakeholder groups ensuring alignment and understanding.
- Innovative Tech Enthusiast with Cloud Expertise Your proactive approach drives you to continually seek improvements in ML development and deployment processes.
- You have a strong knowledge of cloud platforms particularly AWS and experience with containerization tools like Docker and Kubernetes. Hands-on experience with AWS Cloud (EC2 S3 RDS, Lambda, SageMaker, etc.).
- Strong programming skills in Python, with experience in frameworks like TensorFlow, PyTorch, scikit-learn, LLM / Lang chain / Agent experience is a plus.
- Understanding of embeddings/vector databases and feature stores.
- Knowledge of monitoring tools like Prometheus, Grafana, or CloudWatch.
- Experience with other cloud platforms (GCP, Azure) is a plus.
- Understanding of model explainability, fairness, and ethical considerations in AI/ML.
- Experience with version control tools like Git and ML model versioning tools like MLflow or DVC.
Functional Areas: Other
Read full job descriptionPrepare for Engineer roles with real interview advice