42 Bluebyte Technologies Jobs
Senior Machine Learning Engineer - PyTorch/Tensorflow (10-15 yrs)
Bluebyte Technologies
posted 17d ago
Key skills for the job
Key Responsibilities :
Machine Learning Model Development :
- Lead the design, development, and implementation of machine learning models across various domains (such as computer vision, NLP, time-series forecasting, etc.).
- Build end-to-end ML pipelines for model training, evaluation, and deployment in production environments.
- Conduct performance analysis, optimization, and tuning of machine learning models.
- Collaborate with data scientists and domain experts to translate business requirements into data-driven solutions.
MLOps Implementation :
- Drive the implementation and optimization of MLOps pipelines for continuous integration, delivery, and monitoring of ML models in production.
- Automate workflows, build reproducible experiments, and enhance model versioning using tools like MLflow, Kubeflow, or TFX.
- Deploy ML models to production environments, ensuring they are scalable, reliable, and performant.
- Collaborate with the data engineering team to ensure seamless data flows from storage to model ingestion.
DevOps and Infrastructure Management :
- Collaborate with the DevOps team to design and implement scalable, cloud-native architectures for ML workloads using platforms like AWS, GCP, or Azure.
- Automate the setup, deployment, and management of ML environments in the cloud, leveraging tools like Docker, Kubernetes, Terraform, and CI/CD pipelines.
- Manage the infrastructure necessary for scaling machine learning workloads efficiently (e.g., GPU provisioning, distributed computing frameworks).
- Ensure smooth integration of ML models with the existing cloud infrastructure, improving model performance and resource usage.
Collaboration & Mentorship :
- Work closely with cross-functional teams, including data scientists, software engineers, product managers, and business stakeholders to deliver impactful machine learning solutions.
- Provide mentorship and guidance to junior engineers, fostering best practices in MLOps, DevOps, and machine learning engineering.
- Advocate for strong software engineering practices within the ML team to ensure quality, scalability, and maintainability of solutions.
Model Monitoring and Maintenance :
- Implement monitoring solutions to track the performance of deployed ML models and alert on anomalies or drift.
- Lead the effort to continuously evaluate, retrain, and update models as new data becomes available or business requirements evolve.
Qualifications :
Required :
- 10+ years of experience in software engineering, with at least 5 years dedicated to machine learning engineering and 5 years of MLOps/DevOps experience.
- Strong proficiency in Python and machine learning frameworks such as TensorFlow, PyTorch, Scikit-learn, or XGBoost.
- Deep experience with cloud platforms (AWS, GCP, or Azure) and containerization technologies like Docker and Kubernetes.
- Experience with building and maintaining CI/CD pipelines for ML systems, using tools such as Jenkins, GitLab CI, or CircleCI.
- Expertise in managing and optimizing ML models for production environments, including deployment and monitoring.
- Solid understanding of DevOps practices, including infrastructure-as-code, version control, and automated testing.
- Experience with distributed computing tools such as Apache Spark, Dask, or Ray.
- Familiarity with data storage solutions such as SQL, NoSQL, and object storage systems (e.g., S3, BigQuery).
Preferred :
- Experience with advanced MLOps tools like MLflow, Kubeflow, or TensorFlow Extended (TFX).
- Familiarity with model deployment frameworks such as Seldon or TFX Serving.
- Knowledge of model interpretability techniques and tools (e.g., SHAP, LIME).
- Experience in agile development methodologies.
- Advanced degree (Masters or PhD) in Computer Science, Data Science, Machine Learning, or a related field.
Skills and Competencies :
Technical :
- Strong understanding of ML lifecycle management, including data pipelines, model training, testing, validation, deployment, and monitoring.
- Knowledge of cloud infrastructure design, container orchestration, and microservices architecture.
- Proficient in source control systems (e.g., Git) and Agile development processes.
Problem Solving :
- Ability to design solutions for complex, real-world problems with a focus on automation, scalability, and performance.
- Strong troubleshooting skills in distributed systems and ML workflows.
Leadership & Communication :
- Proven experience in leading cross-functional teams and providing mentorship to junior engineers.
- Excellent communication skills, with the ability to convey technical concepts to non-technical stakeholders.
Functional Areas: Other
Read full job description