5 TrueFan Jobs
TrueFan - DevOps Engineer - Cloud Infrastructure (2-4 yrs)
TrueFan
posted 1mon ago
Flexible timing
Key skills for the job
Key Responsibilities :
Model Training/Deployment Pipelines and Monitoring :
- Design, implement, and maintain scalable and automated pipelines for deploying deep neural network models.
- Monitor and manage Production models, ensuring high availability, low latency, and smooth performance.
- Automate workflows for data preprocessing (face alignment, feature extraction, audio analysis), model retraining, and video generation.
- Implement Logging, Tracking, and Monitoring Systems to ensure data integrity and visibility into the model lifecycle.
Infrastructure Management :
- Build and manage cloud-based infrastructure (AWS, GCP, or Azure) for efficient model training, deployment, and data storage.
- Collaborate with DevOps to manage containerization (Docker, Kubernetes) and ensure robust CI/CD pipelines using github and jenkins for model delivery.
- Monitor resource for GPU/ CPU-intensive tasks like video processing, model inference, and training using Prometheus , Grafana, alert manager, ELK stack.
Collaboration :
- Work closely with ML engineers to integrate models into production pipelines.
- Provide tools and frameworks for rapid experimentation and model versioning.
Required Skills :
- Basic Python
- Strong experience with cloud platforms (AWS, GCP, Azure) and cloud-based machine learning services.
- Expert knowledge of containerization technologies (Docker, Kubernetes) and infrastructure-as-code
(Terraform, CloudFormation)
- Have understanding of Deployment of both synchronous and asynchronous API using Flask, Django, Celery, Redis, RabbitMQ , Kafka
- Deployed and Scaled AI/ML in Production.
- Familiarity with deep learning frameworks (TensorFlow, PyTorch).
- Familiarity with video processing tools like FFMPEG and Dlib for handling dynamic frame data.
- Basic understanding of ML models
Preferred Qualifications :
- Experience in image and video-based deep learning tasks.
- Familiarity with media streaming and video processing pipelines for real-time generation.
- Experience with real-time inference and deploying models in latency-sensitive environments.
- Strong problem-solving skills with a focus on optimising machine learning model infrastructure for scalability and performance.
Functional Areas: Software/Testing/Networking
Read full job description2-3 Yrs
Gurgaon / Gurugram
2-3 Yrs
Gurgaon / Gurugram