Home
Communities
Companies
- Companies
  
  Discover best places to work
- Compare Companies
  
  Compare & find best workplace
- Add Office Photos
  
  Bring your workplace to life
- Add Company Benefits
  
  Highlight your company's perks
Reviews
- Company reviews
  
  Read reviews for 6L+ companies
- Write a review
  
  Rate your former or current company
Salaries
- Browse salaries
  
  Discover salaries for 6L+ companies
- Salary calculator
  
  Calculate your take home salary
- Are you paid fairly?
  
  Check your market value
- Share your salary
  
  Help other jobseekers
- Gratuity calculator
  
  Check your gratuity amount
- HRA calculator
  
  Check how much of your HRA is tax-free
- Salary hike calculator
  
  Check your salary hike
Interviews
- Company interviews
  
  Read interviews for 40K+ companies
- Campus placements
  
  Interviews questions for 2K+ colleges
- Share interview questions
  
  Contribute your interview questions
Jobs
Awards

WINNERS AWAITED!
- ABECA 2025
  
  WINNERS AWAITED!
  
  AmbitionBox Employee Choice Awards - 4th Edition
- ABECA 2024
  
  AmbitionBox Employee Choice Awards - 3rd Edition
- AmbitionBox Best Places to Work 2022
  
  2nd Edition
- AmbitionBox Best Places to Work 2021
  
  1st Edition

For Employers

Add office photos

Engaged Employer

Madhees

Compare

4.1

based on 27 Reviews

Video summary

7 Madhees Jobs

Director - Site Reliability Engineering (10-12 yrs)

Madhees

4.1

based on 27 Reviews

10-12 years

Director - Site Reliability Engineering (10-12 yrs)

Madhees

posted 19d ago

Job Role Insights

Fixed timing

Key skills for the job

Cloud Computing Kubernetes Site Reliability Engineering Jenkins Release Management Docker

+ 3 more

Job Description

Director SRE for the Partner Company into Product Development. Looking for someone who worked on end to end SRE with Devops.

Job Description :

Qualifications :

- Bachelors or Masters degree in Computer Science, Data Engineering, AI/ML, or a related field.

- 10+ years of experience in software release management, with at least 3-5 years in SRE or DevOps environments, preferably in AI or data-driven applications.

- Proven experience building and managing both release management and SRE teams in complex, multi product environments.

- Strong knowledge of AI/ML operations (MLOps), data pipeline management, and cloud-based AI product deployments.

- Expertise in release management tools (Jenkins, GitLab, Git, Jira) and SRE tools such as Prometheus, Grafana, Datadog, or similar monitoring systems.

- Experience with cloud platforms (AWS, GCP, Azure), containerization (Kubernetes, Docker), and infrastructure automation tools (Terraform, Ansible).

- Excellent problem-solving, organizational, and leadership skills, with a strong track record of driving continuous improvement in both release and operational reliability processes.

Preferred Qualifications :

- Experience deploying and maintaining large-scale AI/ML models in production environments, including monitoring, retraining, and operationalization.

- Familiarity with ITIL, MLOps, or DevOps frameworks and best practices.

- Knowledge of cloud-based services and tools specifically designed for AI/ML (e.g., AWS SageMaker, TensorFlow, PyTorch).

- Demonstrated ability to manage incident response and root cause analysis in complex software ecosystems.

Responsibilities :

- Build, mentor, and lead a high-performing SRE and release management team.

- Foster a culture of ownership, collaboration, and continuous improvement.

- Define team goals, performance metrics, and career development plans.

- Develop and implement SRE best practices, including monitoring, alerting, capacity planning, and incident response.

- Ensure the reliability, availability, and performance of our production systems.

- Drive the adoption of automation and infrastructure-as-code principles.

- Establish and maintain service level objectives (SLOs) and service level agreements (SLAs).

- Oversee the end-to-end release management process, ensuring smooth and efficient deployments.

- Implement and maintain CI/CD pipelines using tools like Jenkins, GitLab, and Git.

- Promote DevOps principles and practices across the organization.

- Manage and optimize data pipelines and MLOps workflows.

- Manage and optimize cloud infrastructure on platforms like AWS, GCP, or Azure.

- Implement and manage containerization and orchestration using Kubernetes and Docker.

- Utilize infrastructure automation tools like Terraform and Ansible to ensure consistent and scalable deployments.

- Oversee the monitoring and management of large-scale AI/ML models in production.

- Lead incident response and root cause analysis efforts.

- Implement proactive monitoring and alerting systems using tools like Prometheus, Grafana, and Datadog.

- Develop and maintain incident response playbooks and procedures.

- Improve system resilience, and minimize downtime.

- Collaborate with development, product, and data science teams to ensure alignment on reliability and release goals.

- Communicate effectively with stakeholders at all levels of the organization.

- Document processes, procedures, and best practices.

Functional Areas: Software/Testing/Networking

Read full job description

Prepare for Site Reliability Engineer roles with real interview advice

What people at Madhees are saying

What Madhees employees are saying about work life

based on 27 employees

73%

57%

72%

100%

Strict timing

Alternate Saturday off

No travel

Day Shift

View more insights

Compare Madhees with

HRH Next Services

3.0

Compare

Data Entry

4.1

Compare

Magus Customer Dialog

3.6

Compare

Greet Technologies

2.9

Compare

Cogenthub

2.8

Compare

Mas Callnet

3.0

Compare

Om Innovation Call Services

3.7

Compare

Selectsys

3.6

Compare

Frontizo Business Services

3.2

Compare

Dr ITM

3.5

Compare

Teleminds Infotech

2.4

Compare

Back Office

4.1

Compare

Trayee Business Solutions

3.3

Compare

Gamma Process Hub

3.7

Compare

Kserve Bpo

3.6

Compare

TSR Darashaw

3.4

Compare

Bristol Healthcare Services

2.8

Compare

VOIZ

3.0

Compare

Xtrim Global Solutions

4.6

Compare

Okay Call Centre

3.5

Compare

Similar Jobs for you

Site Reliability Engineer at Centific Global Technologies

10-15 Yrs

₹ 30-45 LPA

Site Reliability Engineer at Patch Infotech Private Limited

4-9 Yrs

₹ 15-25 LPA

Site Reliability Engineer at Okta

8-10 Yrs

₹ 24-30 LPA

Site Reliability Engineer at Xebia IT Architects India Pvt Ltd

5-14 Yrs

₹ 25-45 LPA

Site Reliability Engineer at Shadow Placements

6-10 Yrs

₹ 15-32 LPA

Site Reliability Engineer at Whitefield Careers

7-10 Yrs

₹ 20-25 LPA

Site Reliability Engineer at Trantor Software

7-9 Yrs

₹ 25-30 LPA

Site Reliability Engineer at IT Firm

5-8 Yrs

₹ 28-45 LPA

Senior Site Reliability Engineer at ONE2N CONSULTING PRIVATE LIMITED

6-9 Yrs

₹ 34-42 LPA

Site Reliability Engineer at Centific Global Technologies

12-20 Yrs

₹ 30-50 LPA

7 Madhees Jobs

Director - Site Reliability Engineering (10-12 yrs)

Madhees

Job Role Insights

Job Description

What people at Madhees are saying

What Madhees employees are saying about work life

Madhees Benefits

Compare Madhees with

Similar Jobs for you

Director - Site Reliability Engineering (10-12 yrs)

Cloud Architect (10-12 yrs)

Engineering Manager - MERN Stack (13-19 yrs)

Madhees - Key Account Manager (5-10 yrs)

Groovy Developer (3-8 yrs)

Madhees - Principal Architect/Engineer (15-22 yrs)

Executive - Business Development - Recruitment Services (0-1 yrs)

Recently Viewed

Jobs from Similar Companies