Upload Button Icon Add office photos
Engaged Employer

i

This company page is being actively managed by Madhees Team. If you also belong to the team, you can get access from here

Madhees Verified Tick

Compare button icon Compare button icon Compare
filter salaries All Filters

7 Madhees Jobs

Director - Site Reliability Engineering (10-12 yrs)

10-12 years

Director - Site Reliability Engineering (10-12 yrs)

Madhees

posted 19d ago

Job Description

Director SRE for the Partner Company into Product Development. Looking for someone who worked on end to end SRE with Devops.



Job Description :


Qualifications :


- Bachelors or Masters degree in Computer Science, Data Engineering, AI/ML, or a related field.


- 10+ years of experience in software release management, with at least 3-5 years in SRE or DevOps environments, preferably in AI or data-driven applications.


- Proven experience building and managing both release management and SRE teams in complex, multi product environments.


- Strong knowledge of AI/ML operations (MLOps), data pipeline management, and cloud-based AI product deployments.


- Expertise in release management tools (Jenkins, GitLab, Git, Jira) and SRE tools such as Prometheus, Grafana, Datadog, or similar monitoring systems.


- Experience with cloud platforms (AWS, GCP, Azure), containerization (Kubernetes, Docker), and infrastructure automation tools (Terraform, Ansible).


- Excellent problem-solving, organizational, and leadership skills, with a strong track record of driving continuous improvement in both release and operational reliability processes.


Preferred Qualifications :


- Experience deploying and maintaining large-scale AI/ML models in production environments, including monitoring, retraining, and operationalization.


- Familiarity with ITIL, MLOps, or DevOps frameworks and best practices.


- Knowledge of cloud-based services and tools specifically designed for AI/ML (e.g., AWS SageMaker, TensorFlow, PyTorch).


- Demonstrated ability to manage incident response and root cause analysis in complex software ecosystems.


Responsibilities :


- Build, mentor, and lead a high-performing SRE and release management team.


- Foster a culture of ownership, collaboration, and continuous improvement.


- Define team goals, performance metrics, and career development plans.


- Develop and implement SRE best practices, including monitoring, alerting, capacity planning, and incident response.


- Ensure the reliability, availability, and performance of our production systems.


- Drive the adoption of automation and infrastructure-as-code principles.


- Establish and maintain service level objectives (SLOs) and service level agreements (SLAs).


- Oversee the end-to-end release management process, ensuring smooth and efficient deployments.


- Implement and maintain CI/CD pipelines using tools like Jenkins, GitLab, and Git.


- Promote DevOps principles and practices across the organization.


- Manage and optimize data pipelines and MLOps workflows.


- Manage and optimize cloud infrastructure on platforms like AWS, GCP, or Azure.


- Implement and manage containerization and orchestration using Kubernetes and Docker.


- Utilize infrastructure automation tools like Terraform and Ansible to ensure consistent and scalable deployments.


- Oversee the monitoring and management of large-scale AI/ML models in production.


- Lead incident response and root cause analysis efforts.


- Implement proactive monitoring and alerting systems using tools like Prometheus, Grafana, and Datadog.


- Develop and maintain incident response playbooks and procedures.


- Improve system resilience, and minimize downtime.


- Collaborate with development, product, and data science teams to ensure alignment on reliability and release goals.


- Communicate effectively with stakeholders at all levels of the organization.


- Document processes, procedures, and best practices.



Functional Areas: Software/Testing/Networking

Read full job description

Prepare for Site Reliability Engineer roles with real interview advice

What people at Madhees are saying

What Madhees employees are saying about work life

based on 27 employees
73%
57%
72%
100%
Strict timing
Alternate Saturday off
No travel
Day Shift
View more insights

Madhees Benefits

Soft Skill Training
Job Training
Free Transport
Gymnasium
Cafeteria
Work From Home +6 more
View more benefits

Compare Madhees with

HRH Next Services

3.0
Compare

Data Entry

4.1
Compare

Magus Customer Dialog

3.6
Compare

Greet Technologies

2.9
Compare

Cogenthub

2.8
Compare

Mas Callnet

3.0
Compare

Om Innovation Call Services

3.7
Compare

Selectsys

3.6
Compare

Frontizo Business Services

3.2
Compare

Dr ITM

3.5
Compare

Teleminds Infotech

2.4
Compare

Back Office

4.1
Compare

Trayee Business Solutions

3.3
Compare

Gamma Process Hub

3.7
Compare

Kserve Bpo

3.6
Compare

TSR Darashaw

3.4
Compare

Bristol Healthcare Services

2.8
Compare

VOIZ

3.0
Compare

Xtrim Global Solutions

4.6
Compare

Okay Call Centre

3.5
Compare

Similar Jobs for you

Site Reliability Engineer at Centific Global Technologies

10-15 Yrs

₹ 30-45 LPA

Site Reliability Engineer at Patch Infotech Private Limited

4-9 Yrs

₹ 15-25 LPA

Site Reliability Engineer at Okta

8-10 Yrs

₹ 24-30 LPA

Site Reliability Engineer at Xebia IT Architects India Pvt Ltd

5-14 Yrs

₹ 25-45 LPA

Site Reliability Engineer at Shadow Placements

6-10 Yrs

₹ 15-32 LPA

Site Reliability Engineer at Whitefield Careers

7-10 Yrs

₹ 20-25 LPA

Site Reliability Engineer at Trantor Software

7-9 Yrs

₹ 25-30 LPA

Site Reliability Engineer at IT Firm

5-8 Yrs

₹ 28-45 LPA

Senior Site Reliability Engineer at ONE2N CONSULTING PRIVATE LIMITED

6-9 Yrs

₹ 34-42 LPA

Site Reliability Engineer at Centific Global Technologies

12-20 Yrs

₹ 30-50 LPA

Director - Site Reliability Engineering (10-12 yrs)

10-12 Yrs

19d ago·via hirist.com

Cloud Architect (10-12 yrs)

10-12 Yrs

19d ago·via hirist.com

Engineering Manager - MERN Stack (13-19 yrs)

13-19 Yrs

19d ago·via hirist.com

Madhees - Key Account Manager (5-10 yrs)

5-10 Yrs

19d ago·via iimjobs.com

Groovy Developer (3-8 yrs)

3-8 Yrs

1mon ago·via hirist.com

Madhees - Principal Architect/Engineer (15-22 yrs)

15-22 Yrs

2mon ago·via iimjobs.com

Recently Viewed

write
Share an Interview
How was your last interview experience?
Rate your experience using AmbitionBox
Terrible
Terrible
Poor
Poor
Average
Average
Good
Good
Excellent
Excellent