Home
Communities
Companies
- Companies
  
  Discover best places to work
- Compare Companies
  
  Compare & find best workplace
- Add Office Photos
  
  Bring your workplace to life
- Add Company Benefits
  
  Highlight your company's perks
Reviews
- Company reviews
  
  Read reviews for 6L+ companies
- Write a review
  
  Rate your former or current company
Salaries
- Browse salaries
  
  Discover salaries for 6L+ companies
- Salary calculator
  
  Calculate your take home salary
- Are you paid fairly?
  
  Check your market value
- Share your salary
  
  Help other jobseekers
- Gratuity calculator
  
  Check your gratuity amount
- HRA calculator
  
  Check how much of your HRA is tax-free
- Salary hike calculator
  
  Check your salary hike
Interviews
- Company interviews
  
  Read interviews for 40K+ companies
- Campus placements
  
  Interviews questions for 2K+ colleges
- Share interview questions
  
  Contribute your interview questions
Jobs
Awards

WINNERS AWAITED!
- ABECA 2025
  
  WINNERS AWAITED!
  
  AmbitionBox Employee Choice Awards - 4th Edition
- ABECA 2024
  
  AmbitionBox Employee Choice Awards - 3rd Edition
- AmbitionBox Best Places to Work 2022
  
  2nd Edition
- AmbitionBox Best Places to Work 2021
  
  1st Edition

Add office photos

Engaged Employer

Factspan

Compare

3.7

based on 116 Reviews

4 Factspan Jobs

Site Reliability Engineer Manager

Factspan Analytics

3.7

based on 116 Reviews

7-12 years

Bangalore / Bengaluru

1 vacancy

Site Reliability Engineer Manager

Factspan

posted 1hr ago

Job Role Insights

Flexible timing

Key skills for the job

Project Management Kubernetes Incident Management Site Reliability Engineering Docker Leadership

+ 6 more

Job Description

Position: Site Reliability Engineering Manager
Bengaluru, Karnataka

Role Overview
We are looking for an experienced Site Reliability Engineering (SRE) Manager to lead a team of highly skilled SREs in managing, automating, and optimizing our cloud infrastructure on Google Cloud Platform (GCP). The SRE Manager will be responsible for ensuring the reliability, availability, and performance of critical services while driving automation and operational excellence having 8+ years of experience.

As an SRE Manager, you will work closely with development, infrastructure, and security teams to implement scalable, resilient, and high-performance solutions. This role is ideal for someone passionate about reliability engineering, cloud automation, and observability.

Key Responsibilities

Leadership & Team Management

Lead, mentor, and grow a team of Site Reliability Engineers, fostering a culture of innovation, collaboration, and continuous learning.
Define and drive SRE best practices, focusing on reliability, automation, monitoring, and incident response.
Collaborate with development, DevOps, and security teams to align infrastructure and application reliability with business objectives.
Own SRE roadmap and strategy, ensuring alignment with organizational goals and industry best practices.

Reliability & Performance

Ensure the uptime, availability, and performance of critical applications hosted on GCP.
Implement SLOs (Service Level Objectives), SLIs (Service Level Indicators), and SLAs (Service Level Agreements) to measure system reliability.
Conduct root cause analysis (RCA) for production incidents and drive post-mortems to improve system resilience.

Automation & CI/CD

Automate infrastructure management using Infrastructure-as-Code (IaC) tools such as Terraform or Pulumi.
Improve CI/CD pipelines using GitOps methodologies to enable faster and reliable deployments.
Champion self-healing architectures to minimize manual intervention.

Observability & Incident Management

Implement and enhance monitoring, logging, and alerting using tools like Prometheus, Grafana, Stackdriver (Cloud Monitoring), and Open Telemetry.
Develop on-call rotations, runbooks, and incident management processes to minimize downtime and improve MTTR (Mean Time to Resolution).
Use AI/ML-based anomaly detection for proactive monitoring.

Security & Compliance

Ensure security best practices for IAM, networking, and data encryption within GCP.
Conduct security audits and work with compliance teams to ensure adherence to SOC2, ISO 27001, HIPAA, or other regulatory frameworks.
Implement zero-trust security models and automated compliance policies.

Cost Optimization & Capacity Planning

Optimize cloud costs using GCP cost management tools, rightsizing, and auto-scaling.
Implement capacity planning strategies to balance cost and performance.
Work with finance teams to forecast infrastructure costs and optimize spend.

Required Skills & Qualifications:

Technical Skills

Strong expertise in Google Cloud Platform (GCP) services such as GKE, Cloud Run, Cloud Functions, Cloud SQL
BigQuery, and Cloud Spanner.
Hands-on experience with Terraform, Pulumi, or Cloud Deployment Manager for Infrastructure-as-Code (IaC).
Experience with CI/CD tools like GitHub Actions, ArgoCD, Spinnaker, or Jenkins.
Strong knowledge of Kubernetes (GKE) and container orchestration.
Experience with SRE principles such as error budgets, chaos engineering, and observability.
Strong scripting and automation skills in Python.
Experience with monitoring and observability tools (Stackdriver, Datadog, Prometheus, Grafana, New Relic).

Leadership & Soft Skills

Proven experience managing and mentoring SRE teams.
Strong problem-solving skills with the ability to troubleshoot complex production issues.
Ability to work in a fast-paced, DevOps-oriented environment.
Strong communication and stakeholder management skills.
Experience collaborating with cross-functional teams, including engineering, security, and product teams

Preferred Qualifications

GCP Professional Cloud Architect or GCP Professional DevOps Engineer certification.
Experience with multi-cloud or hybrid cloud environments.
Hands-on experience with serverless computing and event-driven architectures.
Prior experience in high-traffic, distributed systems.

If you are passionate about leveraging technology to drive business innovation, possess excellent problem-solving skills, and thrive in a dynamic environment, we encourage you to apply for this exciting opportunity.

Employment Type: Full Time, Permanent

Read full job description

Prepare for Site Reliability Engineer roles with real interview advice