Home
Communities
Companies
- Companies
  
  Discover best places to work
- Compare Companies
  
  Compare & find best workplace
- Add Office Photos
  
  Bring your workplace to life
- Add Company Benefits
  
  Highlight your company's perks
Reviews
- Company reviews
  
  Read reviews for 6L+ companies
- Write a review
  
  Rate your former or current company
Salaries
- Browse salaries
  
  Discover salaries for 6L+ companies
- Salary calculator
  
  Calculate your take home salary
- Are you paid fairly?
  
  Check your market value
- Share your salary
  
  Help other jobseekers
- Gratuity calculator
  
  Check your gratuity amount
- HRA calculator
  
  Check how much of your HRA is tax-free
- Salary hike calculator
  
  Check your salary hike
Interviews
- Company interviews
  
  Read interviews for 40K+ companies
- Campus placements
  
  Interviews questions for 2K+ colleges
- Share interview questions
  
  Contribute your interview questions
Jobs
Awards

WINNERS AWAITED!
- ABECA 2025
  
  WINNERS AWAITED!
  
  AmbitionBox Employee Choice Awards - 4th Edition
- ABECA 2024
  
  AmbitionBox Employee Choice Awards - 3rd Edition
- AmbitionBox Best Places to Work 2022
  
  2nd Edition
- AmbitionBox Best Places to Work 2021
  
  1st Edition

Add office photos

Engaged Employer

Agivant Technologies

Compare

4.5

based on 22 Reviews

17 Agivant Technologies Jobs

Site Reliability Engineer - Cloud Platforms (7-12 yrs)

Agivant Technologies

4.5

based on 22 Reviews

7-12 years

Agivant Technologies

posted 6d ago

Job Role Insights

Flexible timing

Key skills for the job

DevOps AWS Cloud Services Kubernetes Azure DevOps Site Reliability Engineering

+ 4 more

Job Description

Job Description :

We are looking for a highly skilled Site Reliability Engineer (SRE) with strong engineering and architectural expertise to design, implement, and manage large-scale, mission-critical infrastructure across multiple data centers and cloud providers.

As an SRE, you will be responsible for architecting and optimizing our global infrastructure, enabling development teams to roll out new features efficiently while maintaining high availability and reliability. You will be hands-on with automation, performance tuning, infrastructure scalability, and cloud-native technologies to ensure a seamless user experience for millions of customers.

Key Responsibilities :

1. Architect and implement highly scalable, fault-tolerant, and distributed systems across multi-cloud (OCI, AWS, GCP) and on-premise environments using modern DevOps and SRE principles.

2. Design and deploy next-generation cloud infrastructure with a strong focus on automation, self-healing systems, and performance optimization.

Develop and maintain infrastructure-as-code (IaC) using Terraform and configuration management tools such as Ansible and Puppet for automated provisioning and orchestration.

3. Build and optimize containerized environments using Kubernetes and Docker for seamless deployment and scaling.

4. Drive performance, scalability, and security improvements across our cloud and on-prem infrastructure, ensuring high availability and disaster recovery capabilities.

Monitor, troubleshoot, and resolve complex system issues by implementing advanced observability solutions, logging, and real-time monitoring frameworks.

5. Develop and enforce SRE best practices, including SLI/SLO definition, capacity planning, and incident management strategies.

6. Eliminate toil and automate repetitive tasks using scripting languages such as Python, Golang, or Shell scripting to improve operational efficiency.

7. Collaborate closely with engineering, architecture, and security teams to improve system resiliency, optimize application performance, and streamline CI/CD workflows.

Lead the transition of legacy systems to modern, cloud-native architectures, advocating for DevOps and infrastructure automation.

8. Participate in 24/7 on-call rotations, ensuring rapid response to critical incidents and driving post-mortem analysis for continuous improvement.

Requirements :

1. 7+ years of hands-on experience in a Site Reliability Engineering (SRE) role, with a strong focus on designing, implementing, and managing cloud-native infrastructure.

Proficient with any cloud platform (preferably OCI) -not just operational experience but actual design and implementation expertise.

2. Proven experience in building, deploying, and optimizing infrastructure-as-code (IaC) using Terraform.

3. Strong automation mindset with proficiency in Ansible, Puppet, or other configuration management tools.

4. Hands-on experience with container orchestration using Kubernetes, Docker, and microservices architecture.

5. Advanced scripting and automation skills in Python, Golang, or Shell scripting to eliminate manual operations.

6. Working knowledge of load balancing technologies (HAProxy, Nginx, F5, Varnish, dnsdist) and web servers (Apache, Nginx).

7. Strong understanding of networking, distributed systems, and observability tools (Prometheus, Grafana, ELK stack, Datadog).

8. Experience in designing and implementing highly available, scalable, and secure architectures across cloud and hybrid environments.

9. AWS and/or GCP certifications are a plus but not required.

10. This is not a support-focused role-we are looking for engineers who have built, deployed, and optimized complex distributed systems from the ground up.