Home
Communities
Companies
- Companies
  
  Discover best places to work
- Compare Companies
  
  Compare & find best workplace
- Add Office Photos
  
  Bring your workplace to life
- Add Company Benefits
  
  Highlight your company's perks
Reviews
- Company reviews
  
  Read reviews for 6L+ companies
- Write a review
  
  Rate your former or current company
Salaries
- Browse salaries
  
  Discover salaries for 6L+ companies
- Salary calculator
  
  Calculate your take home salary
- Are you paid fairly?
  
  Check your market value
- Share your salary
  
  Help other jobseekers
- Gratuity calculator
  
  Check your gratuity amount
- HRA calculator
  
  Check how much of your HRA is tax-free
- Salary hike calculator
  
  Check your salary hike
Interviews
- Company interviews
  
  Read interviews for 40K+ companies
- Campus placements
  
  Interviews questions for 2K+ colleges
- Share interview questions
  
  Contribute your interview questions
Jobs
Awards

WINNERS AWAITED!
- ABECA 2025
  
  WINNERS AWAITED!
  
  AmbitionBox Employee Choice Awards - 4th Edition
- ABECA 2024
  
  AmbitionBox Employee Choice Awards - 3rd Edition
- AmbitionBox Best Places to Work 2022
  
  2nd Edition
- AmbitionBox Best Places to Work 2021
  
  1st Edition

For Employers

Add office photos

Employer? Claim Account for FREE

Forbes Advisor

Compare

3.7

based on 23 Reviews

12 Forbes Advisor Jobs

Staff Engineer- SRE

Forbes Advisor

3.7

based on 23 Reviews

12-14 years

Chennai

1 vacancy

Staff Engineer- SRE

Forbes Advisor

posted 25d ago

Job Role Insights

Flexible timing

Key skills for the job

Python Cloud Computing Automation Testing Operations Windows System Administration Wellness

+ 4 more

Job Description

Responsibilities:

The Site Reliability Engineering (SRE) team is responsible for the reliability, scalability, stability and performance of systems and services.
They work with cross-functional teams to design, build and maintain systems and they troubleshoot issues when they arise. They bridge the gap between development and operations teams.
They work closely with business teams to define Service Level Objectives (SLO) and agreements (SLA) of critical systems. They also monitor and maintain the uptime of these systems in-line with the defined SLO s and SLA s.
They deploy and manage monitoring tools to gain insights on system health and performance.
They analyze performance, identify bottlenecks and implement solutions to improve a system s scalability and latency durations.
They develop scripts, implement tools and automation frameworks to reduce the manual intervention efforts of deployment, monitoring and scaling.
They work with development teams for design and development of observability practices like logging, metrics, tracing, etc. They aim to diagnose and troubleshoot issues proactively.
They create actionable alerts on monitoring systems to ensure rapid response for potential production incidents.
They forecast resource needs and provision adequately for current and future demand.
They design and execute chaos experiments to test system s failure resiliency.
They own, define and implement the Disaster Recovery (DR) processes for systems.
They also conduct planned and unplanned mock DR drills to test for response preparedness during production incidents.
They ensure that security best practices are followed and implemented during design and operations of systems.
They also own and maintain documentation of processes, playbooks, and systems.
They publish KPI reports and other system health updates on a regular basis to the business.

Requirements:

Must-have - Bachelors degree, preferably in CS or a related field, or equivalent experience
Must-have - 12+ years of overall IT experience
Must-have - 7+ year of proven work experience as a Senior Site Reliability Engineer or a similar position.
Must-have - 5+ years of AWS Cloud experience with AWS Certified DevOps Engineer or SysOps or Security etc.
Must-have - AWS experience - 3+ years experience with using a broadrange of AWS technologies (e.g. EC2, RDS, ELB, S3, VPC, CloudWatch & Monitoring Tools) to develop and maintain an Amazon AWS based cloud solution, with an emphasis on best practice cloud security.
Must-have - 2+ year of experience in CDN and/or Cache systems like Fastly, Akamai, CloudFront, etc.
Proven Understanding & strong experience with Cloud deployments ( AWS / Docker/ Kubernetes)
Knowledge on provisioning IAC Tools like Terraform, Chef, Ansible, Shell, groovy, python, etc.
Experience with monitoring systems such as CloudWatch, NewRelic, Datadog/Splunk, ELK stack.
Experience managing cloud network resources (AWS Preferred) such as CloudWatch,
VPC, URL proxies, private link, DNS, ACLs, firewalls, and C2S access points.
Platform or Application Engineering and Operational Knowledge in any of the CI/CD tooling like GitHub Actions, Jenkins, etc.
Experience in other tooling Technologies like JIRA, Bitbucket, Jenkins, Fortify, SonarQube, Nexus, Nexus IQ
Experience with configuration automation tools like Puppet/Ansible/Chef/Salt
Scripting Skills: Strong scripting (e.g. Bash & Python) and automation skills.
Operating Systems: Windows and Linux system administration.
Problem Solving: Ability to analyze and resolve complex infrastructure resource and application deployment issues
Strong attention to detail. Excellent verbal and written communication skills. Strong documentation skills.

Good To Have:

Experience with Terraform/Ansible/Chef/Puppet
Experience with GitHub Actions
Experience with CloudFront, Fastly
Oversees team members performing these functions
Anticipates problems and future technical needs and takes necessary steps to address issues.
Work primarily in server side technologies and comfortable with client side whenever required
Enthusiastically follow technology trends, software engineering best practices and technologies

Perks:

Day off on the 3rd Friday of every month (one long weekend each month)
Monthly Wellness Reimbursement Program to promote health well-being
Paid paternity and maternity leaves

Qualifications

Must-have - Bachelors degree, preferably in CS or a related field, or equivalent experience
Must-have - 12+ years of overall IT experience
Must-have - 5+

Employment Type: Full Time, Permanent

Read full job description

Prepare for Staff Engineer roles with real interview advice