Home
Communities
Companies
- Companies
  
  Discover best places to work
- Compare Companies
  
  Compare & find best workplace
- Add Office Photos
  
  Bring your workplace to life
- Add Company Benefits
  
  Highlight your company's perks
Reviews
- Company reviews
  
  Read reviews for 6L+ companies
- Write a review
  
  Rate your former or current company
Salaries
- Browse salaries
  
  Discover salaries for 6L+ companies
- Salary calculator
  
  Calculate your take home salary
- Are you paid fairly?
  
  Check your market value
- Share your salary
  
  Help other jobseekers
- Gratuity calculator
  
  Check your gratuity amount
- HRA calculator
  
  Check how much of your HRA is tax-free
- Salary hike calculator
  
  Check your salary hike
Interviews
- Company interviews
  
  Read interviews for 40K+ companies
- Campus placements
  
  Interviews questions for 2K+ colleges
- Share interview questions
  
  Contribute your interview questions
Jobs
Awards

WINNERS AWAITED!
- ABECA 2025
  
  WINNERS AWAITED!
  
  AmbitionBox Employee Choice Awards - 4th Edition
- ABECA 2024
  
  AmbitionBox Employee Choice Awards - 3rd Edition
- AmbitionBox Best Places to Work 2022
  
  2nd Edition
- AmbitionBox Best Places to Work 2021
  
  1st Edition

Add office photos

Employer? Claim Account for FREE

HARP Technologies & Services

Compare

4.3

based on 23 Reviews

47 HARP Technologies & Services Jobs

Site Reliability Engineer - Configuration Management Tools (8-15 yrs)

HARP Technologies and Services

4.3

based on 23 Reviews

8-15 years

HARP Technologies & Services

posted 19hr ago

Job Role Insights

Fixed timing

Key skills for the job

DevOps AWS New Relic Site Reliability Engineering Terraform Monitoring Tools

+ 2 more

Job Description

Experience : 8+ Years

Location : Mumbai,Chennai (Other cities Remote)

Notice period : Immediate to 30 days max

Responsibilities of Senior SRE :

- The Site Reliability Engineering (SRE) team is responsible for the reliability, scalability, stability and performance of systems and services.

- They work with cross-functional teams to design, build and maintain systems and they troubleshoot issues when they arise. They bridge the gap between development and operations teams.

- They work closely with business teams to define Service Level Objectives (SLO) and agreements (SLA) of critical systems. They also monitor and maintain the uptime of these systems in-line with the defined SLO's and SLA's.

- They deploy and manage monitoring tools to gain insights on system health and performance.

- They analyze performance, identify bottlenecks and implement solutions to improve a system's scalability and latency durations.

- They develop scripts, implement tools and automation frameworks to reduce the manual intervention efforts of deployment, monitoring and scaling.

- They work with development teams for design and development of observability practices like logging, metrics, tracing, etc. They aim to diagnose and troubleshoot issues proactively.

- They create actionable alerts on monitoring systems to ensure rapid response for potential production incidents.

- They forecast resource needs and provision adequately for current and future demand.

- They design and execute "chaos experiments" to test system's failure resiliency.

- They own, define and implement the Disaster Recovery (DR) processes for systems. They also conduct planned and unplanned mock DR drills to test for response preparedness during production incidents.

- They ensure that security best practices are followed and implemented during design and operations of systems.

- They also own and maintain documentation of processes, playbooks, and systems.

- They publish KPI reports and other system health updates on a regular basis to the business.

Requirements :

- Must-have - Bachelor's degree, preferably in CS or a related field, or equivalent experience

- Must-have - 12+ years of overall IT experience

- Must-have - 7+ years of proven work experience as a Senior Site Reliability Engineer or a similar position.

- Must-have - 5+ years of AWS Cloud experience with AWS Certified DevOps Engineer or SysOps or Security etc.

- Must-have - AWS experience - 3+ years' experience with using a broad range of AWS technologies (e.g. EC2, RDS, ELB, S3, VPC, CloudWatch & Monitoring Tools) to develop and maintain an Amazon AWS based cloud solution, with an emphasis on best practice cloud security.

- Must-have - 2+ years of experience in CDN and/or Cache systems like Fastly, Akamai, CloudFront, etc.

- Proven Understanding & strong experience with Cloud deployments ( AWS / Docker/ Kubernetes)

- Knowledge on provisioning IAC Tools like Terraform, Chef, Ansible, Shell, groovy, python, etc.

- Experience with monitoring systems such as CloudWatch, NewRelic, Datadog/Splunk, ELK stack.

- Experience managing cloud network resources (AWS Preferred) such as CloudWatch, VPC, URL proxies, private link, DNS, ACLs, firewalls, and C2S access points.

- Platform or Application Engineering and Operational Knowledge in any of the CI/CD tooling like GitHub Actions, Jenkins, etc.

- Experience in other tooling Technologies like JIRA, Bitbucket, Jenkins, Fortify, SonarQube, Nexus, Nexus IQ

- Experience with configuration automation tools like Puppet/Ansible/Chef/Salt

- Scripting Skills : Strong scripting (e.g. Bash & Python) and automation skills.

- Operating Systems : Windows and Linux system administration.

- Problem Solving : Ability to analyze and resolve complex infrastructure resource and application deployment issues

- Strong attention to detail. Excellent verbal and written communication skills. Strong documentation skills.

Good To Have :

- Experience with Terraform/Ansible/Chef/Puppet

- Experience with GitHub Actions

- Experience with CloudFront, Fastly