Home
Communities
Companies
- Companies
  
  Discover best places to work
- Compare Companies
  
  Compare & find best workplace
- Add Office Photos
  
  Bring your workplace to life
- Add Company Benefits
  
  Highlight your company's perks
Reviews
- Company reviews
  
  Read reviews for 6L+ companies
- Write a review
  
  Rate your former or current company
Salaries
- Browse salaries
  
  Discover salaries for 6L+ companies
- Salary calculator
  
  Calculate your take home salary
- Are you paid fairly?
  
  Check your market value
- Share your salary
  
  Help other jobseekers
- Gratuity calculator
  
  Check your gratuity amount
- HRA calculator
  
  Check how much of your HRA is tax-free
- Salary hike calculator
  
  Check your salary hike
Interviews
- Company interviews
  
  Read interviews for 40K+ companies
- Campus placements
  
  Interviews questions for 2K+ colleges
- Share interview questions
  
  Contribute your interview questions
Jobs
Awards

WINNERS AWAITED!
- ABECA 2025
  
  WINNERS AWAITED!
  
  AmbitionBox Employee Choice Awards - 4th Edition
- ABECA 2024
  
  AmbitionBox Employee Choice Awards - 3rd Edition
- AmbitionBox Best Places to Work 2022
  
  2nd Edition
- AmbitionBox Best Places to Work 2021
  
  1st Edition

Add office photos

Employer? Claim Account for FREE

Recro

Compare

4.2

based on 36 Reviews

74 Recro Jobs

Recro.io - Site Reliability Engineer - CI/CD Pipeline (4-6 yrs)

Recro

4.2

based on 36 Reviews

4-6 years

Recro

posted 2mon ago

Job Role Insights

Flexible timing

Key skills for the job

Digital Marketing DevOps AWS Azure DevOps Site Reliability Engineering Terraform

+ 2 more

Job Description

Job Description:

We are looking for a talented Site Reliability Engineer (SRE) to join our team and help ensure the reliability, scalability, and performance of our applications and services.

As an SRE, you will play a key role in bridging the gap between development and operations, focusing on automation, infrastructure management, and maintaining system health.

You will be responsible for building and maintaining scalable infrastructure, implementing best practices, and monitoring systems to ensure high availability and performance.

Key Responsibilities :

- Design, develop, and maintain scalable, reliable, and secure infrastructure to support applications and services, ensuring that systems are efficient, fault-tolerant, and optimized for performance.

- Collaborate with development and operations teams to design solutions that meet both business and technical requirements for reliability and scalability.

- Implement Site Reliability Engineering (SRE) best practices to drive operational excellence.

- Focus on high availability, performance optimization, and capacity planning to ensure critical systems run efficiently and scale effectively under demand.

- Help set and track Service Level Objectives (SLOs), Service Level Indicators (SLIs), and Error Budgets to continuously improve system reliability.

- Collaborate with software engineering, operations, and security teams to improve system reliability, observability, and scalability.

- Work closely with development teams to ensure that systems are designed for maintainability, scalability, and easy troubleshooting.

- Contribute to continuous improvement efforts by providing feedback from a reliability and operations perspective.

- Automate routine operational tasks (e.g, deployment, monitoring, incident response) to reduce manual interventions and improve efficiency across the infrastructure.

- Use tools such as Terraform, Ansible, or similar to automate infrastructure provisioning, scaling, and configuration management.

- Monitor system performance using modern monitoring tools (e.g, Prometheus, Grafana, etc.), and implement effective alerting to identify and respond to issues proactively.

- Troubleshoot and resolve incidents to minimize downtime and ensure that services are restored quickly with minimal disruption.

- Participate in the on-call rotation, providing 24/7 support for critical systems when needed.

- Ensure infrastructure and systems comply with relevant security, reliability, and compliance standards.

- Apply security best practices to the infrastructure, ensuring that systems are protected against security threats and vulnerabilities.

- Regularly review and improve the security posture of systems and applications, implementing necessary patches, upgrades, and controls.

Requirements :

Experience :

- 4-6 years of experience in a Site Reliability Engineer (SRE) or similar role, with a proven track record of maintaining and optimizing large-scale systems.

- Strong experience with cloud platforms, particularly Google Cloud Platform (GCP), and other cloud environments like AWS or Azure.

Technical Skills :

- Expertise in DevOps practices such as CI/CD pipelines, Infrastructure as Code (IaC), and automation tools like Terraform, Ansible, or similar.

- Monitoring & Observability experience with tools like Prometheus, Grafana, ELK stack, or equivalent systems to track and visualize infrastructure performance, usage, and issues.

- Proficiency in programming and scripting languages like Python, Bash, or others, with experience in writing scripts for automating tasks, deployments, and workflows.

- Familiarity with Git for version control and experience working with collaborative workflows in a development environment