Home
Communities
Companies
- Companies
  
  Discover best places to work
- Compare Companies
  
  Compare & find best workplace
- Add Office Photos
  
  Bring your workplace to life
- Add Company Benefits
  
  Highlight your company's perks
Reviews
- Company reviews
  
  Read reviews for 6L+ companies
- Write a review
  
  Rate your former or current company
Salaries
- Browse salaries
  
  Discover salaries for 6L+ companies
- Salary calculator
  
  Calculate your take home salary
- Are you paid fairly?
  
  Check your market value
- Share your salary
  
  Help other jobseekers
Interviews
- Company interviews
  
  Read interviews for 40K+ companies
- Campus placements
  
  Interviews questions for 1K+ colleges
- Share interview questions
  
  Contribute your interview questions
Jobs
Awards

RATE NOW!
- ABECA 2025
  
  RATE NOW!
  
  AmbitionBox Employee Choice Awards - 4th Edition
- ABECA 2024
  
  AmbitionBox Employee Choice Awards - 3rd Edition
- AmbitionBox Best Places to Work 2022
  
  2nd Edition
- AmbitionBox Best Places to Work 2021
  
  1st Edition

Add office photos

Engaged Employer

Coders Brain

Compare

3.3

based on 39 Reviews

359 Coders Brain Jobs

DevOps/Site Reliability Engineer - IAC Terraform (5-10 yrs)

Coders Brain Technology Private Limited

3.3

based on 39 Reviews

5-10 years

Coders Brain

posted 12hr ago

Job Role Insights

Flexible timing

Key skills for the job

Software Configuration Management DevOps AWS Cloud Computing Kubernetes Incident Management

+ 3 more

Job Description

Job Description or Key Skills :

Position : DevOps / Site Reliability Engineer (SRE)

Location : Bangalore

Experience : 5-10 years

Required Skills & Expertise :

1. Cloud Platform :

- AWS (Amazon Web Services) : Experience with services like EC2, S3, RDS, Lambda, VPC, etc.

2. Infrastructure as Code (IaC) :

- Terraform : Expertise in provisioning, managing infrastructure using Terraform scripts.

- Continuous Integration and Continuous Deployment (CI/CD) :

- GitLab CI/CD : Proficient in setting up, managing pipelines, and automating deployments.

3. Container Orchestration :

- Kubernetes : Experience in managing Kubernetes clusters, configuring services, ingress controllers, and setting up autoscaling.

- Pod Sizing : Optimizing pod resources (CPU, memory), ensuring scalability and cost-efficiency.

- Horizontal Pod Autoscaling (HPA) : Setting up and managing HPA based on metrics like CPU or custom application metrics.

4. Configuration Management & Automation :

- Helm : Experience in deploying applications using Helm charts, including Helm templating for parameterizing Kubernetes resources.

5. Monitoring & Observability :

- Datadog : Hands-on experience in setting up dashboards, monitoring infrastructure, and creating alerts for proactive issue resolution.

SLOs (Service Level Objectives) & SLIs (Service Level Indicators) :

- Experience in defining and tracking SLOs, SLIs to ensure the reliability and performance of services.

6. Incident Management & Runbooks :

- Strong ability to create and maintain runbooks for incident response.

- Troubleshooting, diagnosing, and resolving production incidents with quick resolutions and minimal downtime.

Desired Soft Skills :

- Strong communication and collaboration skills, working in cross-functional teams.

- Ability to proactively monitor and improve system reliability and performance.

- Hands-on experience with automation, and optimizing infrastructure and deployment processes.

- Experience with Agile/Scrum methodologies is a plus.

Potential Responsibilities :

1. Infrastructure Automation : Build and manage cloud infrastructure using AWS and Terraform, enabling automated and scalable solutions.

2. CI/CD Pipeline Management : Maintain and optimize GitLab CI/CD pipelines for smooth and efficient software delivery.

3. Kubernetes Management : Handle day-to-day operations of Kubernetes clusters, including scaling, updates, and troubleshooting.

4. Monitoring & Alerting : Implement and manage monitoring solutions using Datadog, ensuring the health and availability of services.

5. Reliability Engineering : Work on SLOs, SLIs, and incident response to improve system availability and reliability.

6. Runbook Creation : Develop, maintain, and optimize runbooks to streamline incident management and troubleshooting.