i
Coders Brain
359 Coders Brain Jobs
5-10 years
DevOps/Site Reliability Engineer - IAC Terraform (5-10 yrs)
Coders Brain
posted 12hr ago
Flexible timing
Key skills for the job
Job Description or Key Skills :
Position : DevOps / Site Reliability Engineer (SRE)
Location : Bangalore
Experience : 5-10 years
Required Skills & Expertise :
1. Cloud Platform :
- AWS (Amazon Web Services) : Experience with services like EC2, S3, RDS, Lambda, VPC, etc.
2. Infrastructure as Code (IaC) :
- Terraform : Expertise in provisioning, managing infrastructure using Terraform scripts.
- Continuous Integration and Continuous Deployment (CI/CD) :
- GitLab CI/CD : Proficient in setting up, managing pipelines, and automating deployments.
3. Container Orchestration :
- Kubernetes : Experience in managing Kubernetes clusters, configuring services, ingress controllers, and setting up autoscaling.
- Pod Sizing : Optimizing pod resources (CPU, memory), ensuring scalability and cost-efficiency.
- Horizontal Pod Autoscaling (HPA) : Setting up and managing HPA based on metrics like CPU or custom application metrics.
4. Configuration Management & Automation :
- Helm : Experience in deploying applications using Helm charts, including Helm templating for parameterizing Kubernetes resources.
5. Monitoring & Observability :
- Datadog : Hands-on experience in setting up dashboards, monitoring infrastructure, and creating alerts for proactive issue resolution.
SLOs (Service Level Objectives) & SLIs (Service Level Indicators) :
- Experience in defining and tracking SLOs, SLIs to ensure the reliability and performance of services.
6. Incident Management & Runbooks :
- Strong ability to create and maintain runbooks for incident response.
- Troubleshooting, diagnosing, and resolving production incidents with quick resolutions and minimal downtime.
Desired Soft Skills :
- Strong communication and collaboration skills, working in cross-functional teams.
- Ability to proactively monitor and improve system reliability and performance.
- Hands-on experience with automation, and optimizing infrastructure and deployment processes.
- Experience with Agile/Scrum methodologies is a plus.
Potential Responsibilities :
1. Infrastructure Automation : Build and manage cloud infrastructure using AWS and Terraform, enabling automated and scalable solutions.
2. CI/CD Pipeline Management : Maintain and optimize GitLab CI/CD pipelines for smooth and efficient software delivery.
3. Kubernetes Management : Handle day-to-day operations of Kubernetes clusters, including scaling, updates, and troubleshooting.
4. Monitoring & Alerting : Implement and manage monitoring solutions using Datadog, ensuring the health and availability of services.
5. Reliability Engineering : Work on SLOs, SLIs, and incident response to improve system availability and reliability.
6. Runbook Creation : Develop, maintain, and optimize runbooks to streamline incident management and troubleshooting.
Functional Areas: Software/Testing/Networking
Read full job descriptionPrepare for Site Reliability Engineer roles with real interview advice