Upload Button Icon Add office photos
filter salaries All Filters

48 IT Firm Jobs

Site Reliability Engineer - Docker/Kubernetes (5-8 yrs)

5-8 years

Site Reliability Engineer - Docker/Kubernetes (5-8 yrs)

IT Firm

posted 1mon ago

Job Description

We are looking for an experienced SRE Engineer to manage production systems and optimize system reliability, scalability, and performance.

Key Responsibilities :

- Provide production support and troubleshoot real-time issues.

- Develop and maintain CI/CD pipelines using Jenkins and Git/Bitbucket.

- Manage deployments with Docker and Kubernetes.

- Set up observability tools (Grafana, Prometheus, Instana).

- Automate infrastructure using Terraform and follow SRE practices.

Required Skills :

- Production Support, Docker, Kubernetes

- CI/CD (Jenkins, Git/Bitbucket)

- Observability (Grafana, Prometheus)

- Terraform, TypeScript, Python

- SRE principles

Responsibilities :

System Reliability & Availability : Ensure that the services are highly available, reliable, and scalable in both production and non-production environments.

Incident Management: Lead the investigation and resolution of incidents, identify the root causes, and ensure the recovery of services. You will also contribute to postmortems and implement preventative measures.

Monitoring & Observability :

- Build and maintain monitoring and alerting systems. Implement metrics, logs, and tracing to ensure transparency into system health and performance.

Automation :

Develop and maintain automation tools and systems to reduce manual intervention and improve operational efficiency.

Capacity Planning : Work with the team to forecast capacity needs and implement scaling solutions to ensure our systems are always prepared for increased load.

Performance Optimization : Identify and eliminate bottlenecks and optimize the performance of critical systems.

Collaboration : Work closely with development, QA, and operations teams to ensure smooth deployment and transition of code to production environments.

Security & Compliance : Ensure that security best practices are followed across our infrastructure. Assist with vulnerability management and compliance tasks.

Disaster Recovery: Design and implement disaster recovery and backup strategies to ensure business continuity.

Required Qualifications :

- Bachelor's degree in Computer Science, Engineering, or a related field, or equivalent experience.

- 3+ years of experience in Site Reliability Engineering, DevOps, or similar roles.

- Strong experience with cloud platforms (AWS, GCP, Azure).

- Proficiency in infrastructure automation tools (e.g., Terraform, Ansible, Puppet, Chef).

- Expertise in containerization and orchestration tools (Docker, Kubernetes).

- Experience with CI/CD tools and pipelines (e.g., Jenkins, GitLab, CircleCI).

- Knowledge of monitoring and observability tools (e.g., Prometheus, Grafana, ELK stack, Datadog, New Relic).

- Solid understanding of Linux/Unix systems and networking fundamentals.

- Programming/scripting skills in at least one language (e.g., Python, Go, Bash, Ruby, or Java).

- Strong troubleshooting skills and the ability to debug complex, distributed systems.

- Excellent communication and collaboration skills.

Preferred Qualifications :

- Experience with infrastructure as code (IaC) and configuration management tools.

- Familiarity with microservices architecture.

- Experience with performance tuning and optimization in a large-scale production environment.

- Knowledge of security practices and tools related to cloud infrastructure.

- Understanding of Service-Level Objectives (SLOs) and Service-Level Indicators (SLIs).

What We Offer :


Functional Areas: Software/Testing/Networking

Read full job description

Compare IT Firm with

TCS

3.7
Compare

Accenture

3.8
Compare

Wipro

3.7
Compare

Cognizant

3.7
Compare

Capgemini

3.7
Compare

HDFC Bank

3.9
Compare

Infosys

3.6
Compare

ICICI Bank

4.0
Compare

HCLTech

3.5
Compare

Tech Mahindra

3.5
Compare

Genpact

3.8
Compare

Teleperformance

3.9
Compare

Concentrix Corporation

3.7
Compare

Axis Bank

3.7
Compare

Amazon

4.0
Compare

Jio

4.0
Compare

iEnergizer

4.6
Compare

Reliance Retail

3.9
Compare

IBM

4.0
Compare

LTIMindtree

3.7
Compare

Similar Jobs for you

Site Reliability Engineer at Apple INC

4-6 Yrs

Not Disclosed

Site Reliability Engineer at Patch Infotech Private Limited

4-9 Yrs

₹ 15-25 LPA

Site Reliability Engineer at Whitefield Careers

7-10 Yrs

₹ 20-25 LPA

Site Reliability Engineer at Zensar Technologies

6-8 Yrs

₹ 18-24 LPA

Site Reliability Engineer at CA-One Tech Cloud

3-8 Yrs

₹ 10-24 LPA

Site Reliability Engineer at Xebia IT Architects India Pvt Ltd

5-10 Yrs

₹ 22-35 LPA

Devops Engineer at ITC Infotech India Ltd

5-10 Yrs

₹ 15-30 LPA

Devops Engineer at Risk Resources

3-8 Yrs

₹ 10-24 LPA

Senior Site Reliability Engineer at CIRRUSLABS PRIVATE LIMITED

5-12 Yrs

₹ 20-32 LPA

Devops Engineer at Harmony Data Integration Technologies Pvt. Ltd.

5-8 Yrs

₹ 7-28 LPA

Salesforce Lead - Apex/Visual Force (5-7 yrs)

5-7 Yrs

1mon ago·via hirist.com

Full Stack Developer - Node.js/React.js (9-13 yrs)

9-13 Yrs

2mon ago·via hirist.com

Software Engineer - Python/React.js (5-8 yrs)

5-8 Yrs

2mon ago·via hirist.com

Senior Data Scientist - Python/PySpark (6-10 yrs)

6-10 Yrs

2mon ago·via hirist.com

Senior Splunk Developer - SIEM (5-10 yrs)

5-10 Yrs

2mon ago·via hirist.com

Azure Integration Lead - CI/CD Pipeline (5-11 yrs)

5-11 Yrs

2mon ago·via hirist.com

ETL Lead (10-15 yrs)

10-15 Yrs

2mon ago·via hirist.com

Storage Developer Architect - NAS/SAN (12-15 yrs)

12-15 Yrs

2mon ago·via hirist.com

Recently Viewed

write
Share an Interview
Rate your experience using AmbitionBox
Terrible
Terrible
Poor
Poor
Average
Average
Good
Good
Excellent
Excellent