Home
Communities
Companies
- Companies
  
  Discover best places to work
- Compare Companies
  
  Compare & find best workplace
- Add Office Photos
  
  Bring your workplace to life
- Add Company Benefits
  
  Highlight your company's perks
Reviews
- Company reviews
  
  Read reviews for 6L+ companies
- Write a review
  
  Rate your former or current company
Salaries
- Browse salaries
  
  Discover salaries for 6L+ companies
- Salary calculator
  
  Calculate your take home salary
- Are you paid fairly?
  
  Check your market value
- Share your salary
  
  Help other jobseekers
- Gratuity calculator
  
  Check your gratuity amount
- HRA calculator
  
  Check how much of your HRA is tax-free
- Salary hike calculator
  
  Check your salary hike
Interviews
- Company interviews
  
  Read interviews for 40K+ companies
- Campus placements
  
  Interviews questions for 2K+ colleges
- Share interview questions
  
  Contribute your interview questions
Jobs
Awards

WINNERS AWAITED!
- ABECA 2025
  
  WINNERS AWAITED!
  
  AmbitionBox Employee Choice Awards - 4th Edition
- ABECA 2024
  
  AmbitionBox Employee Choice Awards - 3rd Edition
- AmbitionBox Best Places to Work 2022
  
  2nd Edition
- AmbitionBox Best Places to Work 2021
  
  1st Edition

Add office photos

Employer? Claim Account for FREE

G-Tech

Compare

3.3

based on 11 Reviews

12 G-Tech Jobs

Site Reliability Engineer - Cloud Platform (5-7 yrs)

GTECH

3.3

based on 11 Reviews

5-7 years

G-Tech

posted 6d ago

Job Role Insights

Flexible timing

Key skills for the job

DevOps AWS Cloud Computing Cloud Services Kubernetes Azure DevOps

+ 3 more

Job Description

Job Title : Site Reliability Engineer

We're Hiring!

Responsibilities :

- Design, implement, and maintain highly available and reliable infrastructure and services.

- Define and monitor Service Level Objectives (SLOs) and Service Level Indicators (SLIs).

- Implement and manage incident response and on-call processes.

- Conduct post-incident reviews and implement corrective actions.

- Monitor and analyze system performance metrics.

- Identify and resolve performance bottlenecks and scalability issues.

- Implement performance tuning strategies and optimizations.

- Conduct capacity planning and forecasting.

- Automate infrastructure provisioning, configuration, and deployment using Infrastructure as Code (IaC) tools (Terraform, CloudFormation, Ansible).

- Develop and maintain automation scripts and tools for system administration and monitoring.

- Implement and manage CI/CD pipelines for automated deployments.

- Utilize and enhance monitoring and logging tools (Prometheus, Grafana, ELK stack, Datadog).

- Lead incident response efforts and coordinate with cross-functional teams.

- Develop and maintain incident response plans and procedures.

- Analyze incident data and identify patterns and trends.

- Implement proactive measures to prevent future incidents.

- Monitor resource utilization and forecast future capacity needs.

- Implement auto-scaling and load balancing strategies.

- Ensure efficient resource allocation and utilization.

- Implement and maintain security best practices and policies.

- Conduct security audits and vulnerability assessments.

- Ensure compliance with industry standards and regulations.

- Collaborate with development, operations, and product teams to ensure smooth service delivery.

- Communicate effectively with team members and stakeholders.

- Participate in design and code reviews.

- Provide technical guidance and mentorship to junior team members.

- Create and maintain detailed documentation of infrastructure, processes, and procedures.

- Share knowledge and best practices with team members.

- Conduct training sessions and workshops.

- Contribute to the development of internal tools and libraries.

- Stay up-to-date with the latest SRE practices and technologies.

- Research and evaluate new tools and methodologies.

- Identify and implement process improvements.

- Participate in industry events and conferences.

Technical Skills & Qualifications :

- 5+ years of experience as a Site Reliability Engineer or similar role.

- Strong understanding of distributed systems and cloud technologies (AWS, Azure, GCP).

- Proficiency in Infrastructure as Code (IaC) tools (Terraform, CloudFormation, Ansible).

- Experience with containerization and orchestration technologies (Docker, Kubernetes).

- Proficiency in scripting languages (Python, Bash, Go).

- Experience with monitoring and logging tools (Prometheus, Grafana, ELK stack, Datadog).

- Experience with CI/CD pipelines and tools (Jenkins, GitLab CI, Azure DevOps).

- Strong understanding of networking concepts and protocols.

- Excellent problem-solving and debugging skills.

- Strong communication and interpersonal skills.

- Ability to work independently and as part of a team.

- Bachelor's degree in Computer Science, Software Engineering, or a related field.

Preferred Qualifications :

- Experience with service mesh technologies (Istio, Linkerd).

- Experience with serverless computing (AWS Lambda, Azure Functions, Google Cloud Functions).

- Experience with database administration and performance tuning.

- Experience with security tools and practices.

- Experience with incident management tools (PagerDuty, Opsgenie).

- Experience with configuration management tools (Chef, Puppet).

Functional Areas: Software/Testing/Networking

Read full job description