Home
Communities
Companies
- Companies
  
  Discover best places to work
- Compare Companies
  
  Compare & find best workplace
- Add Office Photos
  
  Bring your workplace to life
- Add Company Benefits
  
  Highlight your company's perks
Reviews
- Company reviews
  
  Read reviews for 6L+ companies
- Write a review
  
  Rate your former or current company
Salaries
- Browse salaries
  
  Discover salaries for 6L+ companies
- Salary calculator
  
  Calculate your take home salary
- Are you paid fairly?
  
  Check your market value
- Share your salary
  
  Help other jobseekers
- Gratuity calculator
  
  Check your gratuity amount
- HRA calculator
  
  Check how much of your HRA is tax-free
- Salary hike calculator
  
  Check your salary hike
Interviews
- Company interviews
  
  Read interviews for 40K+ companies
- Campus placements
  
  Interviews questions for 2K+ colleges
- Share interview questions
  
  Contribute your interview questions
Jobs
Awards

WINNERS AWAITED!
- ABECA 2025
  
  WINNERS AWAITED!
  
  AmbitionBox Employee Choice Awards - 4th Edition
- ABECA 2024
  
  AmbitionBox Employee Choice Awards - 3rd Edition
- AmbitionBox Best Places to Work 2022
  
  2nd Edition
- AmbitionBox Best Places to Work 2021
  
  1st Edition

Add office photos

Employer? Claim Account for FREE

Grey Orange

Compare

3.2

based on 328 Reviews

Video summary

2 Grey Orange Senior Site Reliability Engineer Jobs

Senior Site Reliability Engineer

GreyOrange

3.2

based on 328 Reviews

5-10 years

Gurgaon / Gurugram

1 vacancy

Senior Site Reliability Engineer

Grey Orange

posted 2mon ago

Job Role Insights

Flexible timing

Key skills for the job

Software Configuration Management Python Computer Networking Linux System Administration Incident Management CCTV Monitoring

+ 4 more

Job Description

Shaping the future of omnichannel Optimizing warehouses and stores through AI-driven software and robotics GreyOrange is a global leader in AI-driven robotic automation software and hardware, transforming distribution and fulfillment centers worldwide. Our solutions increase productivity, empower growth and scale, mitigate labor challenges, reduce risk and time to market, and create better experiences for customers and employees. Founded in 2012, GreyOrange is headquartered in Atlanta, Georgia, with offices and partners across the Americas, Europe and Asia. We are seeking a talented and motivated Senior Site Reliability Engineer (SRE) to join our organization.

The SRE team at GreyOrange is responsible for monitoring the stability and availability of mission-critical production systems, managing incidents for quicker resolution, and establishing BAU. The team also manages and maintains internal tools/infra which is consumed by other development teams.
The experienced SRE will play a crucial role in ensuring the reliability, scalability, capacity planning, and performance of our infrastructure and applications. The ideal candidate will have a strong background in software engineering, system administration, containerization, and cloud technologies.

Requirements
Should have 5 to 8 years of experience
Well-versed with scripting/programming languages (Python/Bash/PowerShell, etc.) to automate manual work, particularly within cloud environments
Well-versed with Observability tools (Grafana, Splunk, Dynatrace) for monitoring, alerting, and logging solutions to identify and address potential issues, especially in cloud infrastructure
Working experience with automation tools (Jenkins, GitLab, Ansible/Chef for configuration management) and processes to streamline deployment, monitoring, and management of systems and applications in the cloud
Hands-on experience with containerization and orchestration technologies such as Docker, Kubernetes, or similar, particularly in cloud-native environments
Well aware of SLI, SLO, SLA, and Error Budget concepts and their implementations; provide on-call support and participate in incident management & response activities as needed
Expert with troubleshooting production issues and bugs.
Good knowledge of Unix systems, networking, web technologies, and databases.
Incident Management experience coupled with effective communication skills for production workload.
Working knowledge in any one of the cloud platforms (AWS or GCP)

What youll do:
Lead reliability engineering projects and drive them to closure.
Ensure system stability and high availability by proactively monitoring performance and troubleshooting issues
Design, build and maintain efficient, reliable, and scalable cloud-based infrastructure and services
Automate processes and find opportunities to improve the observability and availability of the Platform to reduce toil.
Implement and manage observability tools for comprehensive monitoring, alerting, and logging
Own end-to-end availability and performance of different services & tools.
Practice sustainable incident response and blameless postmortems.
Provide on-call support for incident management and participate actively in response activities