Upload Button Icon Add office photos
Engaged Employer

i

This company page is being actively managed by Uplers Team. If you also belong to the team, you can get access from here

Uplers Verified Tick

Compare button icon Compare button icon Compare
4.1

based on 355 Reviews

filter salaries All Filters

86 Uplers Jobs

Site Reliability Engineer (SRE)

5-9 years

Chennai

1 vacancy

Site Reliability Engineer (SRE)

Uplers

posted 1d ago

Job Role Insights

Flexible timing

Job Description


Site Reliability Engineer

Experience: 5+ years
Salary : Competitive
Preferred Notice Period: Within 30 Days
Shift: 10:00AM to 6:00PM IST
Opportunity Type: Remote
Placement Type: Permanent

(*Note: This is a requirement for one of Uplers' Partners)

What do you need for this opportunity
Must have skills required :
Monitoring tools, automation, Kubernetes, Docker/Kubernetes
Good to have skills :
Grafana, ELK, Splunk tool, Datadog

Our Hiring Partner is Looking for:
Site Reliability Engineer who is passionate about their work, eager to learn and grow, and who is committed to delivering exceptional results. If you are a team player, with a positive attitude and a desire to make a difference, then we want to hear from you.

Role Overview Description
SoHo Dragon, Ahmedabad is a growing company and is always on the lookout for new, energized talent to join our team. We deliver only the highest standard of service to our customers, and therefore we only hire professionals that are great all-rounders.

Job Description
Key Responsibilities:
● System Reliability and Uptime:
Ensure high availability and performance of critical systems, APIs, and infrastructure components.
Define and track Service Level Objectives (SLOs) and Service Level Indicators (SLIs) to maintain system reliability standards.
Develop and maintain error budgets to balance new feature development with reliability.
● Real-Time Monitoring and Alerting:
Implement and maintain comprehensive monitoring and alerting solutions across critical services, such as APIs, data processing pipelines, databases (e.g., Cassandra, Redshift), and cloud infrastructure.
Set up proactive monitoring for API latency, system load, throughput, and error rates to identify issues before they impact customers.
Collaborate with DevOps and Platform & Infra teams to create end-to-end observability for the entire data processing ecosystem.
● Incident Management and Root Cause Analysis:
Act as the first responder to high-severity incidents, taking ownership of incident
management and response.
Conduct thorough root cause analysis post-incident, working closely with cross-functional teams to implement long-term resolutions.
Develop incident runbooks and playbooks to streamline incident response and reduce Mean Time to Recovery (MTTR).
● Automation of Toil and Operational Efficiency:
Identify and automate repetitive, manual tasks to minimize operational overhead,
particularly within data ingestion, disaggregation, and notification workflows.
Implement self-healing solutions for commonly recurring issues, reducing the need for manual intervention.
Enhance operational efficiency by optimizing resource utilization across infrastructure components like EMR clusters, Redis instances, and SQS queues.
● Capacity Planning and Scalability:
Perform regular capacity planning to ensure our systems can handle future growth
and data processing needs, especially during peak usage periods.
Collaborate with the Platform & Infra and DevOps teams to scale infrastructure effectively, ensuring we meet SLAs for data processing and customer response times.
Monitor and optimize infrastructure costs by ensuring efficient resource allocation and cloud utilization.
● Performance Optimization:
Continuously monitor system performance and optimize APIs, databases, and
backend services to reduce latency and improve response times.
Address performance bottlenecks in the data processing pipeline to ensure timely aggregation, disaggregation, and notification generation.
Develop strategies to improve the accuracy and quality of data insights provided to customers.
● Documentation and Cross-Team Collaboration:
Document all reliability processes, runbooks, and incident resolution steps to
maintain clear, actionable resources for the team.
Work closely with Product Support to ensure that customer-impacting issues are resolved quickly, and with DevOps to streamline the deployment and release processes.
Collaborate on building a culture of reliability and efficiency across the organization.

Key Performance Indicators (KPIs):
● Service Uptime and Availability: Percentage uptime for critical services and systems.
● Mean Time to Recovery (MTTR): Average time to resolve incidents and restore services.
● Incident Frequency: Number of incidents and issues per period, aiming for continuous reduction.
● Error Budget Compliance: Adherence to error budgets without breaching SLOs.
● Automation Coverage: Percentage of manual tasks that have been automated to
reduce operational workload.
● Latency and Performance Metrics: API latency (P50, P95, P99) and system
throughput for key workflows.

Qualifications:
● B.Tech/Bachelor in Computer Science or a related field (math, physics, engineering)
● 3-7 years of experience in Site Reliability Engineering (SRE), DevOps, or Infrastructure Engineering.
● Strong experience with monitoring and observability tools (e.g., Prometheus, Grafana, Datadog, ELK stack).
● Proficiency in automation and scripting (e.g., Python, Bash, Terraform) to manage infrastructure as code and automate repetitive tasks.
● Hands-on experience with cloud platforms (AWS preferred) and knowledge of services like EC2, SQS, EKS, S3, RDS, Redshift, EMR.
● Experience with incident management and root cause analysis methodologies.
● Familiarity with database systems (e.g., Cassandra, Redis, MySQL) and
large-scale data processing pipelines.
● Excellent problem-solving skills, with a proactive approach to identifying and
resolving issues.
● Proficient in SQL (Basic and Advanced) to be able to analyze error and log data
and identify pattern to reduce # of recurring issues or identify top opportunity
areas to reduce ticket volume.
● Bonus point if he is aware of any reporting tool like Tableau/Power BI /Looker
etc

How to apply for this opportunity

  1. Register or login on our portal
  2. Click 'Apply,' upload your resume and fill in the required details.
  3. Post this click ‘Apply Now' to submit your application.
  4. Get matched and crack a quick interview with our hiring partner.
  5. Land your global dream job and get your exciting career started!


About Our Hiring Partner:
Bidgely is the indispensable innovation partner for UtilityAI. Bidgely’s patented disaggregation technology unlocks opportunities for utilities to optimize shareholder value, personalize customer engagement, and modernize grid operations. With more than 15M homes under contract, Bidgely is the choice of leading utilities, Learn more at bidgely.com.


About Uplers:
Our goal is to make hiring reliable, simple, and fast. Our role will be to help all our talents find and apply for relevant opportunities and progress in their career. We will support any grievances or challenges you may face during the engagement. You will also be assigned to a dedicated Talent Success Coach during the engagement.

(Note: There are many more opportunities apart from this on the portal. Depending on the assessments you clear, you can apply for them as well).

So, if you are ready for a new challenge, a great work environment, and an opportunity to take your career to the next level, don't hesitate to apply today. We are waiting for you!


Employment Type: Full Time, Permanent

Read full job description

Prepare for Site Reliability Engineer roles with real interview advice

What people at Uplers are saying

What Uplers employees are saying about work life

based on 355 employees
60%
94%
95%
77%
Flexible timing
Monday to Friday
No travel
Day Shift
View more insights

Uplers Benefits

Submitted by Company
Workations
Emergency Loans
Open Door Policy
Learning and Development
Company Laptop and Internet Reimbursement
Work from Anywhere +3 more
Submitted by Employees
Work From Home
Health Insurance
Job Training
Soft Skill Training
Team Outings
Education Assistance +6 more
View more benefits

Compare Uplers with

Srijan Technologies

3.5
Compare

Net Solutions

4.2
Compare

TCS

3.7
Compare

Infosys

3.7
Compare

Wipro

3.7
Compare

LTIMindtree

3.8
Compare

HCLTech

3.5
Compare

Tech Mahindra

3.5
Compare

Persistent Systems

3.5
Compare

Amazon

4.1
Compare

Uber

4.2
Compare

Fareportal

3.4
Compare

OLX

3.8
Compare

Videocon d2h

3.8
Compare

Groupon

3.2
Compare

Expedia Group

3.9
Compare

HungerBox

3.8
Compare

Metric Stream Infotech

3.1
Compare

FoodPanda

3.7
Compare

Airbnb

3.8
Compare

Similar Jobs for you

Site Reliability Engineer at Infosys

Pune, Chennai + 1

10-18 Yrs

₹ 20-35 LPA

Site Reliability Engineer at Pattern Technologies

Pune

6-10 Yrs

₹ 20-35 LPA

Site Reliability Engineer at Collabera

5-8 Yrs

₹ 12-26 LPA

Site Reliability Engineer at Apple INC

4-6 Yrs

Not Disclosed

Site Reliability Engineer at NETRADYNE

Bangalore / Bengaluru

3-6 Yrs

₹ 18-23 LPA

Site Reliability Engineer at Ascendion

6-9 Yrs

₹ 15-30 LPA

Site Reliability Engineer at Xebia It Architects

Chennai, Bangalore / Bengaluru + 1

6-11 Yrs

₹ 15-30 LPA

Site Reliability Engineer at NetApp

Bangalore / Bengaluru

5-8 Yrs

₹ 25-40 LPA

Site Reliability Engineer at Trantor Software

7-9 Yrs

₹ 25-30 LPA

Site Reliability Engineer at Siemens Limited

Bangalore / Bengaluru

4-6 Yrs

₹ 20-27.5 LPA

Site Reliability Engineer (SRE)

5-9 Yrs

Chennai

2d ago·via naukri.com

Full Stack Developer - React/Next.js & Python

7-11 Yrs

Hyderabad / Secunderabad

2d ago·via naukri.com

Senior Software Engineer - SDE3

5-9 Yrs

Bangalore / Bengaluru

2d ago·via naukri.com

IT Business Analyst - Capital Markets

5-8 Yrs

Pune

2d ago·via naukri.com

SharePoint Developer

4-7 Yrs

Ahmedabad

2d ago·via naukri.com

Senior AI/ML Engineer - LLM & NLP

5-8 Yrs

₹ 27.5 - 35L/yr

Delhi/Ncr

2d ago·via naukri.com

Lead AI/ML Engineer

8-13 Yrs

₹ 45 - 65L/yr

Delhi/Ncr

2d ago·via naukri.com

Sr. Software QA Engineer

4-8 Yrs

Ahmedabad

2d ago·via naukri.com

Senior .NET Web Developer

3-6 Yrs

₹ 18 - 22.5L/yr

Ahmedabad

2d ago·via naukri.com

Full-Stack Developer (AWS Expertise)

2-5 Yrs

Bangalore / Bengaluru

3d ago·via naukri.com
write
Share an Interview