Home
Communities
Companies
- Companies
  
  Discover best places to work
- Compare Companies
  
  Compare & find best workplace
- Add Office Photos
  
  Bring your workplace to life
- Add Company Benefits
  
  Highlight your company's perks
Reviews
- Company reviews
  
  Read reviews for 6L+ companies
- Write a review
  
  Rate your former or current company
Salaries
- Browse salaries
  
  Discover salaries for 6L+ companies
- Salary calculator
  
  Calculate your take home salary
- Are you paid fairly?
  
  Check your market value
- Share your salary
  
  Help other jobseekers
Interviews
- Company interviews
  
  Read interviews for 40K+ companies
- Campus placements
  
  Interviews questions for 1K+ colleges
- Share interview questions
  
  Contribute your interview questions
Jobs
Awards

RATE NOW!
- ABECA 2025
  
  RATE NOW!
  
  AmbitionBox Employee Choice Awards - 4th Edition
- ABECA 2024
  
  AmbitionBox Employee Choice Awards - 3rd Edition
- AmbitionBox Best Places to Work 2022
  
  2nd Edition
- AmbitionBox Best Places to Work 2021
  
  1st Edition

Add office photos

Engaged Employer

Diamondpick

Compare

4.1

based on 299 Reviews

111 Diamondpick Jobs

Senior Site Reliability Engineer (SRE)

Diamond pick

4.1

based on 299 Reviews

1-3 years

Chennai

1 vacancy

Senior Site Reliability Engineer (SRE)

Diamondpick

posted 7d ago

Job Role Insights

Flexible timing

Key skills for the job

Software Configuration Management Python Computer Networking Automation Testing Operations Information Technology

+ 3 more

Job Description

JOB TITLE:

Site Reliability Engineer (SRE)

Level E2

INTRODUCTION:

At NBCUniversal, we believe in the talent of our people. It s our passion and commitment to excellence

that drives NBCU s vast portfolio of brands to succeed. From broadcast and cable networks, news and

sports platforms, to film, world-renowned theme parks and a diverse suite of digital properties, we take

pride in all that we do and all that we represent. It s what makes us uniquely NBCU. Here you can create the extraordinary. Join us.

ABOUT THE ROLE:

The Legal and Privacy Engineering organization is looking for a Site Reliability Engineer (SRE) that is a well-rounded IT professional with strong software troubleshooting skills, software development experience, and strong systems administration skills.

SRE team members are responsible for ensuring the stability, scalability, and performance of our legal and privacy systems by blending software engineering and operations expertise. They are proactively creating monitoring and alerting systems, monitoring the systems to find gaps, addressing gaps before they impact users, responding to issues, and continually improving our systems. They automate away manual processes to increase reliability and reduce operational costs. They will track down defects and come up with innovative solutions to improve reliability and availability.

In this role you will be handling site reliability engineering responsibilities across all systems within the legal and privacy space and working as a larger SRE team to provide continuous coverage.

Responsibilities Include the following:

Monitor system performance and reliability to proactively identify and address potential issues before they impact users.

Develop, maintain, and optimize alerting and monitoring systems to ensure high availability and system performance.

Communicate effectively with stakeholders about system status, downtime, and issues.

Measure and report on system availability and performance against defined SLAs.

Participate in on-call rotations, ensuring timely and effective incident response and resolution.

Conduct thorough root cause analysis of incidents and outages and implement preventive measures to avoid recurrence.

Automate routine tasks and processes to minimize manual intervention and optimize operational efficiency.

Collaborate closely with development teams to ensure new features are designed for reliability, scalability, and effective monitoring.

Plan, test, coordinate, and implement new systems, upgrades, and modifications.

Design, develop, and maintain scalable infrastructure systems to support high-traffic

applications.

Collaborate with vendors and cross-functional teams to ensure seamless integration and alignment of efforts.

Create and maintain documentation for systems, processes, and procedures to ensure

knowledge is shared and accessible.

Assist with documenting system designs, processes, and troubleshooting procedures to facilitate knowledge sharing within the team.

Ensure automated CI/CD deployments run successfully, providing troubleshooting and fallback support as needed to prevent service disruptions.

Manage system capacity planning and scaling strategies to effectively handle growth and traffic fluctuations.

Establish and enforce best practices for security, compliance, and configuration management.

Continuously enhance system reliability by evaluating and integrating new tools, technologies, and best practices.

REQUIREMENTS:

1-3 years of experience as a site reliability engineer, DevOps, or similar role - with a strong focus on systems administration and experience in software engineering

Bachelor s degree in Computer Science, Information Technology, Engineering, or a related field.

Advanced degrees or relevant certifications are a plus.

Proficiency in cloud platforms such as AWS, Azure, or Google Cloud, with experience in managing cloud-based infrastructure.

Strong scripting and automation skills using languages like Python, Bash, or PowerShell.

Experience with configuration management tools (e.g., Ansible, Chef, Puppet, Terraform) and infrastructure-as-code (IaC) tools.

In-depth knowledge of CI/CD pipelines, including tools such as GitLab.

Proficiency in CloudFormation needed and proficiency in AWS CDK a plus

Proficiency in monitoring and alerting tools and the ability to design and optimize alerting systems.

Solid understanding of networking concepts, security best practices, and compliance requirements.

Strong problem-solving skills, with a demonstrated ability to perform root cause analysis and implement effective solutions to prevent future incidents.

Excellent communication and collaboration skills, with the ability to work effectively with cross-functional teams, including development and product teams.

Familiarity with incident management frameworks and experience in participating in on-call rotations.

Ability to manage multiple priorities in a fast-paced environment, with a strong focus on detail and quality.

Employment Type: Full Time, Permanent

Read full job description

Prepare for Senior Site Reliability Engineer roles with real interview advice