Upload Button Icon Add office photos
filter salaries All Filters

174 Nvidia Jobs

Senior Site Reliability Engineer

7-9 years

Bangalore / Bengaluru

1 vacancy

Senior Site Reliability Engineer

Nvidia

posted 7hr ago

Job Role Insights

Flexible timing

Job Description

NVIDIA has been redefining computer graphics, PC gaming, and accelerated computing for more than 25 years. It s a unique legacy of innovation that s motivated by outstanding technology and amazing people. Today, we re tapping into the unlimited potential of AI to define the next era of computing. An era in which our GPU acts as the brains of computers, robots, and self-driving cars that can understand the world. NVIDIA is at the forefront of generative AI models, from language to images. Doing what s never been done before takes vision, innovation, and the world s best talent. As an NVIDIAN, you ll be immersed in a diverse, encouraging environment where everyone is inspired to do their best work.

NVIDIA is looking for a Senior Site Reliability Engineer (SRE) to join its cloud service team for supporting, triaging, and building generative AI-powered visual applications. As SREs are responsible for the big picture of how our systems relate to each other, we use a breadth of tools and approaches to tackle a broad spectrum of problems. We live SRE practices that are key to product quality, such as limiting time spent on reactive operational work, blameless postmortems, proactive identification of potential outages, and iterative improvements, which all make for exciting and multifaceted day-to-day work. The person in this position will be responsible for Service Response and workflow and will drive tools/service development to maintain and improve service SLOs. We partner with Service Owners to drive the reliability of the service.

What you will be doing:

  • Support and work on groundbreaking Generative AI inferencing workloads running in a globally-distributed heterogeneous environment spanning 60+ edge locations plus all major cloud service providers. Ensure the best possible performance and availability on current and next-generation GPU architectures.

  • Collaborate closely with the service owner, architecture, research, and tools teams at NVIDIA to achieve ideal results for AI problems at hand.

  • Monitoring & supporting critical high-performance, large-scale services running multi-cloud.

  • Participate in the triage & resolution of sophisticated infra-related issues.

  • Maintain services once live by measuring and monitoring availability, latency, and overall system health using metrics, logs, and traces.

  • Scale systems sustainably through mechanisms like automation and evolve systems by pushing for changes that improve reliability and velocity.

  • Practice balanced incident response and blameless postmortems.

  • Be part of an on-call rotation to support production systems and lead significant production improvement around tooling, automation, and process.

  • Architect, design, and code using your expertise to optimize, deploy and productize services.

What we need to see:

  • 8+ years of experience operating & owning end-to-end availability and performance of mission-critical services in a live-site production environment, either as an SRE or Service Owner.

  • BS degree in Computer Science or a related technical field involving coding (e. g. , physics or mathematics), or equivalent experience

  • Solid understanding of containerization and microservices architecture, K8s. Excellent understanding of the Kubernetes ecosystem and best practices with K8s.

  • Ability to dissect complex problems into simple sub-problems and use available solutions to resolve them.

  • Technical leadership beyond development that includes scoping, requirements capturing, leading and influencing multiple teams of engineers on broad development initiatives.

  • Lead significant production activities, including change management, post-mortem reviews, workflow processes, software design, and delivering software automation in various languages (Python, or Go ) and technologies (CI/CD auto-remediation, alert correlation).

  • Best in understanding SLO/SLIs, error budgeting, KPIs, and configuring for highly sophisticated services.

  • Experience with the ELK and Prometheus stacks as a power user and administrator.

  • Excellent understanding of cloud environments and technologies, especially AWS, Azure, GCP, or OCI.

  • Proven strengths in identifying, mitigating, and root-causing issues while continuously seeking ways to drive optimization, efficiency, and the bottom line.

Ways to stand out from the crowd:

  • Exposure to containerization and cloud-based deployments for AI models.

  • Excellent coding: Python, Go (Any similar language).

  • Prior experience driving production issues and helping with on-call support and understanding of Deep Learning / Machine Learning / AI.

  • Experience with Cuda, PyTorch, TensorRT, TensorFlow, and/or Triton as well as experience with StackStorm and similar automation platforms is a bonus.

  • Understanding of observability instrumentation techniques and best practices, including OpenTelemetry.

NVIDIA is widely considered to be one of the technology world s most desirable employers. We have some of the most forward-thinking and hardworking people in the world working for us. If you're creative and autonomous, we want to hear from you.


Employment Type: Full Time, Permanent

Read full job description

Prepare for Senior Site Reliability Engineer roles with real interview advice

People are getting interviews at Nvidia through

(based on 55 Nvidia interviews)
Campus Placement
Job Portal
Company Website
Referral
Walkin
Recruitment Consultant
36%
25%
11%
7%
4%
4%
13% candidates got the interview through other sources.
High Confidence
?
High Confidence means the data is based on a large number of responses received from the candidates.

What people at Nvidia are saying

Senior Site Reliability Engineer salary at Nvidia

reported by 3 employees with 7-9 years exp.
₹28.8 L/yr - ₹97 L/yr
100% more than the average Senior Site Reliability Engineer Salary in India
View more details

What Nvidia employees are saying about work life

based on 514 employees
66%
96%
85%
79%
Flexible timing
Monday to Friday
No travel
Day Shift
View more insights

Nvidia Benefits

Free Transport
Free Food
Cafeteria
Health Insurance
Work From Home
Job Training +6 more
View more benefits

Compare Nvidia with

Qualcomm

3.8
Compare

Intel

4.3
Compare

Advanced Micro Devices

3.8
Compare

Micron Technology

3.7
Compare

Texas Instruments

4.1
Compare

Broadcom

3.3
Compare

Applied Materials

3.9
Compare

Analog Devices

4.1
Compare

NXP Semiconductors

3.8
Compare

Sterlite Technologies

3.8
Compare

Indus Towers

3.8
Compare

Nokia Networks

4.3
Compare

Cisco

4.2
Compare

Vertiv

4.0
Compare

Lumen Technologies

4.1
Compare

Redington

4.0
Compare

Colt Technology Services

4.4
Compare

RadiSys

4.1
Compare

Vindhya Telelinks

4.1
Compare

Juniper Networks

4.2
Compare

Similar Jobs for you

Senior Site Reliability Engineer at NVIDIA

Hyderabad / Secunderabad, Pune + 2

7-9 Yrs

₹ 32.5-37.5 LPA

Senior Site Reliability Engineer at NVIDIA

Hyderabad / Secunderabad, Pune + 2

7-9 Yrs

₹ 32.5-37.5 LPA

Senior Site Reliability Engineer at NVIDIA

Hyderabad / Secunderabad, Pune + 2

5-8 Yrs

₹ 32.5-37.5 LPA

Senior Site Reliability Engineer at FabHotel Aay Kay Model Town

New Delhi, Karnal

7-10 Yrs

₹ 25-30 LPA

AI Engineer at NVIDIA

Pune

5-7 Yrs

₹ 25-30 LPA

Software Developer at NVIDIA

Pune, Bangalore / Bengaluru

7-12 Yrs

₹ 25-30 LPA

Site Reliability Engineer at Akamai

Remote

5-10 Yrs

₹ 30-37.5 LPA

Site Reliability Engineer at NVIDIA

Bangalore / Bengaluru

7-9 Yrs

₹ 32.5-37.5 LPA

Senior Site Reliability Engineer at Cimpress India

Mumbai

3-10 Yrs

₹ 30-35 LPA

Senior Architect at NVIDIA

Bangalore / Bengaluru

7-11 Yrs

₹ 20-27.5 LPA

Nvidia Bangalore / Bengaluru Office Locations

View all
Bengaluru Office
NVIDIA Graphics PVT LTD, C-1 "Jacaranda", Wing-A Manyata Embassy Business Park, Outer Ring Road Bengaluru
Karnataka 560045
Bengaluru Office
Nvidia Graphics Pvt Ltd, C1, Nagavara Bengaluru
Karnataka 560045

Senior Site Reliability Engineer

7-9 Yrs

Bangalore / Bengaluru

15hr ago·via naukri.com

Customer Program Manager - Auto

4-8 Yrs

Pune, Bangalore / Bengaluru

2d ago·via naukri.com

Senior Site Reliability Engineer

7-9 Yrs

Pune

2d ago·via naukri.com

Senior Site Reliability Engineer

7-9 Yrs

Hyderabad / Secunderabad, Pune, Gurgaon / Gurugram +1 more

2d ago·via naukri.com

Compute Performance Developer Technology Engineer

2-4 Yrs

Pune

2d ago·via naukri.com

CAD Engineer

2-4 Yrs

Hyderabad / Secunderabad, Bangalore / Bengaluru

2d ago·via naukri.com

Senior Automotive Hardware Application Engineer

12-13 Yrs

Pune

2d ago·via naukri.com

ASIC Verification Engineer

0-5 Yrs

Bangalore / Bengaluru

2d ago·via naukri.com

Senior Verification Engineer, Memory Subsystem

4-10 Yrs

Bangalore / Bengaluru

2d ago·via naukri.com

Software QA Test Developer - Automotive

1-4 Yrs

Hyderabad / Secunderabad, Pune

4d ago·via naukri.com
write
Share an Interview