Upload Button Icon Add office photos
filter salaries All Filters

182 Nvidia Jobs

Service Reliability Operations Engineer

6-8 years

Bangalore / Bengaluru

1 vacancy

Service Reliability Operations Engineer

Nvidia

posted 6d ago

Job Role Insights

Flexible timing

Job Description

NVIDIA's NGC (NVIDIA Gpu Cloud) team is looking for highly motivated System Administrator/DevOps engineers to design, develop and implement a global, dynamic, state-of-the-art Service Reliability Operations Center (known as Mission Control), to provide extraordinary levels of support for our Cloud products and services. As a key member of the Mission Control team, you will partner with other key members of our organization including Site Reliability Engineering, Security Operations Center, DevOps teams, and other datacenter operations partners to help make our services capable of providing near 100% availability. On the rare occasion that an incident occurs, you will be our front line to decrease the frequency and duration of any issue. Working in partnership with the development community the Mission Control team will develop monitors, alarms and alerts to help make the service more reliable and improve our customer experience. Additionally you will be very involved in selecting the technologies that we will use in the Mission Control to help monitor, run and measure the effectiveness of the environment.

What you will be doing:

  • The team will provide their services 24/7 with a follow-the-sun environment which will span continents.

  • You will directly report to a manager in Bangalore.

  • Each team member will need to work either a Saturday or Sunday each week. The hours worked may include an early or late start (10hrs-per-day x 4days-per-week schedule) to ensure that the combination of the US and India teams provide 24/7 coverage.

  • The heart of Mission Control will be monitoring and triaging a growing On-prem and CSP (Cloud Service Provider) production compute and storage Datacenter environment.

  • Every Mission Control team member will utilize alerts and alarms to help prevent issues and incidents when possible. You may also work with the developer community to develop and execute predictive support or diagnostic routines.

  • Perform Linux administration tasks, network administration tasks, security incident monitoring to drive your actions.

  • Mission Control team members will work with developers to learn how the service works, then translate that understanding into runbooks which the entire team will use. As new features and functionality are added, you will also update and evolve the runbooks as needed.

  • Strong communication and interpersonal skills will help keep the team engaged through incident resolution, including initiating the incident management procedure.

What we need to see:

  • BS/BE degree in Computer Science, Electronics or equivalent experience.

  • Minimum of 3 years experience administering open system servers in a Production environment of demanding Internet, Cloud, or Telecommunications environments as a Linux Systems Administration, DevOps, SRE, or NOC role.

  • Strong problem-solving, analytical, and troubleshooting abilities on Linux Clusters on public or private clouds.

  • Strong Linux administration experience. Shell scripting, automation, DNS, DHCP, storage concepts, basic networking, IP Tables, etc. RHCE or equivalent level of knowledge.

  • Experience scripting in Python and ansible playbooks is preferred, but not required.

  • Knowledge and understanding of application containers, container orchestration systems and git workflow. .

  • Prior experience analyzing system and network performance using monitoring alerts, data, and graphs.

  • Demonstrate ability to master and maintain complicated environments.

NVIDIA offers highly competitive salaries and a comprehensive benefits package. We have some of the most forward-thinking and talented people in the world working for us and, due to unprecedented growth, our world-class engineering teams are growing fast. If you're a creative and autonomous engineer with real passion for technology, we want to hear from you.


Employment Type: Full Time, Permanent

Read full job description

Prepare for Operations Engineer roles with real interview advice

People are getting interviews at Nvidia through

(based on 55 Nvidia interviews)
Campus Placement
Job Portal
Company Website
Referral
Walkin
Recruitment Consultant
36%
25%
11%
7%
4%
4%
13% candidates got the interview through other sources.
High Confidence
?
High Confidence means the data is based on a large number of responses received from the candidates.

What people at Nvidia are saying

What Nvidia employees are saying about work life

based on 513 employees
66%
96%
85%
79%
Flexible timing
Monday to Friday
No travel
Day Shift
View more insights

Nvidia Benefits

Free Transport
Free Food
Cafeteria
Health Insurance
Work From Home
Job Training +6 more
View more benefits

Compare Nvidia with

Qualcomm

3.8
Compare

Intel

4.2
Compare

Advanced Micro Devices

3.8
Compare

Micron Technology

3.7
Compare

Texas Instruments

4.1
Compare

Broadcom

3.3
Compare

Applied Materials

3.9
Compare

Analog Devices

4.1
Compare

NXP Semiconductors

3.8
Compare

Sterlite Technologies

3.8
Compare

Indus Towers

3.9
Compare

Nokia Networks

4.3
Compare

Cisco

4.2
Compare

Lumen Technologies

4.0
Compare

Redington

4.0
Compare

Colt Technology Services

4.4
Compare

RadiSys

4.1
Compare

Vindhya Telelinks

4.1
Compare

Juniper Networks

4.2
Compare

ITI

3.7
Compare

Similar Jobs for you

Staff at Synopsys (India) Private Limited

Noida, Hyderabad / Secunderabad + 2

3-8 Yrs

₹ 16-21 LPA

Senior Site Reliability Engineer at Eli Lilly and Company

Bangalore / Bengaluru

8-10 Yrs

₹ 20-22 LPA

Site Reliability Engineer at Apple India Pvt Ltd

Hyderabad / Secunderabad

4-9 Yrs

₹ 20-22 LPA

Senior Site Reliability Engineer at Equifax Credit Information Services Private Limited

Pune, Thiruvananthapuram

6-14 Yrs

₹ 8-16 LPA

Site Reliability Engineer at UPS Pvt. Ltd.

Chennai

4-9 Yrs

₹ 17-22 LPA

Architect at Zensar Technologies

Pune

6-9 Yrs

₹ 14-18 LPA

Site Reliability Engineer at Zscaler, Inc.

Hyderabad / Secunderabad

3-7 Yrs

₹ 12-16 LPA

Site Reliability Engineer at NVIDIA

Bangalore / Bengaluru

4-7 Yrs

₹ 17-21 LPA

Services Consultant at Juniper Networks India Pvt Ltd

Gurgaon / Gurugram

8-13 Yrs

₹ 13-18 LPA

Site Reliability Engineer at ALIQAN Technologies

Bangalore / Bengaluru

5-10 Yrs

₹ 25-40 LPA

Nvidia Bangalore / Bengaluru Office Locations

View all
Bengaluru Office
NVIDIA Graphics PVT LTD, C-1 "Jacaranda", Wing-A Manyata Embassy Business Park, Outer Ring Road Bengaluru
Karnataka 560045
Bengaluru Office
Nvidia Graphics Pvt Ltd, C1, Nagavara Bengaluru
Karnataka 560045

Service Reliability Operations Engineer

6-8 Yrs

Bangalore / Bengaluru

8d ago·via naukri.com

Senior Solutions Architect - Generative AI

4-12 Yrs

Bangalore / Bengaluru

2d ago·via naukri.com

Senior System Software Engineer

4-14 Yrs

Bangalore / Bengaluru

2d ago·via naukri.com

Senior System Software Engineer - Platform Team

4-14 Yrs

Bangalore / Bengaluru

2d ago·via naukri.com

Senior System Software Engineer, Security

4-14 Yrs

Bangalore / Bengaluru

2d ago·via naukri.com

Business Systems Analyst

10-15 Yrs

Bangalore / Bengaluru

2d ago·via naukri.com

Hypervisor and RTOS Engineer

1-9 Yrs

Bangalore / Bengaluru

2d ago·via naukri.com

System Software Engineer - Automation

1-5 Yrs

Bangalore / Bengaluru

6d ago·via naukri.com

Senior DevOps Engineer

5-11 Yrs

Pune

6d ago·via naukri.com

System Software Engineer

1-5 Yrs

Pune

6d ago·via naukri.com
write
Share an Interview