Home
Communities
Companies
- Companies
  
  Discover best places to work
- Compare Companies
  
  Compare & find best workplace
- Add Office Photos
  
  Bring your workplace to life
- Add Company Benefits
  
  Highlight your company's perks
Reviews
- Company reviews
  
  Read reviews for 6L+ companies
- Write a review
  
  Rate your former or current company
Salaries
- Browse salaries
  
  Discover salaries for 6L+ companies
- Salary calculator
  
  Calculate your take home salary
- Are you paid fairly?
  
  Check your market value
- Share your salary
  
  Help other jobseekers
Interviews
- Company interviews
  
  Read interviews for 40K+ companies
- Campus placements
  
  Interviews questions for 1K+ colleges
- Share interview questions
  
  Contribute your interview questions
Jobs
Awards

RATE NOW!
- ABECA 2025
  
  RATE NOW!
  
  AmbitionBox Employee Choice Awards - 4th Edition
- ABECA 2024
  
  AmbitionBox Employee Choice Awards - 3rd Edition
- AmbitionBox Best Places to Work 2022
  
  2nd Edition
- AmbitionBox Best Places to Work 2021
  
  1st Edition

Add office photos

Employer? Claim Account for FREE

ITP Global Consultant

Compare

No reviews yet

1 ITP Global Consultant Job

Site Reliability Engineer - DevOps (5-8 yrs)

ITP GLOBAL CONSULTANT

5-8 years

ITP Global Consultant

posted 7d ago

Job Role Insights

Key skills for the job

DevOps Cloud Services New Relic Azure DevOps Datadog Incident Management

+ 4 more

Job Description

Skills Set Required : Devops/SRE concepts, Azure Cloud with 24x7 Production Support, Observability with Datadog/New Relic, RCA, Incident Management.

Detailed Job Description :

3+ years of experience operating and troubleshooting Azure App Services, Azure Functions, Azure Logic Apps, Azure SQL, Azure Storage, Application Insights, Azure Redis, VNets and Azure App Gateway.

- 3+ years full-stack engineering experience with an emphasis on backend (C# and .net applications).

- 3+ years of experience with Reliability concepts to ensure high performance and high service availability, able to define implement and improve business performance SLO's.

- 3+ years of experience with Observability across multiple domains (APM, Infrastructure, Synthetics, Logs, etc...) within cloud and on-premise environments using Datadog, Azure Monitor and Application Insights. NewRelic and Grafana are nice to have.

- 3+ years of experience with Production operations including 24x7 on-call support, escalation/paging with OpsGenie, incident management, RCA (Root Cause Analysis) and retrospective analysis.

- 3+ or more years in hands on technical roles (such as site reliability engineer, software engineer, DevOps engineer, infrastructure engineer).

- Experience with infrastructure management across multiple cloud and on-premise environments using tools such as Terraform, Bicep, PowerShell, Ansible.

- Security is part of everything we do and will require your knowledge of fundamental cloud security (e.g., identity and access management, firewalls, etc.)

- Strong collaboration and communication skills in a hybrid environment using Microsoft Teams, email and calendar.

- Bachelor's Degree in a relevant major or equivalent years of experience

Any of the following would be a plus :

- Dental industry knowledge

- Azure certifications

- Experience working in B2B SaaS companies

- Experience with cloud containers, specifically Kubernetes

Responsibilities & Duties :

Develop :

- Architecture, strategy and implementations to enable or enhance the Observability and Reliability of applications and services running on IaaS and PaaS in Microsoft Azure. AWS and GCP are nice to have.

- Service Level Objectives and indicators focused on improving business workflow performance and availability.

- Technical and business dashboards, metrics, and actionable alerting.

- Processes and automation for increasing uptime and availability, reducing toil and improving all phases of incident and problem management.

24x7 Support :

- Perform deep dives into systemic and latent reliability issues, incident management, problem management.

- Participate in all aspects of incident management including awareness, communication, remediation, retrospective / root cause analysis.

- Identify and implement process improvements of MTTA (Mean Time to Acknowledge) and MTTR (Mean Time to Resolve).

- Support operations & engineering teams on Azure. AWS and GCP are nice to have.

- Supports applications written in .net, .net core, MVC and JavaScript.

- Training & mentoring for peers and less experienced engineers.

- Production environments with on-call rotations.

Advocacy :

- Train and mentor engineering teams on modern observability practices and techniques.

- Define and socialize SRE culture, best practices, architectural and security standards.

- Assess and raise risks across the organization.

Partnership with :

- Internal engineering, architecture and operations teams to ensure alignment.

- External teams to support their work and ensure compliance with our standards

Optimize & manage :

- Multi product observability platforms supporting cloud / on prem infrastructure, services and applications. Observability cost optimization.

- Measuring and monitoring availability, latency, and overall system health across multiple product lines.

- Other duties as assigned

Functional Areas: Software/Testing/Networking

Read full job description