1 ITP Global Consultant Job
Site Reliability Engineer - DevOps (5-8 yrs)
ITP Global Consultant
posted 7d ago
Key skills for the job
Skills Set Required : Devops/SRE concepts, Azure Cloud with 24x7 Production Support, Observability with Datadog/New Relic, RCA, Incident Management.
Detailed Job Description :
3+ years of experience operating and troubleshooting Azure App Services, Azure Functions, Azure Logic Apps, Azure SQL, Azure Storage, Application Insights, Azure Redis, VNets and Azure App Gateway.
- 3+ years full-stack engineering experience with an emphasis on backend (C# and .net applications).
- 3+ years of experience with Reliability concepts to ensure high performance and high service availability, able to define implement and improve business performance SLO's.
- 3+ years of experience with Observability across multiple domains (APM, Infrastructure, Synthetics, Logs, etc...) within cloud and on-premise environments using Datadog, Azure Monitor and Application Insights. NewRelic and Grafana are nice to have.
- 3+ years of experience with Production operations including 24x7 on-call support, escalation/paging with OpsGenie, incident management, RCA (Root Cause Analysis) and retrospective analysis.
- 3+ or more years in hands on technical roles (such as site reliability engineer, software engineer, DevOps engineer, infrastructure engineer).
- Experience with infrastructure management across multiple cloud and on-premise environments using tools such as Terraform, Bicep, PowerShell, Ansible.
- Security is part of everything we do and will require your knowledge of fundamental cloud security (e.g., identity and access management, firewalls, etc.)
- Strong collaboration and communication skills in a hybrid environment using Microsoft Teams, email and calendar.
- Bachelor's Degree in a relevant major or equivalent years of experience
Any of the following would be a plus :
- Dental industry knowledge
- Azure certifications
- Experience working in B2B SaaS companies
- Experience with cloud containers, specifically Kubernetes
Responsibilities & Duties :
Develop :
- Architecture, strategy and implementations to enable or enhance the Observability and Reliability of applications and services running on IaaS and PaaS in Microsoft Azure. AWS and GCP are nice to have.
- Service Level Objectives and indicators focused on improving business workflow performance and availability.
- Technical and business dashboards, metrics, and actionable alerting.
- Processes and automation for increasing uptime and availability, reducing toil and improving all phases of incident and problem management.
24x7 Support :
- Perform deep dives into systemic and latent reliability issues, incident management, problem management.
- Participate in all aspects of incident management including awareness, communication, remediation, retrospective / root cause analysis.
- Identify and implement process improvements of MTTA (Mean Time to Acknowledge) and MTTR (Mean Time to Resolve).
- Support operations & engineering teams on Azure. AWS and GCP are nice to have.
- Supports applications written in .net, .net core, MVC and JavaScript.
- Training & mentoring for peers and less experienced engineers.
- Production environments with on-call rotations.
Advocacy :
- Train and mentor engineering teams on modern observability practices and techniques.
- Define and socialize SRE culture, best practices, architectural and security standards.
- Assess and raise risks across the organization.
Partnership with :
- Internal engineering, architecture and operations teams to ensure alignment.
- External teams to support their work and ensure compliance with our standards
Optimize & manage :
- Multi product observability platforms supporting cloud / on prem infrastructure, services and applications. Observability cost optimization.
- Measuring and monitoring availability, latency, and overall system health across multiple product lines.
- Other duties as assigned
Functional Areas: Software/Testing/Networking
Read full job description