Manage major incidents from detection to resolution, ensuring timely communication with stakeholders and minimal impact on business operations. Lead the Incident Management process, including identification, containment, eradication, and recovery of incidents. Collaborate with cross-functional teams to resolve complex technical issues. Develop and maintain relationships with key stakeholders to ensure effective communication during critical situations. Ensure compliance with ITIL processes and industry standards for incident management.
Establish continuous process improvement cycles where the process performance,
activities, roles and responsibilities, policies, procedures and supporting technology is reviewed and enhanced where applicable.
Key Responsibilities:
Lead the incident management process and oversee team members involved in resolving incidents.
Respond to reported service incidents, identify the cause, and initiate the management process.
Establish and orchestrate bridge calls with emphasis on restoring service to users as quickly as possible, facilitate and troubleshoot toward resolution of incidents, and manage incidents to completion
Responsible for the appropriate recording, prioritization, Data Collection, Validation and ongoing management of problems through to root cause identification, known error management and closure.
Ensures quality of Known Error records
Monitor and support Incident management in production, development, and test environments.
Prioritize incidents according to their urgency and influence on the business.
Collaborate with the incident management team so all protocols are diligently followed.
Communicate with upper management if significant issues are found.
Log all incidents and their resolution to identify and prevent recurring malfunctions.
Conduct diagnostics, data collection, and troubleshooting activities to identify and resolve technical issues.
Define and document metrics to judge efficiency and effectiveness of Incident Management Process
Examples: Mean Time to Repair, Mean Time Between Failures, Repeat Incidents
Support the achievement of response and resolution targets for all tickets as per agreed SLA within the Service Desk Operations.
Measures Output:
Provide support for:
Daily/Weekly/Monthly Reporting
Achievement of SLA/KPIs
Trending of improvements for open Incident/Service Requests
Incident avoidance (via Problem Management)
Qualifications: 3 years of experience working as Major Incident Manager
3 years exposure to Service Management/ITIL framework and concepts (incident, problem, change management, RCA)
Excellent verbal and written communication skills: experience working with technical and functional resources
Preferred Skills:
Foundational Certification in AWS or Azure Cloud services.
Certifications in ITIL or other service management frameworks.
Proficiency in incident response, infrastructure, ITIL, metrics, production environment, incident reports, technical issues, NOC (Network Operations Center), client-facing, network operations, Java, mainframe, and SharePoint