Monitor and manage system alerts, incidents, and performance metrics 24/7, ensuring timely resolution and escalation as necessary.
Serve as the first point of contact for operational issues/alerts, ensuring effective communication with internal and external stakeholders.
Lead customer communication, assuring timely status updates and case resolution
Lead to collaboration efforts between the company and third parties to troubleshoot and resolve escalated customer issues
Report product defects and enhancement requests
Review and collaborate on product documentation for accuracy before new releases
Design and maintain troubleshooting runbooks
Author and review knowledge base articles for internal and external use
Provide formal and informal training to co-workers, customers, and partners
Develop tools, scripts, and programs to improve the quality of our customer support
Coordinate incident response efforts, conducting root cause analysis and documenting findings to improve future response strategies.
Collaborate with IT teams to ensure systems operate optimally and identify potential issues before they escalate.
Maintain detailed logs of incidents and responses, providing regular reports and updates to management.
Who We Want
Detail-oriented process improvers. Critical thinkers who naturally see opportunities to develop and optimize work processes - finding ways to simplify, standardize,e and automate.
Self-directed imitators. People who take ownership of their work need no prompting to drive productivity, change, and outcomes.
Analytical problem solvers. People who go beyond just fixing to identify root causes, evaluate optimal solutions, and recommend comprehensive upgrades to prevent future issues.
What You Will Need
Bachelor s degree in computer science or related field of study or 6+ years
4+ years of experience in a command center, or similar environment preferably in Healthcare IT
IT Infrastructure Cloud : Expertise in cloud platforms (AWS, Azure, VMware), system administration (Windows, RHEL), and network management.
Incident Management : Experienced in ITIL and command center operations, particularly in Healthcare IT environments.
Automation Scripting : Proficient in Python and Bash for automating tasks.
Monitoring Tools : Skilled in Zabbix, Datadog, Prometheus, CloudWatch, Nagios, and SolarWinds for performance monitoring.
Ticketing Systems : Strong experience with ServiceNow and Salesforce for incident and request management.
Problem Solving Communication : Excellent under pressure with strong communication and team collaboration skills.
Flexible Work : Available to work shifts, including nights, weekends, and holidays.