i
55 Northern Trust Operating Services Jobs
Principal, Infra PM&A
Northern Trust Operating Services
posted 1y ago
Flexible timing
Key skills for the job
Major Duties
- Collaborate with teams to craft, implement, and maintain observability solutions that provide deep insights into our applications, infrastructure, and operational processes.
- Develop working relationships with application and infrastructure teams to understand and flush out applicable use cases for monitoring and document them for traceability and auditing.
- Scope and gather technical requirements around the customer monitoring use cases and business KPIs, translate them to tool specifications for Dynatrace, Infrastructure, OS, Synthetics, Real User Monitoring, and Dashboards, and ensure successful implementation and operational success.
- Implement automated scaling mechanisms, performance testing frameworks, and capacity planning strategies to ensure the platform can handle increasing demand while maintaining a high-quality user experience.
- Strategize and implement scalable pipeline ready solutions for continuous monitoring and availability using SNOW tools, CI/CD tools and Automation solutions like Chef/Ansible/Puppet/Terraform.
Leadership Responsibilities:
Provides strategic leadership and roadmaps vision, aligned with department and company goals and objectives.
- Develop and execute a comprehensive observability strategy, including the selection, implementation, and integration of appropriate monitoring, logging, and tracing tools.
- Define key performance indicators (KPIs) and establish monitoring frameworks to proactively identify and resolve issues, ensuring high availability and optimal performance.
- Communicates progress, risks, and outcomes to senior leadership and other stakeholders, providing insights and recommendations for informed decision-making.
- Collaborate with cross-functional teams to identify manual processes, bottlenecks, and pain points, and design and implement scalable automation solutions to increase operational efficiency and reduce human errors.
Mentors junior level technical staff within the functional monitoring area of the IT organization
- Periodically help drive incident investigations, coordinate with relevant teams, and drive root cause analysis to identify systemic issues and implement preventive measures.
- Champion a culture of continuous improvement and digital transformation by implementing feedback loops, analyzing system metrics, and driving iterative enhancements.
Requirements:
- 12+ years of experience as an Observability Engineer, Site Reliability Engineer, or similar role, with a focus on monitoring, logging, tracing, and alerting.
- Experience working in an Agile delivery environment
- Solid understanding of software development and application architecture principles
- Strong knowledge of observability tools and frameworks such as Dynatrace, Azure App Insights, Elastic, Prometheus
- Experience with Azure Managed Services, Serverless Frameworks.
- Prior experience with Java, JS, Python, Teraform, NodeJS, Spring
- Dynatrace Certification Preferred
- ITIL Foundations Certification is preferred
- CIS in Discovery, Service Mapping, Event Mgmt, Cloud Mgmt
- Experienced in implementation on ServiceNow and Dynatrace Discovery, Service Mapping, Event Mgmt and Orchestration use cases.
- Strong knowledge of incident management processes, including incident response, escalation, and post-incident analysis, root cause, error budget, mean time to detect, mean time to restore metrics.
- Demonstrate a strong understanding of Cloud (Azure) services and standard processes
- Solutioning and Design the SNOW ITOM solution using industry best practices.
- Experience with CMDB design, architecture and implementations with a fair understanding of ServiceNow CMDB model and extensions, including integrations with observability tools and APIs.
- Proven experience engineering and implementing end to end observability tools in a large matrixed organization with a variety of technical debt and legacy platforms and applications
Knowledge / Skills / Experience:
Employment Type: Full Time, Permanent
Read full job descriptionPrepare for Principal roles with real interview advice