As a Senior DevOps Engineer, you ll be critical in delivering shared solutions for infrastructure operations across our products. Working collaboratively, you ll lead your team to build and improve our current and future hosting capabilities. You will be consistently looking at ways to automate, optimize, increase availability, and migrate to cloud services.
Responsibilities
Contributing to common paved road and self-service modules which allows teams to move at pace, operate, and maintain software systems across the full breath of the SDLC.
Day to day leadership and mentorship of your team facilitating an environment where innovation and learning will thrive, developing an inclusive, engaged, agile culture to promote innovation and collaboration where our colleagues can grow their careers.
Designing and working with complex data models, successfully implementing development processes, coding best practices, and code reviews, designing and authoring modular infrastructure assets for framework improvements and utilizing our Core Platform best practices
Ensuring Elsevier s Platform operational frameworks, policies and best practices are consistently applied, improving the reliability and performance of Elsevier s product portfolio, including systems design, incident management, disaster recovery, lifecycle management products, L3 support response to tool outages and performance alerts.
Platform automation to increase deployment frequency, minimize change failures, maintain service levels, and ensure security though optimal construction and implementation of CI/CD pipelines providing consistency and reliability throughout the lifecycle, using scripting languages (Python, Bash, PowerShell) infrastructure as code (Terraform, Cloud Formation) and Cloud native tools like Lambdas.
Key leader driving modernization of our entire technology stack and alignment with architects to implement multi-region AWS infrastructure, accelerating Platform innovations by ensuring we maintain and promote the use of secure, high-performing, and reliable frameworks and shared services.
Seeking out diverse ideas and perspectives from a variety of sources to create better solutions, products, and services and champion innovation for team squads and across the organization, eliminating toil, so that the team can focus more time on project work and lasting improvement, serving as the initial point of escalation for development issues within the Platform team.
Work closely with the Operational Command center, monitoring the health of the tools, triage issues, troubleshooting tooling and integration issues efficiently while effectively communicating escalations and outcomes, debugging production issues across all levels of the stack, embracing a blameless culture to prevent incidents from ever happening.
Requirements
Advanced problem-solving experience involving leading teams in identifying, researching, and coordinating the resources necessary to effectively troubleshoot/diagnose complex project issues; prior success extracting/translating findings into alternatives/solutions, good fault diagnostic skills with the ability to assess and prioritize faults and respond or escalate accordingly
Demonstrate experience and best practices in AWS Architecture with an accreditation or proficiency in Amazon Web Services (AWS), lead the development of technical standards and perform reviews to ensure enterprise and architectural standards and processes are followed.
Experience facilitating technical and planning meetings to identify best outcomes seek diverse ideas and perspectives from a variety of sources to create better solutions, products, and services and champion innovation within your squads and across the organization.
Experience in authoring CI/CD pipelines, automation elements related to infrastructure composition, deployment orchestration, and monitoring, ability to build and deploy code with Jenkins or similar tools that allow for rapid release of high-quality software.
Experience in containerization and orchestration (with Docker and Kubernetes respectively), deploying application with container technology (OpenShift, Docker, Kubernetes, etc.) with expert level experience managing and running Linux servers and administration skills at scale.
Experience in deploying and integrating monitoring technologies, backup/restore, and tools in the cloud large scale monitoring and reporting (New Relic), running and managing ELK with demonstrable knowledge and expertise of 24 x 7 operational support of systems hosted on a major cloud provider (AWS, GCP, Azure)
Knowledge in writing and using modular Terraform at scale and Infrastructure as Code (IaC) as an AWS automation technology implementing modern scripting and object-oriented programming, configuration management, and deployment via Ansible, Puppet or Chef for multi-region cloud-based environment Platforms (AWS, Azure or Google Cloud or similar)
Possess a proven record of implementing DevOps and SRE methodologies, principles, and practices. Through data and metrics, you can demonstrate a holistic view of how these working practices help build and support products for our team.
Solid knowledge of scripting technologies required to build tooling integrations (ruby, bash, Golang, python or other scripting languages), experience in using modern scripting and OO programming languages as a contributing member within an agile dev squad
You are an innovative, passionate leader who can create high performing global engineering teams that use software to solve complex problems, have strong analytical skills and ability to collate and interpret data from various sources, strong leadership and initiative skills with ability to self-direct, plan and prioritize tasks, and successfully lead large onshore and offshore resources in solving complex business needs, coordinate the production of advanced complex software and serve as a senior source of expertise.