DevOps Engineer I - Docker/Kubernetes (2-4 yrs)
Hudson RPO
posted 8d ago
Key skills for the job
Role Purpose :
The principal responsibility of this role is to provide operational support within the SRE team and work with the software engineering teams for releasing and maintaining new and existing applications.
This encompasses :
- Working in partnership with the business and the technology teams, bringing awareness and insight of the different operational constraints / opportunities for projects targeting cloud-based or on-premises deployment.
- Implementation and maintenance of approved best practices for cloud and on-prem resources and environments.
- Promotion of mutual feedback in cross-functional groups, following SRE best practices within a devops culture.
- Implementation of continuous integration/delivery toolset or the processes and platforms that use those tools.
- Strong focus on service availability and proactive troubleshooting.
Responsibilities :
- Collaborative Pipeline Implementation : Work closely with cross-functional teams, including developers, QA, and product managers, implement and maintain robust delivery pipelines. Facilitate seamless deployments in both cloud-based environments (primarily AWS, with some Azure) and on-premises systems.
- Operational Advocacy : Act as an advocate for operational excellence within the team by identifying and highlighting potential operational constraints and opportunities, such as auto-scaling, containerization, and system resiliency.
- Environment Monitoring : Implement and maintain comprehensive monitoring solutions to ensure the health, performance, and availability of live environments. Regularly analyze monitoring data to identify trends and areas for improvement.
- Tooling Consistency : Participate in team research and development initiatives to standardize tooling and processes. Strive to improve the accessibility and maintainability of tools used across the team.
- Continuous Improvement : Embrace a mindset of continuous improvement, regularly reviewing operational processes to identify inefficiencies. Assist with drafting and proposing actionable plans for process enhancements to increase team efficiency and system reliability.
- Incident Response : Actively engage in incident response activities, quickly diagnosing and resolving issues to minimize impact on system availability. Document incident responses to improve future handling and prevention.
- Documentation : Maintain accurate and comprehensive documentation of systems, processes, and configurations to ensure knowledge sharing and operational transparency across the team.
- On-Call Support : Participate in a 24/7 on-call support rotation, responding to system alerts and incidents to ensure continuous system availability and performance.
Qualifications :
We understand every organization is different and professionals have their own unique history and experience, so we don't expect to find a 100% match of candidate competencies in respect of the tech stack we use in Wood Mackenzie. We list our preferred technologies, but if you have transferrable knowledge and you are willing to learn what you do not know, we will consider your application.
Skill Requirements :
- Educational Background : BS degree in IT, IS, CS, or equivalent work experience.
- Experience : Minimum of 2-4 years of experience in SRE/DevOps roles.
- Agile Environment: Experience working in an Agile environment, supporting multiple software engineering teams for product releases and providing monitoring and insights into issues.
- Cloud Deployment : Basic understanding of cloud architecture and deployment strategies, particularly within Amazon Web Services (AWS). Familiarity with Azure is a plus.
- DevOps Mindset : Basic understanding of DevOps principles, including continuous integration, continuous delivery, and infrastructure as code. Experience with agile methodologies such as Kanban or Scrum and familiarity with JIRA for issue tracking.
- Team Collaboration : Ability to work as part of a cross-functional, multi-locational team. Basic communication skills to collaborate with team members and stakeholders.
- Resource Maintenance : Ability to assist with routine system maintenance tasks, including patch management, system backups, and performance tuning.
- Technical Skills : Basic proficiency in Linux administration (RHEL/Ubuntu) and command line tools for quick triage and resolution of production issues.
- Configuration Management : Familiarity with configuration management tools (e.g., Ansible, SaltStack) for automating system configurations and deployments.
Additional Preferred Skills :
- Advanced Linux Proficiency : Greater proficiency in Linux system administration, including shell scripting and automation.
- Monitoring and Alerting : Experience with monitoring and alerting tools such as Prometheus, Grafana, or CloudWatch.
- Containerization : Basic understanding of containerization technologies (e.g., Docker, Kubernetes) and their role in modern deployment strategies.
- Scripting and Automation : Proficiency in scripting languages such as Python, PowerShell, or Bash to automate repetitive tasks and improve operational efficiency
Functional Areas: Software/Testing/Networking
Read full job description6-9 Yrs