We are seeking a skilled Site Reliability Engineer (SRE) specialising in Cloud Operations to join our dynamic team. The ideal candidate will leverage their expertise in cloud infrastructure, automation, and monitoring to ensure the reliability and performance of our services. You will play a crucial role in designing systems that are resilient, scalable, and fault-tolerant while collaborating closely with development and operations teams to enhance our cloud infrastructure.
Responsibilities:
Design and Implement Infrastructure:
Develop and maintain robust cloud architectures across various platforms (AWS, Azure, GCP) to support scalable services and applications.
Monitoring and Incident Management:
Set up monitoring tools to track system performance and promptly respond to incidents to minimize downtime and ensure service reliability.
Automation and Tooling:
Automate deployment, scaling, and management of applications using Infrastructure as Code (IaC) tools like Terraform or CloudFormation.
Continuous Improvement:
Collaborate with software engineering teams to promote best practices in system design, development, and operational procedures to maximize uptime and performance.
Capacity Planning:
Conduct capacity and performance analysis to anticipate system demands and ensure that resources are effectively utilized while implementing strategies for scaling.
Disaster Recovery and Security:
Design and test disaster recovery plans and ensure compliance with security protocols to protect customer data and maintain system integrity.
Documentation:
Create and maintain detailed documentation for system architecture, processes, and operational procedures to assist in training and knowledge transfer.
Minimal Qualifications
Bachelors degree in Computer Science, Computer Engineering, Electrical Engineering, or equivalent
At least 5 years of experience in a software product development environment in the cloud
Excellent English proficiency and fluency to communicate with people at all levels.
Preferred Qualifications:
Experience in large, scalable distributed systems and/or SaaS solutions
Experience with Terraform, Docker, Helm, Kubernetes.
Scripting experience with language such as PowerShell, Shell scripting.
Working Experience in Azure cloud services, including Azure Resource Management and Terraform Provider for Azure RM.
Software development on Cloud platforms is a plus
Fluent in English - level required: C1
Direct experience with the Industrial Automation industry is a Plus
Benefits:
The ability to collaborate with, learn from colleagues in a complex, global organization.
We provide a working environment with a creative company, paired with a great compensation package, great benefits, and a supportive atmosphere where you can sharpen with new challenges and development opportunities.
Hybrid work-from-home and at a determined Rockwell Automation facility.
Corporate Social Responsibility opportunities,
Support from our 24/7 employee assistance program.