We are looking for Site Reliability Engineers who can manage, maintain and troubleshoot Alkiras world class cloud networking solution round the clock. In this role, you will w ork in a product company where you get to sharpen your existing skills and get exposed to a wide range of technologies and constructs ranging from microservices, devops methodologies, Kubernetes, Terraform, data networking and security.
Responsibilities:
You will be responsible for the availability and integrity of the infrastructure that underpins Alkira s Cloud Networking platform
You hold the production systems together; troubleshoot issues that arise in production deployment
Provide 24x7 coverage as a part of scheduled shift and on-call rotation
Work with multiple tools like Prometheus, Grafana, Jira etc. to monitor, manage, triage and document infrastructure issues in real time
Automate infrastructure deployment using CI/CD
Build necessary tools to evolve how we maintain and monitor our solution
Develop and execute system and integration test plans
Requirements:
At least 2 years experience in management of production systems
Self starter and a solution oriented mindset. You see potential challenges as opportunities to learn and grow
Experience with cloud providers, AWS, Azure or GCP
Experience with computer networking and network technologies
Experience with CI/CD pipelines such as Concourse-CI, Jenkins.
Experience with Kubernetes
Excellent problem-solving skills and ability to quickly grasp new concepts
Highly desirable candidates with Hashicorp Certified: Terraform Associate