i
Shadow Placements
8 Shadow Placements Jobs
Site Reliability Engineer - Cloud Platform (6-10 yrs)
Shadow Placements
posted 14hr ago
Key skills for the job
Job Title : Site Reliability Engineer
Location : Bangalore (Hybrid).
Duration : 6-month Contract.
Job Description :
As a Site Reliability Engineer (SRE), you will be responsible for ensuring the availability, scalability, and reliability of cloud-based systems. You will design, develop, and deploy scalable solutions using cloud platforms (AWS, Azure, GCP), containerization (Docker, Kubernetes), and CI/CD pipelines.
You will also focus on monitoring, automation, and incident response, ensuring seamless operation of critical business applications.
Key Responsibilities :
- Design and implement scalable, fault-tolerant, and high-availability infrastructure using AWS, Azure, or GCP.
- Develop automated solutions to improve system reliability, monitoring, and alerting.
- Optimize cloud-native applications for performance and cost efficiency.
- Deploy and manage containerized applications using Docker and Kubernetes.
- Implement Kubernetes-based orchestration and workload management.
- Configure and maintain Helm charts for application deployments.
- Build and maintain CI/CD pipelines using GitHub Actions, ArgoCD, Harness.io, or GitLab CI.
- Automate software deployments and infrastructure provisioning with Infrastructure as Code (IaC) tools.
- Collaborate with DevOps teams to improve deployment strategies and rollback mechanisms.
- Set up monitoring and observability tools (e.g., Prometheus, Grafana, ELK stack, Datadog, New Relic).
- Define SLIs, SLOs, and error budgets to measure system reliability.
- Respond to incidents and perform root cause analysis (RCA) for system failures.
- Ensure security best practices in cloud environments, including IAM policies, role-based access controls (RBAC), and network security.
- Implement logging, auditing, and compliance requirements for cloud platforms.
- Maintain disaster recovery and backup strategies.
Required Skills & Experience :
- 6+ years of experience in Site Reliability Engineering (SRE).
- Proficiency in Go or Python for automation and scripting.
- Strong expertise in cloud platforms AWS, Azure, GCP.
- Deep knowledge of containerization Docker, Kubernetes and orchestration tools.
- GitHub Actions, ArgoCD, Harness.io, GitLab CI.
- Knowledge of monitoring tools : Prometheus, Grafana, ELK Stack, Datadog, or similar.
- Hands-on experience with Infrastructure as Code (IaC): Terraform, CloudFormation.
- Certified Kubernetes Application Developer
- Certified Kubernetes Security Specialist
- Experience with service meshes (Istio, Linkerd).
- Knowledge of serverless architectures (AWS Lambda, Azure Functions).
- Experience in incident response and chaos engineering.
- Work on cutting-edge cloud and containerization technologies.
- Collaborate with a highly skilled team in an Agile environment.
- Opportunity to work on a high-impact, global-scale project.
- Competitive contract-based compensation.
Functional Areas: Software/Testing/Networking
Read full job description4-7 Yrs