75 Okta Jobs
Okta - Staff Site Reliability Engineer - AWS Infrastructure (8-10 yrs)
Okta
posted 1d ago
Fixed timing
Key skills for the job
Position Overview :
The Staff Site Reliability Engineer (SRE) will play a key role in building and managing Kubernetes platforms that support cloud-native applications and services.
This position focuses on architecting and managing reliable, scalable, and secure Kubernetes-based platforms on AWS, ensuring high availability and performance while optimizing costs and automation.
The ideal candidate will have hands-on experience with AWS infrastructure, Kubernetes platform creation, Helm charts, Karpenter scaling, and Istio service mesh.
Key Responsibilities :
Kubernetes Platform Creation :
- Design, implement, and maintain highly available, scalable, and fault-tolerant Kubernetes platforms.
- Ensure clusters are optimized for production workloads, providing high resilience and operational efficiency.
AWS Infrastructure Management :
- Build, manage, and optimize AWS cloud infrastructure, including EKS,ECS, S3, VPCs, RDS, IAM, and more.
- Implement best practices for cost management, scaling, and security within AWS.
Helm Management :
- Utilize Helm to automate and streamline the deployment of applications and services to Kubernetes clusters.
- Create, maintain, and manage Helm charts for production-ready deployments.
Karpenter Implementation :
- Implement and manage Karpenter to dynamically scale Kubernetes clusters in response to workload demands.
Istio Service Mesh Management :
- Configure and manage Istio to provide service-to-service communication, security, and observability within the Kubernetes clusters.
- Enable fine-grained traffic management, service discovery, and policy enforcement.
Platform Automation & Scaling :
- Automate the deployment, scaling, and management of infrastructure and applications.
- Work with CI/CD pipelines to ensure a seamless flow from development to production with minimal downtime.
Incident Management & Troubleshooting :
- Respond to incidents, troubleshoot, and resolve system issues related to performance, availability, and security in a timely and effective manner.
Security & Compliance :
- Design and implement secure cloud infrastructure with appropriate access controls, network security, and compliance frameworks.
Required Qualifications :
- 5+ years of experience with Kubernetes/ K8s, Helm,Karpenter,Istio;
- 8+ years of Experience with infrastructure-as-code tools like Terraform, Chef or Ansible
- 8+ years of Experience with serverless computing (AWS Lambda, API Gateway) and microservices architecture.
- Proven experience with AWS (EKS, ECS, RDS, S3, CloudFormation, IAM, etc.) and solid understanding of cloud-native architectures.
- Strong expertise in Kubernetes platform creation, management, and optimization (e., setting up highly available clusters, networking, and storage).
- Hands-on experience with Helm for Kubernetes application deployment and management.
- Practical experience with Karpenter for dynamic scaling of Kubernetes clusters and optimizing resource usage.
- Expertise in managing and securing Istio for service mesh, including traffic management, security, and observability features.
- Proficiency in CI/CD pipelines and automation tools (e., Jenkins, GitLab, CircleCI, Terraform, Spinnaker, Ansible).
- Strong scripting and automation skills in Python or Go for infrastructure management and platform automation.
- Experience with monitoring, logging, and alerting tools such as Prometheus, Grafana, CloudWatch, and ELK Stack.
Preferred Qualifications :
- Experience with multi-region cloud environments.
- Understanding of security best practices for cloud platforms and Kubernetes (e., role-based access control (RBAC), encryption, and compliance frameworks).
- Familiarity with Docker and containerization principles.
- Bachelor's degree in Computer Science, Engineering, or related field (or equivalent professional experience).
Certifications (Preferred) : CKA (Certified Kubernetes Administrator), CKAD (Certified Kubernetes Application Developer), or AWS Certified DevOps Engineer are highly desirable
Functional Areas: Software/Testing/Networking
Read full job descriptionPrepare for Okta Site Reliability Engineer roles with real interview advice
8-10 Yrs
5-10 Yrs
Bangalore / Bengaluru