6 Okta Jobs
Okta - Senior Site Reliability Engineer (4-7 yrs)
Okta
posted 5d ago
Fixed timing
Key skills for the job
Key Responsibilities :
Production Infrastructure Management :
- Build, operate, and monitor Okta's production infrastructure, ensuring its security, scalability, and reliability.
- Respond to production incidents, conducting root cause analysis and implementing preventive measures to avoid future occurrences.
- Troubleshoot complex production issues to ensure high availability, reliability, and performance of systems.
Security Advocacy & Best Practices :
- Act as an evangelist for security best practices across the engineering organization.
- Lead initiatives and projects aimed at strengthening the company's security posture for critical infrastructure, ensuring compliance with industry standards.
- Promote and apply best practices for building scalable, secure, and reliable services across engineering teams.
Incident Response & Automation :
- Quickly respond to incidents, ensuring that all critical issues are resolved promptly.
- Continuously evolve and automate manual processes to improve operational efficiency and security.
- Develop and maintain technical documentation, runbooks, and procedures to ensure seamless incident response and system management.
Monitoring & Tooling :
- Develop and improve monitoring tools and platforms to track the health and performance of production systems.
- Identify potential security risks and apply the appropriate mitigations proactively.
- Collaborate with other teams to ensure optimal configuration and deployment of operational security tools.
On-Call Support & 24/7 Operations :
- Support a 24x7 online environment by participating in an on-call rotation to respond to critical incidents.
- Ensure systems are configured for high availability and resilience in production environments.
Ideal Candidate Profile :
You are the ideal candidate if you :
- Are always willing to go the extra mile to identify and fix problems, especially those that affect production security.
- Have extensive experience automating, securing, and running large-scale production IAM and containerized services across cloud platforms like AWS (EC2, ECS, KMS, Kinesis, RDS), GCP (GKE, GCE), or others.
- Possess strong knowledge of CI/CD principles, Linux fundamentals, OS hardening, networking concepts, and IP protocols.
- Are proficient in configuration management tools such as Chef and Terraform.
- Have hands-on experience with operational scripting languages such as Ruby, Python, Go, and shell, as well as source control systems.
- Have experience with industry-standard security tools like Nessus, Qualys, OSQuery, Splunk, etc.
- Are familiar with Public Key Infrastructure (PKI) and secrets management.
Bonus Points :
- Experience conducting threat assessments and evaluating vulnerabilities in high-availability settings.
- Familiarity with MySQL (replication and clustering strategies) and data stores like DynamoDB, Redis, and Elasticsearch.
Required Knowledge, Skills, and Abilities :
- 3+ years of experience architecting and managing complex cloud infrastructure (e.g, AWS, GCP).
- 3+ years of experience working with Chef and Terraform for configuration management.
- Strong troubleshooting skills with a deep understanding of Linux systems.
- A solid security background and knowledge, particularly in infrastructure security.
- Bachelor's degree in Computer Science or equivalent experience.
What We Offer :
- An opportunity to work at the cutting edge of cloud computing security within a rapidly growing company.
- A collaborative and innovative environment where your contributions will have a direct impact on the company's security and infrastructure.
- Competitive compensation package with opportunities for career growth and advancement.
- A fast-paced, dynamic work environment where you can grow your skills and make a difference
Functional Areas: Software/Testing/Networking
Read full job descriptionPrepare for Senior Site Reliability Engineer roles with real interview advice
5-7 Yrs
5-6 Yrs