58 Spruce Infotech Jobs
Senior Site Reliability Lead - SaaS & Cloud Operations (8-10 yrs)
Spruce Infotech
posted 4d ago
Flexible timing
Key skills for the job
Responsibilities :
- Lead and manage a team of SRE/Operations engineers, fostering a collaborative and high-performing work environment.
- Oversee the support of our SaaS products and services on AWS, ensuring optimal uptime and performance.
- Implement and manage Infrastructure as Code (IaC) tools like Terraform and Helm to automate infrastructure provisioning and configuration (Ansible experience a plus).
- Manage and optimize our Kubernetes clusters (EKS) and CI/CD pipelines using ArgoCD or similar tools.
- Proactively monitor system health, identify and troubleshoot issues, and implement solutions to minimize downtime.
- Respond effectively to incidents, diagnose root causes, and lead the team in implementing swift resolutions.
- Develop and implement processes and procedures for efficient SRE operations.
- Stay up-to-date on the latest trends and advancements in cloud technologies, SRE best practices, and automation tools.
- Create a culture of knowledge sharing and provide mentorship to junior engineers to help them grow their skills.
Qualifications :
- 8+ years of experience in system administration, cloud operations, or a related field.
- Proven experience leading and managing a team of SRE/Operations engineers.
- Solid understanding of SaaS delivery models and support methodologies.
- In-depth knowledge of AWS cloud services and best practices.
- Expertise in Infrastructure as Code (IaC) tools like Terraform and Helm (Ansible experience a plus).
- Experience with Kubernetes (EKS) and CI/CD pipelines (ArgoCD or similar) is a strong plus.
- Excellent problem-solving, analytical, and troubleshooting skills.
- Strong leadership, communication, collaboration, and interpersonal skills.
- Passion for building and mentoring high-performing teams
Functional Areas: Other
Read full job descriptionPrepare for Spruce Infotech roles with real interview advice