i
Coschool
2 Coschool Jobs
Coschool - Senior Site Reliability Engineer - Production Systems (5-7 yrs)
Coschool
posted 7d ago
Flexible timing
Key skills for the job
Job Description : Senior Site Reliability Engineer.
Responsibilities :
- Monitor and troubleshoot issues related to system performance, reliability, and security.
- Define and implement Service Level Indicators (SLIs), Service Level Objectives (SLOs), and.
- Error Budgets to measure and improve service reliability.
- Analyze and report on metrics and trace data using Grafana and Prometheus.
- Participate in an on-call rotation to provide 24/7 support for critical production systems.
- Evaluate and automate manual and repetitive tasks to reduce toil and improve system efficiency.
- Design and manage infrastructure using tools like Terraform, Pulumi, Ansible.
- Implement and manage security measures to protect infrastructure and data.
- Coordinate between developers and operations to ensure smooth software releases and timely resolution of production issues.
- Conduct thorough root cause analysis (RCA) of production incidents and implement preventive measures.
- Review and optimize system performance, identify bottlenecks, and implement capacity planning and recovery strategies.
- Maintain comprehensive documentation of systems, processes, and incident responses.
- Continuously seek and implement improvements to infrastructure, processes, and tools to enhance system reliability and performance.
- Monitor and manage production workloads to ensure optimal resource utilization and scalability.
- Establish incident management processes, including escalation protocols, to minimize downtime and ensure quick recovery.
- Collaborate with stakeholders to prioritize and schedule production deployments while minimizing user impact.
- Implement robust change management processes to ensure safe and controlled updates to production systems.
- Oversee and enforce compliance with production readiness standards for all software and infrastructure changes.
Requirements :
- Bachelor's degree in computer science, Engineering, or related field (or equivalent work experience).
- Strong experience with CI/CD tools such as Jenkins, GitHub CI/CD.
- Hands-on experience with infrastructure as code tools such as Terraform, Ansible, or CloudFormation, and containerization and orchestration platforms such as Docker and Kubernetes.
- Experience with any one of cloud platforms such as AWS, Azure, or Google Cloud.
- Platform, and proficiency in cloud services and resources.
- Strong understanding of database management systems (DBMS) such as PostgreSQL, MongoDB for managing database clusters efficiently.
- Proficiency in database performance tuning and optimization to enhance query performance and overall system efficiency.
- Familiarity with observability best practices and tools, such as Prometheus, Grafana, or ELK Stack.
- Proficient in scripting languages, particularly Python.
- Knowledge of security best practices for infrastructure and application deployments.
- Experience with high availability (HA) and disaster recovery (DR) solutions for database clusters to ensure continuous operation and data protection.
- Strong problem-solving skills and attention to detail, with the ability to analyze complex systems and identify areas for improvement.
- Excellent collaboration skills with the ability to work effectively in cross-functional teams and facilitate collaboration between development and operations teams.
Nice to Have :
- Experience in managing analytics applications using Hadoop, Spark, and Flink.
- Exposure to MLOps, LLMOps.
- Exposure to SOC2, PCI-DSS, or similar compliance frameworks.
About Coschool :
- Coschool is a pioneering Generative AI-based edtech startup focused on providing innovative educational solutions to schools across India.
- With a suite of solutions designed for teachers, students, and schools, Coschool is dedicated to transforming the educational landscape by enhancing learning experiences.
- Coschool was also awarded as LinkedIn Top Startup of 2024.
Functional Areas: Software/Testing/Networking
Read full job description5-7 Yrs