16 Hornbill Studios Jobs
18-24 years
Noida, Pune, Bangalore / Bengaluru
1 vacancy
Director (SRE, DevOps)
Hornbill Studios
posted 12d ago
Flexible timing
Key skills for the job
Job Title: Director (SRE, DevOps, Monitoring, and Database Operations)
Key Responsibilities:
Leadership & Strategy:
• Provide technical and people leadership to SRE, DevOps, Monitoring, and Database
Operations teams.
• Collaborate with leadership on budgeting, planning, hiring, and managing third-party contracts.
• Oversee project status, assemble project teams, and define assignments with schedules and milestones.
Platform Reliability & Performance:
• Drive continuous improvement of reliability, stability, and performance of digital platforms.
• Oversee implementation of automated telemetry, observability, and applied intelligence systems.
• Lead efforts to develop automated alerting, self-healing mechanisms, and intelligent response systems.
Incident & Escalation Management:
• Ensure 24/7 uptime of sites and services, with minimal unplanned downtime.
• Serve as Escalation Manager/Critical Incident Manager during major incidents, leading teams in rapid service restoration.
• Provide on-call escalation support based on 24/7/365 schedules.
• Communicate timely updates and incident reports to senior leadership.
Collaboration & Integration:
• Partner with administrators, platform engineers, and other stakeholders to achieve
highly reliable infrastructure, systems, and integrations.
• Collaborate with product, application development, QA, and technology teams to
enhance service reliability and performance.
Incident Management & Automation:
• Provide advanced Incident and Problem Management support to effectively
diagnose, remediate, and resolve platform issues.
• Automate critical workflows across the platform to minimize manual errors and reduce
human intervention.
• Implement ITIL processes like Incident, Problem, and Change Management.
Monitoring & Scalability:
• Design and implement effective monitoring systems with proper alerting and
escalation mechanisms for critical events.
• Ensure timely capacity planning and infrastructure upgrades for optimal reliability.
• Develop and refine processes to minimize Mean Time to Recover (MTTR) and extend Mean Time to Failure (MTTF).
Documentation & Compliance:
• Create and maintain detailed documentation, including run books, incident response
guides, post-mortem reports, RCAs, and mitigation plans.
• Ensure all changes adhere to established procedures and documentation standards.
Business Alignment:
• Understand business workflows and map technology solutions to address problems effectively.
• Lead conversations and provide technical support to both internal and external customers.
Employment Type: Full Time, Permanent
Read full job description4-7 Yrs
₹ 8 - 16L/yr
Hyderabad / Secunderabad
5-10 Yrs
₹ 14 - 24L/yr
Hyderabad / Secunderabad, Chennai, Bangalore / Bengaluru