i
Infosys
2426 Infosys Jobs
SRE Engineer
Infosys
posted 6d ago
Flexible timing
Key skills for the job
Key Responsibilities, Command Center Design & Implementation Architect and implement a centralized command center that provides comprehensive visibility into both infrastructure and application layers Establish standardized operational procedures, runbooks, and escalation protocols for incident management Design and implement monitoring solutions that provide real-time insights into system health, performance metrics, and business KPIs Operations Management: Lead the development of automated remediation solutions for common operational issues Implement and maintain SLOs/SLIs across critical services and applications Drive continuous improvement in incident response times and system reliability metrics Collaborate with development teams to ensure applications are designed with operational excellence in mind Tool Development & Integration: Develop and maintain monitoring dashboards that provide actionable insights for both technical and non-technical stakeholders Implement and customize monitoring tools for infrastructure and application performance monitoring Create automation scripts and tools to streamline operational processes Integrate various monitoring and alerting systems to provide a unified view of system health Leadership & Collaboration: Mentor junior engineers in SRE practices and command center operations Collaborate with security, development, and infrastructure teams to ensure comprehensive monitoring coverage Partner with business stakeholders to align monitoring strategies with business objectives Lead post-incident reviews and drive implementation of learned improvements Preferred Qualifications: Experience in designing and implementing enterprise-scale command centers Knowledge of AIOps and machine learning for IT operations Certification in relevant cloud platforms or technologies is good to have Experience with chaos engineering and resilience testing Background in implementing ITIL practices across any of the IT services Bachelors degree in Computer Science, Engineering, or related field 5+ years of experience in Site Reliability Engineering or similar roles Strong experience with cloud platforms (AWS/Azure/GCP) and infrastructure-as-code Extensive knowledge of monitoring tools (e.g., Prometheus, Grafana, ELK Stack) Proficiency in at least one programming language (Python, Go, or Java preferred) Experience with containerization and orchestration (Docker, Kubernetes) Strong understanding of networking, system design, and distributed systems Excellent problem-solving and analytical abilities Strong communication skills and ability to work with cross-functional teams Experience in incident management and on-call rotations Proven track record of improving system reliability and performance Ability to handle high-pressure situations and make quick decisions Strong documentation and technical writing skills
Employment Type: Full Time, Permanent
Read full job descriptionPrepare for Infosys roles with real interview advice
3-8 Yrs
Hyderabad / Secunderabad, Chennai, Bangalore / Bengaluru
3-8 Yrs
Hyderabad / Secunderabad, Chennai, Bangalore / Bengaluru
5-10 Yrs
Noida, Coimbatore, Bangalore / Bengaluru