i
ITC Infotech
70 ITC Infotech Jobs
Site Reliability Engineer - Incident Management (3-5 yrs)
ITC Infotech
posted 22d ago
Flexible timing
Key skills for the job
Job Description :
- Partner with application developers and solution architects to ensure services are built for scale and performance.
- Lead setting service-level objectives, agreements and indicators (SLOs, SLAs and SLIs) for the underlying service by collaborating with Application Development, Product and Business Owners.
- Design, Develop and create Scripts/Software/Tools that will improve the reliability of systems in Production including fixing issues, responding to incidents and taking on-call responsibilities.
- Improve the overall resilience of a system and provide visibility to the health and performance of services across all applications and infrastructure.
- Improve service performance metrics like latency, page load speed and ETL and help proactively identify performance issues across the system.
- Implement monitoring solutions, create Dashboards and Alerts based on four golden signals of SRE providing single source to determine the overall performance and availability of the services they support.
- Writing, updating, and using documentation, including runbooks/playbooks.
- Automating work including infrastructure needs, testing, failover solutions, failure mitigation, and much more.
- Using Chaos Engineering to test what you build under real-world conditions.
- Spread information across DevOps and business teams encouraging a blameless culture focused on workflow visibility and collaboration.
- Root-cause analysis complex problems involving multiple parties, networks, hardware, and software that relate to scaling and performance.
- Services as technical owner to ensures delivery for SRE initiative.
- Performs deliverable reviews and coaches\' team in area of expertise in SRE.
- Provide continuous competitive and best-practices research, leverage industry resources and market trends, and liaise with internal stakeholders.
Functional Areas: Software/Testing/Networking
Read full job descriptionPrepare for Site Reliability Engineer roles with real interview advice