54 Fatakpay Digital Jobs
3-5 years
Mumbai
Site Reliability Engineer - Incident Management (3-5 yrs)
Fatakpay Digital
posted 1mon ago
Fixed timing
Key skills for the job
Job Summary :
We are looking for a Site Reliability Engineer to help ensure the reliability, scalability, and performance of our systems. You will focus on monitoring, incident management, and continuous improvement of our infrastructure.
Responsibilities :
- Monitor system health and uptime using industry-standard tools.
- Design and implement incident management processes.
- Optimize system performance and ensure uptime.
- Collaborate with developers to improve system design for reliability.
- Automate repetitive tasks and processes for greater efficiency.
Skills & Requirements :
- 3+ years of experience as an SRE or in a similar role.
- Strong understanding of monitoring and logging tools (Prometheus, ELK, etc.).
- Experience in incident management and root cause analysis.
- Proficiency with scripting and automation (Python, Shell, etc.).
- Good understanding of cloud platforms (GCP preferred).
- Strong problem-solving skills and a passion for improving systems.
Functional Areas: Software/Testing/Networking
Read full job descriptionPrepare for Site Reliability Engineer roles with real interview advice