i
Arting Digital
176 Arting Digital Jobs
Site Reliability Engineer (7-12 yrs)
Arting Digital
posted 1mon ago
Flexible timing
Key skills for the job
Posting title : Site Reliability Engineer
Experience : 7+ Years
Location : Bangalore
Work mode : WFO
Primary skills : Cloud Monitoring & Operations (GCP & Azure), Python, ServiceNow
Qualification : Any Engineering/ Computers degree
Roles & Responsibilities :
Daily Operations & Monitoring :
- Actively monitor systems, applications, and infrastructure across cloud environments (GCP & Azure).
- Ensure that service levels, such as uptime and performance, meet the expected standards.
Support Tickets & Issue Resolution :
- Work on support tickets raised by platform users, addressing technical problems and providing timely solutions to ensure smooth operations.
Incident Management :
- Lead the management and resolution of incidents, minimizing downtime and ensuring quick recovery.
- Manage the incident lifecycle from detection to resolution, coordinating across teams as necessary.
Root Cause Analysis & Problem Management :
- Perform root cause analysis for incidents and recurring problems to prevent future occurrences. Document findings and implement preventive measures to maintain service reliability.
Automation & Optimization :
- Write scripts and automation tools (primarily using Python) to reduce manual intervention and optimize operational tasks, driving efficiency and consistency.
Cloud Monitoring & Operations (GCP & Azure) :
- Leverage your expertise in cloud technologies to monitor and manage resources in GCP and Azure environments.
- Ensure seamless integration, configuration, and scaling of cloud services.
ServiceNow Integration :
- Use ServiceNow for managing and tracking incidents, requests, and changes. Ensure proper documentation and ticket management following ITIL best practices.
Collaboration with Cross-functional Teams :
- Work closely with development, operations, and other engineering teams to maintain a unified approach to platform reliability and performance. Provide inputs for continuous improvement of the platform and processes.
Required Skills & Qualifications :
Cloud Monitoring & Operations :
- Proven experience in managing operations across Google Cloud Platform (GCP) and Microsoft Azure.
- Hands-on experience with cloud monitoring tools and techniques.
Incident Management :
- Experience in leading incident response efforts, coordinating across teams, and minimizing the impact of outages.
Scripting & Automation :
- Proficiency in Python for automation tasks. Knowledge of other scripting languages is a plus.
ServiceNow :
- Familiarity with ServiceNow for incident tracking and service management.
- Ability to integrate ServiceNow into existing operational workflows.
Problem-solving & Analytical Thinking :
- Strong skills in root cause analysis, problem management, and preventive maintenance.
Communication & Collaboration :
- Excellent communication skills with a focus on cross-team collaboration, customer service, and continuous improvement.
Functional Areas: Software/Testing/Networking
Read full job description