Upload Button Icon Add office photos
Engaged Employer

i

This company page is being actively managed by W Energy Software Team. If you also belong to the team, you can get access from here

W Energy Software Verified Tick

Compare button icon Compare button icon Compare
2.7

based on 4 Reviews

filter salaries All Filters

3 W Energy Software Jobs

W Energy Software - Site Reliability Engineer - Grafana/Prometheus (4-6 yrs)

4-6 years

W Energy Software - Site Reliability Engineer - Grafana/Prometheus (4-6 yrs)

W Energy Software

posted 12d ago

Job Description

Site Reliability Engineer (SRE)

Description :

We seek an experienced Site Reliability Engineer (SRE) to ensure our production systems' reliability, scalability, and performance.

This role will leverage monitoring and metrics tools such as Azure Metrics, Grafana, and Prometheus to identify and resolve performance issues proactively.

You will work closely with engineering teams to maintain high system availability, manage incidents, and optimize production environments for seamless operations.

As an SRE, you'll have the autonomy to make critical decisions while receiving support and guidance as needed.

We value intelligence, creativity, and curiosity, and we're committed to providing opportunities for growth and learning.

Our team fosters a flexible, respectful work environment where collaboration and communication are encouraged at all levels, including with the executive team.

As part of our Infrastructure team, your contributions will be critical to our ongoing success and deeply appreciated.

Key Responsibilities :

Production Reliability :

- Maintain and enhance the reliability and availability of production systems, ensuring fault tolerance and minimal downtime.

- Design and support high-availability configurations, including database clustering and read replication.

Incident Response :

- Respond to and resolve production incidents in real time, leveraging monitoring tools to diagnose and address issues effectively.

Performance Management :

- Use metrics from Azure Monitor, Grafana, and Prometheus to identify and resolve performance bottlenecks across applications, infrastructure, and databases.

- Implement optimizations to improve overall system efficiency based on performance data.

Change Management :

- Plan and execute system changes with a focus on minimizing risk and maintaining operational stability.

Monitoring & Metrics Collection :

- Develop and maintain monitoring systems to collect real-time infrastructure and application metrics.

- Create and refine Grafana dashboards to visualize system health and performance effectively.

Troubleshooting & Root Cause Analysis (RCA) :

- Conduct thorough investigations into production issues, analyzing system metrics and logs to identify root causes.

- Document and implement permanent fixes to prevent issue recurrence.

Collaboration :

- Collaborate with engineering teams to address real-time alerts, performance anomalies, and application behavior issues.

Requirements :

- Strong expertise with monitoring and metrics tools, including Azure Monitor, Grafana, and Prometheus.

- Proficiency in SQL Server administration, including performance tuning, clustering, and read replication.

- Solid experience in real-time monitoring and performance troubleshooting within cloud environments.

- Proficiency in Linux/Unix and Windows system administration.

- Experience with cloud platforms (e.g, Azure, AWS, or GCP) for deploying and scaling production systems.

- Strong scripting/programming skills (e.g, Python, PowerShell, or Bash).

- Knowledge of infrastructure-as-code tools (e.g, Terraform, Ansible).

- Administer and manage user accounts using Active Directory (AD).

- Integrate authentication systems with Single Sign-On (SSO) solutions (e.g, SAML, OAuth, OpenID Connect)

Experience :


- Bachelor's degree in engineering, computer science, accounting, finance, MIS, or a related field.

- 4-6 years of experience working with cloud systems (e.g, AWS/Azure) and SaaS environments.

- 3+ years of experience in Site Reliability Engineering, DevOps, or similar roles.

- Proven track record of diagnosing and resolving performance issues using monitoring tools like Grafana, Prometheus, or DataDog.

- Strong analytical and problem-solving skills.

- Excellent written and verbal communication skills, with the ability to collaborate effectively across teams.

- Detail-oriented with a proactive approach to maintaining production reliability.

- Ability to manage multiple priorities and tasks effectively.

Preferred Qualifications :

- AWS certification (Solution Architect or Cloud Practitioner).

- Experience with automation tools such as Jenkins, Ansible, and Terraform.

- Database administration experience.

- Networking experience, including VPNs and routing.

- Security knowledge related to AWS/Azure.

Working Hours :

- This role involves a rotational shift schedule, including night, morning, and regular day shifts


Functional Areas: Software/Testing/Networking

Read full job description

Prepare for Site Reliability Engineer roles with real interview advice

People are getting interviews at W Energy Software through

(based on 7 W Energy Software interviews)
Company Website
Job Portal
Referral
57%
29%
14%
Moderate Confidence
?
Moderate Confidence means the data is based on a sufficient number of responses received from the candidates

What people at W Energy Software are saying

What W Energy Software employees are saying about work life

based on 4 employees
100%
100%
100%
Flexible timing
Monday to Friday
No travel
View more insights

W Energy Software Benefits

Free Transport
Child care
Gymnasium
Cafeteria
Work From Home
Free Food +6 more
View more benefits

Compare W Energy Software with

Energy Exemplar

4.4
Compare

Energy Solutions International

3.9
Compare

TCS

3.7
Compare

Accenture

3.9
Compare

Cognizant

3.8
Compare

Wipro

3.7
Compare

Capgemini

3.8
Compare

HDFC Bank

3.9
Compare

ICICI Bank

4.0
Compare

Infosys

3.7
Compare

HCLTech

3.6
Compare

Tech Mahindra

3.6
Compare

Genpact

3.9
Compare

Teleperformance

3.9
Compare

Concentrix Corporation

3.8
Compare

Axis Bank

3.8
Compare

Amazon

4.1
Compare

Jio

3.9
Compare

Reliance Retail

3.9
Compare

IBM

4.1
Compare

Similar Jobs for you

Site Reliability Engineer at Collabera

5-8 Yrs

₹ 12-26 LPA

Site Reliability Engineer at Azilen Technologiues

3-5 Yrs

₹ 12-15 LPA

Site Reliability Engineer at QURE AI TECHNOLOGIES PRIVATE LIMITED

Bangalore / Bengaluru

2-5 Yrs

₹ 10-25 LPA

Site Reliability Engineer at Idemia Syscom India Pvt ltd

3-6 Yrs

₹ 12-17 LPA

Site Reliability Engineer at FatakPay Digital Pvt. Ltd.

Mumbai

3-5 Yrs

₹ 18-30 LPA

Senior Site Reliability Engineer at Zycus

3-6 Yrs

₹ 10-20 LPA

Site Reliability Engineer at HireXtra

4-7 Yrs

₹ 12-20 LPA

Site Reliability Engineer at Arting Digital

7-12 Yrs

₹ 20-25 LPA

Site Reliability Engineer at FLEXERA SOFTWARE INDIA LLP

3-6 Yrs

₹ 12-18 LPA

Site Reliability Engineer at ITC Infotech India Ltd

3-5 Yrs

₹ 12-15 LPA

Product Owner

1-6 Yrs

Bangalore / Bengaluru

9d ago·via naukri.com

Site Reliability Engineer ( SRE )

4-6 Yrs

Remote

17d ago·via naukri.com
write
Share an Interview