3 Travash Software Solutions Jobs
6-8 years
Site Reliability Engineer - AWS Cloud Services (6-8 yrs)
Travash Software Solutions
posted 20d ago
Flexible timing
Key skills for the job
Role : Site Reliability Engineer
Location : Hyderabad
Job Type : Full-Time
Experience Level : Senior (5-8 years)
About the Role :
We are looking for a seasoned Senior Site Reliability Engineer (SRE) with 5-8 years of experience in cloud infrastructure and reliability engineering. In this role, you will contribute to building and managing highly reliable, scalable, and secure systems while collaborating across teams to embed reliability practices into the development lifecycle.
Key Responsibilities :
1. Infrastructure Design and Deployment :
- Design and implement scalable, reliable, and fault-tolerant cloud architectures using AWS services (e.g., EC2, S3, Lambda).
- Support automation of infrastructure provisioning and management for streamlined deployments.
2. Monitoring and Observability :
- Implement real-time monitoring to track application and infrastructure health.
- Ensure observability practices are in place to identify and resolve issues efficiently.
3. Incident Management :
- Actively participate in incident response, ensuring minimal service disruption and fast recovery.
- Perform post-incident analysis to identify root causes and recommend preventive measures.
4. Security and Compliance :
- Apply security best practices to cloud infrastructure to protect data and applications.
- Ensure compliance with relevant standards and frameworks (e.g., SOC 2, ISO 27001).
5. Collaboration and Training :
- Collaborate with development, operations, and other teams to ensure reliability is prioritized.
- Share knowledge and mentor peers on best practices in reliability engineering.
6. Performance Optimization :
- Analyze and optimize system performance to improve efficiency and reduce latency.
- Conduct capacity planning to prepare infrastructure for future growth and demand.
7. Disaster Recovery and Backup :
- Contribute to the development and maintenance of disaster recovery plans.
- Implement backup solutions to safeguard critical data and maintain business continuity.
Qualifications :
- Experience: 5-8 years of experience in Site Reliability Engineering, Cloud Engineering, or a similar role.
Technical Skills :
- Proficient with AWS services (e.g., EC2, S3, Lambda) and cloud architecture.
- Hands-on experience with monitoring tools like CloudWatch, Grafana, or Datadog.
- Strong skills in scripting and automation (e.g., Python, Bash, Terraform).
- Problem-Solving : Strong troubleshooting and root cause analysis abilities.
- Collaboration : Ability to work effectively in cross-functional teams.
- Security Awareness : Knowledge of security best practices and compliance standards.
What We Offer :
- Opportunities for career growth and ongoing learning.
- A collaborative and innovative work environment.
Functional Areas: Software/Testing/Networking
Read full job description