i
Cloudologic
6 Cloudologic Jobs
Senior Site Reliability Engineer - DevOps (5-7 yrs)
Cloudologic
posted 11d ago
Flexible timing
Key skills for the job
Company Description :
Cloudologic is a prominent cloud consulting and IT service provider based in Singapore and rooted in India, focusing on cloud operations, cyber security, and managed services. With a decade of expertise, our dedication to delivering high-quality services has earned the trust of clients worldwide, making us a valued partner in the tech industry.
Role Description :
This is a full-time onsite role for a Senior Site Reliability Engineer at Cloudologic. The SRE Specialist will be responsible for troubleshooting, software development, system administration, and infrastructure maintenance. While the role is based in Gurgaon, remote work is acceptable.
System Reliability & Performance :
- Ensure high availability, reliability, and scalability of services.
- Implement SLOs (Service Level Objectives) and SLIs (Service Level Indicators).
- Monitor system performance and proactively address bottlenecks.
Incident Management & Troubleshooting :
- Respond to incidents, conduct root cause analysis (RCA), and implement fixes.
- Develop and improve monitoring, alerting, and diagnostic tools.
- Conduct blameless postmortems to improve system resilience.
- Automation & Infrastructure as Code (IaC). Automate deployments, scaling, and recovery processes.
- Manage infrastructure using tools like Terraform, Ansible, or Kubernetes.
- Implement CI/CD pipelines for seamless software releases.
Observability & Monitoring :
- Use monitoring tools (e.g., Prometheus, Grafana, Datadog, Splunk, ELK) to track system health.
- Define and maintain dashboards and alerts for proactive system monitoring.
- Security & Compliance. Implement security best practices, vulnerability scanning, and patch management
- Ensure compliance with regulatory requirements (GDPR, ISO 27001, etc.).
- Conduct security audits and risk assessments.
Capacity Planning & Cost Optimization :
- Forecast system demands and scale infrastructure accordingly.
- Optimize cloud costs by managing resource utilization efficiently.
- Work with development teams to build cost-effective solutions.
- Collaboration & Documentation. Work closely with developers, DevOps, and IT teams to improve system reliability.
- Document processes, best practices, and incident response playbooks.
- Participate in on-call rotations and knowledge-sharing sessions.
Qualifications :
- Bachelor's degree in Computer Science, Engineering, or a related field (or equivalent experience).
- 5+ years of experience in a Site Reliability Engineering, DevOps, or similar role.
- Strong understanding of system reliability, performance, and scalability principles.
- Proficiency in scripting languages (e.g., Python, Bash) and automation tools.
- Experience with Infrastructure as Code (IaC) tools (e.g., Terraform, Ansible, Kubernetes).
- Expertise in monitoring and logging tools (e.g., Prometheus, Grafana, Datadog, Splunk, ELK).
- Solid understanding of cloud platforms (AWS, Azure, GCP).
- Experience with CI/CD pipelines and software release management.
- Strong problem-solving and troubleshooting skills.
- Excellent communication and collaboration skills.
- Knowledge of security best practices and compliance requirements. -
Preferred Qualifications :
- Experience with containerization and orchestration technologies (Docker, Kubernetes).
- Experience with database administration and optimization.
- Relevant certifications (e.g., AWS Certified DevOps Engineer, Google Cloud Certified Professional Cloud DevOps Engineer).
Functional Areas: Software/Testing/Networking
Read full job descriptionPrepare for Senior Site Reliability Engineer roles with real interview advice