5 Commdel Consulting Services Jobs
Commdel - Site Reliability Engineer - Terraform/Ansible (7-10 yrs)
Commdel Consulting Services
posted 8d ago
Flexible timing
Key skills for the job
We are seeking a skilled and experienced Site Reliability Engineer (SRE) to join our team.
The ideal candidate will have hands-on experience with Kafka, Kubernetes, and MongoDB, and a proven ability to manage, monitor, and optimize these solutions.
You will play a key role in ensuring the reliability, scalability, and performance of our infrastructure and applications while contributing to failure isolation, resolution, and continuous improvement initiatives.
Responsibilities :
- Design, implement, and maintain highly available and resilient infrastructure for Kafka, Kubernetes, and MongoDB.
- Monitor end-to-end system performance, availability, and reliability, using modern observability tools and techniques.
- Develop and enhance monitoring, alerting, and logging frameworks to ensure effective failure detection and response.
- Perform root cause analysis and resolve incidents, ensuring minimal downtime and impact on end users.
- Automate repetitive tasks and processes, improving system efficiency and reducing manual intervention.
- Collaborate with software engineering teams to optimize application performance and deployment strategies.
- Define and implement best practices for capacity planning, scaling, and disaster recovery.
- Continuously improve system architecture, infrastructure as code (IaC), and deployment pipelines.
Requirements :
- Engineering degree, or a related technical discipline, or equivalent work experience.
- Demonstrated work experience on cloud platforms (e.g GCP, AWS).
- Proficient in Infra/ configuration technologies (e.g Terraform, Ansible).
- Mandatory Minimum of 3 years of experience on SRE (Site reliability Engineering)
- Knowledge of Cloud-based applications and Containerization Technologies.
- Experience coding in higher-level languages (e.g , Python, JavaScript, C++, or Java).
- Demonstrated understanding of observability and its configuration.
- Demonstrable fundamentals in 2 of the following: Computer Science, Cloud architecture, Security, and Network Design fundamentals.
- Demonstrated proficiency in Linux operating systems.(Experience, Education, Certification, License, and Training).
- Must have at least 6 years of hands-on experience working in the data center or Cloud systems infrastructure for large-scale customer-facing companies
Functional Areas: Software/Testing/Networking
Read full job descriptionPrepare for Site Reliability Engineer roles with real interview advice
4-7 Yrs