Home
Communities
Companies
- Companies
  
  Discover best places to work
- Compare Companies
  
  Compare & find best workplace
- Add Office Photos
  
  Bring your workplace to life
- Add Company Benefits
  
  Highlight your company's perks
Reviews
- Company reviews
  
  Read reviews for 6L+ companies
- Write a review
  
  Rate your former or current company
Salaries
- Browse salaries
  
  Discover salaries for 6L+ companies
- Salary calculator
  
  Calculate your take home salary
- Are you paid fairly?
  
  Check your market value
- Share your salary
  
  Help other jobseekers
- Gratuity calculator
  
  Check your gratuity amount
- HRA calculator
  
  Check how much of your HRA is tax-free
- Salary hike calculator
  
  Check your salary hike
Interviews
- Company interviews
  
  Read interviews for 40K+ companies
- Campus placements
  
  Interviews questions for 2K+ colleges
- Share interview questions
  
  Contribute your interview questions
Jobs
Awards

WINNERS AWAITED!
- ABECA 2025
  
  WINNERS AWAITED!
  
  AmbitionBox Employee Choice Awards - 4th Edition
- ABECA 2024
  
  AmbitionBox Employee Choice Awards - 3rd Edition
- AmbitionBox Best Places to Work 2022
  
  2nd Edition
- AmbitionBox Best Places to Work 2021
  
  1st Edition

Add office photos

Employer? Claim Account for FREE

Natobotics

Compare

3.8

based on 5 Reviews

110 Natobotics Jobs

Site Reliability Engineer - Terraform/Ansible (4-8 yrs)

Natobotics Technologies Pvt Limited

3.8

based on 5 Reviews

4-8 years

Natobotics

posted 10d ago

Job Role Insights

Key skills for the job

Software Configuration Management DevOps Python Linux Administration Linux System Administration Site Reliability Engineering

+ 3 more

Job Description

Responsibilities :

- Develop and maintain infrastructure as code (IaC) using Terraform for provisioning and managing cloud resources.

- Automate configuration management and deployment processes using Ansible playbooks and roles.

- Design and implement reusable Terraform modules and Ansible roles to improve efficiency and consistency.

- Manage and maintain version control of infrastructure code using Git.

- Design and implement comprehensive monitoring and alerting solutions to ensure system health and performance.

- Utilize observability tools (Prometheus, Grafana, ELK stack) to gather and analyze metrics, logs, and traces.

- Define and implement service level objectives (SLOs) and service level indicators (SLIs).

- Develop and maintain dashboards and alerts to proactively identify and resolve issues.

- Design and implement high availability and fault-tolerant infrastructure solutions.

- Implement disaster recovery and business continuity plans.

- Identify and eliminate single points of failure.

- Perform capacity planning and performance tuning.

- Identify and automate repetitive tasks and manual processes to reduce toil.

- Develop and maintain automation scripts and tools to improve operational efficiency.

- Continuously improve infrastructure and operational processes.

- Participate in on-call rotations and respond to incidents and alerts.

- Troubleshoot and resolve complex infrastructure and application issues.

- Conduct post-incident reviews and implement corrective actions.

- Collaborate with development, operations, and other teams to ensure smooth deployments and operations.

- Communicate technical concepts clearly and concisely.

- Document infrastructure designs, configurations, and procedures.

- Implement and maintain security best practices for infrastructure and applications.

- Ensure compliance with relevant industry standards and regulations.

Required Skills & Qualifications :

- Experience : 4+ years of experience in Site Reliability Engineering or a related role.

- Infrastructure as Code (IaC) : Strong experience with Terraform for infrastructure provisioning and management.

- Configuration Management : Proficiency in Ansible for configuration management and automation.

- Observability : Experience with observability tools and techniques (Prometheus, Grafana, ELK stack).

- Monitoring and Alerting : Experience in designing and implementing monitoring and alerting systems.

- High Availability : Understanding of high availability and fault-tolerant architectures.

- Scripting : Proficiency in scripting languages (Python, Bash).

- Version Control : Experience with Git.

- Cloud Platforms : Experience with cloud platforms (AWS, Azure, GCP).

- Linux Systems Administration : Strong understanding of Linux systems administration.

- Networking : Basic understanding of networking concepts.

- Problem-Solving : Excellent problem-solving and troubleshooting skills.

- Communication : Strong communication and collaboration skills.

Preferred Qualifications :

- Experience with Kubernetes and container orchestration.

- Experience with CI/CD pipelines.

- Experience with database administration.

- Relevant certifications (AWS Certified DevOps Engineer, Certified Kubernetes Administrator).

- Experience with security tools and practices