i
Peoplefy Infosolutions
24 Peoplefy Infosolutions Jobs
Site Reliability Engineer - Terraform/Ansible (3-5 yrs)
Peoplefy Infosolutions
posted 5d ago
Fixed timing
Key skills for the job
Responsibilities :
- Design, build, and maintain highly available and scalable infrastructure on cloud platforms (AWS, Azure, GCP).
- Implement and manage CI/CD pipelines using Jenkins, GitLab CI/CD, or other relevant tools.
- Automate infrastructure provisioning and management using Terraform, Ansible, or Puppet.
- Monitor system performance and proactively identify and resolve issues using tools like Prometheus, Grafana, and ELK stack.
- Troubleshoot and resolve production issues quickly and effectively.
- Participate in on-call rotations and provide 24/7 support for critical systems.
- Collaborate with software engineers to improve the reliability and performance of applications.
- Implement and maintain monitoring and alerting systems.
- Implement and maintain security best practices and controls.
- Ensure compliance with security and compliance regulations.
- Conduct security audits and penetration testing.
- Contribute to the development and improvement of SRE best practices and processes.
- Automate routine tasks and improve operational efficiency.
- Participate in incident response and post-mortem analysis.
- Stay up-to-date with the latest technologies and trends in the field of Site Reliability Engineering.
- Research and implement new technologies and tools to improve system reliability and performance.
Required Skills :
- Strong experience with at least one major cloud provider (AWS, Azure, GCP).
- Proficiency in infrastructure-as-code tools like Terraform, Ansible, or Puppet.
- Experience with CI/CD pipelines and tools (Jenkins, GitLab CI/CD, etc.
- Experience with monitoring and alerting systems (Prometheus, Grafana, ELK stack, etc.
- Proficiency in scripting languages like Python, Bash, or Ruby.
- Strong understanding of Linux/Unix systems administration.
- Solid understanding of networking concepts (TCP/IP, DNS, routing).
- Understanding of security best practices and common security threats.
- Excellent analytical and problem-solving skills.
- Strong communication and collaboration skills.
Nice to Have :
- Experience with containerization technologies (Docker, Kubernetes).
- Experience with serverless computing (AWS Lambda, Azure Functions).
- Experience with service mesh technologies (Istio, Linkerd).
- Experience with chaos engineering.
- Experience with SRE principles and practices (Google SRE book)
Functional Areas: Software/Testing/Networking
Read full job descriptionPrepare for Site Reliability Engineer roles with real interview advice