Detailed JD *(Roles and Responsibilities) Responsibilities: Monitoring and Automation: Continuously monitor system performance and reliability. Automate repetitive tasks to improve efficiency and reduce manual intervention. Incident Management: Respond to on-call incidents and troubleshoot issues to minimize downtime. Develop and implement disaster recovery plans. Infrastructure Management: Manage and maintain infrastructure using tools like Chef, Ansible, Terraform, and Kubernetes. Ensure systems are scalable and can handle increased load. Performance Optimization: Identify and resolve performance bottlenecks. Implement solutions to improve system performance and reliability. Collaboration: Work closely with development and IT operations teams to ensure seamless integration and deployment of new features. Document processes and share knowledge to improve team efficiency. Mandatory skills* Skills and Qualifications:
Proven experience as a Site Reliability Engineer or in a similar role.
Strong knowledge of automation tools and scripting languages.
Experience with cloud platforms and containerization technologies.
Excellent problem-solving and troubleshooting skills.
Ability to work in a fast-paced, dynamic environment.