i
Fixity Technologies
16 Fixity Technologies Jobs
Site Reliability Engineer - Cloud Infrastructure (6-9 yrs)
Fixity Technologies
posted 4d ago
Flexible timing
Key skills for the job
Job Overview :
We are seeking a talented and motivated Site Reliability Engineer (SRE) with expertise in Azure, Kubernetes, DevOps, and Test Automation. As an SRE, you will play a critical role in maintaining and improving the availability, reliability, and performance of our cloud infrastructure and services. You will work closely with development and operations teams to ensure a smooth and efficient software delivery lifecycle.
Responsibilities :
- Ensure the availability, performance, and scalability of production systems by monitoring, troubleshooting, and resolving complex issues across multiple environments.
- Build, maintain, and improve automated pipelines using Azure DevOps for continuous integration and continuous delivery (CI/CD).
- Manage and optimize Kubernetes clusters, ensuring high availability, efficient resource allocation, and secure operations.
- Implement test automation frameworks and scripts to improve software quality, reliability, and performance.
- Develop and manage Docker containers and images to support microservices and containerized applications.
- Collaborate with development teams to design, implement, and maintain automation frameworks that improve operational efficiencies.
- Establish and enforce best practices for system reliability, including disaster recovery, backup, and incident management.
- Create and maintain dashboards and monitoring systems for real-time system performance and health metrics.
- Participate in on-call rotations to ensure systems remain operational 24/7.
- Conduct root cause analysis for incidents and implement measures to prevent recurrence.
- Drive continuous improvement initiatives for system reliability and efficiency.
Required Skills and Experience :
- Strong experience with Azure Cloud services (Azure Resource Manager, Azure Kubernetes Service, Azure DevOps, etc.).
- Expertise in Kubernetes for container orchestration, deployment, and scaling of applications.
- Proficient in DevOps principles and tools (CI/CD pipelines, version control systems, automation, etc.).
- Experience with Docker for containerization and management of microservices.
- Hands-on experience with Test Automation, including writing and maintaining test scripts for different types of tests (unit, integration, end-to-end).
- Strong understanding of infrastructure-as-code (IaC) tools (e.g., Terraform, Ansible, Helm).
- Experience with monitoring tools (e.g., Prometheus, Grafana, New Relic, Datadog) for system health and performance metrics.
- Solid scripting and automation skills in languages such as Python, Bash, or PowerShell.
- Excellent problem-solving skills and the ability to troubleshoot complex systems.
- Strong communication and collaboration skills, with the ability to work cross-functionally within teams.
Preferred Qualifications :
- Certifications in Azure (e.g., Azure Solutions Architect, Azure DevOps Engineer) or Kubernetes (e.g., CKA, CKAD).
- Experience with cloud-native applications and serverless architecture.
- Familiarity with microservices architectures and patterns.
- Experience with agile development practices.
Functional Areas: Software/Testing/Networking
Read full job descriptionPrepare for Site Reliability Engineer roles with real interview advice