i
Xoriant
1 Xoriant Site Reliability Engineer Job
Site Reliability Engineer
Xoriant
posted 5d ago
Flexible timing
Key skills for the job
Site Reliability Engineer
Pune, Mumbai, Bangalore, Gurgaon , Chennai
Full Time
Hybrid (3 dyas a week)
As a Site Reliability Engineer (SRE), you will play a crucial role in maintaining and improving the reliability and performance of our systems and applications. You will leverage Datadogs monitoring and observability platform to enhance our infrastructure, ensure uptime, and deliver a seamless user experience.
Key Responsibilities:
Monitoring and Observability:
Configure and manage Datadog for comprehensive monitoring of systems, applications, and services.
Develop and maintain dashboards, alerts, and anomaly detection to ensure visibility into system performance and health.
Incident Response:
Act as the first responder to system incidents, performing root cause analysis and driving rapid resolution.
Implement and refine incident management processes to minimize downtime and impact on users.
Performance Optimization:
Analyze system performance and recommend optimizations to improve efficiency and scalability.
Collaborate with development teams to implement best practices for application performance and reliability.
Automation and Tooling:
Develop and maintain automation scripts and tools to streamline operational tasks and reduce manual intervention.
Integrate Datadog with other DevOps tools and platforms to enhance automation and workflow efficiency.
Capacity Planning:
Conduct capacity planning and scaling exercises to ensure the infrastructure can handle growth and increased demand.
Provide recommendations for hardware and software upgrades to support scaling efforts.
Documentation and Training:
Create and maintain comprehensive documentation of monitoring configurations, incident response procedures, and performance tuning guidelines.
Provide training and support to development and operations teams on the effective use of Datadog and related tools.
Qualifications:
Education:
Bachelor’s degree in Computer Science, Engineering, or a related field (or equivalent experience).
Experience:
Proven experience as a Site Reliability Engineer, DevOps Engineer, or similar role.
Hands-on experience with Datadog, including setting up monitoring, creating dashboards, and configuring alerts.
Strong understanding of cloud platform, Azure, and containerization technologies (e.g., Docker, Kubernetes).
Skills:
Proficiency in Terraform scripting and automation.
Solid knowledge of infrastructure-as-code (IaC) principles and tools.
Excellent problem-solving skills and the ability to perform well in a fast-paced environment.
Strong communication and collaboration skills to work effectively with cross-functional teams.
Employment Type: Full Time, Permanent
Read full job descriptionPrepare for Site Reliability Engineer roles with real interview advice