The Cortex team builds and delivers the industry s most advanced SecOps platform, consisting of XDR, XSIAM, XSOAR, and XPANSE. As a member of the Cortex DevOps team, your role involves operating and maintaining a large-scale GCP environment
We are looking for a motivated leader in the capacity of a Sr. SRE Manager to join our Cortex XDR DevOps group in our India center. We are responsible for reliability, scalability, and Operational excellence while keeping an eye on latency, performance, and capacity of GCP hosted data stores , primarily BigQuery. As part of this role, you will collaborate closely with our engineering teams to develop innovative solutions that provide clear and actionable insights into our systems performance and health. More information about the Cortex product can be found
Contribute to the success of the Data platform team.
Develop expertise in new technologies
Work with developers, researchers, data scientists, and security experts
Ensure that applications are scalable, and reliable
Develop tools and automation frameworks
Automate robust deployment of robust services
Orchestrate end-to-end monitoring and alerting
Participate in design reviews
Your Impact
Build, grow and nurture a high-performing SRE/DevOps team, including hiring, mentoring, and professional development
Contribute to a seamless 24/7 round-the-clock site reliability engineering workflows in conjunction with the regional teams in the organization
Influence and align the Organizations vision by collaborating with Customers, Partners, Product Management and Engineering teams
Deliver high-quality results with full ownership and take the product/ service to the next level
Implement and promote Agile methodologies and SRE/DevOps practices to streamline development processes
Own career development of the team through active coaching
Foster a strong team culture of engineering excellence, customer passion, collaboration, diversity and inclusion
Communicate effectively with stakeholders, including upper management, to provide updates on project progress, risks, and roadblocks
Facilitate clear and open communication within the SRE/DevOps team
Your Experience
15+ years of experience in SRE/DevOps roles, with a focus on cloud-based infrastructure
5+ years of experience managing a team of SRE/DevOps engineers primarily working in Google Cloud ( GCP )
Deep knowledge of modern infrastructure management and site reliability engineering practice, including Infrastructure-as-Code tools (e.g., GCP , AWS , Terraform, K8s , Help etc.)
Solid understanding of public cloud technologies (GCP, AWS , Azure) with hands-on technical knowledge of Google Cloud Platform ( GCP )
Deep knowledge of modern software development tools/ways of working (e.g., git/GitHub, DevOps tools, metrics/monitoring, ) and development environments to provide DevOps services and solutions
Experience with building tools to improve the reliability of systems, automated remediation of issues, or improve scalability with hands-on experience collecting performance data, analyzing, troubleshooting, and tuning
Leverage AI-driven operations to enhance incident response, automation, and proactive monitoring, and to improve overall system efficiency
Knowledge of defining and monitoring system quality measures, including SLIs (Service-level Indicators), SLOs (Service-level Objectives), and Service-level Agreements (SLAs)
Work closely with customer support teams to provide timely and accurate updates to customers during incidents, setting realistic expectations for resolution times (MTTR)
Lead the development and implementation of robust incident management processes, ensuring timely detection, response, and resolution of incidents affecting system reliability
Exceptional leadership skills with the ability to inspire, mentor, and motivate a team, including performance evaluation and career development
Excellent communication skills with the ability to work collaboratively and effectively with cross-functional teams and knowledge-sharing
Ability to interact effectively at all levels with sensitivity to cultural diversity
Ability to convey complex technical concepts to non-technical stakeholders
Active listening, conflict resolution skills, strong problem-solving abilities
Capacity to handle high-pressure situations and make critical decisions quickly
Understanding of the companys business goals and the ability to align team efforts accordingly