Sharing a collaboration space with the high performing SRE team for 50% Automation and 50% operations. Should be highly dynamic in the culture adoption at first place itself (as it s a long and continuous learning curve)
Be keen on the deliverables based on the demand and requirement basis followed by the agile methodologies & best practices. As we follow best in industry standard agile mindset scrumban
Operate Infrastructure as Code and ensure coding best practices are followed - Terraform.
Observability (sli/slo/alert dashboarding) - Google Cloud Operation s Suite / NewRelic
Drive stability in cloud operations using SRE principles.
Faster collaboration builds relationships and work hand in hand with Cloud Engineering, Platform SRE, DevOps team and other stakeholders to deliver best outcomes.
Identify automation areas which adds value towards key SRE principles such as TOIL Reduction, increasing developer productivity, ensuring zero maintenance over head for the developers.
Proficiency in Requirement analysis, Risk analysis & mitigation s, Incident management, Problem management, Change management.
Identify problems and use procedures and documentation for best actions, and participate in mitigation/resolution
Drive tools optimization to pro-actively detect anomalies before it affects service
Documenting & maintaining knowledge base is a KEY success. Ensure this is happening religiously on educational materials such as cloud how-to and best practices. Maintain and improve procedures, processes, and documentation relevant to cloud native services, networks, security best practices
Your skills and experience
Proficient in Google Cloud Platform
Experience with coding in Terraform on GCP.
Experience in supporting Terraform Enterprise, Shared Infrastructure and Network in Cloud
Proficiency in more than one programming language (Python, Go, Bash)
Good experience in automating Operational tasks
Experience in IAM & Least Privilege Access Principles are followed.
Have strong insights over SRE principles
Experience in Observability - Preferably GCO or NewRelic
Familiarity with SDLC processes
Good understanding of network services on cloud including VPCs, subnets and firewalls.
Certification on a cloud platform, preferably GCP
Excellent communication and influencing skills
Ability to work under pressure and part of multicultural team
Ability to provide technical leadership in major incident investigation, problem and change management activities
Educated to bachelor s degree with 10- 15 years of Experience.