-Kubernetes Administration: Manage and optimize Kubernetes clusters to ensure high availability and performance of TGCloud Savanna. -Automation & Infrastructure as Code (IaC): Develop automation scripts and Helm charts to streamline deployments and infrastructure management. -Observability & Monitoring: Implement and maintain monitoring solutions (Prometheus, Grafana, ELK, etc.) to proactively identify and resolve issues. -Incident Management & Troubleshooting: Respond to incidents, perform root cause analysis, and implement long-term solutions to prevent recurrence. -Cloud Operations & Scalability: Optimize cloud infrastructure (AWS/GCP/Azure) to ensure seamless scaling and resource efficiency -Database Performance & Reliability: Monitor and optimize graph database performance, ensuring optimal query execution and data consistency. -Collaboration with Development Teams: Work closely with engineering teams to improve system reliability and support new feature deployments. -Collaboration with TSE team: Work closely with the TSE team on issues that do spread between the TG CoreDB and TGCloud infrastructure -Customer Support & Communication: Provide technical support for TGCloud customers, ensuring high satisfaction and resolution of technical challenges. -Knowledge Documentation: Maintain detailed documentation of system architecture, processes, and troubleshooting guides (aka Playbooks, Runbooks). -Training & Mentoring: Share expertise and mentor junior engineers to strengthen the SRE teamcapabilities.
Requirements
-2-3 years of hands-on experience with Kubernetes in production environments. -Strong expertise in Kubernetes Operators and Helm chart creation. Once onboard we will be providing training on the TG k8s operator -CKA certification (Certified Kubernetes Administrator) is a plus. -Proficiency in cloud platforms (AWS, GCP, Azure) -Strong background in Linux, networking, and scripting (Bash/Python). -Experience with observability tools (Prometheus, Grafana, ELK, Datadog, etc.). -Familiarity with CI/CD pipelines and DevOps best practices. -Strong problem-solving skills with the ability to troubleshoot complex issues. -Excellent communication skills for effective customer interaction and cross-functional collaboration