i
yellow.ai
Site Reliability Engineer - AWS/Azure (1-4 yrs)
yellow.ai
posted 14hr ago
Flexible timing
Key skills for the job
About yellow.ai (Formerly Yellow Messenger) :
yellow.ai brings the best of AI+human-led conversational automation for enterprises of great repute like Schlumberger, Domino's, Dr. Reddy's Lab, PepsiCo, Bajaj Group, Indigo, Cipla, Siemens, MG Motors, and more.
We have offices in 6 countries and have clients across 27 countries. We're a team of 500+ makers, who've shipped over 650+ Intelligent Virtual Assistants.
Our Virtual Assistants converse in over 120 languages, and our platform handles more than a billion conversations every month - over 50+ channels in text and voice! We're also named Leading conversational AI Platform, Distinguished CX Vendor, and Advanced Virtual Assistant Provider by Gartner. They're really critical and meet 1000+ Conversational AI Platforms. We're thrilled to be recognized by them!
We're one of the fastest growing SaaS leaders emerging from Asia and are backed on this journey with more than $100M in funding so far by partners like Lightspeed, Sapphire Ventures, WestBridge Capital and Salesforce Ventures.
We also got honored recently, being one of the top 10 companies to work at by #LinkedinTopStartups' and a Great Place to Work at' certification.
Job Description :
Function : IT Operations and Support - DevOps / Cloud
- Linux
- AWS
- Azure
- Google Cloud
- Kubernetes
The Application SRE team is a critical part of the Engineering organization, responsible for triaging and resolving engineering support requests and bug reports raised by customer-facing teams.
We focus on ensuring platform stability, providing timely resolutions, and enhancing the overall customer experience by working closely with our L3 Engineers and Product teams.
Responsibilities :
- Debug and analyze engineering support requests, identifying root causes and potential solutions.
- Implemented configuration changes and provided workaround solutions where applicable.
- Develop and enhance internal automation tools to streamline support operations.
- Monitor and analyze logs using OpenSearch (or similar log management tools) to diagnose and resolve issues efficiently.
- Work closely with L3 Engineers, Product, and Engineering teams to improve system stability and enhance the overall support experience.
- Contribute to documentation, FAQs, and internal knowledge bases to enable customer-facing teams with self-service solutions.
- Conduct training sessions and knowledge-sharing initiatives to improve the platform understanding among customer-facing teams.
Requirements :
- 1-3 years experience.
- Strong debugging and analytical skills with a problem-solving mindset.
- Hands-on experience with Node.js or other backend programming languages.
- Proficiency in working with OpenSearch (or similar log aggregation tools) for log analysis and issue investigation.
- Good understanding of Linux commands and basic Kubernetes (K8s) concepts (a plus).
- Strong communication skills with the ability to explain technical concepts to non-technical stakeholders.
- Ability to collaborate with cross-functional teams and work in a fast-paced environment.
Preferred Qualifications :
- Prior experience in a Support role.
- Familiarity with automation frameworks and AI-driven support tools.
- Experience with incident management and debugging large-scale distributed systems.
- Exposure to cloud environments such as AWS, GCP, or Azure
Functional Areas: Other
Read full job descriptionPrepare for Site Reliability Engineer roles with real interview advice
Good Work culture in company
No dislike about working there