7 StarTree Jobs
StarTree - Site Reliability Engineer - Distributed Systems (5-9 yrs)
StarTree
posted 7d ago
Key skills for the job
Job Description :
StarTree is seeking exceptional Site Reliability Engineers for Pinot (SRE- Pinot), to manage, tune, and debug the large-scale highly available distributed systems. You will be working with a team of passionate and talented engineers in the automation, tuning, and troubleshooting of Apache Pinot. We are looking for motivated, hardworking, and focused individuals who have a real passion for operational excellence, data systems, and automation.
Responsibilities :
- Leverage various monitoring and alerting services to solve intricate programming problems at scale.
- Manage and tune multiple critical customer-facing Apache Pinot clusters.
- Monitor availability, read/write latencies, and other key telemetry to proactively identify SLO misses and help mitigate issues.
- Build a rapport with and work closely with customers to mitigate and resolve incidents.
- Execute disaster recovery strategies with minimal downtime.
- Collaborate with other engineers to understand and troubleshoot systems and use the experience gained to influence the roadmap of other teams.
- Debugging Pinot queries and ingestion when incidents occur.
- 5+ years of experience as an engineer (SRE, SDET, or development).
- Experience managing highly available production facing distributed systems and in-depth knowledge of Java is a plus.
- Exposure to cloud platforms such as AWS, GCP, or Azure is a plus.
- Experience with Kubernetes and container orchestration is a plus.
- Familiarity with streaming systems, such as Kafka, Pulsar, Flume, Flink, Spark, or similar.
- Strong troubleshooting and critical thinking skills.
- Experience working with managing Apache Pinot is preferred.
- Experience building Java apps is required.
- Experience with zookeepers is a huge plus.
Functional Areas: Software/Testing/Networking
Read full job description4-6 Yrs
Kolkata, Mumbai, New Delhi +4 more