Candidates for this position are preferred to be based in Bangalore, India and will be expected to comply with their teams hybrid work schedule requirements.
Who We Are:
Wayfair is a leader in the e-commerce space for all things home. We live and breathe modern technologies. We re looking for smart, logical thinkers who can advocate for systems and designs that work at scale, but still retain the startup feel. We pride ourselves on scrappy innovation - getting real, working solutions deployed quickly and iterating based on data.
Our Site Reliability Engineering (SRE) team is looking for experienced engineers who have a mind for cloud native design, stability, and an understanding of SRE best practices! The engineer role within SRE is at the heart of fulfilling SRE s mission: build highly reliable, scalable & measurable customer experience for the continued growth of Wayfair s infrastructure.
What You ll Do:
As a Senior SRE, you will join our team to help grow our systems into best-in-class for efficiency, stability, observability, velocity, and scale in the e-commerce space, engage with product and engineering team from Day 1 to design, build and maintain the system / software proactively.
Influence the design and architecture of Wayfair system as part of Cloud Enablement journey; collaborate with development teams to design scalable and reliable systems, considering aspects such as fault tolerance, availability and performance.
Work with both software engineers and fellow SREs to optimize and develop repeatable systems for the two sides to leverage each other. There s a wide range of opportunities to both guide the broad conversation and dive into the nuance of our design & architecture
Help service owners determine SLIs, build realistic SLOs, set SLAs and error budgets, and ensure production services have reliability built into their cloud-native design
Even after self-healing and automation done by you - if EXTREME complex issues arise, get involved into troubleshooting and root-cause analysis of issues across the stacks - hardware, software, database, network and so on.
Participate in a shared on-call schedule [follow-the-sun model] managed across SRE & Engineering.
Develop and maintain tools and frameworks for automating the deployment, CI/CD pipeline, configuration, and management of software systems. Automate repetitive tasks to increase efficiency and reduce human error.
Level-up new hires and other engineers by example, collaboration, tech talks, and other avenues to increase technical efficiency across the organization
What You ll Need:
5+ years experience working as an Engineer in a SRE role or software development with an understanding of cloud infrastructure.
Experience with cloud platforms GCP or AWS or Azure, and containerization technologies (e.g. Docker, Kubernetes).
Proficiency with any of the following: Python, Java, Go.
Understanding of monitoring and alerting, with a focus on performance monitoring and tracing instrumentation & SLI/SLO/SLA implementation
Understanding of distributed systems, microservices architecture, and related technologies.
Knowledge of CI/CD pipelines and version control systems (e.g., Git).
Excellent communication skills across engineers, product managers, and business stakeholders alike.
Nice To Have:
Experience with JavaScript, React, etc. is a plus
Experience in designing and iterating on systems at scale
Experience in participating in cross-functional team initiatives and driving projects to completion
Experience gathering and balancing requirements from technical and business stakeholders, and reaching consensus on prioritization
Experience mentoring engineers and leading code reviews
Familiarity with infrastructure-as-code tools (e.g., Terraform, Ansible, Puppet)