Work directly with Product Development teams on features, operations, and reliability engineering, to improve the outcomes that our merchants deserve.
Learn quickly and be able to triage and remediate production system and application incidents while practicing balanced incident response.
Your Day to Day
Shift based team protecting our merchants by serving as a first responder for our systems and applications.
You will facilitateconstructive retrospective sessions to help us enhance the whole lifecycle of operational response from inception and design, through deployment, operation and refinement.
Ensure the proper documentation of incidents affecting our merchants, with the appropriate data collected for the use of our various reliability platforms.
Develop and improve production monitoring and management capabilities using existing platforms and tools.
Work with the Production Operations Manager to take on problems, escalate, and resolve critical site incidents.
Be the primary enabler in the reduction and elimination of issues affecting merchant experience.
What you need to bring
Desire to be part of a shift based team with hands on responsibility to protect the experiences of our merchants.
Skills in Python or other industry standard software development languages (e.g., Java, Node)
Strong experience with Monitoring applications - Splunk/DataDog
Experience with GCP and related services a strong plus.
Good understanding and working knowledge of networking principles, internet fundamentals, Operating Systems, and application stacks and should be comfortable using Linux command line.
Experience with GitOps/Operation as Code preferred with the knowledge of terraform, artifactory