Manager, Incident Response and Management

Job Description

  • The Incident Ops team is a global 24/7 team responsible for driving incident response and management from detection to resolution. Stripe is proud of its five 9s reliability and this team is at the forefront of ensuring we keep it that way – working hand-in-hand with Reliability Eng and across the Tech Org.
  • This team of incident response managers (IRM) is defined by our sense of ownership and how we drive incidents to resolution – marshaling the necessary cross-functional resources to respond to and resolve service outages, critical bugs, security attacks and anything that significantly impacts the users of our products.
  • The team is user-first and ensures appropriate external communications from Stripe and senior management to keep our users informed of disruption to their experience of Stripe.
  • The team is skilled in program management, communications, incident handling and technical adeptness as incidents can arise from anywhere and cut across products and orgs in Stripe.

What you’ll do

  • As the Manager of Incident Response Managers, you’ll build a world class incident response team in EMEA to maintain a high bar of reliability expected of Stripe.
  • You’ll work hand-in-hand with IRM teams in APAC and AMER to ensure solid 24/7 coverage on how we detect, respond to incidents, communicate to users, improve related tooling and measure impact. You will lead and nurture a high-performing 24/7 EMEA IRM team that has a strong sense of urgency, skilled program ownership of incidents and comms, with drive to rally engineers to their cause and technical expertise to understand impact. As a result, you’ll be seen as the protector of our users – in minimizing the impact of incidents on their business and ensuring that Stripe is always thinking of our users.

Responsibilities

  1. Both manage a team of frontline on-call active responders and participate in on-call responses.
  2. Coordinate and managing incident resolution with speed, cross-functional collaboration, and accuracy, with a global and broad set of stakeholders.
  3. Contribute to incident root cause analysis, identifying remediation opportunities for Incident Operations, partner teams on operations and engineering to execute upon.
  4. Formulate strategy and deliver on communications to both internal stakeholders and Stripe’s users.
  5. Collaborate with engineering and operations teams to align on and execute upon on-going improvements to processes, metrics, and framework.
  6. Influence and make decisions through interpretation of data and consolidation of input from multiple stakeholders.

Who you are

We’re looking for someone who meets the minimum requirements to be considered for the role. If you meet these requirements, you are encouraged to apply. The preferred qualifications are a bonus, not a requirement.

Qualification

  1. Have technical background, are proficient in SQL, Splunk, or equivalent query languages
  2. Experience using infrastructure and application monitoring tools such as Signalfx, Prometheus, Sentry and others
  3. Experience at a high-growth technology company, especially within the payments or e-commerce space in particular for incident response
  4. Experience with managing user-facing communications strategy during sensitive situations such as outages
  5. Strong analytical skills, and the ability to use data to drive business decisions

Skills Required

  • Have 5+ years of direct people management experience, an excellent coach
  • Enjoy a fast paced work environment, crafting strategic and rapid fixes to high intensity problems with a keen eye for detail and a high bar for quality
  • Comfortable navigating ambiguity, while identifying areas for process improvement and establishing best practices
  • Can problem solve and translate complicated technical issues into solutions, while keeping a users-first mindset
  • Have an ability to execute on and deliver complex operational projects involving multiple stakeholders especially in partnering with engineering.