Job Descrption
WHAT YOU'LL DO
Braze is at an inflection point in our maturity, where a key focus of our work is on Scalability, Observability, and Reusability. Reporting to our Head of Incident Management, you’ll focus on major incident management, process management, program management, and release management. The Technology Operations Team is focused on ensuring that Braze is operating as a technology-first business, with process, policy, and support in place to manage growth and scale. You’ll be ensuring that programs and processes that span or are required by multiple engineering departments are standardized, followed, and improved over time.
• Creating, communicating, and executing the incident response strategy and actions for individual incidents (spanning Security, IT, DevOps, and Product Engineering)
• Incident Commanding - driving resolution of incidents by closely partnering and collaborating with Engineering, Technical Support, and Customer Success
• Lead and contribute projects to... improve tools and processes related to manageability, observability, resiliency
• Manage incident-related training, including cross-training of our SREs, DevOps, and Application Engineers
• Overseeing the incident management process and team members involved in resolving the incident
• Prioritizing incidents according to their urgency and influence on the business
• Contribute to our blameless post-mortem process, driving prioritization of action items related to site reliability and resiliency
• Understand and translate technical information and issues into business cases, impacts, and risks that can easily be interpreted by the customer
• Leads the weekly release process as part of a release management team
• Escalate and manages release related issues through to resolution
WHO YOU ARE
• Able to effectively communicate critical issue status (both verbally and written) to executive staff, go to market teams, and other involved parties
• Are able to effectively build and maintain relationships with key stakeholders across the business
• Ability to lead, make decisions, problem solve and work within teams. Can demonstrate flexibility and agility to move between role types within teams
• Ability to effectively prioritize and execute tasks in a high-pressure environment
• Experience leading technical incidents and driving them to resolution, whether as part of an on-call team or as an incident manager
• A strong technical background and experience with specific tools for reporting, documentation, and observability (Jira, Confluence, Datadog, or the equivalent)
• A good foundational understanding of release management concepts, DevOps, and SRE
• You have a high degree of operational excellence, use data-driven decision-making to minimize risk, and love building and managing against reports and data
• 7+ years in incident management, operations, or technical support experience
Your CV has been submitted successfully.