At 2:47am, your on-call engineer gets an alert. Systems are down. By 3:15am, you know it's ransomware. By 3:30am, you're awake and need to make decisions that will define the next 72 hours — and that will shape how your organisation is remembered by customers, partners, regulators, and your own team. In that moment, your technology stack is the least important variable. Your leadership is everything.
Major incidents — whether ransomware attacks, cascading infrastructure failures, or catastrophic data loss — are leadership events as much as they are technical events. The organisations that recover fastest and emerge strongest aren't necessarily the ones with the best technology. They're the ones where leaders had a plan, communicated clearly, protected their teams, and made principled decisions under pressure. This playbook is for those moments.
When the Alert Fires at 2am: Leadership Before the Technical Fix
The first thirty minutes of a major incident are the most chaotic and the most consequential. Decisions made in this window — who is engaged, what information is gathered, how the response is structured — shape everything that follows. Leaders who have never thought about this moment in advance will improvise. Some will improvise well. Many won't. The difference between a controlled incident response and a chaotic one is almost always preparation, not talent.
Your first job as a leader in the first thirty minutes is not to fix the problem. It's to create the conditions in which your team can fix the problem. That means activating the right people without creating an unmanageable crowd in the incident channel. It means establishing a clear incident commander — one person with explicit authority to make decisions and co-ordinate the response — rather than a committee that debates every action. It means making sure the people doing the technical work have what they need: access, information, authority, and freedom from interruption.
Leaders who jump into the technical resolution without a structured response framework create a second problem alongside the first. The on-call team is now managing the incident and managing the leader's anxiety simultaneously. Resist the instinct to solve. Your job is to lead — to hold the structure that lets your engineers do what they're trained to do, without unnecessary interference.
The Blame Instinct and Why It Destroys Incident Response
When something catastrophic happens, the human instinct — and the organisational instinct — is to assign blame. Who deployed the change that caused this? Who approved the configuration that was exploited? Who was responsible for the backup that didn't work? These questions feel urgent and reasonable. They are neither. In the middle of an active incident, blame is a distraction. It consumes cognitive resources that need to be focused on resolution. It chills the open information sharing that effective incident response depends on.
Engineers who fear blame in an active incident will withhold information that might implicate them. They'll be slow to flag mistakes they've made because admitting the mistake feels dangerous. They'll engage in defensive behaviour — documenting their own actions in ways that distance them from the cause — rather than collaborative behaviour focused on the fastest path to resolution. A single ill-timed blame signal from a leader during an active incident can add hours to the mean time to resolution. Leaders who understand this protect the incident response environment aggressively.
This doesn't mean accountability is abandoned. It means accountability is separated in time from incident response. During the incident, the only question is "how do we fix this?" After the incident is resolved and people have rested, there is a structured post-incident review that addresses root causes, systemic failures, and — where necessary — individual performance. But that review is blameless in its approach, focused on systems and conditions rather than individual culpability. The distinction between "what allowed this to happen?" and "who is to blame?" is not semantic. It produces fundamentally different outcomes.
Ransomware: A Leadership Crisis, Not Just a Technical One
Ransomware incidents are categorically different from conventional outages because they involve an active adversary who has already been inside your environment, the potential for ongoing exfiltration even after systems are taken offline, complex legal and regulatory notification obligations with strict timelines, and a ransom decision that has financial, ethical, legal, and reputational dimensions that go far beyond the technical response. Leaders who treat a ransomware incident as a particularly severe infrastructure outage will make decisions that compound the damage.
The first leadership decision in a ransomware incident is whether and how quickly to involve external expertise. In-house teams — however skilled — are rarely prepared for the forensic, legal, and negotiation complexity of a ransomware response. Engaging a specialist incident response firm in the first hours, rather than the first days, is almost always the right call. The cost of delay — in evidence preservation, in regulatory notification windows, in adversary persistence — consistently exceeds the cost of engaging early.
The ransom payment decision is one of the most complex leadership decisions in modern business. There is no universally right answer. Paying may fund criminal organisations and encourage future attacks. Not paying may mean unrecoverable data loss, extended downtime, and business failure. Legal obligations vary by jurisdiction and may prohibit payment in some contexts. This decision should never be made unilaterally by a single leader under time pressure. It requires legal counsel, board involvement, and — where available — law enforcement engagement before a decision is made. Leaders who have pre-agreed a decision framework for this scenario before an incident occurs make better decisions in the moment.
Stakeholder Communication During an Active Incident
Communication during an active incident is a leadership function, not a communications function. While your PR or comms team owns the message crafting, the decision about what to communicate, to whom, and when is a leadership call. Getting this wrong — either by communicating too little and creating an information vacuum that fills with speculation, or by communicating premature information that turns out to be inaccurate — has lasting reputational consequences that outlast the incident itself.
Effective stakeholder communication during an incident follows a consistent pattern: communicate early with what you know and what you don't know, update regularly even when there is nothing new to report (silence is interpreted as concealment), and be explicit about your investigation process so stakeholders understand how information will flow. "We are aware of the issue, our team is actively investigating, and we will provide an update in two hours with whatever we know at that point" is a better communication than either silence or a premature explanation.
Different stakeholder groups require different communication approaches. Customers need to understand impact and timeline. Regulators need to understand the nature of the incident and your response actions within their notification windows — GDPR requires notification within 72 hours of becoming aware of a personal data breach, a clock that starts running from the moment you know, not the moment you're certain. Board members need executive summary information that gives them oversight without pulling them into operational details. Staff need enough context to respond accurately to customer questions without speculating. Managing these parallel communication streams while simultaneously overseeing the technical response is a leadership capability that requires explicit planning before an incident occurs.
The 72-Hour Playbook: What Good Crisis Leadership Looks Like
The first 72 hours of a major incident have a predictable rhythm, and leaders who understand that rhythm can navigate it deliberately rather than reactively. Hours zero to six are about containment and assessment: understanding the scope of the incident, stopping active harm, and establishing the response structure. Hours six to twenty-four are about stabilisation: restoring critical services in priority order, maintaining communication cadence with stakeholders, and managing the human sustainability of your response team. Hours twenty-four to seventy-two are about recovery and learning: restoring full service, completing regulatory notifications, and beginning the post-incident analysis.
The human sustainability point deserves specific attention. Extended incidents create enormous pressure on the engineers closest to the problem, and leaders who don't actively manage fatigue will find their best people making poor decisions after twenty-plus hours without sleep. Rotating engineers off the incident response every six to eight hours, ensuring that someone is explicitly responsible for keeping responders fed and hydrated, and releasing engineers to sleep when they are not actively needed are leadership decisions, not HR considerations. Exhausted engineers make the incident worse.
Good crisis leadership in this window is characterised by clear decision-making authority, regular and honest stakeholder communication, active management of responder wellbeing, and a deliberate separation between the people doing the technical recovery and the people managing the organisational and stakeholder response. Leaders who try to do all of these simultaneously, without clear delegation, create chaos. Leaders who have pre-defined these roles and practised them create calm.
Post-Incident: The Leadership Work That Actually Prevents Recurrence
The post-incident review is the most important meeting in your incident response programme, and it is the meeting that most organisations get most wrong. Conducted under time pressure, with too many participants, in a blame-adjacent atmosphere, by a facilitator who is also the subject of the review — these are the conditions that produce a document rather than learning. The document gets filed. The conditions that produced the incident remain. The next incident is slightly different in form and identical in cause.
Effective post-incident reviews are scheduled with adequate preparation time — not the day after the incident when people are exhausted and emotional, but within a week while details are still fresh. They are facilitated by someone without a stake in the outcome, ideally someone who was not in the incident response chain of command. They focus explicitly on systemic conditions rather than individual actions: what in our systems, processes, and culture created the conditions that made this incident possible? What made detection slower than it should have been? What made recovery harder than it needed to be?
The output of a good post-incident review isn't a list of individual actions assigned to specific engineers with two-week deadlines. It's a set of systemic improvements — to architecture, to processes, to tooling, to training — that are resourced, prioritised, and tracked with the same rigour as product work. Leaders who treat post-incident action items as optional context will be in the same post-incident meeting twelve months later, reviewing the same class of incident.
Board and C-Suite Communication: Honesty Over Damage Control
The instinct in a major incident is to protect leadership from bad news for as long as possible, to manage the narrative rather than the facts, and to present the situation in the most favourable light available. This instinct is understandable and almost always wrong. Boards and C-suites that receive sanitised incident reports during an active crisis are boards and C-suites that make poor governance decisions. They can't provide appropriate oversight of an incident response they don't understand. They can't prepare for regulatory, legal, and reputational consequences they haven't been told about.
Effective board and C-suite communication during a major incident is direct, timely, and honest about uncertainty. "We don't know yet" is always a legitimate answer to a board question during an active incident — what's not legitimate is pretending to know when you don't, or delaying communication until you have a tidy story to tell. The board's job in a crisis is oversight and support, not management. Leaders who brief the board early and honestly get oversight and support. Leaders who brief the board late and optimistically get scrutiny and doubt.
After the incident, the board deserves a complete and honest account of what happened, what the organisation's response was, what the consequences are, and what systemic improvements are being made. This account should not be defensive. The goal is not to preserve leadership credibility through narrative control. It's to give the board the information they need to exercise appropriate governance — and to demonstrate the kind of leadership integrity that actually does preserve long-term credibility.
Building Incident Response Muscle Before You Need It
Everything in this playbook is harder if you encounter it for the first time during a real incident. The incident commander role is harder to exercise if nobody has held it in a practice run. The blameless post-incident culture is harder to establish when emotions are raw after a real event than when it's practised on a simulation. The stakeholder communication rhythm is harder to maintain when it's improvised than when it follows a pre-agreed template that everyone knows. Incident response is a muscle, and like all muscles, it needs to be trained before you rely on it.
Tabletop exercises — structured simulations where leadership walks through an incident scenario without any actual system disruption — are the most accessible form of incident response training. A well-designed tabletop for a ransomware scenario will surface the decision points that leaders find hardest: the ransom payment decision, the regulatory notification timing, the board communication approach. Surfacing those difficulties in a training context rather than a real incident context is worth considerably more than the half-day it takes.
Beyond tabletops, organisations with mature incident response programmes run regular game day exercises where actual systems are deliberately disrupted to test recovery capabilities. They rotate on-call responsibilities so that incident response skills are distributed across the engineering organisation rather than concentrated in a few individuals who become single points of failure. They treat every real incident — including minor ones — as a practice opportunity, running post-incident reviews consistently rather than only for high-severity events. The organisations that handle ransomware and major outages best are the ones that have been practising incident response as a normal part of their engineering culture for years before the major event arrives.
Related Reading
Prepare Your Leadership Team Before the Incident
MindZBASE works with engineering and executive leadership to build incident response playbooks, run tabletop exercises, and establish the organisational practices that make crisis leadership effective. Don't build your playbook during the incident. Build it now.
Schedule a Consultation