Four years before BP’s Deepwater Horizon became global news, ExxonMobil walked away from a $187 million investment in a similar deep sea well. The well had yet to produce a single drop of oil when engineers detected geologic instability in the seabed. After consulting with geologists and drillers on the project, leaders decided that the best course of action was to cap the well and move on. Industry analysts at the time accused ExxonMobil of being too risk averse. After all, no incident had actually occurred.
That narrative changed in 2010, when operators of the Deepwater Horizon lost control of a wellhead that had been experiencing similar disturbances. This time there was no speculation of whether the risk was “real” or not. The resulting explosion and fire killed 11 workers and created an environmental and public relations nightmare. For leaders across industry, the tragedy of the Deepwater Horizon was compounded by the fact that many of the lessons it had to offer had already been “learned” before.
So why did only one organization seem to heed them?
Since the 1980s, effort has been made in the safety community to define a High-Reliability Organization (HRO). Ultimately, a HRO functions to understand why early warning signals are overlooked and what can be done to improve identifying and responding to these signals.
The HRO paradigm seeks to characterize organizations that stay safe despite operating in high-risk conditions. Good examples are submarine groups, wildfire incident management teams, and nuclear plant operations. In theory, a HRO would routinely make the same kinds of decisions that ExxonMobil did back in 2006—and simultaneously avoid the mistakes made by operators of the Deepwater Horizon in 2010.
The challenge for leaders is that defining HRO performance has largely been an academic exercise rather than a practical one. Leaders struggle to translate anecdotal examples of feel-good “habits” into actionable and, more importantly, measurable practices.
Rather than focusing on individual behaviors found in HROs, organizations should identify organizational practices that systemically mitigate risk. There are five disciplines that together support the technical and operational aspects of catastrophic incident prevention. Most importantly, these disciplines can be measured, assessed and acted upon.
Anticipation is about recognizing early warning signs. An organization strong in anticipation will have mechanisms to capture information from a variety of sources that may be meaningful early indicators of change to exposure. Examples may include process deviations, unusual maintenance requests, and even front-line workers detecting differences in sounds. Anticipation helps reduce organizational and operational risk blindness.
To have strong anticipation you need a culture that supports it. This means:
· People are encouraged to report and the value of reporting is reinforced.
· Data is acted upon.
· Leaders understand and accept that there will a lot of information that may not be essential, but it’s worth sorting through to be sure they don’t miss the critical information.
· Leadership drives risk-blindness reduction efforts by moving from statements like “It can’t happen here” to “How can it happen here?” and “What are we doing about it?”
· Organizations with strong cultures have leaders who visibly value the search for early warnings and reinforce the analysis of these indicators, even when this does not result in identification of serious risk. These leaders understand that supporting the investigation of false positives is worth it if it results in avoiding just one catastrophic event.
Questioning is about preserving the integrity of decision-making and action, specifically by protecting teams from the natural biases innate in all humans and groups. Traditional process safety management includes a number of elements designed to evaluate and plan for the control of hazards and risks. However, there are common cultural characteristics such as cognitive bias that can undermine the effectiveness of these efforts and leave the organization vulnerable.
This can also trap us in poor decision-making. We witnessed the impact of cognitive bias in one incident where a crew of maintenance workers responded to the sound of leaking gas from pipelines in a trench. Everyone on the crew assumed the gas was nitrogen because of a previous nitrogen leak. But this time, the leak was hydrogen. When the trench cover was lifted, a spark ignited the hydrogen and killed several people.
Had the work crew applied appropriate questioning, it is likely that they would have overcome the bias that drove them to assume it was a nitrogen leak. Even if the leak was nitrogen, using the gas detector for questioning would have identified if the environment was oxygen deficient.
Biases can include:
· Confirmation bias. The tendency to search for or interpret information in a way that confirms one’s preconceptions.
· Normalcy bias. The refusal to plan for, or react to, a disaster which has never happened before.
· Availability bias. The tendency to predict based on how easily an example can be brought to mind.
· Status quo bias. The tendency for people to want things to stay relatively the same.
· Groupthink. The tendency to do (or believe) things because many other people do (or believe) the same.
· Risk seeking/risk aversion. The tendency to make risk averse choices if the expected outcome is positive, but make risk-seeking choices to avoid negative outcome.
The only way to guard against the insidious effects of cognitive bias is through culture and leadership. There are specific leadership behaviors, such as encouraging dissenting opinions, that promote a culture that produces more accurate decisions. There are also specific skills involved in asking the right question in the right way to get the right data.
Leaders must promote and measure the use of these leadership behaviors and skills.
Diligence is about assuring the consistent and reliable use of all programs and processes to deliver reliable and safe outcomes. Any program that relies on human action is subject to the risks of human error, and this can lead to significant risk exposure if not fully understood and mitigated.
While many organizations use periodic audits to provide a check on implementation, the key to assuring consistent and ongoing activity is developing leaders who monitor, reinforce and verify effective program execution and understand the elements that are needed to identify and reduce human error.
Human performance reliability is the key to good diligence because the way our brains are wired plays a key role in work-related decisions that have significant implications on safety.
Neuroscience shows us that the Fast Brain, the part of the brain that relies on habit, operates quickly and is often reactive. Worse yet, when fatigue sets in, reaction times can be heightened and judgement can be greatly impaired. When workers experience either acute or cumulative fatigue, they put themselves and others at significant risk.
Understanding brain-centered hazards is imperative for any organization that wants to reduce the potential for critical errors related to safe operation. Monitoring, reinforcing and verifying work decisions are critical in the diligence needed to reduce exposure that is caused by the brain.
Resilience is about developing the agility to recognize and quickly respond to exposures in real time. Upset conditions occur from time to time in any system. A resilient organization is able to react in ways that prevent upset conditions from becoming catastrophic events, and then learning from that experience. Successful organizations also authorize front-line personnel to react and respond to early warning signals and have an effective process for Stop Work Authority.
This has a major influence on results. Even where automated control systems are designed to handle upset conditions, it is important that workers understand when and how to intervene, and are not only able but also willing to make appropriate interventions early. An organization strong in resilience is more likely to prevent a small process disruption from becoming a major incident.
Resilient organizations make learning routine rather than waiting for an incident to trigger learning activities. They create formal mechanisms, such as cross-functional teams and advanced training in incident investigation, to continually capture and respond to risk data. They also focus on generating new knowledge, not finding fault with workers.
Great learning organizations are also characteristically self-aware. They make sure that even the newest employee knows the story of where they came from and where they are going. They are also vigilant about preserving organizational memory and ensuring that hard lessons are never forgotten or repeated.
Taking the Next Steps
Leaders must understand and address the underlying factors that contribute to catastrophic safety risk—to uncover organizational blind spots, and to identify cultural and leadership factors that contribute to ineffectively controlling exposure. A critical first step is to conduct an unbiased evaluation of where your organization stands in relationship to the five HRO disciplines described above.
Adopting the principles of HROs will create an environment where risks are systematically identified, controls sustainably implemented, and performance is monitored—ultimately reducing the potential for catastrophic incidents.
Mike Snyder is North American process safety managing director with DEKRA, a provider of testing, inspection, certification and consulting services.