Safety and Swiss Cheese!


What's the connection?

On SBS-TV last night here in Australia, viewers were presented with a very interesting French documentary, entitled "Why Planes fall".

Here is what the programs notes said:

Cutting Edge: Why Planes Fall

Why Planes Fall looks at the reasons why modern air accidents occur. The majority of modern airline crashes are not caused by aircraft problems such as mechanical flaws or computer failure, but in 70% of cases, by human error. In the 1970s NASA pioneered the idea that in addition to acquiring training in how to fly, pilots needed to know how to handle stress and conflict on board planes. This documentary includes compelling re-enactments of what took place in the cockpit in the moments before a number of famous crashes. (From France, in French and English, English subtitles)

I always have reservations about recommending such programs to patients (especially with such titles) since any program which features incidents, no matter how rare, may be incorrectly interpreted. For the same reason, I always organised for Ansett Fear of Flying groups to visit the maintenance hangars late in the training program where the sight of a Boeing 767 in pieces undergoing a D check would be seen as reassuring (the thoroughness of maintenance) rather than dangerous ("there are so many pieces which could fail").

Fortunately, this French documentary which strongly featured psychological aspects of flight management, including Crew Resource Management (CRM) in Air France and United Airlines, managed to ultimately portray why commercial aviation is as safe as it is, far safer than most other modes of transporting people - elevators and escalators apparently being top of the pops for safely moving people.

One of the key people interviewed is a psychologist whose research has helped explain how a vastly safe system can be "tricked" so that incidents occur His name is Professor James Reason, a psychologist in Manchester, UK.

Reason has written extensively about how humans and organisations (not just aviation but nuclear plants and governments) commit errors and how such incidents can be prevented once their causes are understood. In particular, he has developed what he calls the "Swiss Cheese" model of incident occurrence.

Here is what it looks like applied in one area where vulnerabilities can have dire consequences, the hospital setting:

ARDS is Acute Respiratory Distress Syndrome which can result from a number of preventable circumstances. In the illustration above, each slice of cheese, starting from the right, represents an obstacle or defense to ARDS development in a patient admitted to the hospital. But the holes in the cheese slice represent something different - a latent error or system failure waiting to to happen. These could be human error, equipment failure, and so on.

Each of these can be handled and prevented by proper training, supervision, maintenance and so on. But when these methods break down, the likelihood of a serious event increases. We are talking here in probability terms, with each of them being independent of the other. Although it must be said that an organisation which allows one or two of these events to occur may well have systemic and cultural problems which might allow for more events to occur: that is, for the slices of swiss cheese to have more holes in them.

When the holes line up, meaning all the defenses fail and an organisation's latent vulnerabilities are exposed, then an incident occurs, as this illustration above shows.

The same reasoning has been applied in aviation too, where each slice might represent a different component of the aviation matrix: the airplane manufacturer, the airline, pilots and their training, air traffic control and so on.

Each acts in a defensive way to prevent incidents, yet each of these have vulnerabilities where things can go wrong. The more we know what can go wrong, the smaller the holes in the swiss cheese becomes, and the less chance the slices will spontaneously align for an incident to occur. This is one reason why aviation incidents, no matter how small, are thoroughly investigated by first-tier airlines, who also encourage an organisational culture of self-reporting to occur, without penalty for admission of error. In other words, it's one thing to have holes or vulnerabilities, it's another for them to line up at any one given time to allow an incident to "pass through". This is why aviation incidents are often described as occurring from a conspiracy of unlikely events occurring close together in time and space.

This is also why flying is as safe as it is... there are lots of slices (=redundancy systems, ie. three hydraulic systems, planes can climb out on one engine, etc), and efforts are constantly occurring to reduce hole size. There are few industries as self-reflective and examining as the commercial aviation industry.

To bring the message home, try and develop a model for your driving experience. How many slices of cheese exists to help you prevent an accident, or recover from it? How big are the holes, and how often do you refine your driving methods to improve the chances of not being in an accident, or surviving one?

Before I knew of Reason's Swiss Cheese theory of organisational vulnerability, I would discuss these matters (system redundancies, incident probabilities) with clients by referring to the wiremesh or chainlink fences surrounding the nearby Melbourne Airport (as well as many other secure places).

Here you can see wiremesh used to make fences, baskets, cages and whatever - it's to prevent things getting in, or things getting out!

The integrity of the mesh - its ability to do the job it was designed for - is determined by the strength of the individual components (the metal itself) and the design and quality control used to put the components together. Not to mention whatever maintenance is needed to reinforce its integrity in the face of wear and tear, weather, etc.

Should one of the links in the diagram be broken, then the integrity of the mesh is only very slightly affected - it can still do its basic job of protection. And if another link fails, but quite some distance from the first one, then the fence can still do its job without too much going wrong.

But if more and more links should fail, and they are in close proximity, then the fence may present less and less resistance to intruders or whatever it is containing. Eventually, with enough links broken, the fence or barrier will no longer function in the way it was meant, and danger can get in or out.

To me, that's how air safety works - there are multiple redundancies which must spontaneously conspire together to fail in location and time, in order for vulnerability to danger to become active. Every time you hear of some event which has resulted in a trusted piece of equipment or a person or an organisation failing, you can be sure the swiss cheese holes have lined up, or a number of wiremesh links have been broken for the vulnerability to be exposed.

Here's hoping this brief exploration of safety systems in complex organisations has been helpful.

Posted: Wednesday - May 19, 2004 at 12:39 PM         |


©