Gains from Getting Near Misses Reported - Process Improvement

Apr 1, 2012 ... 2. Introduction. We must learn from accidents and near misses to prevent recurrence. The first step in the learning ... data indicates...

3 downloads 398 Views 427KB Size
__________________________________________________________________________

Gains from Getting Near Misses Reported Mr. William G. Bridges, President Process Improvement Institute, Inc. (PII) 1321 Waterside Lane Knoxville, TN 37922 Phone: (865) 675-3458 Fax: (865) 622-6800 e-mail: [email protected]

2012 © Copyright reserved by Process Improvement Institute, Inc.

Prepared for Presentation at 8th Global Congress on Process Safety Houston, TX April 1-4, 2012 Keywords: investigation, near miss, accident, incident, root cause, RCA ABSTRACT The need for effective root cause analysis is finally gaining the spotlight in the chemical process industry. If we do not find out about an incident, we cannot investigate the root causes. We find out about accidents (harm done) because they are difficult to hide. However, there is only one accident for about 10,000 errors and failures (sometimes called unsafe acts and unsafe conditions). The definition of a near miss (a potentially damaging sequence of events and conditions, but without harm) can be vague and varies from site to site. However, data indicates that there are probably about 100 near misses for every accident. Learning from near misses is much, much cheaper than learning from accidents, yet many companies get less than one near miss reported for each accident. This paper describes the reasons why near misses are not reported and shares how companies have increased the reporting ratio to as high as 105:1; it is an update of the basis for Chapter 5 of Guidelines for Investigating Chemical Process Incidents, Second Edition (CCPS, 2003).

1

Introduction We must learn from accidents and near misses to prevent recurrence. The first step in the learning process is investigation to determine the causes and underlying reasons why accidents and near misses occur. A thorough investigation of root causes will identify the management system weaknesses. Learning which management system weaknesses are leading to near misses and accidents is one of the highest value activities in which a company can invest, and learning from near misses is much cheaper than learning from accidents. Many chemical companies have implemented process safety management systems, and now they are beginning to focus on getting near misses reported and on root cause analysis. This is a very exciting trend. Unfortunately, the chemical industry gets very few near misses reported (the chemical industry is certainly not the only industry with this problem). To understand more about near misses and getting them reported, it is best to first review the basic definitions. An incident is either an accident or a near miss. An accident is a sequence of unplanned events and conditions that result in harm to people, environment, process, product or image. A near miss is an unplanned sequence of events that could have caused harm if conditions were different or is allowed to progress, but did not in this instance. Using just these basic definitions, it is very difficult to make a consistent determination on whether a specific event is a near miss or a "non-incident" (neither an accident nor a near miss). If the users of the investigation system do not identify an event to be at least a near miss, then the event will not be investigated and valuable lessons will be lost. This aspect of near miss reporting will be discussed later. We also need to define causal factor and root cause; the definitions below are used later in this article: A causal factor is a human error (typically an error by the at-risk employee performing a task/job in the process) or a component fault/failure. Note that these human errors and component failures are probably caused by other humans making mistakes, and all errors are controlled by management systems. An incident typically has multiple causal factors. Natural phenomena can also be a causal factor. A root cause is a management system weakness that results in a causal factor. A casual factor typically has multiple root causes. The definitions above are the same as used in Guidelines for Investigating Chemical Process Incidents, Second Edition (CCPS, 2003). Given a consistent understanding of the definition of a near miss, it is possible to estimate how many near misses should be reported for every accident. Studies in several industries indicate that there are between 50 and 100 near misses for every accident. Also, data indicates that there are perhaps 100 erroneous acts or conditions for every near miss. This gives

2

a total population of roughly 10,000 errors for every accident. Figure 1 illustrates the relationships between accidents, near misses and non-incidents.

Figure 1: Relationship between Errors and Potential or Actual Impacts Another way to think about near misses in the chemical industry is that there is about one near miss per plant worker per week. The ratios cited depend heavily on the definition of a near miss and also depend on the type of loss. However, across industries there are roughly equivalent ratios of accidents and near misses and errors: Example: Toyota reports there is a ratio of 20,000 errors per major economic loss event. Toyota also requires reporting of about 70 issues per worker per year (these issues can include process improvement ideas of near misses) (Moore, 2007). Assuming the majority of the issues are near misses to at least a minor loss, this is roughly one near miss per worker per week. Example: In passenger air travel, the pilots make about one mistake per two hundred steps (per review of Black Box data randomly selected by two airlines); or one (or several) mistake per flight. As of 2010, there were about 100,000 scheduled flights per day (approximately 35 million flights per year) and an average of 0.85 crashes per million flights (major losses). This is an average of about 1 million errors per major accident. Overall, there are roughly 10,000 errors per loss event (aircraft damage or injury), not counting damage from natural phenomena, such as bird strikes (which are about 100,000 strikes per year). Although a formal tally is not available, there appears to be roughly 100 near misses per minor or major loss event. Example: One ALCOA site (internal data) had a ratio of about 80 near misses per loss event, and that site required each plant worker to report a minimum of four near misses per month (about one per week).

3

So, across very different industries there appears to be roughly the same ratio of errors to major events and the same ratio of errors and near misses to total loss events (major and minor). Unfortunately, as mentioned earlier, many companies in the chemical industry only get a small fraction of near misses reported. In fact, most of the more than 400 chemical companies we deal with indicate that they get only about one or two near misses reported for every accident. And though some companies achieve a ratio of 20 or higher, many others get fewer near misses reported than accidents (ratio less than one).

Importance of reporting and learning from near misses Investigating near misses is critical to preventing accidents, because near misses share the causes and root causes of accidents; they are one or two barriers away from the loss/accident. We are very likely preventing many apparently unrelated accidents when we prevent the ones that are obviously related. Figure 2 illustrates a hypothetical relationship between causal factors and root causes of accidents and those for near misses. From our experience, this relationship appears to be valid; see the case studies and benefits section of this paper to see some of the proofs of this concept.

Figure 2: Interrelationship between the causes of Accidents (Losses) and the causes of Near Misses As a brief explanation of Figure 2, root causes (management system weaknesses) make it more likely for a causal factor to occur and combinations of causal factors (or in rare cases, perhaps a single

4

causal factor) result in near misses and additional causal factors result in losses (accidents) in increasing severity. Also, a root cause can increase the likelihood of seemingly unrelated causal factors. Example: A systemic issue with written operating procedures, such as not leaving a blank line between steps, can lead to higher probability of an operator losing their place in the procedure. This deficiency can be repeated across all procedures at the site (operating and maintenance procedures). So, correcting this deficiency in one unit’s procedure will help; but correcting the format rules for the entire facility will help reduce the chances of any worker losing their place in a procedure. Example: A systemic issue with PHAs across a company, such as not performing a hazard analysis of non-routine modes of operation, can lead to missing up to 90% of the accidents that occur during non-routine modes; and 80% of major process safety incidents occur during non-routine modes of operation. So, correcting this deficiency in one site’s PHAs will help; but correcting the requirements for PHAs across an organization will help even more. Regardless of the theoretical limit on the ratio of near misses in an industry or for a specific process, and given the importance of near miss reporting, why do we have so few near misses reported? To find the answers, we have conducted several surveys (both formal and informal) throughout the past 15 years. The rest of this paper describes the surveys and the results of the surveys, and explains the barriers to getting near misses reported, and how companies are successfully overcoming these barriers.

Informal Surveys During the past 15 years, we have asked more than 5,000 students of our process safety management (PSM) courses and 3,500 students from our investigator leadership training courses how many near misses they get reported for every accident. The students represented about 400 companies, predominantly in chemical-related process industries (chemical, polymer/plastic, petrochemical, refining, oil and gas exploration, pharmaceutical, pulp and paper, etc.). The answers are quite disturbing. More than 95% said that their ratio of near misses reported to accidents reported falls in the range of 0 to 20. Less than 5% of the individuals said the ratio was greater than 5, and less than 2% said the ratio was higher than 10. Students noted that fear of disciplinary action, lack of management commitment, and lack of understanding of the difference between a near miss and a non-incident were the main reasons why near misses do not get reported. In conducting about 150 PSM audits and 7000 process hazard analyses (PHAs) during the past 22 years, we have found ratios from 0 to 105. For the first half of the 1990s, more than 90% of the facilities we talked to had ratios in the range of 0 to 0.5, and more than 95% had ratios in the range of 0 to 1. In the last half of the 1990s, more than 90% of the facilities had ratios in the 0 to1 range, and more than 95% had ratios in the 0 to 2 range. In addition, only a few (less than 2% of the facilities) had a ratio higher than 5. By 2005, many facilities had ratios in the 20 to 100 range and each of these had seen great gains (more on this in future articles), but the average across all companies we talked to increased only slightly. By 2011, the ratio of near misses to accidents (any loss event) was about 2 for companies we deal with. But by the end of 2011, many major companies had ratios well above 50!

5

Our auditors and PHA leaders commented that the primary reasons for lack of reporting were about the same as those found in the survey. The barriers to getting near misses reported, discussed in detail later, are:

1. 2. 3. 4. 5. 6. 7. 8. 9.

Fear of disciplinary action. Fear of teasing by peers (embarrassment). Lack of understanding of what constitutes a near miss versus a non-incident. Lack of management commitment and lack of follow-through once a near miss is reported. An apparently high level of effort is required to report and to investigate near misses compared to low return on this investment. There is No Way to investigate the thousands of near misses per month or year! Disincentives for reporting near misses (e.g., reporting near misses hurts the department's safety performance). Not knowing which accident investigation system to use (or confusing reporting system). Company discourages near-miss reporting due to fear of legal liability if these are misused by outsiders.

The good news is that near-miss reporting appears to be improving, and the industry appears to recognize most of the barriers to near-miss reporting. The bad news is that the ratio is still very poor and improvement appears slow! The number of companies that participated in the informal surveys suggests that the results are a statistically significant representation of the chemical industry. Table 1 summarizes the results of both the informal and formal surveys. Table 1: Summary of Survey Results

Survey Items Current Near-miss Reporting Ratio Range (0-5 years prior) Previous Near-miss Reporting Ratio Range (5-10 years prior) Current Near-miss Reporting Ratio Average (0-5 years prior) Previous Near-miss Reporting Ratio Average (5-10 years prior) Goal for Near-miss Reporting Ratio (average) Theoretical Upper Limit on Nearmiss Reporting Ratio (average) Number of Participating Companies Number of Participating Facilities

Informal Surveys Classroom Found During Polls Audits and PHAs

Formal Survey (or actual organization data)

2 to 20

0 to 105

1 to 5

NA

3

5

1

NA

20+

NA

20

About 100

NA

100

400 8500+

300+ 500+

25+ 400+

1 to 105

2-3

Formal Survey or In-Depth Analysis of Organization Data 6

A formal (written) survey was developed and e-mailed, faxed, and/or mailed to more than 100 companies. Of these, more than 12 replied originally in 1997. Since then, we have reviewed data in detail from more than a dozen major companies. These 25+ companies or affiliates provided data from more than 400 facilities, including more than 150,000 employees in manufacturing and operations. The data were from the chemical industry, polymer industry, refineries, drug/pharmaceutical companies, pulp and paper mills, petrochemical companies, and oil exploration/production. Some of the companies that contributed prior to publication of this paper were:

               

AG Fluoropolymers USA, Inc. ALCOA (Aluminum Company of America) Amoco Oil Offshore Business Unit (now BP-Amoco) Dow Chemical (worldwide) Eli Lilly (International) Chevron (USA only) Conoco Exxon Co. USA - Upstream (now Exxon-Mobil) Mead Paper (USA) National Starch & Chemical Company (USA) Olin Corp. (USA) Petrorabigh (Saudi Arabia) Procter & Gamble (USA) Saudi Arabian Fertilizer Company (SAFCO), and other SABIC affiliates Saudi ARAMCO Toyota (international)

Other companies wished to remain anonymous. Table 1, introduced earlier, provides a summary of the formal survey results. Key findings are:



  

The companies' ratio of near misses reported to accidents report ranged from 0 to 105, but the average value was about 5. This is higher than the informal poll; however, we believe that many of the companies with very low or no reporting of near misses chose not to participate in the formal survey. Also, we know that many of the companies that reported data have completed an intensive effort during the past three to five years to improve near-miss reporting and this has yielded great results. For example, two companies had a near-miss reporting ratio of 1.0 about 10 years ago, but recently were able to increase reporting to a ratio as high as 70 to 80 in the past few. Another company recently increased reporting to 20, and another has reached 15; both of these companies had ratios of below 1 just three years ago. One company increased their report to a ratio of 105 (up from 2 just a few years). Finally, another company in three years has increased their reporting ration to 50 (up from 4 previously). Two very large companies (50,000 employees or more) had ratios higher than 80:1 in their world-wide operations. Companies who participated in the formal survey believe that the theoretical value of the ratio is in the range of 3 to 150, but most believe the theoretical value of the ratio is near 100. This value matches the value found during the informal surveys. Companies believe they can practically achieve a reporting of 30% to 50% of the theoretical ratio; so most believe a ratio of 40 is achievable. This is a slightly higher expectation than found

7

 



during the informal survey (probably because only the better-performing companies replied to the formal survey; those publish papers on near miss reporting but who were not included in the formal survey also achieve ratios). Based on our experience in helping clients optimize their near-miss reporting and incident investigation system, we believe a ratio of 50 is reasonably achievable, and we have found that investigating about 15-20 near misses for every accident (to find all root causes of those near misses) is an optimal investment of resources. The barriers to getting near misses reported were the same as those found during the informal surveys, but some of the solutions were novel. The barriers and solutions are described in the next section of this paper. The barriers can be overcome:  In as few as two months if management takes an aggressive stance on getting near misses reported and if they bold steps necessary at the management level  In a number of ways, though a few approaches seem to work better than others  Never – if management attitude toward worker involvement and blaming for mistakes remains unchanged The number of formal surveys collected prior to submittal of this paper probably does not reflect a statistically significant sample of the chemical process industry but is instead useful for anecdotal comments on getting higher near miss reporting ratios

Barriers and Solutions The formal and informal surveys identified many barriers to reporting of near misses. Most of these were mentioned earlier. Below is a listing of the barriers gathered from the surveys and from our experience. The most critical barriers are listed first, but some of the later barriers can still keep the reporting ratio below 2. Solutions are discussed for each barrier; the solutions have been tried and have worked, but we do not claim that everyone will achieve the same level of results.

1. Fear of disciplinary action This barrier easily ranks highest on the list. Who wants to report a near miss if they believe the bosses will hold the near miss against them or a peer? If this barrier is not overcome, near misses will not be reported. To overcome this barrier, we must first recognize that all accidents (and near misses) are the result of error by some human(s). Our goal should be to find the reasons why this human made a mistake (management system weaknesses) and fix them so that other humans are less prone to repeat the mistake. According to all respondents with high near-miss reporting ratios, the best approach for overcoming this barrier is to: Implement a policy to NOT punish individuals when their errors lead to accidents and near misses (except for acts of malicious intent, such as fights and sabotage). (Note that errors by supervisors during day-to-day supervision and coaching are not usually near misses and so can be corrected on the spot with positive or negative discipline.)

8

This solution is difficult for some managers to accept because it appears to contradict the valid concept of holding individuals accountable for their performance to standards. Actually, the two concepts (policies) apply to different levels of the error universe shown in Figure 1. Individual accountability should be enforced (using one or more of the successful practices, such as selfdirected work teams, behavior-based management, supervisors, etc.) in the "non-incident" portion of the universe. There are roughly 50 to 150 opportunities in this region for every incident, and if management is really keen on instilling discipline (using positive and negative reinforcement, etc.), these are the proper opportunities for action. Conversely, once a sequence of errors and failures propagate to the "incident" level, enough precursor errors have occurred (50 to 150 for each near miss and 5,000 to 20,000 for each accident) to indicate that the current chain of events represents a systemic problem. Systemic problems should not be "blamed" on the individual; we should instead find the system weaknesses and fix these before the next individual runs into a similar problem. Therefore, another important step in overcoming this barrier is to: Find the root causes (management system weaknesses) of each causal factor and only write recommendations to fix root causes. A causal factor may be a mistake someone makes, but finding the reasons why the individual made the mistake is more productive in preventing recurrence than punishing the individual for his or her mistake(s). If we focus on finding the root causes, then ensure that we write recommendations and follow through on them, then we will not blame individuals. Omitting the blame will result in less fear of punishment for future incidents. (This solution is closely related to the first solution of establishing a blame-free culture for incidents.) By the way, remember to not blame the "managers" either; fix the system instead. A related fallacy is: "If we train enough investigators, near misses will get reported; so we do not need to establish a blame-free system." We have seen this assumption proven false when fear is not addressed. At one facility, we trained roughly 10% of the operating and maintenance staff in how to lead investigations (this percentage is not too high by the way). However, management still used the incidents to assign blame to the individuals involved in the chain of events and, in some instances, used the incident as the reason to terminate employment. Because of the fear of continued blame, the ratio at that facility has not increased past 1 (granted, 1 is better than 0, but statistically you need a larger sample of incident data to prevent future incidents related to the same management system weaknesses). Another fallacy somewhat related to the fear of discipline is this misconception: "If we can just get rid of the accident-prone individuals, we can prevent future accidents." Studies have shown that fewer than 20% of the accidents involve "repeaters" (Ref. 1). It is probably more likely that "repeaters" are just less adept at hiding near misses and accidents; or perhaps they are more proactive or open about fixing the problems when they are involved. Management must enforce a "no blame" policy once it is implemented. Exceptions should be made very rarely or not at all. One slip by management can wipe out years of hard work to get near misses reported. Once enforced, the system may need months or years to show results. We have seen tremendous results in just one year (a 10- to 100-fold increase in near-miss reporting) when management proves that they will not assign blame due to an incident. Building trust is the key. Management must "walk the talk."

9

There are some events that warrant discipline. However, these are not accidental in nature; instead they are more criminal in nature. These events include sabotage, severe horseplay, fights and other acts of malicious intent. Therefore, management must be very clear on when discipline is still right; and it is only right for these criminal incidents. A third and fourth solution to reinforce the first two is to: Have peers investigate incidents involving peers. Make sure the employees are the owners of the incident reporting and investigation system. Peers are less intimidating than bosses; and the urge to publicly place blame (or to negatively impact job appraisals) is reduced when peers investigate peers. Management may be reluctant to relinquish control of the investigations, partly because they believe the peers will conspire to "hide" the truth. Cover-ups may occur on some incidents, however isn’t it better to get 10 to 100 times more incidents reported by lowering the fear of the investigation? The companies that have used peers as the investigators have seen dramatic improvements in near-miss reporting; and most reports appear thorough and the results typically appear reasonable. If you do not have the "at risk" employees trained to lead investigations, then consider at least using the peers to "interview" peers. Then over time, train employees to lead investigations and let them "own" the investigation system. Another important benefit of using peers to investigate peers is that this will give you more trained investigators, and therefore your company will be able to begin investigations more quickly (particularly on night shifts and weekends). As mentioned earlier, management must be committed to keeping incidents "blame free." One method to demonstrate commitment to a blame-free incident reporting and investigation system is to: Tell the employees about the new policy to not assign blame and state that they can hold management accountable to this commitment. Another method is to: Offer (at least at the beginning of implementation) incentives (rewards) for reporting near misses. Set accountability for workers of reporting about 12 near misses person per year to achieve a report ratio of about 30 near misses per accident. (Some companies have set a requirement of four near misses per month per worker and this has worked out great. Toyota expects 70 items reported per worker per year and these include a combination of process improvement ideas and near misses.) One company gave away tickets to local college basketball games for each near miss reported. This increased near-miss reporting from a starting value near 1 to a high of 25 (during basketball season). Once the incentives were terminated, the near-miss reporting ratio leveled at about 10. The investment during one winter was well worth the long-term gain in accident prevention. Another company offered an award for the most beneficial near-miss report each month, rather than giving a reward for each one reported. This approach has advantages over the prior approach.

10

An alternative that can be used with or without implementing a blame-free incident system is to: Begin with a system for reporting incidents anonymously; for a short period only. This approach has worked well as a kickoff, but it does not directly solve the problem of building trust. This approach does help get employees in the habit of reporting near misses while management builds trust with the employees. This approach also helps to reduce the other barriers discussed later. One specific example of this approach is to provide self-addressed, postage-paid cards that the employee can fill in and drop in a public mailbox on the way home. They can even have their spouse or friend fill in the card to protect them in case the company decides to use handwriting experts to find the guilty party!

2. Fear of teasing by peers (embarrassment) Some employees are reluctant to report incidents because they are too embarrassed or because they know their peers will never let them hear the end of it. In my first couple of assignments as an operator and shift supervisor (before I finished my engineering degree), we would name the "part" after the "dummy" who broke it. So in my case, I had pumps and reactor lids named after me; and others had similar dubious honors. If done in good humor, such playful banter is not harmful; however, I can speak from personal experience in saying that some shifts will never let the other shifts find out what mistakes they made. The solutions to this barrier include the following: Ensure that all employees understand the importance of near-miss reporting. Demonstrate, through feedback of lessons learned, the importance of near-miss reporting. This could include showing that the recommendations implemented as a result of near- miss reporting have improved the overall safety of each worker. Also: Ensure that all employees understand the harm that teasing can cause to the nearmiss reporting system. Ensure that all employees know that everyone is fair game once the teasing starts. Time. (New employees get picked on more than the old hands; so given enough time, at least the employees with more tenure will be reporting near misses.)

3. Lack of understanding of what constitutes a near miss versus a non-incident In training about 3,500 investigators, we have found that the definition of a near miss is vague. When quizzed, it is common for 30% of a class at a facility to believe that one example event, such as a relief valve opening on demand, is a "non-incident, not even a near miss" while the rest of the class believes it is an "accident or loss event or perhaps a near miss." The ones who believe it is a non-incident cite that "it worked as designed." On the other hand, the rest of the class believes the

11

relief valve opening is at least a near miss because if it hadn't opened, there could have been a catastrophic loss of containment. Several solutions may be necessary to overcome this barrier. First: Develop a list of "in-context" examples that illustrate what you consider to be incidents (particularly near misses) and what you consider to be non-incidents. This list should be created with input from various disciplines in the facility. Start the list by reviewing emergency work orders, process excursions, trouble reports in operating logbooks, etc. The list will be used as a training tool for all personnel who work in or near the process. We recommend creating this list in a two-column format, with examples of incidents listed in one column and examples of non-incidents listed in the other. The examples should be as parallel as possible so that the users (employees) can clearly see the differences. See Table 2.

Table 2: Example Training Tool for Teaching the Difference Between an Incident and a Non-Incident Incident (we will spend the necessary resources to promptly investigate these) Safety relief device opens on demand Pressure reaches relief valve set pressure, but relief valve apparently does not open High-high pressure trip/shutdown (one layer of defense against overpressure of the system) Toxic gas detector in the area tripped/alarms Walking under a suspended crane load Suspended crane load slips

Non-Incident (do not report as an incident; may be trended though) Safety relief device found to be outside of tolerances during routine, scheduled inspection Pressure excursion occurs but remains within the process safety limits High pressure alarm (possible quality impact) Toxic gas detector found to be defective during routine inspection/testing Not wearing a hard hat in a designated area Crane wire rope found to be defective during pre-lift check

Important near misses to get reported are process excursions that reach or exceed the specified safety (or quality) limits of the process. Any time a process parameter reaches or exceeds the stated "process safety limit," the event should be reported as a near miss so the causes can be determined. Nearly every major investigation we have led had multiple "warnings" in the moths, days, or hours prior to the accident. However, the employees did not know that reaching the "high-high pressure alarm point" or reaching the "rupture disk set pressure" constituted a near miss. They checked the system to make sure the disk was still intact, made sure the pressure returned to normal, and then continued operating. They also did not understand (or believe) they had the authority to shut down the process for a near miss. The types of questions to ask when developing the list of near misses include the following:

  

 

What could the consequences be if the circumstances were a little different? How likely is it for the near miss to be spotted before it continues to an accident? How complex is the process (operation) and how many layers of defense are there against the accident? Is the near miss one step away from disaster (are we challenging our last line of defense)? Two steps away (which may be a near miss for a high hazard/high complexity system)? Is the risk associated with the potential accidents well understood? Is there high learning value in this near miss?

12

Once you have the starting list of examples: Train personnel on the examples. This will paint the picture of what the company means by the term "near miss." Over time, expect the list to change and grow as you are faced with unanticipated events. Along the way: Clearly differentiate between a near miss and a "behavior-based management observation." Many companies have implemented a system to have peers observe and try to correct (by coaching, etc.) the behavior of peers. This system should operate in the "non-incident" portion of the error universe. Include examples in a listing, such as Table 2, to illustrate the differences. Finally: Use morning (safety) meetings to capture near misses that were not previously identified. This will keep the topic of near misses high on everyone's mind and will continually improve the understanding of what a near miss is. This system works best when you dedicate a scribe in the meetings for this topic.

4. Lack of management commitment (no training provided on investigation techniques and procedures) and lack of follow-through once a near miss is reported (time is not allocated to investigate near misses, or corrective actions not implemented) Management must demonstrate commitment. What is one measure of commitment? Funding! Management must provide training for investigators. All operations and maintenance staff must be trained on how to recognize and report near misses and on how to interview peers. Also, selected staff must be trained on how to "quality assure" the results of investigations and tabulate and query the data for systemic trends. Management must allow the employees the time necessary to investigate incidents and generate reports. Management must communicate incidents and lessons learned to all affected employees, and management must forward this information to other sites where the lessons would be important. Finally, management must show an interest in the results and enforce follow-through and documentation of the resolution of recommendations. The solutions to this barrier are rather straightforward, but can take many forms. It begins with the following: Provide training to an appropriate number of operations and maintenance personnel on a consistent approach to investigation, which includes causal factor and root cause determination. Based on experience within several companies with mature near-miss reporting systems, we recommend training 10% to 20% of the operating and maintenance staff on how to lead investigations. This training should be 1.5 days or longer; three days of classroom training and the

13

one to two days of coaching bay a qualified leader seems best. Also, train all staff on interviewing skills and train all staff on how to recognize and report near misses (these modules are typically 2 hours and 1 hour in length, respectively). Hold regular meetings with employees to discuss the successes (and weaknesses) of near-miss reporting. Praise employees for submitting near misses. Emphasize to employees how important it is to you for them to invest the time to investigate near misses, including spending overtime labor if necessary. Investigation typically does not require much overtime, but management needs to allocate the time necessary to obtain the required data and to emphasize the importance of investigation to employees. Hold management accountable for achieving a near-miss reporting ratio of at least 20. Set accountability of reporting about 12 near misses person per year (or more) to achieve a reporting ratio of about 30 near misses per accident. Managers will get the message and implement the solutions above if their performance is judged against this parameter. Example: At a large paper company in the USA in 1998, there was a push by the new VP of Operations to increase near miss reporting. Four large pulp and paper mills were targeted for the roll-out. The mill management and other senior staff were taught the importance and new target ratio of 20 near misses reported per accident/loss. The mills averaged about 1600 direct hire employees and more than 500 contractors. Incident investigation leadership training was conducted at each mill for senior operators, senior maintenance technicians, and supervisors in these departments. The starting ratio of reporting was about five accidents/losses per near miss reported (or a ratio of 0.2 for near misses reported versus accident/loss occurring). Reporting goals were set for each department head. After about 6 months, three of the mills had each achieved a ratio of about 20 (most of these were investigated and root causes found and addressed). The 4th mill was stuck at a reporting ratio of 0.2. By the end of the first year, the three mills with the increased reporting ratios for near misses had lowered operational losses by nearly 95% over previous years; the fourth mill had no drop in operating costs. The VP took note. The primary reason for the low reporting of near misses at the fourth mill was the punishment imposed by the mill manager for staff who made mistake that led to losses (though he called these “clear violations of written procedures”). The mill manager believed that it was inappropriate to let workers off the hook for making mistakes. The VP tried to explain that mistakes caught during day-to-day supervision were one thing, but mistakes that showed up on near misses and loss/accident incidents indicated these are systemic (work-force wide) mistakes. After about 9 more months of the same lack of getting near misses reported, the manager of the fourth mill was replaced by someone who would get near misses reported. It is important to note that the mill manager’s salary is about one fifth as much as the saving per year due to lowering these operational losses. Judging by performance measures is important; and the ratio of number of near misses reported versus number of loss events is a primary indicator of trust between management and employees. It

14

is also the direct gage of whether near-miss reporting is high enough. As mentioned earlier, upper management should be very concerned when there are few near miss reports because this means weaknesses in the management systems are not being discovered and corrected. Example: Saudi Basic Industries Corporation (SABIC) is a diverse company of 17 affiliates including being the largest polyolefin producer in the world. Each year they give awards to the top affiliate on safety performance. The affiliate is ranked on many parameters. In the past, the accident/injury rate was the top measure for judgment, but this was recognized as a lagging indicator. In 2005, they changed the rating to leading indicators of performance. The highest priority indicator they chose was the ratio of the number of near misses reported versus the number of loss events occurring. The reporting of near misses immediately jumped significantly. Within 12 months, one affiliate reported a ratio of 77; they also indicated operating losses dropped roughly 90%.

5. An apparently high level of effort is required to report and to investigate near misses compared to low return on this investment This barrier is typically related to the fact that we never truly know how many accidents have been prevented by improved near-miss reporting. However, organizations that have seen dramatic increases in near-miss reporting have also seen dramatic reductions in losses (the root causes of near misses of safety consequences are the same management system weaknesses that lead to adverse impact to operability, quality and profitability). Share with employees the benefits (subjective and tangible) that are expected from increased near-miss reporting. Increased reporting provides more opportunities to learn of weaknesses in the management system, and near misses are far cheaper to learn from than accidents. Ensure that the data are entered in a database and queried regularly. Also ensure that the results of the query are shared with employees so they can see the value of the near misses they are reporting. Example: AMOCO Oil Offshore Business Unit (in the Vermilion Bay area of Louisiana; now part of BP) in 1997 increased its near-miss reporting ratio from 1 to roughly 80 in just 1 year. The company entered all the data in a Microsoft® Access™ database (which the company developed itself) and then queried the data regularly. One of the first observations from the database was that the most frequent near miss was "suspended crane loads slipping." The second most common near miss was "employees walking under suspended crane loads." Based on this data, what is likely to occur very soon? Management shared these findings with the employees and let them draw their own conclusions. Two great benefits were achieved. People stopped walking under crane loads because now they knew that crane loads slip much more often than they originally believed. Second, the employees saw immediate benefit to reporting near misses (new data was shared with them almost immediately).

15

Track the benefits of near-miss reporting and trend these versus the near-miss reporting rate (or the near-miss ratio). This solution will take time to bear fruit, but time will prove what others have learned. Implement user-friendly tools (forms, software, and/or database applications) that ease the burden of documenting and disseminating incident results. Simple forms for inputting/reporting of near misses can ease the burden of notification (reporting) that a near miss has occurred, but forms are only the start for easing the overall burden. You will also need tools to ease the burden of the investigation process and documentation of the results. There are several software tools available for investigating incidents, along with databases for storing and performing trend analysis and queries of the incident data. Some commercial applications combine both major features (which is ideal). However, there are many companies (including Exxon-Mobil, BP-Amoco, Eli Lilly, SABIC affiliates, and others) who have created their own databases. In some cases, the investment was one or more staff-months; in others it was greater. The tool(s) should allow ease of:

    

Inputting (recording) results of the investigation Categorizing the events according to location, material, etc. Tracking and closing recommendations Performing queries of the data across many investigations Trending against type of events, categories, root causes, etc.

The tool should not get in the way of a team's job of deductive reasoning. We have found that several of the tools claim to "help you solve the mystery and deductively reason to the causes and root causes." We have found that most of these tools get in the way of that task. Properly trained investigators do not need software to help them lead and manage an investigation; however, the techniques they use to structure the investigation are critical. We have found that training the users on how to investigate is Key; and special software does not help during the investigation and RCA process. However, software, and particularly those tools with database capabilities, can be critical to managing the large amount of data that can be stored from all investigations.

6. There is No Way to investigate all of the thousands of near misses per month or year! Normally, when there is discussion of having a huge number of near misses reported, such as four per worker per month, the reaction will be shock and then a statement such as Barrier 6. This barrier is closely related to Barrier 5: An apparently high level of effort is required to investigate near misses compared to the small gain perceived. If a site has 500 staff as operators and maintenance craftsmen, then likely 25,000 near misses could be reported. At first glance, it can appear impossible to cope with, let alone investigate that number of near misses (incidents). This is partly true. Part of the reason for the belief that it is impossible to investigate large numbers of near misses stems from the large reports currently required by the company for investigating incidents. Some companies insist

16

on producing what they call “professional” reports of accidents, and these grow to 50 or 100 pages (half of the pages are attachments). Why produce such a large report? What is the use of that large of a report? What makes “size” of a report equivalent to “professionalism” of a report? So one key to reducing both Barrier 5 and 6 is: Simplify the reporting of the investigation/RCA results to the bare minimum needed. Think about every aspect of the report and make sure it is needed. Normally, all that is needed is a:  Cover sheet that includes the date, time, location, one or two sentence description of the near miss or incident, and a title that summarizes the incident at a glance. The cover sheet should also list the team members.  Forms that have the causal factors filled in and the root causes filled in, with perhaps one or two sentences that explain the root cause. These forms can also contain the recommendations necessary to correct the root causes.  That’s It! So, most near miss results will be two pages or so and most loss/accident reports will be about four pages. The complexity of results reporting has grown from the legacy of only investigating losses/accidents. When an organization gets a large ratio of near misses report and therefore a large number of investigations going on, the reports must shrink. This is a good thing. However, if you still have a major accident (which you won’t have if you get a large number of near misses reported and investigated), then add more documentation to meet the needs related to litigation, regulatory interface, etc. Another solution to reducing Barrier 6 is: Get enough investigators trained (as discussed earlier). …otherwise you cannot perform an investigation on the shift it occurs (this is important for reasons beyond the scope of this paper) and you will not be able to keep up. Maybe the most important solution to Barrier 5 and 6 is to first decide which near misses and losses/accidents need to be investigated. The best solution is normally stated as: Let front line foremen or supervisors decide if a near miss or accident needs to be investigated to root causes; the decision is made on the apparent Learning Value of the incident. Figure 1 illustrates the process flow for an investigation system than can handle a large volume of near misses and losses/accidents. For this process to work:  Be prepared for investigations by having enough staff trained in root cause analysis methods (or to help in the analysis, such as being able to interview peers).  When the near miss, etc., is first noticed or reported by staff, let the frontline supervisor or foreman decide if it has high learning value.

17

   

For high learning value incidents, investigate now. For low learning value, put in the database now, along with the little data you have and any obvious causes. Do not investigate yet! Query the database every one to six months and perform Pareto or similar analysis to help decide which recurring events need to be analyzed in more detail. Take the root causes from investigations/RCAs and put in the database as well. Query the database every one to six months and perform Pareto or similar analysis to help decide which recurring root causes need focus.

Figure 3: Best Practice Process Flow for Investigating and Finding Root Causes of Near Misses and Losses/Accidents.

Example: AMOCO Oil Offshore Business Unit (in the Vermilion Bay area of Louisiana; now part of BP) in 1997 increased its near-miss reporting ratio from 1 to roughly 80 in just 1 year. (This resulted in more than 900 misses in the first two months alone). With the initial roll-out of this system, the business unit let the shift foreman decide if a reported incident needed investigation. They made the decision easily based on their perception of Learning Value. There was no second guessing by management. Overall, about 25% of the incidents were investigated and 75% went into the database without further analysis (unless a later analysis of the database indicated a frequently recurring incident).

18

Other companies in this survey found a similar result. The foremen and supervisors have proven very good at screening if an incident needs the investment of an investigation.

7. Disincentives for reporting near misses (e.g., reporting near misses hurts the department's safety performance [as measured versus incident rates] and reduces safetyrelated bonuses/perks) This barrier has stopped near-miss reporting in several instances. One plant manager was even called to headquarters to explain why his "incident" rate climbed so suddenly; his bosses failed to understand that this was an expected and good outcome of implementing an effective near-miss reporting system. The company culture was "enforcement" of standards, and the company has a history of disciplining employees who cause accidents; many in that company still do not believe that giving up the freedom to punish employees when an incident occurs is a good business decision. Disincentive occurs when department goals are tied to lower incident rates. The solution here is obvious and necessary: Ensure that goals and incentives are not tied to lower incident rates (since this discourages reporting), but instead consider providing incentives for achieving higher near-miss reporting ratios (SABIC is trying this with success). Set accountability for workers of reporting about 12 near misses person per year to achieve a report ratio of about 30 near misses per accident. (Some companies have set a requirement of four near misses per month per worker and this has worked out great. Toyota expects 70 items reported per worker per year and these include a combination of process improvement ideas and near misses.) There is still value in tying incentives to business (profitability and productivity) goals, because the company will learn that reporting and investigating near misses will enhance overall business performance (particularly since the near misses of a safety accident or environmental release have the same root causes as incidents that detract from quality and productivity). There have been many papers written on how preventing accidents pays for itself indirectly through improvements in productivity.

8. Not knowing which accident investigation system to use. One consideration that is not related to any of the barriers mentioned above, except marginally to Barrier 3 (lack of understanding of what a near miss is), is the scope of the investigation program. Some companies have one investigation system for occupational safety incidents, another one for process safety incidents, another for environmental releases, another for reliability issues, and yet another for quality and customer services issues. We have found that the same investigation approach and investigator training works well for incidents in any facet of a business. We believe there is merit in combining the systems and, in particular, in combining the incident databases. Combining the incident systems will require more work on defining near misses and in determining success in report near misses.

19

A related consideration is that most incidents affect more than one aspect of a business. Table 3 illustrates this point for an incident involving a 1,000 lb release of cyclohexane from a decanter system at a polymer production facility. The event did not harm any people and did not noticeably damage the environment (though reporting of the release to regulators was required). The event and the actions taken after the release caused the process to be shut down for about 9 hours and caused 3,000 lbs of product to be rejected. (The values in Table 3 are from a qualitative scale, where 10 is very high impact and 0 is very low or no impact.) Table 3: Example of the Impacts of a 1,000 lb Cyclohexane Release Actual Impact of Business Aspect the Incident Safety (harm to people) 0 Environment (harm to nature) 1 Quality (harm to product) 3 Reliability (harm to process efficiency) 5 Capital (harm to property, facilities, equipment) 1 Customer Service (harm to relationship with clients) 2

Potential Impact of the Incident 10 3 3 10 10 10

From the view of both actual and potential impact, the cyclohexane release affects all business aspects. The incident is a near miss for safety, and a minor-major accident for other aspects of the business. Performing six (or more) investigations would be fruitless. Performing one investigation that meets the needs of all business aspects is ideal, and yet also easy. The near-miss definition and related training will need to explain the potential impact of an event in relation to each business aspect, so that the users of the system can identify a near miss. Therefore, the solution includes: Emphasize during training (1) how to report near misses (perhaps you will want different reporting methods for different possible outcomes, though we do not recommend this) and (2) where to go for an answer if you do not know if the event is a near miss. Consider having ONE incident reporting system with ONE approach for teaching employees the definition of a near miss and with ONE approach for doing incident investigations (including one approach for root cause analysis).

9. Company discourages near-miss reporting due to fear of legal liability if these are misused by outsiders There is legitimate concern that near-miss reports can be used detrimentally against a company. In summary, liability typically occurs when:

 a company has many near misses reported, an outsider can claim this shows a history of "unsafe conditions" that apparently is fostered or tolerated by the company  a near-miss report is used to show that a company knew that a certain accident was possible at one site but failed to take effective action to prevent it's occurrence at all sites  a near-miss report directly incriminates the company due to inappropriate wording

20

Liability is mainly an issue in the USA, where we graduate 40 attorneys for each engineer. However, the near misses and accidents do not have to occur in the USA to create a problem for companies based in the USA. An accident that occurs outside can be used in litigation in the USA, to either show a pattern of unsafe conditions, lack of management follow through on key learnings, etc. Even without direct legal liability, opponents of a company can use reports to sway public opinion against a company. And, legal liability for accidents that occur outside of the USA is increasing. Possible solutions to the barriers mentioned above include: Ensure, through investigator training and through auditing of reports, that investigators refrain from broad conclusions and that the language used in the final report is appropriate. Involve legal on major near misses and accidents (any incident where liability could be high) to ensure the results are protected as much as possible under attorney/client privilege. Company attorneys have provided excellent guidance to internal and external investigators on how to conduct and document an investigation to limit liability. Key guidance needs to apply to near misses as well. Such guidance includes:

          

Do not use inflammatory statements such as disaster, lethal, nearly electrocuted, and catastrophe. Do not use judgmental words such as negligent, deficient, or intentional. Do not assign blame. Do not speculate about potential outcomes (for near misses and minor accidents), lack of compliance, or liabilities, penalties, etc. Do not offer opinion on contract rights or obligations or warranty issues. Do not make broad conclusions that can't be supported by the facts of this investigation (let queries of the database demonstrate these conclusions as necessary). Avoid unsupported opinions, perceptions and speculations. Do not oversell recommendations; allow for alternative resolutions of the problems and weaknesses found. Do follow through on each recommendation and document the final resolution, including why it was rejected if that is the final resolution. Do involve legal as soon as possible if the incident appears to have potential liability for the company. Do report, investigate and document near misses to demonstrate the company's commitment (1) to learning where there are weaknesses and (2) to improving risk control.

Even given the possible liabilities, most companies decide that it is better to get near misses reported and to learn how to prevent accidents, rather than to discourage near-miss reporting or record keeping. Therefore, a solution most companies have found critical is: Ensure that technical and business managers understand that:  it is in the company's best interest to get near misses reported and learn from these, in order to prevent future accidents  legal liability concerns should never discourage reporting and investigation

21

 proper investigation and documentation of near misses demonstrates that the company is behaving responsibly to learn lessons and continually improve risk management

22

Benefits If you are very successful at getting near misses reported, you may have the nice problem that only a few companies have experienced: "We have too many near misses reported!" As mentioned earlier, AMOCO Oil (before the buyout by BP) implemented most of the solutions above and was able to increase their near-miss reporting ratio of about 80! However, they did not have the resources to investigate 80 near misses for every accident (the actual number was about 500 near misses across about 20 sparsely staffed, facilities). So, the foremen and operators decided on a case-by-case basis which of the 500 events had high learning value, and those were the ones they investigated. The events that were not investigated were still categorized and entered into the master database. By the end of the year, they found they had investigated roughly 15 near misses for each accident. Another company (in Saudi Arabia) was able to increase near miss reporting to about 2000 near misses per year (compared to 25 losses/accidents in the same year). By investigating about 500 of these near misses, they were able to reduce the number of accidents from 65 to 25 in two years and more importantly, their monetary losses were reduce by more than 90% (with a similar drop in injury rates). A company should strive to reach a ratio of 50-100 and investigate about 20 near misses per accident. This will provide a statistically significant sample of all incidents (and all important errors) and provide a company with sufficient feedback on which management system weaknesses are causing the errors and component failures. Various companies with different cultures have achieved high ratios with great return on investment.

Conclusions It is possible to get near misses reported, but you must first recognize and address each barrier. Reducing fear of discipline is most important, and various steps may need to be taken to achieve success. All of the solutions presented in this paper have been proven in one or more companies and, therefore, should be seriously considered.

Acknowledgements The author is grateful to the companies who contributed data to this paper; the sharing will help us all.

References 1.

Hammer, Willie, Occupational Safety Management and Engineering, Prentice-Hall, Englewood Cliffs, NJ, 1985.

2.

Moore, R., Selecting the Right Manufacturing Improvement Tools, Elsevier, 2007.

3.

Guidelines for Investigating Chemical Process Incidents, Second Edition, CCPS/AICHE, New York, NY, 2003.

23

4.

Incident Investigation/Root Cause Analysis Leadership Training (Course 4), Student Textbook, Process Improvement Institute, Inc., Knoxville, TN, 2003-2012 (revised).

5.

Bridges, WG, Getting Near Misses Reported, International Conference and Workshop – Process Safety Incidents, 2000, CCPS/AIChE.

24