Poor interface design can kill people! In this section, we introduce a case study of Therac-25 which highlights such a scenario. Therac-25 was a computerized radiation therapy machine, devised by Atomic Energy of Canada Limited (AECL), with one of the most dangerous human interface design and software related accidents. In this discussion of the case study, we will focus on the interface problems. The material discussed here has been gathered from three main sources described in the further readings section: Casey (1993); Leveson and Turner (1993) and Leveson (2017). The ones reading the primer are requested to consider these three documents for detailed understanding of all aspects of the accidents in Therac-25 case.
AECL and CGR, a French company collaborated to build medical linear accelerators that accelerate electron beams that could destroy tumours with minimal impact on the surrounding healthy tissues. Although AECL developed a radically new “double-pass” concept of electron acceleration in the 1970’s, AECL and CGR’s business relationship faltered. AECL started to build its own radiotherapy machine with their newly developed concept. This new technology was beneficial as it reduced the amount of space and energy required. Therac-25 was built on this new concept.
Therac-25 was a dual-mode linear accelerator that could deliver photons at 25 MeV or electrons at various energy levels. Therac-25 is more compact, more versatile and easier to use. The machine took advantage of the "depth dose" phenomenon allowing it precise localized aim at malignant tissue. The machine was designed to take advantage of the computer control from the outset and not be a stand-alone machine. It relied more on the software for the functions, and the computer's ability to control and monitor the hardware safety mechanisms and interlocks.
Although the machine was based and inspired from machines that had a history of clinical use but its past was without computer control. It also contained the industry-standard hardware safety features and interlocks which were manually controlled instead of letting the computer take over the control. With change in technology and growth of computers, the team behind Therac-25 put more faith on the software than on hardware reliability. The first hardwired prototype was produced in 1976 and a completely operational computerised commercial version was made available in late 1982. In March of 1983, a safety analysis was performed by the AECL in the form of a fault tree but apparently excluded the ones related to software and human interaction.
Operator Interface:
Therac-25 was operated through a DEC VT100 terminal. The operator would position the patient on the treatment table, and manually set the treatment field size, and gantry rotation, among other requirements. The operator then had to enter the patient identification, treatment prescription-mode, energy level, dose, dose rate and time, field sizing, gantry rotation and accessory data through the VT100 console. The system would then compare the manually set values with those entered into the console, and if it did match, a “verified” message would display permitting the system to treat. If it did not match, treatment would not proceed until corrected. These steps took a long time. Not surprisingly, the operators grumbled about this initially. Therefore, to accommodate this issue, the manufacturer made a provision by which instead of re-entering the data at the keyboard, a quick series of carriage returns was used to merely copy the treatment site data. This interface modification came into sharp relief in several of the accidents.
During operation, in case of an error, the machine was designed to shut down in two ways – first, a treatment suspend which required the system to be reset in order for it to restart; second, a treatment pause which required a single-key command to restart. The treatment pause could be resumed with the “P” key to proceed with the treatment. This feature could be invoked a maximum of five times, after which the machine would automatically stop the treatment and the operator had to reset the system.
The messages related to any error in the system were quite cryptic. For example, the word “malfunction” followed by a number: “Malfunction 54”. In many of such cases, the operator could not refer to the manual as such scenarios were either not properly documented or provided no explanation. The operator did not have any knowledge about the fact that the malfunction messages were placing the patients under possibilities of harm. Further, Therac-25 did not have any in-built safety system that could prevent over-dosage caused by incorrect parameters being entered or intermixed.
In many cases, the operators explained that they had become immune to the error messages because they did not think that these were hampering patients’ safety. In most cases, when the malfunctions occurred, either the service technicians or the hospital physicist would make the Therac-25 operable again.
Messages regarding low dose rate, V-tilt, H-tilt and many other issues, were quite normal during operation. Further, when the operators were instructed about the capabilities of the machine, they were given to understand that the machine had “many safety mechanisms” that would make it “virtually impossible” to overdose the patients. In their minds, the operators were convinced about the safety of the system.
Operator Interface of Therac-25. Recreated from Leveson and Turner (1993).
Therac-25 Accidents:
AECL had eleven Therac-25 installed machines - five in US and six in Canada. There were a total of six reported accidents between 1985-87. Due to the accidents, the machine was recalled in 1987 for extensive design changes—hardware and software. We list, below, in a chronological manner the accidents that resulted in deaths and considerable injuries due to Therac-25. Amongst these, one accident (East Texas Cancer Centre, March 1986) has been developed in some detail to understand the functioning of the interface and its role in the accident.
Map of accident sites of Thearc-25 between 1985-1987.
a. Kennestone Regional Oncology Centre, 1985: A 61 year old woman had undergone lumpectomy to remove a malignant breast tumour and was receiving follow-up radiation therapy to nearby lymph nodes. Due to Therac-25 malfunctions, the patient's breasts had to be removed because of the radiation burns and she lost the use of her shoulder and arm. The manufacturer and operators refused to believe that an accident of this magnitude could be caused by the machine!
b. Ontario Cancer Foundation, 1985: On July 26th, 1985, a 40 year old patient who was being treated for carcinoma of the cervix treatment but the machine shut down after only five minutes of activation with an "H-tilt" error message. The display at the time read "no dose" but indicated a "treatment pause". Due to poorly designed messages, interface problems as well as general technical problems of Therac-25, the AECL technicians estimated that the patient had received a very heavy radiation exposure (about 13,000 to 17,000 rads).
c. Yakima Valley Memorial Hospital, 1985: On December 1985, a woman who had come in for treatment with Therac-25 resulted in erythema, a condition of excessive reddening of the skin in parallel stripes on her right hip. She continued her Therac-25 based treatment because the cause of the reaction on her skin was not determined. Much later, when Therac-25 issues were brought to light, it was discovered that the patient had suffered from chronic skin ulcer, tissue necrosis under the skin and had been in constant pain ever since. these symptoms were relieved when the tissues were surgically repaired and skin grafts were made. The patient survived but was faced with minor disabilities and some scars.
d. East Texas Cancer Centre, March 1986: Therac-25 had been in use at the centre for two years without any accidents. More than 500 patients had been treated until then with that machine. It is the only accident with much more details than the others due to the diligence of the hospital physicist, Fritz Hager, whose efforts helped in understanding the problems of the machine.
Therac-25. Redrawn from Leveson and Turner (1993). Notice the control computer outside of the patient room. The operator had a view inside the treatment room through a TV camera and an intercom. The operator interface shown above.
On March 21st, 1986, Voyne Roy Cox received follow-up radiation therapy for a tumour surgery. He was to receive the therapy at the back of his left shoulder. The technician, Mary Beth, helped him take position on the table. He was supposed to receive a treatment of 22 MeV electron beams of 180 rads.
Cox was used to watching Mary Beth operate the hand-held control console that rotated the table and him to the proper position underneath the machine’s gantry. after this Mary Beth left the patient room and went to the adjacent room where the control computer was placed.
The operator, Mary Beth had worked for some time in the hospital and had quite a level of typing efficiency with her experience. But she failed to notice that the video monitor that would ideally give here the view to the patient room was unplugged and also the interlinking intercom was not in a functional state. However, since she had conducted this work before, she proceeded with the session.
She entered the patients prescription data with great efficiency but then realised she had made a mistake in entering the wrong mode, x instead of e. That means that she had entered for the x-ray mode instead of the electron mode he was supposed to receive. She had been administering most x-ray treatments so was accustomed to the typing errors. It was an easy to fix mistake, just an up key that would edit the mode entry. After verifying all parameters and correcting the error within eight seconds she began the treatment process.
Inside the shielded patient room of the machine, Cox saw a flash of blue light and heard a sizzling sound before he felt a shot of heat on his shoulder.
A moment later the machine shut down displaying "Malfunction 54", and the treatment paused indicating a problem of low priority. The monitor showed no dose being fired, clicked in for a round two of the treatment. The sheet on the side of the machine showed the malfunction as a “dose input 2” error.
In the room, Cox tried to roll to his side only to feel a second shot to his neck while he screamed to stop. He felt his chest muscles constricted, squeezing the air out of his lungs.
Mary Beth had no idea what the error code meant leading to her hitting the proceed a third time. Outside on the treatment table when he was shot a third time, he had better run out to get help.
He ran out in fear of his life and writhing in pain, and bumped into technicians walking down the hall.
Eventually, Mary Beth realized that there was some problem in the treatment room, came out and was met with Cox at the Nurse’s Station.
Cox went on to explain that he had received continued and painful shots of “electric shocks” while lying on the table.
Mary Beth responded by saying that nothing like that had ever occurred before and had no idea what might have caused it. The machine had malfunctioned and shut down automatically, showing that Ray had only received a tenth of his prescribed treatment!
Not much information was made available in the instruction manuals or other documents explaining the malfunction. Later on one of the AECL technicians explained “dose input 2” as a dose that had either been delivered too high or too low. The monitor showed a substantial under-dose of radiation - about 6 monitor units was delivered while the operator requested 202 units.
The physician observed that Ray Cox had an intense erythema over the treated area but it suspected nothing more serious than an electric shock. He was discharged with an instruction to return if he suffered any further reactions. The physicist found the machine calibration with specifications with no problems, so more patients were treated throughout the day on the same machine.
In actuality, the patient, Ray Cox, had received a massive overdose of radiation concentrated to the center of the treated area, which was estimated a possible dose of 16,500 to 25,000 rads in less than 1 sec over an area of 1 cm. The patient experienced continued pain in his neck and shoulder area, later lost the function of his left arm and had periodic bouts of nausea and vomiting too. He was later hospitalised for radiation induced myelitis of the cervical cord causing paralysis to his left arm and both legs, left vocal cord and left diaphragm. He finally died of complications from the overdose five months later.
The physicist and the operator spent a whole day of running tests on the machine but it did no help and he was not told of the other accident reports that had been made of overexposure. One of the engineers doing the investigation suggested that an electrical problem might have occurred.
An independent engineering firm conducting their own investigation, in their final report, explained that no electrical grounding problem was detected in the machine and was not capable of giving an electrical shock. The machine, having found no problem during the investigation was put back into service on April 7th, 1986.
e. East Texas Cancer Centre, April 1986: On April 11th, a male patient was scheduled to receive an electron treatment for skin cancer on the side of his face with a prescription was of 10MeV to an area of 7x10 cm. It was almost similar to the previous occurrence of radiation overdose in the same year. The patient died three weeks later of overdose on May 1st, 1986. He suffered disorientation which progressed to coma with a fever of 104 degrees Fahrenheit, and neurological damage. An autopsy showed acutely high dose of radiation injury to the right temporal lobe of the brain and brain stem.
f. Yakima Valley Memorial Hospital, 1987: On January 17th, the second patient of the day was scheduled to receive two film-verification exposures of 3 and 4 rads electron + 79 rads photon treatment. Later after the accident with Therac-25, AECL's preliminary measurement of the dose delivered on the day when the turntable was in the field-light position was estimated to be 4000 to 5000 rads. Since two attempts were made, it was estimated that the patient had received an approximate of 8000 to 10000 rads instead of the 86 rads he was supposed to receive.
Insights For Interface Design:
Therac-25 served as a major lesson for human factors and interface design of safety-critical systems. The insights gathered have been generalizable to almost every industry that employs safety-critical devices. We present some interrelated learnings that are applicable to interface design.
a) Need for proper requirements: Safety depends on the context or more specifically on the system it is used in and not on the software itself. In most if not all accidents involving the software resulted from flawed software requirements and not on its implementation. People misunderstand that software is safe, if it satisfies the requirement of the software. But most software-related accidents, oftentimes, do not involve coding or implementation errors but requirement flaws. In order to reduce software-related accidents, proper safety-critical requirements are important for building safety into these machines. Safety can’t be ensured at the end; it has to be built in from the beginning. Therefore, we need good requirements right from the very beginning including ones for the human interface.
b) Inadequate investigation of incidents or follow-up on accident reports: Most of the time of such technological accidents the blame is put on the operators rather than acknowledging the technical and interface design errors. Blaming on the operators, leads to patching the symptoms but does no help in understanding the systemic causes of the loss. Unfortunately, the blame game finds operators as their primary target. Changing operators results in fixes-that-fail. Thus, the accidents remains latent in the system regardless of the operator being changed. In these situations, in order to prevent future accidents, the role of the entire system needs to be addressed for understanding the accident. Thus, proper interface design and associated issues resulting in accidents should be recognized as a systemic concept in safety-critical system. Further, they should be properly investigated with appropriate models and frameworks.
c) Safe versus “friendly” user interface, role of “human error”: In safety-critical systems, there is often a tension between safety and “ease of use”. Oftentimes, the sine qua non of interfaces is to make them simple and easy to use. However, in case of safety, we should ensure that actions through the interface that may lead to unsafe states (hazards) are relatively difficult as well as have proper checks and balances. There should also be provision for the operator to recover from slips and mistakes through the interface by providing appropriate recovery mechanisms. This will ensure that operators are not blamed for flawed interface design. In other words, we can design safety into the system. “Human error” in many cases maybe a misnomer.
As the case-study demonstrates, our main challenge is to understand, how to design for the human in complex technological systems to which we turn next.