The matrix recoded

Computing courses must address the fact that most IT failures are due to human error and management problems, says Darrell Ince

四月 8, 2010

Last August I lost the sight in my left eye. All I could see was a dark grey fuzz. The NHS was magnificent: I had suffered a retinal detachment and, as soon as the damage was confirmed, I was rushed to the John Radcliffe Hospital in Oxford, where staff successfully reattached the retina. Subsequent events were less good.

I had to attend an outpatients department in my local hospital, and twice it temporarily lost the paper case notes. I was surprised. I recalled that the NHS National Programme for IT (NPfIT) had had more than £12 billion pounds committed to it over the preceding seven years and that its main component was the computerisation of patient records.

Prompted by my experiences, I did some light reading about recent IT systems failures as I recuperated. The results astounded me. In the 1980s I was involved in examining a number of project failures - occasionally as an expert witness. My experiences formed much of my world view about IT failure and software engineering. Even in those days, it was not mainly technical errors that sank projects, it was managerial errors. Common problems included developers being too optimistic about the resources that were needed or the capability of technologies, customers who changed their minds mid-project, and inadequate quality assurance. In the end, I stopped attending project funerals because what had started out as an interesting and remunerative intellectual task had turned into a technical branch of the entertainment industry.

Reading reports of the project failures of the past five years, what surprised me was how history repeats itself. A project that typifies all that is wrong with large projects is C-Nomis, which was intended to provide a single National Offender Management Information System for the probation and prison services.

In November 2009, the government announced the failure of this IT-based project. It was curtailed at the cost of £600 million, with only one element of the project remaining. It was typical of major problems with government systems: a survey by the TaxPayers' Alliance released in November 2009 put the overspend on such projects at £19 billion - of which £11 billion could be ascribed to IT failures.

A number of reasons have been given for the failure of C-Nomis (and many of them are also cited in other IT flops that the National Audit Office identified); these include poor overall management by the Civil Service, a lack of IT experience in the senior responsible officer for much of the project, inadequate costing, over-optimism, a poor contractual relationship with the software developers, an overly complex system and the project's rush to computerise existing processes without considering whether those processes could have been simplified first and then automated.

Another more recent failure occurred in the UK student loans system, where misplaced faith in scanning technology led to a massive backlog of loan applications and, equally importantly, severely curtailed the ability of the Student Loans Company to offer advice by phone. Undue optimism caused managers to overlook the absence of proper contingency plans if the technology failed. As a result of Sir Deian Hopkin's damning report on the loans fiasco, published in December 2009, the head of IT at the SLC was one of two senior figures to leave.

What are the lessons for departments of computing? Before looking at these, it is worth stating that after 20 years of externally examining British computing degrees, I can say that they offer a technical education second to none. Unfortunately, much of the knowledge, tools and skills that could prevent gross failure go untaught.

I'll concentrate on a few. One recurring theme in reports of project failures is the mistake of not questioning the status quo and then attempting to computerise it. For example, C-Nomis had to cope with a large number of probation areas and prisons, which had different ways of handling offenders' records and logging events such as release from jail. Instead of starting by simplifying these processes and developing a single streamlined system, project managers devised a variety of systems to cope with all sorts of local exceptions.

Another problem that is highlighted is that of systems that impair the work of the frontline staff they are meant to aid. A good example of this is the Offender Assessment System, or OASys. This was designed, in part, to protect the public from offenders who could cause harm if they were allowed into the community.

There are more than 750 data items associated with the system, and it can take in excess of two hours to enter all the relevant data. This is cause enough for criticism: the complex procedure raises doubts about data validity and accuracy, and prompts questions about the involvement of computers in sensitive decisions about releasing mentally ill offenders into society.

However, a number of other problems have been found. A study by Jenny Gale, a lecturer in human resource management at the University of Staffordshire, found that the implementation of OASys resulted in the loss of professional autonomy, rising workloads and increasing stress levels. Another study of OASys reported in the summer 2006 issue of the British Journal of Community Justice focused on the use of the software in assessing offenders with mental illnesses. The authors - Wendy Fitzgibbon, senior lecturer in criminal justice studies at University of Hertfordshire, and Roger Green, director of the Centre for Community Research at Hertfordshire - identified a number of problems associated with the fact that assessments were inaccurate and defensive because the basic case files that were relevant to the offender were not explored. Here, the computer elbowed out the normal discussion and appraisal that was needed.

Many of the recent failures including C-Nomis, the Libra IT system for magistrates' courts, the computer systems used to support social workers dealing with children at risk and the student loans system involve a highly techno-centric view of the human operator that may be true for someone interacting with a stock-control system, but is not true for someone whose job has a lot to do with personal contact.

Research projects at the universities of Cardiff, Durham, Huddersfield, Lancaster, Nottingham, Southampton and York revealed major problems with IT systems for support. For example, researchers revealed that the Integrated Children's System used by social workers after the death of Victoria Climbie was so complicated it typically took more than 10 hours to fill in a preliminary form for a child considered to be at risk. A more complete form required on average a further 40 hours. A system that was intended to free time for social workers to carry out home visits demonstrably reduced this time.

Another area where the human dimension is ignored is computer security. If you look at any textbook on computer security you will find excellent coverage of technical issues: how a firewall works, what viruses do and how cryptography can be used to secure financial transactions. What you will not find is much description of the human issues involved in security.

This is quite an oversight given that the majority of major security incidents over the past five years have not been caused by technical glitches. They have involved human error. For example, the United States Department of Veterans Affairs lost a computer containing 38,000 medical records of former servicemen and women; this was followed by the loss of a hard disk containing medical details of 250,000 veterans. In 2007, HM Revenue & Customs lost about 25 million personal and financial records. In December 2009, inadequately shredded medical records with identifiable data were found as wrapping in gift boxes. These are just a few non-technical examples; there are many more.

Systems development in our undergraduate degrees is technically biased in favour of topics such as testing, systems analysis, systems design and programming and other tasks. Nowhere do we cover material that addresses some of the main reasons for IT failure. Because students are taught immensely difficult technical material, they indirectly gain the impression that technical errors cause project failure.

As a result of the bias, students come away with the idea that everything that could be computerised should be computerised. For example, many of the chronicling systems that have failed or been delayed over the past five years such as C-Nomis and the NPfIT patient record system could have been simply implemented at a fraction of the cost with a commercial document management system in each centre that used the system, together with secure manual procedures for swapping data between these centres, rather than an all-encompassing, rococo software edifice. Such a system would also have the flexibility to support local processes and procedures.

So, what would I do with the computing curriculum? The first thing would be to modify any software engineering course and human-computer interaction course to include material on project failure. Second, I would develop a new course on failure that contains a mix of technical, political, cultural and management topics. It would encompass project failures, hardware failures, human failures, systems failures and scientific failures. Third, project management courses should be modified to include material on the cultural, social and political relationship between the developer and the customer. Fourth, security courses would be radically overhauled in favour of an approach that looks at what could go wrong in the real world as well as what could go wrong technically. A good model is a course at the University of Washington that, as well as addressing technical issues, spends a lot of time looking at the security aspects of objects such as traffic lights and safe deposit boxes, and examining security in halls of residence and at automobile dealerships.

Clearly computing departments should not drop their technical curriculum: the operation that saved my eye relied on some very sophisticated computer technology, and I am so grateful to the developers who produced the equipment used by the surgeon, but the curriculum needs to be biased more in terms of what could go wrong rather than what might go right.

If you are a computing academic who is sceptical about curriculum change, let me approach my proposals in another way. A number of the systems I have described are safety critical in the same sense that, say, a nuclear power station system is safety critical: failure leads to a loss of life. An unwieldy system that does not show important case information, one that keeps a social worker from visiting the home of a child at risk or prevents a probation officer from carrying out a proper assessment of a mentally ill offender, is dangerous, and its proper development should be addressed in our degree courses.

I once heard a joke in which an astronaut and a monkey were both given instructions. The monkey was told what dials to look at, what controls to use and how to ensure that the spacecraft satisfied its mission. The astronaut asked the manager conducting the briefing about what his role was. The answer was "feed the monkey". Substitute the word "computer" for "monkey" and "social worker" for "astronaut" and you will gain an idea about some of the systems that have been developed.

请先注册再继续

为何要注册?

  • 注册是免费的,而且十分便捷
  • 注册成功后,您每月可免费阅读3篇文章
  • 订阅我们的邮件
注册
Please 登录 or 注册 to read this article.