Centre for Software Reliability

SHIP - Assessment of the Safety of Hazardous Industrial Processes in the Presence of Design Faults

The project partners were:

Associate partners from Eastern Europe were:

Objective

The overall objective of SHIP was to devise a means of assessing, ideally numerically, the achieved reliability or safety of a system in the presence of design faults.

Motivation for the Project

Hazardous industrial processes are of increasing social concern, and need adequate means for judging their safety. As industrial systems become more complex, this becomes increasingly difficult. Complexity increases the risks of both random component failures and design-related failures. Random plant failures can be mitigated by incorporating redundancy in plant design. Design-related failures cannot be mitigated in the same way (as the design fault would be common to redundant components), so design faults may become the dominant factor affecting the safety of complex plant.

In some industries such as aerospace, railways, and nuclear power, quantified. targets are set for plant safety. For random hardware failures there are well-established techniques for quantifying the reliability and safety implications. The assessment of the impact of design faults is more difficult. The main problem with quantification is that we do not know, in advance, the number and nature of the design faults remaining in the plant so it is difficult to quantify the impact on safety.

The overall objective of SHIP was to devise a means of assessing, ideally numerically, the achieved reliability or safety of a system in the presence of design faults and hence improve current industrial practice for safety assessment. This problem was tackled from an unusual viewpoint. In software, all failures arise from design faults. So the SHIP project investigated a range of software engineering techniques for minimising and estimating failures to see if they could be applied to industrial plant. As a secondary objective we were also interested in whether plant-level engineering techniques could improve existing software methods.

Research Programme

In the project, we first established the state-of-the-art for safety practice in industrial and software systems development. We also needed to establish a methodology for combining the available evidence as a basis for making reliability estimates. Within that general framework we needed to examine individual analysis methods (drawn from existing software practice) which provide the basic evidence used in a safety assessment. We conducted a series of case studies to determine whether the individual assessment methods and the overall framework were practicable. Finally we reviewed the impact the results could have on existing industrial practice and standards.

Technical Approach

The state of the art reviews confirmed that it was difficult to assess the impact of plant design faults on safety and reliability in a quantitative manner. The review also showed that there were a number of design and assessment methods in the software field which were potentially applicable to plant as a whole. Some potentially applicable techniques from the computing field which could be investigated in SHIP were:

Such methods have to be integrated with evidence from more conventional sources in order to make an overall safety assessment. The SHIP project decided to base its safety assessment methodology on the well-established "safety case" approach which is standard practice in a number of industries (e.g. nuclear and oil and gas). The SHIP project has developed this safety case approach through a combination of theoretical and practical studies. The theoretical studies have covered:

In the practical studies, we have examined:

We have also considered the potential impact of our research work on current industrial practice and standards.

Safety Case Structure

The safety case should be developed in parallel with the design. It will tend to evolve and become more detailed as the system is developed. At each stage, the basis for the safety arguments should be clear. The safety case should:

Within the SHIP model, a safety case consists of the following elements:

This is summarised in the figure below.

The "evidence" could in fact be a sub-claim so the whole argument structure is recursive, hiding the details in lower level arguments. The evidence might also be initial design assumptions which have to be supported by confirmatory tests as the development proceeds.

The actual nature of the argument and the inference mechanism can vary depending on the system design and the safety case strategy. For example, an argument could be:

In addition the overall argument should be robust, i.e. the argument should be sound even if there are uncertainties or errors in parts of the argument. For example the safety case could be structured as follows:

This argument is robust to a flaw in any single argument chain.

Formulating the Argument

The safety of system is discussed in the context of the following model.

The diagram shows the standard fault-error-failure model for software. A fault is a defect in the design and is the primary source of the failure. After development, the design could be perfect or faulty. In practice some faults are likely to remain in a complex design after development. However, even if it is faulty, the system may still operate correctly most of the time (i.e. stay in the OK state) until some triggering input condition is encountered. Once triggered, some of the computed values will deviate from the design intent (an error). However the deviation may not be large enough (or persist long enough) to be dangerous, so the system may recover naturally from the "glitch" in subsequent computations ("self healing"). Alternatively explicit design features (e.g. diversity, "firewalls", etc.) can be used to detect such deviations and either recover the correct value (error recovery) or override the value with a safe alternative (fail-safety).

Finally we may simply observe the system as a "black box" where we compute the probability of failure from past experience.

The overall approach to generating the safety case involves:

For numerical estimates we need to quantify (or at least bound) the transition probabilities on the transition arcs. In a formal argument we claim the transition probability of OK to ERROR is zero. In a probabilistic argument we might combine an error recovery probability (ERROR to OK) with a bounding estimate for the error rate based on operational testing. Alternatively we may simply rely the failure rate observed in the field.

This safety case approach was applied to plant and computer-based examples and appears to be fairly generally applicable.

Research supporting the Safety Case Approach

In SHIP, we have examined the general approach to constructing a safety case and examined specific safety cases to see how the argument is constructed and what evidence is employed. In this area we have examined:

In addition, we have undertaken research on specific types of evidence and arguments that can be deployed in .safety case reliability assessments, namely:

Case study examples utilising some of these techniques are referenced in supporting technical papers, and some selected examples are presented in later sections.

It can be seen that the reliability grows with operational use, and it was also shown that growth was better for less complex systems. This is consistent with the PLC study results which indicated that faults are eliminated more rapidly in simple designs.

Papers

X/001
P. Bishop, Adelard
"The Variation of Software Survival Times for Different Operational Input Profiles", FTCS-23, Toulouse, June 22-24, 1993, IEEE Computer Society Press, ISBN 0-8186-3680-7.

X/002
P. Bishop, Adelard and G. Bruns, S.O. Anderson, LFCS,
"Stepwise Development and Verification of a Boiler System Specification", International Workshop on the Design and Review of Software Controlled Safety-related Systems, National Research Council, Ottawa, Canada, June 28-29, 1993.

X/003
J. Gorski and A. Wardzinski, FPS
"Application of Formal Methods to Assessment of Software", Proc. European Safety and Reliability Conference (ESREL'94), La Baule, France, June 1994.

X/004
J. Gorski, FPS
"Extending Safety Analysis techniques with Formal Semantics", In Technology and Assessment of Safety-Critical Systems (Edited by F.J. Redmill and T. Anderson), Springer-Verlag, 1994, pp. 147-163.

X/005
Bev Littlewood, CSR
"Learning to Live with Uncertainty in our Software", Proc. 2nd Intl. Software Metrics Symposium, London, Oct 24-26, IEEE Computer Society Press, pp 2-9, 1994.

X/006
P. Mellor, CSR
"CAD: Computer-Aided Disaster!",
High Integrity Systems Journal, Vol. 1, Iss. 2, 1994.

X/007
J. Gorski and A. Wardzinski,
"Formal Specification and Verification of a Real-Time Kernel", Euromicro Workshop on Real-Time Systems ERTS'94, Vaesteraas (Sweden), June 1994

X/008
J. Gorski and A. Wardzinski, FPS
"Formalizing Fault Trees",
Safety Critical Systems Symposium, Brighton (UK), February 1995

X/009
F. Mazzanti, IEI-CNR
"Coding regulations a for Safety Critical Software Development", Second IEEE International Symposium on Software Engineering Standards (ISESS 95), Montreal, Canada, 1995.

X/010
P.G. Bishop and R.E. Bloomfield, 1995 The SHIP Safety Case, in SafeComp 95, Proc.
14th IFAC Conf. on Computer Safety, Reliability and Security (ed. G. Rabe),
Belgirate, Italy, 11-13 October 1995, Springer, ISBN 3-540-19962-4.

X/011
Bev Littlewood and David Wright, CSR
"On a stopping rule for the operational testing of safety-critical software", Proc FTCS-25 (International Symposium on Fault-Tolerant Computing), Pasadena, June 1995.

X/012
Antonia Bertolino, Lorenzo Strigini, IEI-CNR
"Using Testability Measures for Dependability Assessment", Proc. of the ACM/IEEE 17th International Conference on Software Engineering,ICSE 17, Seattle, USA, April 23-30, 1995, pp. 61-70.

X/013
Krassimir Djambazov, Peter Popov, ICCS-BAS
"Software design Faults Simulation", ENCRESS / CSR Conference, Safety and Reliability of Software Based Systems, Bruges, Belgium, September 1995.

X/014
Krassimir Djambazov, Peter Popov, ICCS-BAS
"The Effect of Testing on the Reliability of Single version and 1-out-of-2 Software Systems", ISSRE 95, Toulouse, France, October 1995.

X/015
Krassimir Djambazov, Peter Popov, ICCS-BAS
"Simulation Study of the Role of Testing upon the Reliability of Software based Systems", Problems of Engineering Cybernetics and Robotics, Bulgarian Academy of Sciences, Sofia, Bulgaria, ISSN 0204-9848,1995.

X/016
Bev Littlewood and David Wright, CSR
"A Bayesian model that combines disparate evidence for the quantitative assessment of system reliability",
Safecomp 95, BelGirata, Italy, 11-13 Oct. 1995.

X/017
Bev Littlewood, Martin Niel and Gary Ostrelenk
"The role of models in managing the uncertainty of software intensive systems",
(to appear in Reliability Engineering and System Safety)

X/018
Bev Littlewood, Martin Niel and Gary Ostrelenk
"Uncertainty of software intensive systems", (to appear in the High Integrity Systems Journal).

X/019
P. Bishop and R. Bloomfield, Adelard
"The Ship Safety Case Approach: a Combination of System and Software Methods,", ENCRESS / CSR Conference, Safety and Reliability of Software Based Systems, Bruges, Belgium, September 1995.

X/020
J. Gorski, EFP
"Software Safety - Some Present Research Problems" (in Polish), National Conference for on Software for Real-Time Computer Systems. Wroclaw, Poland, September 1994.

X/021
J. Gorski and A. Wardzinski, EFP
"Safety Analysis of a Computerised Control System" (in Polish), Informatics, 8, 1994.

X/022
Antonia Bertolino, IEI-CNR
"Software Testing for Dependability Assessment"
Second Symposium on Software Quality Techniques and Acquisition Criteria, Florence, May 1995.

Deliverables

D/001

P. Bishop and J. Cheng (eds), Adelard
State of the Art - Plant Safety, Part 1: Main Report.

D/002

P. Bishop, Adelard and L. Strigini, IEI
State of the Art - Computers and Software, Part 1: Main Report.

D/003

P. Bishop and J. Cheng (eds), Adelard
State of the Art - Plant Safety, Part 2: Annexes.

D/004

L. Strigini and F. Mazzanti, IEI, and P. Bishop, Adelard
State of the Art - Computers and Software, Part 2: Annexes.

D/005

P. Bishop, R. Bloomfield, B. Littlewood and L. Strigini
SHIP Project: Final Report. (Adelard reference D/71/0311/1).

SHIP Project Information

The project (ref. EV5V 103) was carried out with financial support from the EEC in the framework of the Environment programme, sub-theme: Major Industrial Hazards.

The project co-ordinator was:

Robin Bloomfield,
Adelard,
3 Coborn Rd,
London E3 2DA,

Telephone: +44 (0)81 983 0217,
Telefax: +44 (0)81 983 1845.
Email: reb@adelard.co.uk