Safety Instrumented Functions: Do you know your performance data?

By EUR ING David Green BEng(hons) CEng MIET MInstMC

Increasing emphasis is being placed on the data used in Safety Instrumented Function (SIF) design. This article explores the reasons and methods to allow people to evaluate their data effectively and builds on the article on the proof testing of Safety Instrumented Systems / Functions (Oct-Dec 2016 edition). This paper has been developed based on two decades of operational experience by the author, within top tier Control of Major Accident Hazards (COMAH) establishments within the UK, and subsequent consultancy work within multiple industry sectors.

Safety Instrumented Functions: Do you know your performance data?

1. Introduction

Current good practice in the process sector, with respect to functional safety, is to achieve compliance with IEC 61511.  The approach adopted by global regulatory bodies is to benchmark the functional safety achieved against requirements in IEC 61511. This is present within the guidance on demonstrating adequacy of the control measures for the Safety Case Regulations. The standard has been updated in 2016 (with an amendment in 2017) to the second edition.

The emphasis on reliability data utilised for the justification that the equipment installed meets the requirements of the allocated Safety Integrity Level (SIL) has been changed within the updated edition.

IEC61511-1:2017, Clause 11.9.3 states:

‘The reliability data used when quantifying the effects of random failures shall be credible, traceable, documented, justified and shall be based on field feedback from similar devices used in a similar operating environment’.

The requirement for Functional Safety Assessment has changed requiring that the stage-4 assessment, during the operation and maintenance phase of the safety lifecycle, is now a ‘shall’ requirement within the standard. Resulting in the requirement for the stage to be conducted as a mandatory activity.

Clause 5.2.6.1.10 states:

‘A FSA shall also be carried out periodically during the operations and maintenance phase to ensure that maintenance and operation are being carried out according to the assumptions made during design and that the requirements within IEC 61511 for safety management and verification are being met.’

These changes mean that if you are claiming compliance to the standard within your regulatory commitments these requirements are now mandated to be completed as part of your company’s activities.

The activities rely on the performance of the functions being recorded and reviewed. Through my consultancy role I work with many companies globally. A common theme is that people are recording results (either in a Maintenance Management System OR on paper). However, the missing activity is the review of the results in terms of the failure / demand rate evaluation.

Calculations conducted during the design stage utilise data usually from a corporate or generic data sets. The assumed failure rates should be reviewed whilst the equipment is in operation. The review of the real equipment performance is therefore crucial to determine that the systems are operating as expected during the design.

2. Where does the information come from?

The maintenance and operating technicians are pivotal in providing feedback on the performance of the functions and their associated equipment. The evaluation of the achieved demands and failure rates is only as good as the data received.

The failures and demands come from:

  • Proof testing results
  • Reports from operators
    • trip worked correctly and caused system shutdown
    • trip didn’t work correctly and didn’t shutdown the system
  • Equipment defects observed with diagnostic alarms

3. What should be recorded?

  • Faults:
    • dangerous – which would prevent the function from operating when requested (or late outside of tolerance)
    • safe – which would / has caused a spurious trip (or early outside of tolerance)
  • Demands:
    • successful demand leading to a spurious trip
    • unsuccessful demand in which a component didn’t activate correctly (see dangerous failure in faults)

4. How should evaluation be conducted?

The evaluation of the data should be conducted by an individual that understands how to evaluate data. There are statistical methods for evaluating the data which should be suitably grouped in line with the failure rate data used in the design.

An important factor to understand is that failure rates claimed in designs are very low and therefore the amount of data required to perform a representative review is many operating hours. IEC 61508-7 – Annex D, provides guidance in table D.1 that the number of operating hours should be a minimum of 3 million operating hours.

Failure rate data will be specific to either a specific model or group of equipment. For example, Pressure Transmitter OR John Doe’s Pressure Transmitter.

The requirement is to evaluate the data in similar operating conditions therefore the evaluation should include as many devices as possible operating in the location with the similar process conditions. The more data that is included the better and more representative the output will be. The devices evaluated don’t always have to be on safety service.

Demand evaluation is more straight forward than failure evaluation, this literally the period of time between demands which should then compared to the assumed demand rate from the study record which determined the SIL value.

5. How do I evaluate the failures?

There are many methods to evaluate data with statistical methods. The main ones are:

  1. basic calculations
  2. Chi squared
  3. Poisson distribution

The basic calculations can only be used if there are more than 10 failures. The key information required is the operating hours within the equipment being evaluated and the number of failures. The equation that can be employed is:

Failure rate (λ) = number of failures (k) / operating hours (T)

Example (basic calculation):
Population of devices 750 (in same operating conditions and process fluids)
Period of Evaluation: 10 years (7500 operating years = 6.57E07 hours) {T}
Number of Failures observed = 73 {k}
Number of Safe Failures = 47
Number of Dangerous Failures = 26

Total Failure Rate (λ)
= k / T
= 73 / 6.57E07
= 1.11E-06 per hour

Safe Failure Rate (λs
= k / T
= 47 / 6.57E07
= 7.15E-07 per hour

Dangerous Failure Rate (λD)
= k / T
= 26 / 6.57E07
= 3.95E-07 per hour

More accurate evaluation is by mean of the statistical methods detailed above. These can work even if you have no recorded failures. There is no simple formula for these evaluation techniques.

Chi Squared is a statistical method which has lack of symmetry. The method involves the gathering of the data and then evaluating tables to establish parameters. The method still requires the gathering of operating hours and number of failures. The method then utilises these values to establish some parameters ‘α’ and ‘n’ to establish the ‘χ2’ value. The parameters are then utilised for the evaluation. The benefit of using Chi Squared is that the method can accommodate zero observed failures.

Statistical journals can explain the Chi squared methodology and Poisson distribution in far more detail for those with interest, than the article is intended to cover.

6. Is there nothing to help me?

There are software tools in the marketplace including ESC’s DATA Comp module (part of the ProSET suite www.proset.co.uk) which allows the recording of events per device. This then provides the following benefits:

  • ease of viewing performance of the equipment in a chronological order
Figure 1 - Record of Faults / demands per device

Figure 1 – Record of Faults / demands per device

  • Demand evaluation per device
Figure 2 - Installed time and demands per device

Figure 2 – Installed time and demands per device

  • Failure rate evaluation per group of equipment
Figure 3 - Evaluation of failure rate per group of devices

Figure 3 – Evaluation of failure rate per group of devices

  • Ability to determine the bad actors within the groups of equipment based on failures or demands
  • Ability to ‘back calculate’ and apply the observed failure rates to the calculations.

7. We aren’t doing any of this, where do I start?

The starting point is to be collecting data for the performance of your SIFs. The results should be evaluated for the cause of the fault and categorisation of whether the equipment would have worked correctly when called upon i.e. was it a safe or dangerous failure.

Once you have the list of devices you can build a database to conduct the evaluation. Starting today you can record failures going forward. If you have historical records these are all useful pieces of information. The more information that you have the better the accuracy of the evaluation will be.

Once you have data start reviewing it. Are there any trends emerging?

  • Certain models
  • Certain locations / functions
  • Certain environmental / process conditions
  • Common failure causes

Once evaluated take actions to remedy and issues. This way you may be able to remove any systematic failures and reduce the numbers of failures being observed.

By reviewing and taking actions to eliminate failures / demands then fewer issues should be observed. The reduction of issues will lead to fewer production interruptions due to faults or spurious demands in the safety functions. These functions are there to STOP production and place the facility into a safe state. Therefore, they should be sparingly activated (hopefully never if the plant is controlled well) so any improvement is a benefit.

My former role as a company engineer required the data evaluation which was beneficial to identify both specific application issues and models that required attention. The resolution required investment but the performance after the investment was improved.  Without the site wide evaluation some of the issues would not have been observed as the failures occur in different operating units, often managed and reviewed by separate personnel.

8. Conclusions

  • The standards in the process sector have changed, more emphasis is placed on data being traceable to field experience;
  • Understanding the performance of the equipment installed at your location benefits the company through evaluation and resolution of issues reliability / availability of the SIFs will be improved. This improvement leads to fewer production interruptions and costs for remediation;
  • More accurate data will assist in the demonstration that the design was correct;
  • More accurate data will allow easier justifications for less frequent testing, preventing additional production interruption and reduced likelihood of an error when returning the devices to service;
  • Functional Safety Assessments during the operation phase are conducted more easily with a structured performance review system.

I need help!

Anyone who needs assistance in setting a system up or requires some helping in completing the evaluation on the company’s behalf please contact me for support We can guide you through this process. Contact: d.green@esc.uk.net