Possibly the most detailed prescription for design cost vs. safety level comes out of the Aviation Industry. There, the DO-254 standard for FPGA and programmable electronic hardware development spells out in great detail what is to be performed for each level of required safety. Other industries spell this out for software and then treat programmable logic as software, although new guidance in this area is coming.
When software is involved in a system, the development and design assurance of that software is often governed by the DO-178B document titled "Software Considerations in Airborne Systems and Equipment Certification." The severity of consequence identified by the hazard analysis establishes the criticality level of the software. Software criticality levels range from A to E, corresponding to severities of Catastrophic to No Safety Effect. Higher levels of rigor are required for level A and B software, and corresponding functional tasks and work products in the system safety domain are used as objective evidence of meeting safety criteria and requirements.
The following safety-related severity definitions are compiled from Wikipedia and the DO178/DO254 standards:
Severity = Catastrophic
Results in multiple fatalities and/or loss of the system.
Severity = Hazardous
Reduces the capability of the system or the operator ability to cope with adverse conditions to the extent that there would be:
- Large reduction in safety margin or functional capability
- Crew physical distress/excessive workload such that operators cannot be relied upon to perform required tasks accurately or completely
- Serious or fatal injury to small number of occupants of aircraft (except operators)
- Fatal injury to ground personnel and/or general public
Severity = Major
Reduces the capability of the system or the operator ability to cope with adverse conditions to the extent that there would be:
- Significant reduction in safety margin or functional capability
- Significant increase in operator workload
- Conditions impairing operator efficiency or creating significant discomfort
- Physical distress to occupants of aircraft (except operator) including injuries
- Major occupational illness and/or major environmental damage, and/or major property damage
Severity = Minor
Does not significantly reduce system safety. Actions required by operators are well within their capabilities. This category includes:
- Slight reduction in safety margin or functional capabilities
- Slight increase in workload such as routine flight plan changes
- Some physical discomfort to occupants or aircraft (except operators)
- Minor occupational illness and/or minor environmental damage, and/or minor property damage
Severity = No Safety Effect
Has no effect on safety.
The DO-254 standard also uses this approach for FPGA's in aviation (commercial and some military). The IEC 61508 standard, which is employed in industrial automation, also uses a multi-level approach to certification and safety (SIL 1-3).
Data from Hi-Rely shows that the development cost of a Level A project can be 240 percent more than that of a Level E project, while a Level D project can cost 160 percent more to develop than a level E project. However, the lifecycle costs of a Level A project are often lower due to fewer field related issues, because once such a product is fielded, it remains "as-is" and under the same configuration.
The policies and procedures at the aircraft level are developed to cover equipment failure conditions. Some items -- such as DSP and Math -- are quite difficult to realize at a Level A standard. Thus, systems are designed with redundancy to allow them to be implemented at, say, a Level C or D and to be completed economically. Other items -- such as radios, which are subject to interference, sun spot activity, and lightning -- simply cannot be made reliable enough to be Level A or B, and as such are left at Level C or D. If an FPGA is just used for a data-bus, for example, it is left at Level C or D.
At Level A, every item in the assurance chain is ratcheted up to full. Code coverage is performed for all conditions of each conditional statement -- low-level requirements are generated and traced to code and tests -- and all items are kept under configuration management control. At Level D, requirements are only traced to test results, and only a minimum of items are maintained under configuration management.
Given that we all enjoy the benefits of safe, low-cost air travel, how is safety vs. cost managed for FPGAs in your industry?