The concept of risk allows decisions to be made in the presence of uncertainty. This uncertainty is of two kinds:
- Aleatory – the inherent randomness of the underlying processes found in the project.
- Epistemic – the inherent state of knowledge about the project.
Two Types of Uncertainty = Two Types of Risk
A key difference between the risk types is that epistemic risk may be reduced by improving our knowledge but aleatory uncertainty represents an absolute limit to our knowledge. To use the coin toss example having thrown the coin a thousand times we would be able to express with confidence the probability of a heads occurring but that’s all we can say about the next coin toss.
Sources of Epistemic Risk
As with other risk domains, we can refine our description of epistemic risk into various areas that we know have traditionally been the source of problems.
Assumptions ‒ In engineering we introduce epistemic risk every time we make an assumption about the world (2). We may make an assumption because we lack data, or we may make simplifying assumptions to make our job easier. In either case the uncertainty we introduce also carries with it a risk.
A further problem with design assumptions is that often they are implicitly rather than explicitly stated, thus they become an invisible and unquestioned part of the landscape, unknown ‘knowns’ if you will.
Safety analyses - Epistemic uncertainty is also a significant problem for safety analyses because while some data may be ‘hard’, such as the failure rates of specific components, other data may be highly uncertain, such as human error rates or the distribution of environmental effects such as lightning. Such uncertainties then become buried in the analysis as assumptions often leading to the effect known as ‘tail truncation’ where the likelihood of extreme events can be significantly underestimated.
Subjective evaluation ‒ Epistemic uncertainty becomes even more dominant when we are asked to evaluate the likelihood of a rare event for which little or no empirical data exists and for which we must rely on subjective ‘expert estimation’ with all the potential for biases that this entails.
Design errors ‒ Design errors are yet another source of epistemic uncertainty, here we can view the design as a ‘model’ of the system, the possibility of an error in the model introduces uncertainty as to whether the model system will correctly predict behavior.
Dealing with Risk
Dealing with aleatory risk. Because we express aleatory uncertainty as process variability over a series of trials aleatory risk is always expressed in relation to a duration of exposure. The classical response to such variability is to build in redundancy, such as backup components or additional design margins that over the duration of exposure reduce risk to an acceptable level.
But with aleatory risk, we also hit a fundamental limit of control. While we can reduce the risk exposure by, for example, introducing redundancy if we keep playing the game long enough eventually we’ll lose.
Dealing with epistemic risk ‒ If epistemic uncertainty represents our lack of knowledge then reducing epistemic risk requires us to improve our knowledge of the system of interest or avoid such implementations that increase uncertainty. In reducing such uncertainty, we seek to reduce uncertainty in our model of system behavior or in the model’s parameters (3).
Looking at these two aspects of uncertainty, we see that the reduction of epistemic risk actually provides a theoretical justification for at least two well-worn principles of safety engineering e.g., the avoidance of complexity and the use of components with well-understood performance.
Complexity in and of itself is not a direct cause of system accidents but what complexity does do is breed epistemic uncertainty. Complex systems are difficult to completely understand and model (4), usually require more design assumptions and are more likely to contain design errors leading to greater epistemic risk. So simplifying a system design has potentially more ‘bang for the buck’ in terms of enhancing safety.
Similarly, the use of components with well-understood and characterized behavior both improves our certainty over parameters such as component failure rates as well as reducing our modeling uncertainty (5).
Uncertainty introduced by design assumptions can be reduced by making all assumptions an explicit part of the design and revisiting these assumptions on a regular basis to see if they remain valid or whether they can be removed and real data substituted. A key point at which assumptions should be checked is the point at which we change the context of a systems use. Uncertainty introduced by design error can be reduced by formal or rigorous design methods and processes.
Early successes but an uncertain future ‒ the difference between aleatory & epistemic risk goes some of the way towards explaining the early successes of the safety engineering community in improving safety. Early efforts focused upon aleatory risk presented by the random failure of system components.
Through the improvement of reliability, use of redundant components, and increased design margins to handle environmental variation significant gains in safety could be made.
- Reliance on subjective judgment ‒ People see things differently: one person’s risk may even be another person’s opportunity. For example, using a new technology in a project can be seen as a risk (when focusing on the increased chance of failure) or opportunity (when focusing on the opportunities afforded by being an early adopter). This is a somewhat extreme example, but the fact remains that individual perceptions influence the way risks are evaluated. Another problem with subjective judgment is that it is subject to cognitive biases – errors in perception. Many high profile project failures can be attributed to such biases. Given these points, potential risks should be discussed from different perspectives with the aim of reaching a common understanding of what they are and how they should be dealt with.
- Cognitive Biases
A cognitive bias is a pattern of deviation in judgment that occurs in particular situations. Implicit in the concept of a "pattern of deviation" is a standard of comparison; this may be the judgment of people outside those particular situations, or may be a set of independently verifiable facts.
Cognitive biases are instances of evolved mental behavior. Some are presumably adaptive, for example, because they lead to more effective actions in given contexts or enable faster decisions when faster decisions are of greater value. Others presumably result from a lack of appropriate mental mechanisms, or from the misapplication of a mechanism that is adaptive under different circumstances. - Post Cognitive Bias and Project Failure
- Cognitive Biases
- Using inappropriate historical data ‒ Purveyors of risk analysis tools and methodologies exhort project managers to determine probabilities using relevant historical data. The word relevant is important: it emphasizes that the data used to calculate probabilities (or distributions) should be from situations that are similar to the one at hand. Consider, for example, the probability of a particular risk – say, that a particular developer will not be able to deliver a module by a specified date. One might have historical data for the developer, but the question remains as to which data points should be used. Clearly, only those data points that are from projects that are similar to the one at hand should be used. But how is similarity defined? Although this is not an easy question to answer, it is critical as far as the relevance of the estimate is concerned. «More on the reference class problem goes here.
- The Reference Class Problem
- Focusing on numerical measures exclusively ‒ There is a widespread perception that quantitative measures of risk are better than qualitative ones. However, even where reliable and relevant data is available, the measures still need to based on sound methodologies. Unfortunately, ad‒hoc techniques abound in risk analysis: see Cox’s risk matrix theorem and limitations of risk scoring methods for more. Risk metrics based on such techniques can be misleading. As I point out in this comment, in many situations qualitative measures may be more appropriate and accurate than quantitative ones.
- Cox Risk Matrix Theorem
- Limitations on Risk Scoring Methods
- Ignoring known risks ‒ It is surprising how often known risks are ignored. The reasons for this have to do with politics and mismanagement.
- Overlooking the fact that risks are distributions, not point values ‒ Risks are inherently uncertain, and any uncertain quantity is represented by a range of values, (each with an associated probability) rather than a single number. Because of the scarcity or unreliability of historical data, distributions are often assumed a priori: that is, analysts will assume that the risk distribution has a particular form (say, normal or lognormal) and then evaluate distribution parameters using historical data. Further, analysts often choose simple distributions that that are easy to work with mathematically. These distributions often do not reflect reality. For example, they may be vulnerable to “black swan” occurrences because they do not account for outliers.
- The Black Swan Problem
- Failing to update risks in real time ‒ Risks are rarely static – they evolve in time, influenced by circumstances and events both in and outside the project. For example, the acquisition of a key vendor by a mega‒corporation is likely to affect the delivery of that module you are waiting on –and quite likely in an adverse way. Such a change in risk is obvious; there may be many that aren’t. Consequently, project managers need to reevaluate and update risks periodically. To be fair, this is a point that most textbooks make – but it is advice that is not followed as often as it should be.
- Estimation Errors
This brings me to the end of my (subjective) list of risk analysis pitfalls. Regular readers of this blog will have noticed that some of the points made in this post are similar to the ones made in previous work on estimation errors. This is no surprise: risk analysis and project estimation are activities that deal with an uncertain future, so it is to be expected that they have common problems and pitfalls. One could generalize this point: any activity that involves gazing into a murky crystal ball will be plagued by similar problems.
References
- Uncertainty in Probabilistic Risk Assessment: A Review, A. R. Daneshkhah, August 9, 2004, University of Sheffield.
- Uncertain Judgments: Eliciting Expert Probabilities, Anthony O’ Hagan, Caitlin E. Buck, Alireza Daneshkhah, J. Richard Eiser, Paul H. Garthwaite, David J. Jenkinson, Jeremy E. Oakley and Tim Rakow