Class 2

Why Risk Matrices Lie

A systematic autopsy of traditional risk assessment and why it fails for the psychosocial world

In March 2019, an Australian hospital's work health and safety team completed a comprehensive psychosocial risk assessment. They used the organisation's standard five-by-five risk matrix - the same tool that had served them well for chemical spills, slips and falls, and manual handling injuries. For the hazard "high workload in emergency department nursing staff," the WHS officer assigned a likelihood of 4 ("likely") and a consequence of 3 ("moderate"), yielding a risk score of 12. The matrix colour-coded this as amber - requiring action within 90 days. Three months later, the same hospital's human resources department conducted its own assessment. Using a different but equally standard three-by-three matrix, and consulting the same nursing staff about the same workload, they arrived at a risk rating of "Low." No action required.

Both assessments were conducted competently. Both followed published guidelines. Both were wrong - not because the assessors lacked skill, but because the tool itself was structurally incapable of telling the truth about psychosocial risk. By the time the hospital reconciled the contradictory findings, two senior nurses had resigned and a third had filed a workers' compensation claim for psychological injury. The cost of those three departures exceeded $400,000. The cost of the risk matrix? About $15 for the laminated card pinned to the safety noticeboard.

The Seductive Simplicity of the Risk Matrix

If you have spent any time in workplace health and safety, you have encountered the risk matrix. It is the most widely deployed risk assessment tool in the world: a grid, typically five rows by five columns, with likelihood on one axis, consequence on the other, and colour-coded cells - green, yellow, orange, red - indicating priority for action. Its appeal is obvious. It requires no statistical training. It fits on a single page. It produces a clear, decisive output: this risk is "High," that risk is "Medium," and here is what to do about each. For physical hazards - a frayed electrical cord, an unguarded machine, a wet floor - this simplicity is often adequate. The likelihood that someone will slip on a wet floor can be estimated from incident records. The consequence of a fall is observable and bounded. The hazard is independent: one wet floor does not make other floors wetter.

Psychosocial hazards are none of these things. And yet, driven by regulatory requirements and organisational inertia, the same matrices are routinely applied to hazards like workplace bullying, role conflict, emotional demands, and poor organisational justice. The result, as we will see, is not merely imprecision - it is systematic distortion. The matrix does not simply fail to capture psychosocial risk; it actively misrepresents it.

This chapter conducts a forensic examination of why. We will identify four structural failures that make traditional risk assessment tools - not just matrices, but also traffic-light heat maps, linear checklists, and simple scoring rubrics - fundamentally unsuited to psychosocial hazards. These failures are not bugs to be fixed with better training or more careful calibration. They are features of the tools' mathematical architecture. As Cox (2008) demonstrated in his seminal analysis, the problems with risk matrices are provably inherent: they cannot be designed away.

Failure One: The Resolution Problem

A five-by-five risk matrix has 25 cells. Each cell represents a unique combination of likelihood and consequence. But these 25 combinations must be collapsed into a much smaller number of action categories - typically three to five (Low, Medium, High, or similar). Cox (2008) proved mathematically that this compression is catastrophic for decision-making: a risk matrix can correctly rank fewer than 10% of randomly selected hazard pairs. Put differently, if you pick two hazards at random and ask the matrix which one is worse, it will give you the wrong answer more than nine times out of ten.

For physical hazards, this limitation is partially offset by the fact that most organisations only need to distinguish between a few clearly different risk levels. The difference between a paper cut and an amputation is not subtle. But psychosocial hazards cluster in the middle ranges of both likelihood and consequence. Role ambiguity, interpersonal conflict, time pressure, low autonomy, emotional demands - these are all moderately likely and produce moderate-to-serious consequences. They are precisely the kind of hazards that the matrix cannot distinguish between (Cox, 2008). The tool's poor resolution means that genuinely different risk profiles are assigned the same colour code, while essentially similar profiles may land in different cells depending on minor variations in how assessors interpret the scale.

Recent research has reinforced this finding. A systematic review of probability-impact matrices found that the ranking produced by a risk matrix depends on arbitrary design choices such as scale direction, number of categories, and colour-scheme allocation (Humanities and Social Sciences Communications, 2024). The same workplace situation can receive wildly different ratings depending on which commercially available matrix template happens to be in use. This is not a calibration problem - it is a structural one.

The Psychosocial Amplification

The resolution problem is amplified for psychosocial hazards by the absence of clear category boundaries. When assessing a chemical exposure, "consequence" can be mapped to toxicological data: this concentration causes irritation, that concentration causes organ damage. For psychosocial hazards, the boundary between "moderate" and "major" consequences is inherently arbitrary. Is chronic low-grade anxiety a moderate consequence or a major one? It depends on duration, on the individual, on what other stressors are present, and on what resources are available. Taibi et al. (2022) documented that for psychosocial hazards, an epidemiological, risk-oriented understanding analogous to physical hazards is still largely missing, making the consequence categories of any standard matrix essentially meaningless when applied to psychological outcomes.

Think About It

Consider two psychosocial hazards: (1) a manager who occasionally raises their voice during team meetings, and (2) a systematic pattern of being excluded from important emails. How would you rate the "consequence" of each on a 1–5 scale? Now ask yourself: what specific outcome are you rating the consequence of? Acute distress? Long-term burnout? Turnover intention? Each framing produces a different number - and the matrix doesn't tell you which framing to use.

Failure Two: The Subjectivity Trap

Risk matrices require human judgement to assign likelihood and consequence ratings. For physical hazards, these judgements can be anchored to observable data: incident rates, injury severity records, exposure measurements. For psychosocial hazards, the anchoring data is fundamentally different in kind. The "exposure" is often a subjective experience - how much role conflict does a person feel? - and the "consequence" is a latent psychological state that cannot be directly observed.

Metzler et al. (2021) examined how questionnaire results are used in psychosocial risk assessment and found that prevalent methods for assessing and interpreting questionnaire data are "partly based on empirically unsubstantiated assumptions and have solely indirect or low relation to the actual health risk." Unlike physical or chemical hazards, psychosocial risk questionnaire results often lack meaningful cutoff values. A score of 3.7 on a workload scale tells you almost nothing about actual health risk unless you know the specific dose-response relationship for that population - and such relationships are rarely established.

This measurement challenge is compounded by reporting bias. Research comparing self-reported mental health data to administrative records has found that approximately 36.5% of individuals using antidepressant medication did not report a mental health condition on surveys (Economics Letters, 2017). If more than a third of people with a diagnosed, medicated condition will not disclose it even on an anonymous survey, what confidence can we have in questionnaire-based assessments of workplace psychological states that have no clinical diagnosis attached? The stigma effect means that psychosocial risk data systematically underestimates true prevalence - and the risk matrix has no mechanism for correcting this bias.

The Measurement Paradox

There is a deeper problem still. In physics, the observer effect - where the act of measurement alters the phenomenon being measured - is confined to quantum scales and is irrelevant for everyday engineering. In psychosocial risk assessment, the observer effect operates at full scale. A survey about workplace bullying is itself a psychosocial event. It signals to employees that management believes bullying exists (or doesn't). It creates expectations about what will happen with the results. It can trigger rumination in respondents who had not previously framed their experience as "bullying." And when results are fed back - or conspicuously not fed back - the assessment process becomes a new source of organisational justice perceptions.

This means that psychosocial risk assessment is reflexive: the measurement instrument interacts with the hazard it is measuring. A risk matrix, designed for hazards that sit still while you measure them, has no way to account for this reflexivity. It treats the assessment as a neutral window onto a static reality. For psychosocial hazards, no such window exists.

Think About It

Imagine you are a nurse in a busy emergency department. You receive an email announcing a "Psychosocial Risk Assessment Survey" to be completed by Friday. How does the mere existence of this survey change your experience of the workplace this week? What assumptions do you make about why it was sent now? How does it affect what you report?

Failure Three: The Independence Illusion

Traditional risk assessment evaluates hazards one at a time. The risk matrix has a single row for "high workload," a single row for "low supervisor support," and a single row for "role ambiguity." Each receives its own likelihood, its own consequence, and its own risk rating. The implicit assumption is that these hazards are independent - that the presence or absence of one tells you nothing about the others, and that their combined effect is simply the sum of their individual effects.

This assumption is spectacularly wrong for psychosocial hazards. The Job Demands-Resources model, one of the most extensively validated frameworks in occupational health psychology, demonstrates that job demands and job resources interact rather than simply adding together (Bakker & Demerouti, 2007). Job resources buffer the effect of demands on burnout: high workload combined with high supervisor support produces a fundamentally different risk profile than high workload combined with low supervisor support. The effect is not additive but multiplicative - and in some cases, the interaction reverses the direction of the effect entirely. High job demands paired with high resources can actually increase engagement rather than strain.

Consider what this means for our hospital scenario. A risk matrix might rate "high workload" as a 12 and "low supervisor support" as an 8. The summed risk is 20. But the actual risk of the combination is not 20 - it may be 35, or 50, because the absence of support amplifies the impact of workload through a non-linear interaction. Alternatively, if resources are high, the true combined risk might be 10 rather than 20. The matrix, by treating each hazard in isolation, systematically miscalculates joint risk. And since psychosocial hazards virtually never occur in isolation - they cluster, they cascade, they amplify each other - this error is not occasional. It is the norm.

Evidence of Dependence

The empirical evidence for hazard interdependence is overwhelming. Role ambiguity correlates strongly with poor change management (r ≈ 0.45–0.55 in meta-analyses), because unclear roles and unclear organisational direction share common causes and reinforce each other. High emotional demands correlate with low recovery opportunity (r ≈ 0.35–0.50), because emotionally demanding work environments tend to also be ones where breaks are scarce. Job insecurity correlates with low organisational justice (r ≈ 0.40–0.55), because the same management practices that create insecurity also tend to be perceived as unfair.

When correlated hazards are treated as independent, the error in risk estimation can be enormous. If two hazards each have a 0.6 probability of being present, the naive independence assumption calculates their joint probability as 0.6 × 0.6 = 0.36. But if they are correlated at r = 0.5, the true joint probability is approximately 0.50 - nearly 40% higher. For a cluster of four correlated hazards, the cumulative error can exceed 200%. This is not a rounding error. It is a fundamental misrepresentation of the risk landscape (Humanities and Social Sciences Communications, 2024).

Interactive: Risk Matrix Roulette

Rate the following scenario using the risk matrix, then see how others rated the same situation.

Failure Four: The Static Snapshot Problem

A risk matrix captures a single moment in time. It asks: right now, what is the likelihood and consequence of this hazard? But psychosocial hazards are dynamic systems - they evolve, they escalate, they sometimes resolve spontaneously, and they respond to interventions in delayed and non-linear ways. A workload problem that is "moderate" in January may be "extreme" by March if two team members resign. A bullying situation that is "high risk" today may drop to "low risk" next week if the perpetrator is transferred - or it may persist as a trauma response long after the perpetrator is gone.

Traditional tools have no mechanism for representing these temporal dynamics. They cannot express the concept that a hazard's current state depends on its history, or that today's resource levels constrain tomorrow's risk trajectory. They cannot model feedback loops - the way that burnout reduces performance, which increases workload, which deepens burnout. And they cannot incorporate new evidence as it arrives: each assessment is a fresh start, disconnected from what was known before.

This is particularly damaging for psychosocial hazards because of the phenomenon of latency. The health consequences of psychosocial exposure often manifest months or years after the exposure itself. A period of extreme role conflict in 2023 may produce a depressive episode in 2024. A risk matrix completed in 2023 would rate the consequence as "possible future psychological harm" - an almost uselessly vague category. A matrix completed in 2024, after the depressive episode, might rate the same hazard as "serious" - but by then the opportunity for prevention has passed.

The Three Cases Revisited

Let us return to the three workplace scenarios introduced in Chapter 1 and observe how traditional tools fail each one.

The Hospital. The emergency department's risk register lists 14 separate psychosocial hazards: high workload, emotional demands, shift work, role conflict, low autonomy, workplace violence, inadequate staffing, poor change management, low supervisor support, bullying, compassion fatigue, moral distress, career stagnation, and work-life conflict. Each receives its own row in the matrix. Each is rated independently. The resulting register shows six amber risks, five yellow risks, and three green risks. It suggests a moderate-risk environment. But the register cannot represent the fact that these 14 hazards form an interconnected web - that inadequate staffing drives high workload, which amplifies emotional demands, which accelerates compassion fatigue, which reduces the capacity to cope with workplace violence, which increases moral distress. The actual risk is not the sum of 14 independent hazards. It is one cascading system.

The Startup. A 40-person technology startup uses an online checklist tool to assess psychosocial risk quarterly. The tool asks employees to rate 12 hazard categories on a three-point scale (Low/Medium/High). In Q1, most employees rate "job demands" as High but "job satisfaction" as also High - the classic startup pattern of intense but engaging work. The tool flags demands as a risk. In Q2, a major client is lost and layoffs are announced. Now "job insecurity" joins "job demands" at High, but "job satisfaction" drops to Low. The tool flags both hazards - but treats the Q2 assessment as though Q1 never happened. It cannot represent the trajectory: that the same "high demands" hazard has transformed from an energising challenge (buffered by resources) into an exhausting threat (amplified by insecurity). The risk has qualitatively changed, but the tool's vocabulary is too impoverished to express this.

The Mining Operation. A remote mining site uses a corporate-standard five-by-five matrix, completed annually by the site safety manager in consultation with the HR officer. For "social isolation," they rate likelihood as 4 (most workers are away from family for extended rotations) and consequence as 2 (the company provides recreational facilities and counselling services). Risk score: 8, coded yellow. But this rating was produced by two managers, not by the workers experiencing the isolation. Research on mental health stigma tells us that workers - especially in male-dominated industries - systematically under-report psychological distress (Economics Letters, 2017). The matrix has no mechanism for adjusting its confidence in the input data. It treats the managers' estimate of "consequence = 2" with the same certainty as a measured atmospheric oxygen level of 20.9%. These are not the same kind of data.

Think About It

For each of the three cases above, identify one specific interaction between hazards that the risk matrix fails to capture. What would you need the tool to do differently to represent that interaction?

Interactive: Independence Illusion Detector

For each pair of psychosocial hazards, estimate how strongly the presence of one changes the probability of the other. Then see the empirical data.

The Latent Variable Problem

Underpinning all four failures is a deeper epistemological issue. Traditional risk assessment assumes that the things being measured - likelihood and consequence - are manifest variables: directly observable quantities that can be straightforwardly assessed. For a chemical spill, this is reasonable. The concentration of a substance in air can be measured with a detector. The health consequence of a given exposure level can be read from a toxicological database.

Psychosocial hazards are not manifest variables. "Role conflict" is a latent construct - a theoretical entity inferred from patterns of observable indicators (contradictory instructions, competing demands, unclear priorities). "Burnout" is a latent construct inferred from exhaustion, cynicism, and reduced efficacy. Even "workload" - seemingly the most concrete of psychosocial hazards - is partly subjective: the same number of tasks feels overwhelming to an under-resourced worker and manageable to a well-supported one.

VanderWeele (2022) has argued that the standard measurement models used for psychosocial constructs - both reflective and formative latent variable models - are often empirically violated. Typical scale aggregation in psychosocial risk assessment rests on assumptions about underlying latent variables that are rarely tested and, when tested, are frequently wrong. The empirical implications of standard latent variable measurement models "will often be violated by data" (VanderWeele, 2022). This means that even the input to a risk matrix - the questionnaire score that is supposed to represent the "level" of a psychosocial hazard - may not mean what we think it means.

A risk matrix built on latent variables that are poorly measured, systematically biased by stigma, reflexively altered by the measurement process, dynamically evolving over time, and interacting non-linearly with each other is not a simplification of reality. It is a fiction that happens to be formatted as a table.

The Paradox of Discretisation

There is one more structural failing worth examining, because it connects directly to the alternative framework we will develop. Risk matrices convert continuous quantities into discrete categories. A likelihood that could range smoothly from 0% to 100% is forced into five bins: Rare, Unlikely, Possible, Likely, Almost Certain. A consequence that ranges across a continuous spectrum of severity is squeezed into five boxes: Insignificant, Minor, Moderate, Major, Catastrophic.

Duijm (2015) documented how this discretisation introduces systematic errors. Information is destroyed at every category boundary. A hazard with a true likelihood of 49% and one with a true likelihood of 51% may land in different categories and receive different action priorities, despite being virtually identical. Meanwhile, a hazard at 11% and one at 49% may share the same category, despite differing by a factor of nearly five. The matrix has, in effect, poor resolution where fine distinctions matter and false precision where broad categories would suffice.

For psychosocial hazards, where the underlying distributions are continuous, overlapping, and often non-normal, this forced discretisation is especially destructive. Burnout is not a binary state (present/absent) or even a five-level state. It is a continuous dimension with complex relationships to its antecedents. Emotional demands do not come in five flavours. Organisational justice is not a traffic light. By forcing continuous psychosocial reality into discrete risk categories, the matrix does not simplify - it distorts.

What Would a Better Model Look Like?

We have now catalogued five structural failures of traditional risk assessment when applied to psychosocial hazards:

Poor resolution - the tool cannot reliably distinguish between hazards of different magnitudes
Subjectivity and bias - input data is shaped by stigma, framing effects, and the reflexivity of measurement
Independence assumption - hazards are assessed in isolation despite being deeply interconnected
Static snapshots - the tool cannot represent temporal dynamics, feedback loops, or evidence accumulation
Forced discretisation - continuous psychosocial phenomena are distorted by categorical compression

These are not minor limitations to be patched with better training manuals. As Cox (2008) concluded, the flaws are structural, not implementational, and he recommended "quantitative, decision-analytic methods" grounded in centuries of scientific thought as the alternative. But what, specifically, would such a method need to do?

This is the question we want you to answer - before we reveal the framework that addresses it. Based on everything you have learned in this chapter about how traditional tools fail, what properties would an ideal psychosocial risk model possess?

We suggest you pause here and genuinely attempt to list these properties before continuing. The exercise below will guide your thinking.

Interactive: Ideal Model Specification Workshop

Based on the failures we've examined, type properties that you think an ideal psychosocial risk model should have. Watch as your ideas connect to a set of hidden "mystery features" of a model you'll meet in Chapter 3.

The Specification Sheet

In our experience, students who complete this exercise reliably generate a list that includes most of the following desiderata:

The model should handle uncertainty explicitly - representing not just "best estimates" but the degree of confidence in those estimates
The model should show how hazards connect to each other and to outcomes - making interactions visible rather than hiding them
The model should work with continuous variables rather than forcing everything into discrete categories
The model should be able to reason backward from outcomes to causes - if we observe burnout, what combination of hazards most likely produced it?
The model should update with new evidence - incorporating new data without discarding what was known before
The model should incorporate prior knowledge - using what research and experience already tell us, rather than starting from scratch each time
The model should be able to quantify the effect of interventions - showing what happens to overall risk if we change one specific factor

If this list reads like a wish list for a tool that cannot possibly exist, we have good news. Every single property on this list is a defining feature of a specific class of probabilistic graphical models. You have just independently derived the specification sheet for Bayesian Networks - and in Chapter 3, we will show you exactly how they work.

"The question is not whether risk matrices are useful - they are, for simple hazards in simple contexts. The question is whether they are honest about what they cannot do. For psychosocial hazards, the answer is unambiguously no."

The shift from traditional risk matrices to probabilistic models is not merely a technical upgrade. It is a philosophical reorientation. Traditional tools treat uncertainty as noise to be eliminated - they force us to commit to a single likelihood and a single consequence, as though confidence were free. Probabilistic models treat uncertainty as a feature to be modelled. They allow us to say: "Given what we know, there is a 60% chance that workload is in the high range, and if it is, the probability of burnout increases from 15% to 40%, but this estimate would change if we also knew the level of supervisor support." That single sentence contains more useful risk information than an entire colour-coded matrix.

The journey from the laminated card on the safety noticeboard to a model capable of that kind of reasoning is the journey of this course. It begins, as all good journeys do, with an honest accounting of where the old maps fail. You now have that accounting. In the next chapter, we pick up the new map.

Key Takeaways

Risk matrices can correctly rank fewer than 10% of randomly selected hazard pairs - their poor resolution is a mathematical certainty, not a training problem (Cox, 2008).
Psychosocial hazards lack the clear category boundaries and direct observability that make matrices adequate for physical hazards; consequence categories are inherently arbitrary for psychological outcomes.
Systematic under-reporting due to stigma, the reflexivity of measurement, and the absence of empirically validated cutoff values mean that matrix inputs for psychosocial hazards are fundamentally unreliable.
Psychosocial hazards are deeply interdependent - the Job Demands-Resources model demonstrates that demands and resources interact multiplicatively, not additively (Bakker & Demerouti, 2007).
Treating correlated hazards as independent can produce risk estimates that are off by 200% or more.
Traditional tools capture static snapshots of dynamic systems, cannot represent feedback loops or latency effects, and discard prior knowledge with each new assessment.
Forced discretisation of continuous psychosocial phenomena destroys information and creates false precision at category boundaries.
An adequate psychosocial risk model must handle uncertainty, represent connections between hazards, support bidirectional reasoning, incorporate prior knowledge, and update with new evidence - the specification sheet for Bayesian Networks.

Looking Ahead

In Chapter 3, Thinking in Graphs: Introduction to Bayesian Networks, we meet the modelling framework that addresses every failure identified in this chapter. You will learn to read and construct directed acyclic graphs, understand conditional probability tables, and perform your first Bayesian update - discovering that the mathematical machinery for reasoning under uncertainty is not only powerful but surprisingly intuitive. The wish list you built today becomes the feature list of the tool you will learn to use.

References

Bakker, A. B., & Demerouti, E. (2007). The Job Demands-Resources model: State of the art. Journal of Managerial Psychology, 22(3), 309–328. https://doi.org/10.1108/02683940710733115

Cox, L. A., Jr. (2008). What's wrong with risk matrices? Risk Analysis, 28(2), 497–512. https://doi.org/10.1111/j.1539-6924.2008.01030.x

Duijm, N. J. (2015). Recommendations on the use and design of risk matrices. Safety Science, 76, 21–31. https://doi.org/10.1016/j.ssci.2015.02.014

Metzler, Y. A., Groeling-Müller, G., & Bellingrath, S. (2021). How to use questionnaire results in psychosocial risk assessment: Calculating risks for health impairment in psychosocial work risk assessment. International Journal of Environmental Research and Public Health, 18(13), 7107. https://doi.org/10.3390/ijerph18137107

Salas-Molina, F., Pla-Santamaria, D., Garcia-Bernabeu, A., & Reig-Mullor, J. (2024). Beyond probability-impact matrices in project risk management: A quantitative methodology for risk prioritisation. Humanities and Social Sciences Communications, 11, 670. https://doi.org/10.1057/s41599-024-03180-5

Taibi, Y., Metzler, Y. A., Bellingrath, S., Neuhaus, C. A., & Müller, A. (2022). Applying risk matrices for assessing the risk of psychosocial hazards at work. Frontiers in Public Health, 10, 965262. https://doi.org/10.3389/fpubh.2022.965262

VanderWeele, T. J. (2022). Constructed measures and causal inference: Towards a new model of measurement for psychosocial constructs. Epidemiology, 33(1), 141–151. https://doi.org/10.1097/EDE.0000000000001434

Wüllhorst, V., Agor, K., & Winter, S. (2017). Mental health stigma: Comparing self-reports to administrative records to document systematic under-reporting. Economics Letters, 158, 53–56. https://doi.org/10.1016/j.econlet.2017.07.007