Class 6

Building the Machine: Constructing Your First Bayesian Network

Transforming causal intuition and probability language into a working computational model of psychosocial risk

Imagine you are the new Wellbeing Lead for a 40-bed acute medical ward. In your first week, three nurses resign. The remaining staff are working double shifts to cover gaps, medication errors have doubled this quarter, and patient complaints are rising. Your hospital's risk register lists each of these problems as a separate item, each scored on its own likelihood-times-consequence matrix. Yet every nurse you speak to tells the same story: these problems are not separate - they feed each other. High patient loads create fatigue, fatigue erodes concentration, understaffing forces overtime, overtime drives people to quit, and each resignation makes the staffing crisis worse.

You sketched a causal web of these relationships in Chapter 4. You learned to quantify uncertainty with conditional probability in Chapter 5. Now it is time to combine both skills. In this chapter, you will build a Bayesian network - a model that captures both the structure and the strength of these interconnected risks - and in doing so, you will hold in your hands a tool that can answer questions no risk matrix ever could.

What Is a Bayesian Network?

A Bayesian network (BN) is a compact, visual, and mathematically precise way of representing how a set of uncertain variables relate to one another. Formally, it consists of two components: a qualitative structure and a quantitative parameterisation (Koller & Friedman, 2009). The qualitative structure is a directed acyclic graph (DAG) - the kind of arrow diagram you built in Chapter 4 - where nodes represent variables and directed edges (arrows) represent causal or influential relationships. The "acyclic" constraint simply means no chain of arrows can loop back on itself; causes flow forward. The quantitative component is a set of conditional probability tables (CPTs), one for every node, specifying how strongly each node's state depends on the states of its parents.

Pearl (1988) introduced this framework as a way to make probabilistic reasoning tractable. The key insight is factorisation: instead of trying to specify the probability of every possible combination of every variable simultaneously - a table that grows exponentially - a BN decomposes the joint probability distribution into a product of smaller, local conditional distributions. Each node only needs to "know about" its direct parents, not the entire network. This is what makes the approach computationally feasible and, crucially, what makes it buildable by human experts working with limited data (Fenton & Neil, 2018).

In the context of psychosocial hazards, each node might represent a workplace condition (high workload, poor rostering), a mediating state (fatigue, emotional exhaustion), or an outcome (burnout, medication errors, turnover intention). Each arrow encodes the kind of causal reasoning you practised in Chapter 4. And each CPT encodes the kind of conditional probability you learned in Chapter 5. The Bayesian network is the point where those two skill sets converge.

The Three Components, Defined

Following the construction framework proposed by Schenekenberg et al. (2017), we can distil BN construction into three essential components that you will assemble in sequence.

1. Nodes: The Variables

Every Bayesian network begins with a decision about what to include. Nodes represent the variables relevant to your problem. For our nursing-ward scenario, drawing on the predictor categories identified in Dall'Ora et al.'s (2020) review of nursing burnout, we select six nodes: Patient Load (High / Normal), Staffing Level (Adequate / Inadequate), Rostering Quality (Good / Poor), Managerial Support (Present / Absent), Burnout (High / Low), and Adverse Outcomes (a composite of medication errors and turnover, High / Low). Each node has a finite set of mutually exclusive states - for simplicity we use binary states, though richer models can include ordered categories.

2. Edges: The Causal Structure

Edges are the arrows connecting parent nodes to child nodes. Drawing on Chapter 4's causal reasoning and the empirical patterns in the literature, we specify: Patient Load → Burnout, Staffing Level → Burnout, Rostering Quality → Burnout, Managerial Support → Burnout, and Burnout → Adverse Outcomes. Additionally, Staffing Level → Adverse Outcomes captures the direct pathway by which understaffing leads to errors independent of burnout. Notice that Patient Load, Staffing Level, Rostering Quality, and Managerial Support have no parents - they are root nodes whose probabilities are specified unconditionally.

3. Conditional Probability Tables: The Numbers

Each node requires a CPT. For root nodes, this is simply a prior probability - for example, P(Patient Load = High) = 0.60, reflecting that the ward is overloaded more often than not. For child nodes, the CPT specifies the probability of each state given every combination of parent states. Burnout, with four binary parents, requires a table with 2⁴ = 16 rows. Each row answers a question like: "If patient load is high, staffing is inadequate, rostering is poor, and managerial support is absent, what is the probability of high burnout?" Research such as García-Herrero et al. (2013) provides empirical estimates for exactly these kinds of conditional relationships.

Think About It

Before reading on, consider: why does the Burnout node need 16 rows in its CPT, while Adverse Outcomes needs only 4? What determines the size of a CPT? (Hint: think about how many parents each node has and how many states each parent takes.)

Step-by-Step Construction: The Nursing Ward Model

Step 1 - Define the Structure

We begin by drawing the DAG. The four root nodes sit at the top. Arrows converge on the Burnout node in the middle. From Burnout and Staffing Level, arrows proceed to Adverse Outcomes at the bottom. This structure encodes our causal theory: workplace conditions influence burnout, and burnout (alongside staffing) drives patient-safety and retention outcomes. He et al. (2023) followed a nearly identical approach in constructing their BN for psychosocial hazards among construction workers - identifying root-level workplace conditions, mediating psychological states, and downstream health outcomes.

Step 2 - Populate the CPTs

Where do the numbers come from? Three sources are available, and in practice most models blend all three (Fenton & Neil, 2018):

Published research. Dall'Ora et al. (2020) report that nurses working shifts longer than 12 hours have burnout rates approximately 40% higher than those on standard shifts. García-Herrero et al. (2013) provide conditional probability estimates linking demand–support combinations to stress outcomes.
Organisational data. Your hospital's HR system records turnover rates, incident reports, and roster patterns. These can be used to estimate frequencies directly.
Expert elicitation. When data are sparse - common in psychosocial risk - structured interviews with experienced clinicians and managers can produce credible probability estimates. Podofillini et al. (2016) evaluate five formal methods for eliciting CPTs from limited expert judgement, finding that even simple interpolation techniques produce usable tables when guided by clear protocols.

The critical insight is that you do not need a massive dataset to build a useful model. Expert knowledge, anchored to whatever data exist, is a legitimate and often necessary source of quantification.

Step 3 - Validate the Model

A model is only useful if its outputs match reality. Validation involves checking whether the network's predictions - for instance, the overall probability of high burnout given current ward conditions - align with observed rates. Schenekenberg et al. (2017) recommend sensitivity analysis (systematically varying inputs to see which nodes most influence outcomes) and face validity checks (presenting results to domain experts and asking whether the patterns are credible). If the model predicts that burnout should be rare when all four risk factors are present, something is wrong with the CPTs, and they need revision.

Putting Numbers to Causes: The CPT Workshop

The conditional probability table is where qualitative reasoning becomes quantitative power. To build intuition for how CPTs work - and why they capture effects that risk matrices miss - try the interactive exercise below. You will populate a small CPT for a three-node network and compare your estimates to expert-derived values.

The CPT Workshop: Quantifying Causal Connections

A three-node network: High Job Demands (High/Low) and Low Job Control (Low/High) are parents of Burnout Risk (child). Enter your estimated probability (0–100%) that Burnout Risk is High for each combination of parent states.

Why This Overcomes the Risk Matrix

Recall from Chapter 3 the three limitations of traditional risk matrices: they assume hazards are independent, they cannot model propagation through a system, and they ignore directionality - the asymmetry between cause and effect. A Bayesian network addresses all three.

Interdependence. CPTs explicitly encode how combinations of parent states jointly determine a child's probability. As the widget above illustrates, the interaction between high demands and low control produces a disproportionately elevated burnout risk - not the simple sum that an independence assumption would predict (García-Herrero et al., 2013).
Propagation. Because nodes are connected by directed edges, evidence entered at any point in the network flows through the entire structure. Observing that staffing is inadequate updates not only the Burnout node but, through Burnout, the Adverse Outcomes node as well. Pearl (1988) formalised this cascade as belief propagation.
Directionality. Arrows point from cause to effect, preserving the asymmetry of causal influence. High workload raises burnout probability, but learning that someone is burned out does not, in the model, change the actual workload - it changes our belief about what the workload might have been, a distinction the BN handles through Bayes' theorem.

In short, the Bayesian network replaces the flat, isolated cells of the risk matrix with a living, interconnected web of probabilistic reasoning - exactly the kind of model that psychosocial hazard mapping demands.

Think About It

Suppose your hospital's risk committee wants to know: "If we improve rostering quality, how much will adverse outcomes decrease?" Could you answer this question with a risk matrix? How would you answer it with a Bayesian network?

Key Takeaways

A Bayesian network consists of a directed acyclic graph (DAG) encoding causal structure and a set of conditional probability tables (CPTs) quantifying the strength of each relationship.
Construction follows three steps: define the structure (nodes and edges), populate the CPTs (from research, data, or expert elicitation), and validate against observed workplace patterns.
CPTs capture interaction effects between parent variables - the joint influence of combined risk factors - that risk matrices, which treat hazards independently, systematically miss.
Expert elicitation is a legitimate and well-established method for populating CPTs when large datasets are unavailable, making Bayesian networks accessible to practitioners.
Bayesian networks overcome the three core limitations of risk matrices: they model interdependence, enable probabilistic propagation, and preserve causal directionality.

Looking Ahead

We have built the machine - a Bayesian network with structure and numbers. But a machine that sits idle is useless. In Chapter 7, we will learn to run this machine: entering evidence, propagating beliefs, and answering "what if" questions. What happens to burnout probability if we improve just one factor - say, managerial support - while everything else stays the same? The answer lies in inference, and it is the moment the invisible web becomes visible.

References

Dall'Ora, C., Ball, J., Reinius, M., & Griffiths, P. (2020). Burnout in nursing: A theoretical review. Human Resources for Health, 18(41). https://pmc.ncbi.nlm.nih.gov/articles/PMC7273381/

Fenton, N., & Neil, M. (2018). Risk assessment and decision analysis with Bayesian networks (2nd ed.). CRC Press. https://www.routledge.com/Risk-Assessment-and-Decision-Analysis-with-Bayesian-Networks/Fenton-Neil/p/book/9781032917917

García-Herrero, S., Mariscal, M. A., García-Rodríguez, J., & Ritzel, D. O. (2013). Using Bayesian networks to analyze occupational stress caused by work demands: Preventing stress through social support. Accident Analysis & Prevention, 57, 114–123. https://doi.org/10.1016/j.aap.2013.05.008

He, Q., Chan, A. P. C., & Choi, T. N. Y. (2023). A Bayesian network model for the impacts of psychosocial hazards on the mental health of site-based construction practitioners. Journal of Construction Engineering and Management, 149(5). https://doi.org/10.1061/JCEMD4.COENG-12905

Koller, D., & Friedman, N. (2009). Probabilistic graphical models: Principles and techniques. MIT Press. https://mitpress.mit.edu/9780262013192/probabilistic-graphical-models/

Pearl, J. (1988). Probabilistic reasoning in intelligent systems: Networks of plausible inference. Morgan Kaufmann. https://www.sciencedirect.com/book/9780080514895/probabilistic-reasoning-in-intelligent-systems

Podofillini, L., Dang, V. N., Zio, E., Baraldi, P., & Librizzi, M. (2016). Methods for building conditional probability tables of Bayesian belief networks from limited judgement: An evaluation for human reliability application. Reliability Engineering & System Safety, 151, 93–112. https://doi.org/10.1016/j.ress.2016.01.004

Schenekenberg, N. C. M., Malucelli, A., Dias, J. S., & Cubas, M. R. (2017). Modeling Bayesian networks from a conceptual framework for occupational risk analysis. Production, 27, e20162191. https://www.scielo.br/j/prod/a/g73hxJJdC85q5T6cVwXQ35P/?lang=en