Sensitivity analysis

Sensitivity analysis quantifies how changes in an input affect the model output. In risk analysis, it can be used to identify which uncertain inputs are most important for a decision-relevant output. This helps to understand the model behaviour before running it with real data, to prioritise where to intervene to reduce risk, and to increase efforts to reduce uncertainty in the most sensitive inputs.

In mcmodule, you can perform sensitivity analysis using different methods:

Scenario analysis

Scenario analysis (or “what-if analysis”) is a non-statistical but still useful approach to assess input sensitivity. It does not produce sensitivity indices, but it will show how the output changes when you make specific changes to the inputs. This makes it a practical way to compare targeted input changes using real input data. It is especially helpful for complex models where the most important inputs depend on context (for example, when estimating disease-entry risk across countries, where certain pathways will be more important in some countries than in others). Scenario analysis can therefore help you identify the most sensitive inputs for a particular use case. This topic is covered in the working with what-if scenarios section.

Correlation analysis

mcmodule_corr() is a screening method. It ranks inputs by their correlation with a chosen output across the current model runs (an extension of mc2d::tornado() for mcmodule objects and multiple variates). It can quickly highlight influential inputs, but results must be interpreted with care. A weak correlation does not necessarily mean an input is unimportant (e.g., symmetric nonlinear effects can cancel out), and a strong correlation may reflect confounding with other correlated inputs rather than a uniquely attributable effect.

Unlike methods that require a sample design, this approach works directly on an mcmodule built from model input data. Note that mcmodule_corr() only computes correlations for uncertain inputs (stochastic nodes built with func) and, when variates_as_nsv = TRUE, also for multi-variate inputs.

First, compute correlations (Spearman by default) and inspect the table.

# Calculate output (expected number of animals not detected)
imports_mcmodule_corr <- trial_totals(
  imports_mcmodule,
  mc_names = c("no_detect"),
  trials_n = "animals_n",
  subsets_n = "farms_n",
  subsets_p = "h_prev",
  mctable = imports_mctable
)

# Calculate correlations using Spearman's method (default)
corr_results <- mcmodule_corr(imports_mcmodule_corr, output = "no_detect_set_n", method = "spearman")
#> 
#> === Correlation Analysis Summary ===
#> 
#> Analysis Parameters:
#> - Analysis type: Global output
#> - Output node: no_detect_set_n
#> - Correlation method(s): spearman
#> - Missing value handling: all.obs
#> 
#> Expression Information:
#> - Module: mcmodule
#>   - Expression: imports
#>     - Input nodes: w_prev, test_sensi, animals_n, h_prev
#>     - Variates analyzed: 6
#> 
#> Results Summary:
#> - Total correlations calculated: 24
#> - Top 4 most influential inputs (by absolute mean correlation):
#>   1. h_prev: 0.7286
#>   2. w_prev: 0.4552
#>   3. animals_n: 0.2987
#>   4. test_sensi: -0.1525
#> 
#> Input Correlation Strength Distribution:
#> - Very strong: 2 (8.3%)
#> - Strong: 4 (16.7%)
#> - Moderate: 8 (33.3%)
#> - Weak: 4 (16.7%)
#> - Very weak/None: 6 (25.0%)
#> 
#> Inputs by Correlation Strength:
#> - Very strong: h_prev
#> - Strong: w_prev, h_prev, test_sensi
#> - Moderate: animals_n, h_prev, w_prev
#> - Weak: w_prev, animals_n, test_sensi
#> - Very weak/None: test_sensi, animals_n

In this example, h_prev is the main driver of no_detect_set_n (very strong positive association). w_prev and animals_n also show positive correlation, while test_sensi is close to zero or slightly negative across variates.

# View correlation results
head(corr_results)
#>       exp exp_n variate          output      input       value   method     use
#> 1 imports     1       1 no_detect_set_n     w_prev 0.684520665 spearman all.obs
#> 2 imports     1       1 no_detect_set_n test_sensi 0.015206219 spearman all.obs
#> 3 imports     1       1 no_detect_set_n  animals_n 0.479727721 spearman all.obs
#> 4 imports     1       1 no_detect_set_n     h_prev 0.516945462 spearman all.obs
#> 5 imports     1       2 no_detect_set_n     w_prev 0.312842343 spearman all.obs
#> 6 imports     1       2 no_detect_set_n test_sensi 0.002070648 spearman all.obs
#>     module pathogen origin       strength
#> 1 mcmodule        a   nord         Strong
#> 2 mcmodule        a   nord Very weak/None
#> 3 mcmodule        a   nord       Moderate
#> 4 mcmodule        a   nord       Moderate
#> 5 mcmodule        a  south           Weak
#> 6 mcmodule        a  south Very weak/None

Then visualise the ranking with a tornado plot. It can be customised with standard ggplot2 layers.

library(ggplot2)
#> Warning: package 'ggplot2' was built under R version 4.5.3

# Plot tornado plot of correlations
mcmodule_tornado(imports_mcmodule_corr, output = "no_detect_set_n", method = "spearman", print_summary = FALSE)

Each point in the plot represents one input variate. The coloured point marks the variate with the largest absolute correlation for that input, and that value is used to rank the inputs. The black point shows the median correlation across variates, which is often a useful summary when correlations vary across the model. The black horizontal line shows the full range of correlations across variates, so you can see whether an input is consistently influential or only influential in certain variates.

Note that farms_n and test_origin are not shown in this analysis because they are fixed in the original model data and therefore have zero correlation with the output. This is an important limitation of correlation-based methods using real input data: they can only identify influential inputs that vary in the original model runs. To explore uncertainty for fixed inputs, you need a method that evaluates a sample design.

Sensitivity analysis with sample design

Create a sample design

An input sample space is the allowable range of values it can take (or distribution definition). A sample design is a specific set of sampled input combinations used to run the model. The sample space defines where values can come from, while the sample design defines which values are actually used in model runs.

A sample_design in mcmodule can be either:

The sampling matrix must have one column per input node and one row per sample. Each row is one scenario in the mcnode uncertainty dimension. When you provide a sample_design, that uncertainty dimension is no longer sampled from the original model inputs; it is mapped to the values in your design. Currently, sample_design supports one variate.

You can pass the sampling matrix (or sensitivity object) directly to eval_module(), or set it as a global default with set_sample_design(). Using a global design ensures the same sampling matrix is available across all model steps, which is convenient when you run sensitivity analysis after building a complex model. You can also pass the design to each function call, but that is more verbose, and you must remove it later to run the model with the original data inputs.

# Create a sample design as a data frame
X <- data.frame(
  w_prev = runif(10000, 0.15, 0.6),
  h_prev = runif(10000, 0.02, 0.7),
  test_sensi = runif(10000, 0.8, 0.91),
  animals_n = sample(5:10, 10000, replace = TRUE),
  farms_n = sample(82:176, 10000, replace = TRUE),
  test_origin = runif(10000, 0, 1)
)

dim(X)
#> [1] 10000     6
head(X)
#>      w_prev    h_prev test_sensi animals_n farms_n test_origin
#> 1 0.3340396 0.6117728  0.8477135         7     129  0.44193520
#> 2 0.5473578 0.2434982  0.8176573         6     163  0.03386385
#> 3 0.5732103 0.1054768  0.8905329         6     157  0.63026555
#> 4 0.1705004 0.2622306  0.8228900        10     133  0.15496632
#> 5 0.3876475 0.6528754  0.8313463         8     117  0.66649531
#> 6 0.5515886 0.6151214  0.8600830         8     109  0.82409581

# Set global sample design
set_sample_design(X)
#> sample_design set to X

# Retrieve current global sample design
current_design <- set_sample_design()

# Evaluate the module with the global sample design
imports_mcmodule_X <- eval_module(
  exp = imports_exp
)
#> imports_exp evaluated
#> mcmodule created (expressions: imports_exp)

# Calculate output (expected number of animals not detected)
imports_mcmodule_X <- trial_totals(
  imports_mcmodule_X,
  mc_names = c("no_detect"),
  trials_n = "animals_n",
  subsets_n = "farms_n",
  subsets_p = "h_prev"
)

mcmodule_tornado(imports_mcmodule_X, output = "no_detect_set_n", method = "spearman")
#> 
#> === Correlation Analysis Summary ===
#> 
#> Analysis Parameters:
#> - Analysis type: Global output
#> - Output node: no_detect_set_n
#> - Correlation method(s): spearman
#> - Missing value handling: all.obs
#> 
#> Expression Information:
#> - Module: mcmodule
#>   - Expression: imports_exp
#>     - Input nodes: w_prev, test_origin, test_sensi, animals_n, farms_n, h_prev
#>     - Variates analyzed: 1
#> 
#> Results Summary:
#> - Total correlations calculated: 6
#> - Top 5 most influential inputs (by absolute mean correlation):
#>   1. h_prev: 0.6786
#>   2. test_origin: -0.4723
#>   3. w_prev: 0.3440
#>   4. animals_n: 0.2187
#>   5. farms_n: 0.2075
#> 
#> Input Correlation Strength Distribution:
#> - Very strong: 0 (0.0%)
#> - Strong: 1 (16.7%)
#> - Moderate: 1 (16.7%)
#> - Weak: 3 (50.0%)
#> - Very weak/None: 1 (16.7%)
#> 
#> Inputs by Correlation Strength:
#> - Strong: h_prev
#> - Moderate: test_origin
#> - Weak: w_prev, animals_n, farms_n
#> - Very weak/None: test_sensi


# Reset global sample design
reset_sample_design()
#> sample_design reset

This sample design produces a different ranking from the first tornado analysis, because we’re using the inputs sampling space rather than the original model inputs. Here, h_prev is the main positive driver of no_detect_set_n. test_origin is the next most influential input and is negatively associated with the output. w_prev, animals_n and farms_n have smaller positive effects, while test_sensi is weak or negligible within these bounds.

Define sample space in mctable

For methods that use a sample design, mctable can be used to define how inputs are sampled. In particular, mctable$sample_space stores the bounds or distribution parameters used to generate sensitivity samples. To read more about mctable see the package vignette.

imports_mctable[,c("mcnode","mc_func", "sample_space")]
#>            mcnode mc_func                        sample_space
#> 1          h_prev   runif               min = 0.02, max = 0.7
#> 2          w_prev   runif               min = 0.15, max = 0.6
#> 3      test_sensi   rpert min = 0.8, mode = 0.875, max = 0.91
#> 4         farms_n    <NA>                   min = 5, max = 10
#> 5       animals_n   rnorm                 min = 82, max = 176
#> 6 test_origin_unk    <NA>                                <NA>
#> 7     test_origin    <NA>                    min = 0, max = 1

Two helper functions support common sampling designs from mctable:

Currently, mcmodule only provides direct conversion from mctable to Sobol sampling designs. However, you can use mctable_bounds() to create many other sample designs using different approaches. The only requirement is that your sample design matrix contains columns for all the input nodes in your mcmodule model.

Morris elementary effects

Morris is another screening method. It identifies influential inputs at moderate computational cost and highlights potential nonlinearity or interactions.

Start by getting parameter bounds from mctable.

# Get bounds for Morris sampling design
b <- mctable_bounds(imports_mctable, transformation = FALSE)

Build the Morris design with sensitivity::morris(). The resulting object can be passed directly as sample_design. The arguments factors, r, and design control the the factors (inputs) sampled, the number of repetitions of the design, and the step size between sampled values, respectively. The arguments binf and bsup provide the bounds for each input, and scale = TRUE scales the inputs to [0,1] for better comparability of Morris indices across inputs with different ranges.

To have stable Morris indices in large sample designs (with many inputs), you need to run the model with a large number of repetitions, which can be computationally intensive. Always check the sensitivity indices stability in diferent runs before interpreting them, increase r until rankings stabilise.

library(sensitivity)
#> Warning: package 'sensitivity' was built under R version 4.5.2
#> Registered S3 method overwritten by 'sensitivity':
#>   method    from 
#>   print.src dplyr
#> 
#> Attaching package: 'sensitivity'
#> The following object is masked from 'package:dplyr':
#> 
#>     src

# Create Morris design
morris_sa <- sensitivity::morris(
  model = NULL,
  factors = b$factors,
  r = 2000,
  design = list(type = "oat", levels = 4, grid.jump = 2),
  binf = b$binf,
  bsup = b$bsup,
  scale = TRUE
)
#> Warning in sensitivity::morris(model = NULL, factors = b$factors, r = 2000, :
#> keeping 1999 repetitions out of 2000

# Evaluate the module with that design.
imports_mcmodule_morris <- eval_module(
  exp = imports_exp,
  sample_design = morris_sa,
  mctable = imports_mctable
)
#> imports_exp evaluated
#> mcmodule created (expressions: imports_exp)

# Calculate output (expected number of animals not detected)
imports_mcmodule_morris <- trial_totals(
  imports_mcmodule_morris,
  mc_names = c("no_detect"),
  trials_n = "animals_n",
  subsets_n = "farms_n",
  subsets_p = "h_prev",
  sample_design = morris_sa,
  mctable = imports_mctable
)

Finally, extract the output vector and estimate Morris indices. The tell() function updates the Morris object with the model output, and then you can print and plot the indices.

# Extract the output vector and estimate Morris indices.
y <- unmc(imports_mcmodule_morris$node_list$no_detect_set_n$mcnode)

# Aggregated output used for sensitivity analysis.
sensitivity::tell(morris_sa, y)

# Print Morris indices
morris_sa
#> 
#> Call:
#> sensitivity::morris(model = NULL, factors = b$factors, r = 2000,     design = list(type = "oat", levels = 4, grid.jump = 2), binf = b$binf,     bsup = b$bsup, scale = TRUE)
#> 
#> Model runs: 13993 
#>                      mu    mu.star     sigma
#> h_prev       139.808877 139.808877 124.93784
#> w_prev        89.096448  89.096448  98.97369
#> test_sensi    -7.335186   7.335186  10.79865
#> farms_n       50.032335  50.032335  61.29715
#> animals_n     55.904818  55.904818  69.23723
#> test_origin -111.438795 111.438795 110.73603

# Plot Morris indices (mu.star and sigma)
plot(morris_sa)

In the Morris method, mu ( \(\mu\) ) is the mean of the elementary effects across trajectories and mu.star ( \(\mu^*\) ) is the absolute value of mu, which summarises each input’s overall importance (larger values indicate a stronger average influence on the output, regardless of direction). h_prev has the largest mu.star, making it the most influential input overall, followed by test_origin. w_prev is the next most important, while farms_n and animals_n have more moderate effects, and test_sensi is negligible within the explored bounds (note that the uncertainty is small, 0.875-0.91). sigma ( \(\sigma\) ) is the standard deviation of the elementary effects. Large sigma values indicate that the effect of an input varies substantially across the sampled space, which typically reflects nonlinearity and/or interactions with other inputs.

Sobol indices

Sobol analysis decomposes output variance into first-order and higher-order effects, providing a global sensitivity analysis. It is more computationally intensive than Morris, but it gives a more complete picture of input importance, including interactions. Sobol variance decomposition assumes independent inputs, if inputs are dependent, interpret Sobol indices with care or consider other approaches.

First, generate Sobol sampling matrices from mctable.

mctable_sobol_matrices() creates Sobol design matrices from your mctable definitions.

library(sensobol)

N <- 10000
X <- mctable_sobol_matrices(imports_mctable, N = N, order = "second")

Evaluate the module, compute the target output as before.

imports_mcmodule_sobol <- eval_module(
  exp = imports_exp,
  data = NULL,
  sample_design = X,
  mctable = imports_mctable
)
#> imports_exp evaluated
#> mcmodule created (expressions: imports_exp)

imports_mcmodule_sobol <- trial_totals(
  mcmodule = imports_mcmodule_sobol,
  mc_names = c("no_detect"),
  trials_n = "animals_n",
  subsets_n = "farms_n",
  subsets_p = "h_prev",
  sample_design = X,
  mctable = imports_mctable
)

Then compute first-order (Si) and total-order (Ti) Sobol indices and print and plot the results.

y <- unmc(imports_mcmodule_sobol$node_list$no_detect_set_n$mcnode)

# Compute Sobol indices
sobol_sa <- sensobol::sobol_indices(Y = y, N = N, params = colnames(X), order = "second", boot = TRUE, R = 1000)

sobol_sa
#> 
#> First-order estimator: saltelli | Total-order estimator: jansen 
#> 
#> Total number of model runs: 230000 
#> 
#> Sum of first order indices: 0.7854101 
#>          original          bias    std.error        low.ci      high.ci
#>             <num>         <num>        <num>         <num>        <num>
#>  1:  3.332841e-01  2.806764e-04 1.504341e-02  0.3035188778 0.3624879705
#>  2:  1.377370e-01  1.157053e-04 1.010515e-02  0.1178155716 0.1574270136
#>  3:  3.909247e-04  8.930736e-06 5.250725e-04 -0.0006471292 0.0014111171
#>  4:  4.305921e-02 -2.409696e-04 5.966784e-03  0.0316054992 0.0549948618
#>  5:  4.684154e-02  2.136317e-04 6.030061e-03  0.0348092104 0.0584466142
#>  6:  2.240974e-01 -1.665205e-04 1.286802e-02  0.1990430331 0.2494847426
#>  7:  4.907211e-01  5.028705e-04 1.220140e-02  0.4663039437 0.5141325541
#>  8:  2.289120e-01  1.739086e-04 6.448966e-03  0.2160983631 0.2413778446
#>  9:  7.520003e-04  1.397912e-06 3.137228e-05  0.0006891139 0.0008120909
#> 10:  7.602997e-02  7.511655e-05 2.189468e-03  0.0716635768 0.0802461326
#> 11:  9.089470e-02  2.177806e-05 2.819738e-03  0.0853463316 0.0963995033
#> 12:  3.490372e-01  1.310243e-04 9.342170e-03  0.3305958773 0.3672165122
#> 13:  4.242323e-02 -3.018290e-04 7.306883e-03  0.0284038338 0.0570462888
#> 14:  2.276340e-04 -1.191114e-05 4.058860e-04 -0.0005559768 0.0010350670
#> 15:  1.158978e-02 -1.230634e-04 4.097145e-03  0.0036825872 0.0197430993
#> 16:  1.559113e-02 -6.954566e-05 4.275403e-03  0.0072810375 0.0240403091
#> 17:  6.621890e-02 -3.633257e-04 8.412995e-03  0.0500930632 0.0830713962
#> 18:  1.986587e-05  7.138970e-07 2.632880e-04 -0.0004968831 0.0005351870
#> 19:  4.869421e-03 -1.850100e-05 2.874968e-03 -0.0007469115 0.0105227557
#> 20:  5.898327e-03  5.734532e-05 2.902983e-03  0.0001512393 0.0115307250
#> 21:  3.005418e-02  3.219238e-05 6.185212e-03  0.0178991922 0.0421447759
#> 22: -5.842551e-06 -2.641477e-07 1.364565e-04 -0.0002730283 0.0002618715
#> 23:  4.855472e-05  8.737696e-06 1.659606e-04 -0.0002854598 0.0003650939
#> 24: -1.751435e-05 -1.019615e-05 3.833124e-04 -0.0007585966 0.0007439603
#> 25:  2.403328e-03  2.678811e-05 1.661811e-03 -0.0008805490 0.0056336294
#> 26:  7.210102e-03 -1.289130e-04 3.450360e-03  0.0005764332 0.0141015960
#> 27:  8.240894e-03 -3.590833e-05 3.802195e-03  0.0008246376 0.0157289665
#>          original          bias    std.error        low.ci      high.ci
#>             <num>         <num>        <num>         <num>        <num>
#>     sensitivity             parameters
#>          <char>                 <char>
#>  1:          Si                 h_prev
#>  2:          Si                 w_prev
#>  3:          Si             test_sensi
#>  4:          Si                farms_n
#>  5:          Si              animals_n
#>  6:          Si            test_origin
#>  7:          Ti                 h_prev
#>  8:          Ti                 w_prev
#>  9:          Ti             test_sensi
#> 10:          Ti                farms_n
#> 11:          Ti              animals_n
#> 12:          Ti            test_origin
#> 13:         Sij          h_prev.w_prev
#> 14:         Sij      h_prev.test_sensi
#> 15:         Sij         h_prev.farms_n
#> 16:         Sij       h_prev.animals_n
#> 17:         Sij     h_prev.test_origin
#> 18:         Sij      w_prev.test_sensi
#> 19:         Sij         w_prev.farms_n
#> 20:         Sij       w_prev.animals_n
#> 21:         Sij     w_prev.test_origin
#> 22:         Sij     test_sensi.farms_n
#> 23:         Sij   test_sensi.animals_n
#> 24:         Sij test_sensi.test_origin
#> 25:         Sij      farms_n.animals_n
#> 26:         Sij    farms_n.test_origin
#> 27:         Sij  animals_n.test_origin
#>     sensitivity             parameters
#>          <char>                 <char>

plot(sobol_sa)

plot(sobol_sa, order = "second")

First-order indices (Si) measure each input’s main effect, the share of output variance attributable to changing that input alone, averaged over uncertainty in all other inputs. Larger Si values indicate the model output is more sensitive to that factor. Here, h_prev is the dominant driver of variance, test_origin is the next most influential, and w_prev also contributes materially; farms_n and animals_n have smaller but non-zero effects, while test_sensi is negligible. Negative estimates can occur due to Monte Carlo error and should be interpreted as ~0 if the confidence interval overlaps 0. If important factors have wide CIs or many negatives appear, increase N (and/or bootstrap replicates) before interpreting small effects.

Total-order indices (Ti) measure each input’s total effect on output variance, including its main effect plus all interaction effects with other inputs; Ti is therefore always at least as large as Si for the same parameter.

Second-order indices (Sij) quantify interaction effects: how much variance is explained only by varying two inputs together beyond what their individual first-order effects (Si) explain. Values near 0 indicate little interaction; larger values indicate the combined effect of the pair is important (e.g., h_prev.test_origin suggests a meaningful interaction between h_prev and test_origin).

Other methods

You can use other sensitivity methods with mcmodule as long as you can build a table/matrix of input values to run the model with, and collect the corresponding model output(s). For example, sensitivity::fast99() can estimate first-order variance-based indices with structured sampling and can be cheaper than second-order Sobol, depending on the design and number of factors. If you want a more familiar “risk-factor style” summary, regression/ANOVA approaches can quantify how the output changes as inputs change. Tools such as Shapley values (iml::shapley()) can help interpret complex, nonlinear relationships, but they answer a slightly different question and can be computationally heavy. The best method will depend on your model, computational resources, and the specific questions you want to answer about input importance.