Survey weights are indispensable for producing unbiased population estimates under complex sampling designs. Yet, when fitting regression models, analysts often ask: are the weights truly informative for my outcome, or can I safely ignore them? Failure to include informative weights can lead to biased estimates and invalid inference, while including non-informative weights may unnecessarily inflate variance. Analysts need robust tools to assess the informativeness of survey weights in regression contexts.
The svytest package provides a suite of diagnostic tests designed to answer this question. These tests adapt classical specification tests (Hausman, 1978; Pfeffermann, 1993), more recent developments (Pfeffermann & Sverchkov, 1999, 2003; Wu & Fuller, 2005) to the survey regression setting, and non-parametric methods to evaluate whether survey weights materially affect regression estimates. Survey weight diagnostic tests are only meant to be used as a diagnostic tool to determine whether weights should be used in a regression analysis approach. Survey weight diagnostic tests should not be used to draw causal relationships between \(\vec{Y}\) and \(\textbf{X}\) such that they should only be limited to testing the necessity of survey weights in regressions.
This vignette introduces each test, explains the underlying
statistical machinery, and demonstrates their use on example data. We
conclude by outlining a simulation study design that evaluates their
performance. For demonstration purposes, we use the included
svytestCE dataset, a curated subset of the Consumer
Expenditure Survey. See ?svytestCE for details.
Furthermore, the svytest package currently supports
linear regression models fitted via svyglm from the
survey package. Future versions may extend support to
generalized linear models and other modeling frameworks.
Bollen et al. (2016) provide a comprehensive review of the landscape of diagnostic tests for survey weight informativeness. Their central insight is that, despite the proliferation of methods, nearly all tests fall into two broad categories: Difference-in-Coefficients (DC) tests and Weight-Association (WA) tests.
Bollen and colleagues emphasize that DC and WA tests are theoretically equivalent: testing whether weighted and unweighted coefficients differ is equivalent to testing whether weights are correlated with the outcome conditional on covariates. In practice, however, the finite-sample behavior of the tests can diverge, and simulation evidence is limited. Some studies suggest that classical Hausman-type DC tests may over-reject in small samples, while certain WA formulations may be more stable.
They also highlight several practical considerations:
This classification provides a unifying framework for the diagnostics
implemented in svytest: the package offers both
DC-style tests (e.g., diff_in_coef_test) and WA-style tests
(e.g., wa_test with DD, PS1, PS2, WF variants), along with
extensions such as the estimating equations test and permutation-based
approaches.
Throughout this section, we describe each diagnostic test implemented
in the svytest package. For each test, we outline the
statistical formulation, assumptions, and provide example code
demonstrating its use. Let \(\{(y_k,
\vec{X}_k, w_k): k \in S\}\) denote the observed data for sampled
units, where \(y_k\) is the outcome,
\(\vec{X}_k\) is a vector of
covariates, and \(w_k\) is the survey
weight for the \(k\) response in the
survey sample \(S\). We will create a
svyglm object from the survey package to
demonstrate the tests.
# Construct survey design
des <- svydesign(ids = ~1, weights = ~FINLWT21, data = svytestCE)
# Fit weighted regression
fit <- svyglm(TOTEXPCQ ~ ROOMSQ + BATHRMQ + BEDROOMQ + FAM_SIZE + AGE, design = des)Given the many different options, the function
run_all_diagnostic_tests provides a convenient wrapper to
run all tests and summarize results. It produces a recommendation which
can guide analysts on whether to use weights in their regression
analysis, though there is no formal consensus on decision rules within
the literature of diagnostic tests for survey weights.
# Run all diagnostic tests
all_results <- run_all_diagnostic_tests(fit, alpha = 0.05)
print(all_results)
#> 
#> Diagnostic Tests for Informative Weights
#> # A tibble: 10 × 4
#>    test       statistic   p.value reject
#>    <chr>          <dbl>     <dbl> <lgl> 
#>  1 DD              2.56 0.0178    TRUE  
#>  2 PS1             2.40 0.0186    TRUE  
#>  3 PS1q            3.69 0.0000138 TRUE  
#>  4 PS2             4.68 0.00930   TRUE  
#>  5 PS2q            5.55 0.00389   TRUE  
#>  6 WF              1.97 0.0802    FALSE 
#>  7 HP             15.3  0.0179    TRUE  
#>  8 PS3             2.03 0.0581    FALSE 
#>  9 perm_mean      NA    0.0450    TRUE  
#> 10 perm_mahal     NA    0.128     FALSE 
#> 
#> Recommendation:
#>  At least one test rejects H0 at significance level = 0.05. Recommendation: use survey weights in regression.The Difference-in-Coefficients (DC) tests assess whether the coefficients estimated from weighted and unweighted regression models differ significantly. The underlying logic is that if weights are informative, they will affect the estimated coefficients. The DC test implemented in svytest is based on the Hausman-Pfeffermann framework (Pfeffermann, 1993; Pfeffermann & Sverchkov, 1999). The test statistic is constructed as follows:
Let \(W = \text{diag}(w_1, \dots, w_n)\) be the diagonal weight matrix. We define the unweighted and weighted estimator as \[\hat\beta_{U} = (X^\top X)^{-1} X^\top y,\] \[\hat\beta_{W} = (X^\top W X)^{-1} X^\top W y.\] The DC test statistic is then given by \[T = (\hat\beta_{W} - \hat\beta_{U})^\top [\text{Var}(\hat\beta_{W}) - \text{Var}(\hat\beta_{U})]^{-1} (\hat\beta_{W} - \hat\beta_{U}),\] which asymptotically follows a chi-squared distribution with degrees of freedom equal to the number of coefficients tested under the null hypothesis that weights are non-informative. Therefore, with the null hypothesis \(H_0: E(\hat\beta_{W} - \hat\beta_{U}) = 0\), we test \(T \sim \chi^2_p\).
Determining the variance estimator is crucial for the DC test.
Generally, the user assumes homoscedastic residuals and estimates the
variance estimator using pooled residual variance from the weighted
model. However, we implement options to implement
heteroskedasticity-consistent estimators. The svytest
package implements this test in the diff_in_coef_test
function. The heteroskedasticity-robust variance estimators available
are “HC0”, “HC1”, “HC2”, and “HC3” (MacKinnon & White, 1985) and as
implemented in the sandwich package.
# Run DC test with equal residual variance
res_equal <- diff_in_coef_test(fit, var_equal = TRUE)
print(res_equal)
#> 
#> Hausman-Pfeffermann Difference-in-Coefficients Test
#> Chi-squared = 15.3278  df = 6  p-value = 0.0179 
#> 
#> Unweighted coefficients:
#> (Intercept)      ROOMSQ     BATHRMQ    BEDROOMQ    FAM_SIZE         AGE 
#>  -207.41727   372.95582  1599.03461    71.90571   447.79437    19.73244 
#> 
#> Weighted coefficients:
#> (Intercept)      ROOMSQ     BATHRMQ    BEDROOMQ    FAM_SIZE         AGE 
#>  -327.93795   361.18174  1596.63690    65.48708   454.68955    23.36097
# Run DC test with heteroskedasticity-robust variance (HC3)
res_robust <- diff_in_coef_test(fit, var_equal = FALSE, robust_type = "HC3")
summary(res_robust)
#> 
#> Difference-in-Coefficients Test
#> Call:
#> diff_in_coef_test(model = fit, var_equal = FALSE, robust_type = "HC3")
#> 
#> Method:
#>  Hausman-Pfeffermann Difference-in-Coefficients Test
#> 
#> Test Statistic:
#>  Chi-sq = 62288.1804  on 6 df , p-value = 0.0000Weight‑association (WA) tests provide an alternative to difference‑in‑coefficients tests. Rather than directly comparing weighted and unweighted coefficient vectors, WA tests ask: are the survey weights associated with the outcome (or residuals) after conditioning on covariates?
Formally, the null hypothesis is \(H_0: E(y \mid X, w) = E(y \mid X)\), i.e. weights are non‑informative given the covariates.
The DD test (DuMouchel & Duncan, 1983) is one of the earliest WA diagnostics.
Fit the unweighted regression: \[\hat\beta_U = (X^\top X)^{-1} X^\top y.\]
Compute residuals: \[e = y - X\hat\beta_U.\]
Regress residuals on the weights: \[e = \gamma_0 + \gamma_1 w + u.\]
If \(\gamma_1 = 0\), weights are not associated with residuals. A significant slope indicates that weights explain variation in residuals, hence are informative. The test statistic is an \(F\)-test on \(\gamma_1\).
The PS1 test (Pfeffermann & Sverchkov, 1999) augments the regression with functions of the weights: \[y = X\beta + f(w)\theta + \varepsilon.\]
This is implemented as a nested model comparison:
- Reduced model: \(y = X\beta +
\varepsilon\).
- Full model: \(y = X\beta + f(w)\theta +
\varepsilon\).
- An \(F\)-test evaluates whether the
auxiliary regressors \(f(w)\) improve
fit.
The PS2 test (Pfeffermann & Sverchkov, 2003) is a two‑step procedure:
Regress weights on covariates: \[w = X\alpha + \eta, \quad \hat w = X\hat\alpha.\]
Augment the outcome regression with \(\hat w\): \[y = X\beta + g(\hat w)\theta + \varepsilon.\]
This approach conditions out the part of the weights predictable from \(X\), focusing on whether the residual informativeness of weights matters for \(y\).
The WF test (Wu & Fuller, 2005) is a more elaborate WA test that can be seen as a bridge between DC and WA frameworks.
Unlike the Hausman–Pfeffermann DC test, the WF test is framed as an \(F\)-test in the WA family, but the intuition is similar: if weights are non‑informative, the two estimators should not differ systematically.
The Estimating Equations (EE) test of Pfeffermann & Sverchkov (2003) provides a third diagnostic framework for assessing weight informativeness. Unlike the Difference‑in‑Coefficients and Weight‑Association tests, which compare regression fits directly, the EE test works at the level of the score equations that define regression estimators.
For linear regression with Gaussian errors and identity link, the unweighted OLS estimator \(\hat\beta_{\text{unw}}\) solves the estimating equations \[\sum_{i=1}^n u_i = 0, \qquad u_i = x_i \bigl(y_i - x_i^\top \hat\beta_{\text{unw}}\bigr).\]
To adjust for potential informativeness of weights, define \[q_i = \frac{w_i}{E_s(w_i \mid x_i)}, \qquad R_i = (1 - q_i) u_i,\]
where \(E_s(w_i \mid x_i)\) is estimated by regressing \(w\) (or \(\log w\)) on \(X\). Let \[\bar R = \frac{1}{n} \sum_{i=1}^n R_i, \qquad S = \frac{1}{n-1} \sum_{i=1}^n (R_i - \bar R)(R_i - \bar R)^\top.\]
The Hotelling-type test statistic is \[F = \frac{n-p}{p} \, \bar R^\top S^{-1} \bar R,\]
with numerator degrees of freedom \(p\) (the number of tested coefficients) and denominator degrees of freedom \(n-p\).
linear_results <- estim_eq_test(fit, q_method = "linear")
print(linear_results)
#> 
#> Pfeffermann-Sverchkov Estimating Equations Test (linear case)
#> F = 2.0305  df1 = 6  df2 = 21920  p-value = 0.0581
log_results <- estim_eq_test(fit, q_method = "log")
print(log_results)
#> 
#> Pfeffermann-Sverchkov Estimating Equations Test (log case)
#> F = 3.3232  df1 = 6  df2 = 21920  p-value = 0.0028The permutation tests implemented in svytest provide a non-parametric approach to assessing the informativeness of survey weights. These tests do not rely on asymptotic distributions and instead use the empirical distribution of a test statistic under random permutations of the data. The null hypothesis is that, conditional on covariates \(X\), the survey weights \(w\) are non‑informative with respect to the outcome \(y\). Under \(H_0\), permuting the weights should not change the distribution of any statistic that measures the effect of weighting.
perm_mean_results <- perm_test(fit, stat = "pred_mean", B = 1000, engine = "R")
print(perm_mean_results)
#> 
#> Permutation test for weight informativeness (pred_mean)
#> Observed = 6856.7709  Null = 6888.7717  Effect = -31.4661  p-value = 0.0500
library(Rcpp)
perm_mahal_results <- perm_test(fit, stat = "coef_mahal", B = 1000, engine = "C++")
print(perm_mahal_results)
#> 
#> Permutation test for weight informativeness (coef_mahal)
#> Observed = 80517058.4472  Null = 0.0000  Effect = 45326375.8879  p-value = 0.0969To evaluate the finite-sample performance of the diagnostic tests implemented in the package, we replicated the first simulation study of Wang et al. (2023) and extended it to include our non-parametric permutation tests.
Following Wang et al. (2023), we generated a finite population of size \(N=3000\) from the linear model \(Y_i = 1 + X_i + \varepsilon_i, \quad i=1,\ldots,N,\) where \(X_i \sim \text{Uniform}(0,1)\) and \(\varepsilon_i \sim N(0,\sigma^2)\) with \(\sigma \in \{0.1, 0.2\}\). Samples of size \(n \in \{100,200\}\) were drawn with probability proportional to weights \(W_i = \alpha Y_i + 0.3 X_i + \delta U_i,\) where \(U_i \sim N(0,1)\), \(\delta \in \{1,1.5\}\), and \(\alpha \in \{0,0.2,0.4,0.6\}\) controls the informativeness of the weights. When \(\alpha=0\), weights are non-informative.
For each configuration \((n,\sigma,\delta,\alpha)\), we generated \(1000\) samples and applied the following tests:
A test was deemed to reject if its \(p\)-value was below \(\alpha=0.05\). The empirical rejection rate across replications estimates the size (when \(\alpha=0\)) and power (when \(\alpha>0\)).
Consistent with Wang et al. (2023), the PS2 test exhibited the highest power across most configurations, followed by DD, HP, and WF. The PS3 (estimating equations) test was generally less powerful. Our added permutation tests showed competitive performance: the coefficient Mahalanobis statistic was particularly sensitive when informativeness manifested through slope differences, while the predicted mean statistic was more sensitive to shifts in overall prediction levels. Both maintained nominal size when \(\alpha = 0\).
These results highlight that permutation-based diagnostics can serve as robust, distribution-free alternatives to parametric tests, complementing the existing DC and WA families.
| n | sigma | delta | alpha | DD | PS1 | PS1q | PS2 | PS2q | HP | PS3 | pred_mean | coef_mahal | 
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 100 | 0.1 | 1.5 | 0.0 | 0.048 | 0.072 | 0.071 | 0.071 | 0.070 | 0.048 | 0.021 | 0.038 | 0.048 | 
| 100 | 0.1 | 1.5 | 0.2 | 0.076 | 0.076 | 0.080 | 0.096 | 0.098 | 0.072 | 0.047 | 0.056 | 0.062 | 
| 100 | 0.1 | 1.5 | 0.4 | 0.102 | 0.118 | 0.135 | 0.143 | 0.122 | 0.100 | 0.083 | 0.079 | 0.073 | 
| 100 | 0.1 | 1.5 | 0.6 | 0.183 | 0.174 | 0.193 | 0.228 | 0.200 | 0.180 | 0.136 | 0.141 | 0.105 | 
| 100 | 0.1 | 1.0 | 0.0 | 0.029 | 0.063 | 0.073 | 0.062 | 0.063 | 0.029 | 0.031 | 0.026 | 0.035 | 
| 100 | 0.1 | 1.0 | 0.2 | 0.066 | 0.097 | 0.114 | 0.117 | 0.097 | 0.061 | 0.049 | 0.051 | 0.052 | 
| 100 | 0.1 | 1.0 | 0.4 | 0.184 | 0.179 | 0.236 | 0.286 | 0.212 | 0.178 | 0.160 | 0.117 | 0.091 | 
| 100 | 0.1 | 1.0 | 0.6 | 0.377 | 0.316 | 0.383 | 0.458 | 0.364 | 0.375 | 0.307 | 0.208 | 0.160 | 
| 100 | 0.2 | 1.5 | 0.0 | 0.047 | 0.068 | 0.060 | 0.052 | 0.067 | 0.045 | 0.032 | 0.047 | 0.043 | 
| 100 | 0.2 | 1.5 | 0.2 | 0.090 | 0.099 | 0.111 | 0.104 | 0.099 | 0.088 | 0.063 | 0.082 | 0.073 | 
| 100 | 0.2 | 1.5 | 0.4 | 0.245 | 0.238 | 0.246 | 0.286 | 0.264 | 0.242 | 0.190 | 0.246 | 0.186 | 
| 100 | 0.2 | 1.5 | 0.6 | 0.588 | 0.502 | 0.500 | 0.577 | 0.549 | 0.580 | 0.499 | 0.543 | 0.437 | 
| 100 | 0.2 | 1.0 | 0.0 | 0.058 | 0.088 | 0.087 | 0.072 | 0.077 | 0.058 | 0.039 | 0.036 | 0.048 | 
| 100 | 0.2 | 1.0 | 0.2 | 0.150 | 0.181 | 0.204 | 0.213 | 0.201 | 0.145 | 0.127 | 0.133 | 0.111 | 
| 100 | 0.2 | 1.0 | 0.4 | 0.577 | 0.471 | 0.479 | 0.573 | 0.537 | 0.569 | 0.492 | 0.487 | 0.385 | 
| 100 | 0.2 | 1.0 | 0.6 | 0.926 | 0.839 | 0.843 | 0.905 | 0.889 | 0.924 | 0.892 | 0.840 | 0.705 | 
| 200 | 0.1 | 1.5 | 0.0 | 0.033 | 0.068 | 0.075 | 0.062 | 0.069 | 0.032 | 0.035 | 0.044 | 0.041 | 
| 200 | 0.1 | 1.5 | 0.2 | 0.075 | 0.086 | 0.082 | 0.102 | 0.082 | 0.075 | 0.071 | 0.056 | 0.060 | 
| 200 | 0.1 | 1.5 | 0.4 | 0.157 | 0.154 | 0.189 | 0.214 | 0.175 | 0.155 | 0.136 | 0.133 | 0.106 | 
| 200 | 0.1 | 1.5 | 0.6 | 0.327 | 0.282 | 0.321 | 0.409 | 0.329 | 0.323 | 0.295 | 0.278 | 0.200 | 
| 200 | 0.1 | 1.0 | 0.0 | 0.043 | 0.080 | 0.082 | 0.062 | 0.086 | 0.043 | 0.027 | 0.028 | 0.042 | 
| 200 | 0.1 | 1.0 | 0.2 | 0.119 | 0.152 | 0.193 | 0.208 | 0.162 | 0.115 | 0.102 | 0.087 | 0.085 | 
| 200 | 0.1 | 1.0 | 0.4 | 0.358 | 0.301 | 0.399 | 0.467 | 0.350 | 0.353 | 0.316 | 0.254 | 0.198 | 
| 200 | 0.1 | 1.0 | 0.6 | 0.670 | 0.574 | 0.677 | 0.735 | 0.631 | 0.669 | 0.631 | 0.490 | 0.357 | 
| 200 | 0.2 | 1.5 | 0.0 | 0.057 | 0.065 | 0.065 | 0.052 | 0.059 | 0.055 | 0.031 | 0.046 | 0.046 | 
| 200 | 0.2 | 1.5 | 0.2 | 0.149 | 0.141 | 0.145 | 0.168 | 0.149 | 0.148 | 0.125 | 0.138 | 0.114 | 
| 200 | 0.2 | 1.5 | 0.4 | 0.552 | 0.447 | 0.455 | 0.538 | 0.519 | 0.548 | 0.486 | 0.539 | 0.429 | 
| 200 | 0.2 | 1.5 | 0.6 | 0.905 | 0.824 | 0.832 | 0.870 | 0.860 | 0.901 | 0.895 | 0.876 | 0.776 | 
| 200 | 0.2 | 1.0 | 0.0 | 0.046 | 0.082 | 0.094 | 0.069 | 0.076 | 0.044 | 0.040 | 0.045 | 0.041 | 
| 200 | 0.2 | 1.0 | 0.2 | 0.309 | 0.267 | 0.293 | 0.341 | 0.313 | 0.307 | 0.257 | 0.286 | 0.204 | 
| 200 | 0.2 | 1.0 | 0.4 | 0.898 | 0.816 | 0.852 | 0.879 | 0.853 | 0.896 | 0.871 | 0.834 | 0.734 | 
| 200 | 0.2 | 1.0 | 0.6 | 0.998 | 0.989 | 0.993 | 0.992 | 0.994 | 0.998 | 0.998 | 0.994 | 0.980 | 
Bollen, K. A., Biemer, P. P., Karr, A. F., Tueller, S., & Berzofsky, M. E. (2016). Are Survey Weights Needed? A Review of Diagnostic Tests in Regression Analysis. Annual Review of Statistics and Its Application, 3, 375–392.
DuMouchel, W. H., & Duncan, G. J. (1983). Using Sample Survey Weights in Multiple Regression Analyses of Stratified Samples. Journal of the American Statistical Association, 78(383), 535–543.
Hausman, J. A. (1978). Specification Tests in Econometrics. Econometrica, 46(6), 1251–1271.
Pfeffermann, D. (1993). The Role of Sampling Weights When Modeling Survey Data. International Statistical Review, 61(2), 317–337.
Pfeffermann, D., & Nathan, G. (1985). Regression Analysis of Data from Complex Surveys. Journal of the American Statistical Association, 80(389), 151–160.
Pfeffermann, D., & Sverchkov, M. (1999). Parametric and Semi‑parametric Estimation of Regression Models Fitted to Survey Data. Sankhyā: The Indian Journal of Statistics, Series B, 61(1), 166–186.
Pfeffermann, D., & Sverchkov, M. (2003). Fitting Generalized Linear Models under Informative Sampling. In R. L. Chambers & C. J. Skinner (Eds.), Analysis of Survey Data (pp. 175–196). Wiley.
Wang, F., Wang, H., & Yan, J. (2023). Diagnostic Tests for the Necessity of Weight in Regression With Survey Data. International Statistical Review, 91(1), 55–71.
Wu, Y., & Fuller, W. A. (2005). Preliminary Testing Procedures for Regression with Survey Samples. In Proceedings of the Joint Statistical Meetings, Survey Research Methods Section (pp. 3683–3688). American Statistical Association.