Getting Started with AutoStrataK: Automatic Stratification for Survey Sampling

library(AutoStrataK)

Introduction

AutoStrataK provides tools for automatic stratification of survey populations using clustering-based approaches. The package assists researchers and survey practitioners in constructing homogeneous strata to improve sampling efficiency and precision.

Example Data

Create a small synthetic dataset.

data <- data.frame(
  income = c(15, 18, 20, 25, 30, 35, 40, 50),
  expenditure = c(10, 12, 14, 18, 22, 26, 30, 38)
)

head(data)
#>   income expenditure
#> 1     15          10
#> 2     18          12
#> 3     20          14
#> 4     25          18
#> 5     30          22
#> 6     35          26

Generate Strata

Use autostrata() to automatically generate strata.

result <- autostrata(
  data = data,
  target = income,
  n_strata = 4
)

head(result)
#>   income expenditure stratum
#> 1     15          10       4
#> 2     18          12       4
#> 3     20          14       4
#> 4     25          18       3
#> 5     30          22       3
#> 6     35          26       2

Evaluate Stratification

Assess the quality of the generated strata.

evaluate_strata(
  result,
  target = "income"
)
#> $overall_variance
#> [1] 144.6964
#> 
#> $within_variance
#> [1] NA
#> 
#> $homogeneity
#> [1] NA

Summary

Display summary information.

summary(result)
#> 
#> AutoStrataR Results
#> -------------------
#> 
#> 1 2 3 4 
#> 1 2 2 3

Visualization

Visualize the resulting strata.

plot(result)

Conclusion

AutoStrataK provides a simple workflow for automatic stratification, evaluation, summarization, and visualization of survey populations.