Title: Sampling Algorithms and Spatially Balanced Sampling
Version: 0.1.1
Description: Fast tools for unequal probability sampling in multi-dimensional spaces, implemented in Rust for high performance. The package offers a wide range of methods, including Sampford (Sampford, 1967, <doi:10.1093/biomet/54.3-4.499>) and correlated Poisson sampling (Bondesson and Thorburn, 2008, <doi:10.1111/j.1467-9469.2008.00596.x>), pivotal sampling (Deville and Tillé, 1998, <doi:10.1093/biomet/91.4.893>), and balanced sampling such as the cube method (Deville and Tillé, 2004, <doi:10.1093/biomet/91.4.893>) to ensure auxiliary totals are respected. Spatially balanced approaches, including the local pivotal method (Grafström et al., 2012, <doi:10.1111/j.1541-0420.2011.01699.x>), spatially correlated Poisson sampling (Grafström, 2012, <doi:10.1016/j.jspi.2011.07.003>), and locally correlated Poisson sampling (Prentius, 2024, <doi:10.1002/env.2832>), provide efficient designs when the target variable is linked to auxiliary information.
URL: https://www.envisim.se/, https://github.com/envisim/rust-samplr/
BugReports: https://github.com/envisim/rust-samplr/issues
License: AGPL-3
Encoding: UTF-8
Language: en-GB
Depends: R (≥ 4.2)
RoxygenNote: 7.3.2
SystemRequirements: Cargo (Rust's package manager), rustc >= 1.75.0
NeedsCompilation: yes
Packaged: 2025-09-09 07:31:24 UTC; wpmg
Author: Wilmer Prentius ORCID iD [aut, cre], Anton Grafström ORCID iD [ctb], Authors of the dependent Rust crates [aut] (see inst/AUTHORS file)
Maintainer: Wilmer Prentius <wilmer.prentius@slu.se>
Repository: CRAN
Date/Publication: 2025-09-11 09:20:02 UTC

rsamplr: Sampling Algorithms and Spatially Balanced Sampling

Description

Select probability samples in multi-dimensional spaces with any prescribed inclusion probabilities using fast rust implementations.

You can access the project website at https://envisim.se.

Author(s)

Wilmer Prentius wilmer.prentius@gmail.com, Anton Grafström.

References

Benedetti, R., Piersimoni, F., & Postiglione, P. (2017). Spatially balanced sampling: a review and a reappraisal. International Statistical Review, 85(3), 439-454.

Friedman, J. H., Bentley, J. L., & Finkel, R. A. (1977). An algorithm for finding best matches in logarithmic expected time. ACM Transactions on Mathematical Software (TOMS), 3(3), 209-226.

Maneewongvatana, S., & Mount, D. M. (1999). It’s okay to be skinny, if your friends are fat. In Center for geometric computing 4th annual workshop on computational geometry (Vol. 2, pp. 1-8).

Särndal, C. E., Swensson, B., & Wretman, J. (2003). Model assisted survey sampling. Springer Science & Business Media.

Tillé, Y. (2006). Sampling algorithms. New York, NY: Springer New York.

See Also

Useful links:


Sampling defaults

Description

Sampling defaults

Usage

.sampling_defaults(eps = 1e-10, max_iter = 1000L, bucket_size = 50L)

Arguments

eps

A small value used when comparing floats.

max_iter

The maximum number of iterations used in iterative algorithms.

bucket_size

The maximum size of the k-d-tree nodes. A higher value gives a slower k-d-tree, but is faster to create and takes up less memory.


Balanced sampling

Description

Selects balanced samples with prescribed inclusion probabilities from finite populations.

Usage

cube(probabilities, balance_mat, ...)

cube_stratified(probabilities, balance_mat, strata, ...)

Arguments

probabilities

A vector of inclusion probabilities.

balance_mat

A matrix of balancing covariates.

...

Arguments passed on to .sampling_defaults

eps

A small value used when comparing floats.

strata

An integer vector with stratum numbers for each unit.

Details

For the cube method, a fixed sized sample is obtained if the first column of balance_mat is the inclusion probabilities. For cube_stratified, the inclusion probabilities are inserted automatically.

Value

A vector of sample indices.

Functions

References

Deville, J. C. and Tillé, Y. (2004). Efficient balanced sampling: the cube method. Biometrika, 91(4), 893-912.

Chauvet, G. and Tillé, Y. (2006). A fast algorithm for balanced sampling. Computational Statistics, 21(1), 53-62.

Chauvet, G. (2009). Stratified balanced sampling. Survey Methodology, 35, 115-119.

Examples

set.seed(12345);
N = 1000;
n = 100;
prob = rep(n/N, N);
xb = matrix(c(prob, runif(N * 2)), ncol = 3);
strata = c(rep(1L, 100), rep(2L, 200), rep(3L, 300), rep(4L, 400));

s = cube(prob, xb);
plot(xb[, 2], xb[, 3], pch = ifelse(sample_to_indicator(s, N), 19, 1));

s = cube_stratified(prob, xb[, -1], strata);
plot(xb[, 2], xb[, 3], pch = ifelse(sample_to_indicator(s, N), 19, 1));


# Respects inclusion probabilities
set.seed(12345);
prob = c(0.2, 0.25, 0.35, 0.4, 0.5, 0.5, 0.55, 0.65, 0.7, 0.9);
N = length(prob);
xb = matrix(c(prob, runif(N * 2)), ncol = 3);

ep = rep(0L, N);
r = 10000L;

for (i in seq_len(r)) {
  s = cube(prob, xb);
  ep[s] = ep[s] + 1L;
}

print(ep / r - prob);



Doubly balanced sampling

Description

Selects doubly balanced samples with prescribed inclusion probabilities from finite populations.

Usage

local_cube(probabilities, spread_mat, balance_mat, ...)

local_cube_stratified(probabilities, spread_mat, balance_mat, strata, ...)

Arguments

probabilities

A vector of inclusion probabilities.

spread_mat

A matrix of spreading covariates.

balance_mat

A matrix of balancing covariates.

...

Arguments passed on to .sampling_defaults

eps

A small value used when comparing floats.

bucket_size

The maximum size of the k-d-tree nodes. A higher value gives a slower k-d-tree, but is faster to create and takes up less memory.

strata

An integer vector with stratum numbers for each unit.

Details

For the local_cube method, a fixed sized sample is obtained if the first column of balance_mat is the inclusion probabilities. For local_cube_stratified, the inclusion probabilities are inserted automatically.

Value

A vector of sample indices.

Functions

References

Deville, J. C. and Tillé, Y. (2004). Efficient balanced sampling: the cube method. Biometrika, 91(4), 893-912.

Chauvet, G. and Tillé, Y. (2006). A fast algorithm for balanced sampling. Computational Statistics, 21(1), 53-62.

Chauvet, G. (2009). Stratified balanced sampling. Survey Methodology, 35, 115-119.

Grafström, A. and Tillé, Y. (2013). Doubly balanced spatial sampling with spreading and restitution of auxiliary totals. Environmetrics, 24(2), 120-131

Examples

set.seed(12345);
N = 1000;
n = 100;
prob = rep(n/N, N);
xb = matrix(c(prob, runif(N * 2)), ncol = 3);
xs = matrix(runif(N * 2), ncol = 2);
strata = c(rep(1L, 100), rep(2L, 200), rep(3L, 300), rep(4L, 400));

s = local_cube(prob, xs, xb);
plot(xs[, 1], xs[, 2], pch = ifelse(sample_to_indicator(s, N), 19, 1));

s = local_cube_stratified(prob, xs, xb[, -1], strata);
plot(xs[, 1], xs[, 2], pch = ifelse(sample_to_indicator(s, N), 19, 1));


# Respects inclusion probabilities
set.seed(12345);
prob = c(0.2, 0.25, 0.35, 0.4, 0.5, 0.5, 0.55, 0.65, 0.7, 0.9);
N = length(prob);
xb = matrix(c(prob, runif(N * 2)), ncol = 3);
xs = matrix(runif(N * 2), ncol = 2);

ep = rep(0L, N);
r = 10000L;

for (i in seq_len(r)) {
  s = local_cube(prob, xs, xb);
  ep[s] = ep[s] + 1L;
}

print(ep / r - prob);



Spatial balance measure

Description

Calculates the spatial balance of a sample.

Usage

spatial_balance_local(sample, probabilities, spread_mat)

spatial_balance_voronoi(sample, probabilities, spread_mat)

balance_deviation(sample, probabilities, spread_mat)

Arguments

sample

A vector of sample indices.

probabilities

A vector of inclusion probabilities.

spread_mat

A matrix of spreading covariates.

Value

the measure, or in case of balance_deviation, the vector of deviations.

Functions

References

Stevens Jr, D. L., & Olsen, A. R. (2004). Spatially balanced sampling of natural resources. Journal of the American statistical Association, 99(465), 262-278.

Grafström, A., Lundström, N.L.P. & Schelin, L. (2012). Spatially balanced sampling through the Pivotal method. Biometrics 68(2), 514-520.

Prentius, W., & Grafström, A. (2024). How to find the best sampling design: A new measure of spatial balance. Environmetrics, 35(7), e2878.

Examples

set.seed(12345);
N = 500;
n = 70;
prob = rep(n / N, N);
xs = matrix(runif(N * 2), ncol = 2);

s = lpm_2(prob, xs);
spatial_balance_voronoi(s, prob, xs);
spatial_balance_local(s, prob, xs);
balance_deviation(s, prob, xs);


# Compare SRS
r = 1000L;
sb_v = matrix(0.0, r, 2L);
sb_l = matrix(0.0, r, 2L);
bal = matrix(0.0, r, 2L * ncol(xs));

for (i in seq_len(r)) {
  s1 = lpm_2(prob, xs);
  s2 = sample(N, n);
  sb_v[i, ] = c(
    spatial_balance_voronoi(s1, prob, xs),
    spatial_balance_voronoi(s2, prob, xs)
  );
  sb_l[i, ] = c(
    spatial_balance_local(s1, prob, xs),
    spatial_balance_local(s2, prob, xs)
  );
  bal[i, ] = c(
    balance_deviation(s1, prob, xs),
    balance_deviation(s2, prob, xs)
  );
}

# Spatial balance measure (voronoi), LPM vs SRS
print(colMeans(sb_v));
# Spatial balance measure (local), LPM vs SRS
print(colMeans(sb_l));
# Abs. balance deviation, LPM vs SRS
print(colMeans(abs(bal)));



Spatially balanced sampling

Description

Selects spatially balanced samples with prescribed inclusion probabilities from finite populations.

Usage

lpm_1(probabilities, spread_mat, ...)

lpm_1s(probabilities, spread_mat, ...)

lpm_2(probabilities, spread_mat, ...)

scps(probabilities, spread_mat, ...)

lcps(probabilities, spread_mat, ...)

lpm_2_hierarchical(probabilities, spread_mat, sizes, ...)

Arguments

probabilities

A vector of inclusion probabilities.

spread_mat

A matrix of spreading covariates.

...

Arguments passed on to .sampling_defaults

eps

A small value used when comparing floats.

bucket_size

The maximum size of the k-d-tree nodes. A higher value gives a slower k-d-tree, but is faster to create and takes up less memory.

sizes

A vector of integers containing the sizes of the subsamples.

Details

lpm_2_hierarchical selects an initial sample using the LPM2 algorithm, and then splits this sample into subsamples of given sizes, using successive, hierarchical selection with LPM2. When using lpm_2_hierarchical, the inclusion probabilities must sum to an integer, and the sizes vector (the subsamples) must sum to the same integer.

Value

A vector of sample indices, or in the case of hierarchical sampling, a matrix where the first column contains sample indices and the second column contains subsample indices (groups).

Functions

References

Deville, J.-C., & Tillé, Y. (1998). Unequal probability sampling without replacement through a splitting method. Biometrika 85, 89-101.

Grafström, A. (2012). Spatially correlated Poisson sampling. Journal of Statistical Planning and Inference, 142(1), 139-147.

Grafström, A., Lundström, N.L.P. & Schelin, L. (2012). Spatially balanced sampling through the Pivotal method. Biometrics 68(2), 514-520.

Lisic, J. J., & Cruze, N. B. (2016, June). Local pivotal methods for large surveys. In Proceedings of the Fifth International Conference on Establishment Surveys.

Prentius, W. (2024). Locally correlated Poisson sampling. Environmetrics, 35(2), e2832.

Examples

set.seed(12345);
N = 1000;
n = 100;
prob = rep(n/N, N);
xs = matrix(runif(N * 2), ncol = 2);
sizes = c(10L, 20L, 30L, 40L);

s = lpm_1(prob, xs);
plot(xs[, 1], xs[, 2], pch = ifelse(sample_to_indicator(s, N), 19, 1));

s = lpm_1s(prob, xs);
plot(xs[, 1], xs[, 2], pch = ifelse(sample_to_indicator(s, N), 19, 1));

s = lpm_2(prob, xs);
plot(xs[, 1], xs[, 2], pch = ifelse(sample_to_indicator(s, N), 19, 1));

s = scps(prob, xs);
plot(xs[, 1], xs[, 2], pch = ifelse(sample_to_indicator(s, N), 19, 1));

s = lpm_2_hierarchical(prob, xs, sizes);
plot(xs[, 1], xs[, 2], pch = ifelse(sample_to_indicator(s, N), 19, 1));


s = lcps(prob, xs); # May have a long execution time
plot(xs[, 1], xs[, 2], pch = ifelse(sample_to_indicator(s, N), 19, 1));

# Respects inclusion probabilities
set.seed(12345);
prob = c(0.2, 0.25, 0.35, 0.4, 0.5, 0.5, 0.55, 0.65, 0.7, 0.9);
N = length(prob);
xs = matrix(c(prob, runif(N * 2)), ncol = 3);

ep = rep(0L, N);
r = 10000L;

for (i in seq_len(r)) {
  s = lpm_2(prob, xs);
  ep[s] = ep[s] + 1L;
}

print(ep / r - prob);



Unequal probability sampling

Description

Selects samples with prescribed inclusion probabilities from finite populations.

Usage

rpm(probabilities, ...)

spm(probabilities, ...)

cps(probabilities, ...)

poisson(probabilities, ...)

conditional_poisson(probabilities, sample_size, ...)

systematic(probabilities, ...)

systematic_random_order(probabilities, ...)

brewer(probabilities, ...)

pareto(probabilities, ...)

sampford(probabilities, ...)

Arguments

probabilities

A vector of inclusion probabilities.

...

Arguments passed on to .sampling_defaults

eps

A small value used when comparing floats.

max_iter

The maximum number of iterations used in iterative algorithms.

sample_size

The wanted sample size

Details

sampford and conditional_poisson may return an error if a solution is not found within max_iter.

Value

A vector of sample indices.

Functions

References

Bondesson, L., & Thorburn, D. (2008). A list sequential sampling method suitable for real‐time sampling. Scandinavian Journal of Statistics, 35(3), 466-483.

Brewer, K. E. (1975). A Simple Procedure for Sampling pi-ps wor. Australian Journal of Statistics, 17(3), 166-172.

Chauvet, G. (2012). On a characterization of ordered pivotal sampling. Bernoulli, 18(4), 1320-1340.

Deville, J.-C., & Tillé, Y. (1998). Unequal probability sampling without replacement through a splitting method. Biometrika 85, 89-101.

Grafström, A. (2009). Non-rejective implementations of the Sampford sampling design. Journal of Statistical Planning and Inference, 139(6), 2111-2114.

Rosén, B. (1997). On sampling with probability proportional to size. Journal of statistical planning and inference, 62(2), 159-191.

Sampford, M. R. (1967). On sampling without replacement with unequal probabilities of selection. Biometrika, 54(3-4), 499-513.

Examples

set.seed(12345);
N = 1000;
n = 100;
prob = rep(n/N, N);
xs = matrix(runif(N * 2), ncol = 2);

s = rpm(prob);
plot(xs[, 1], xs[, 2], pch = ifelse(sample_to_indicator(s, N), 19, 1));

s = spm(prob);
plot(xs[, 1], xs[, 2], pch = ifelse(sample_to_indicator(s, N), 19, 1));

s = cps(prob);
plot(xs[, 1], xs[, 2], pch = ifelse(sample_to_indicator(s, N), 19, 1));

s = poisson(prob);
plot(xs[, 1], xs[, 2], pch = ifelse(sample_to_indicator(s, N), 19, 1));

s = brewer(prob);
plot(xs[, 1], xs[, 2], pch = ifelse(sample_to_indicator(s, N), 19, 1));

s = pareto(prob);
plot(xs[, 1], xs[, 2], pch = ifelse(sample_to_indicator(s, N), 19, 1));

s = systematic(prob);
plot(xs[, 1], xs[, 2], pch = ifelse(sample_to_indicator(s, N), 19, 1));

s = systematic_random_order(prob);
plot(xs[, 1], xs[, 2], pch = ifelse(sample_to_indicator(s, N), 19, 1));

# Conditional poisson and sampford are not guaranteed to find a solution
prob2 = rep(0.5, 10L);
s = conditional_poisson(prob2, 5L, max_iter = 10000L);
plot(xs[1:10, 1], xs[1:10, 2], pch = ifelse(sample_to_indicator(s, 10L), 19, 1));

s = sampford(prob2, max_iter = 10000L);
plot(xs[1:10, 1], xs[1:10, 2], pch = ifelse(sample_to_indicator(s, 10L), 19, 1));


# Respects inclusion probabilities
set.seed(12345);
prob = c(0.2, 0.25, 0.35, 0.4, 0.5, 0.5, 0.55, 0.65, 0.7, 0.9);
N = length(prob);

ep = rep(0L, N);
r = 10000L;

for (i in seq_len(r)) {
  s = poisson(prob);
  ep[s] = ep[s] + 1L;
}

print(ep / r - prob);



Variance estimator for spatially balanced samples

Description

Variance estimator of HT estimator of population total.

Usage

local_mean_variance(values, probabilities, spread_mat, neighbours = 4L)

Arguments

values

A vector of values of the variable of interest.

probabilities

A vector of inclusion probabilities.

spread_mat

A matrix of spreading covariates.

neighbours

The number of neighbours to construct the means around.

Value

A vector of sample indices.

References

Grafström, A., & Schelin, L. (2014). How to select representative samples. Scandinavian Journal of Statistics, 41(2), 277-290.

Examples

set.seed(12345);
N = 1000;
n = 100;
prob = rep(n/N, N);
xs = matrix(runif(N * 2), ncol = 2);
y = runif(N);

s = lpm_2(prob, xs);
local_mean_variance(y[s], prob[s], xs[s, ], 4);


# Compare SRS, empirical
r = 1000L;
v = matrix(0.0, r, 3L);

for (i in seq_len(r)) {
  s = lpm_2(prob, xs);
  v[i, 1] = local_mean_variance(y[s], prob[s], xs[s, ], 4);
  v[i, 2] = N^2 * sd(y[s]) / n;
  v[i, 3] = sum(y[s] / prob[s]);
}

# Local mean variance, SRS variance, MSE
print(c(mean(v[, 1]), mean(v[, 2]), mean((v[, 3] - sum(y))^2)));



Inclusion probabilities proportional-to-size

Description

Computes the first-order inclusion probabilities from a vector of positive numbers, for an inclusion probabilities proportional-to-size design.

Usage

pips_from_vector(values, sample_size)

Arguments

values

A vector of positive numbers

sample_size

The wanted sample size

Value

A vector of inclusion probabilities proportional-to-size.

Examples

set.seed(12345);
N = 1000;
n = 100;
x = matrix(runif(N * 2), ncol = 2);
prob = pips_from_vector(x[, 1], n);
s = lpm_2(prob, x);
plot(x[, 1], x[, 2], pch = ifelse(sample_to_indicator(s, N), 19, 1));


Transform a sample vector into an inclusion indicator vector

Description

Transform a sample vector into an inclusion indicator vector

Usage

sample_to_indicator(sample, population_size)

Arguments

sample

A vector of sample indices.

population_size

The total size of the population.

Value

An inclusion indicator vector, i.e. a population_size-sized vector of 0/1.

Examples

s = c(1, 2, 10);
si = sample_to_indicator(s, 10);