Help for package COR

Title:

The COR for Optimal Subset Selection in Distributed Estimation

Date:

2024-12-10

Version:

0.2.0

Description:

An algorithm of optimal subset selection, related to Covariance matrices, observation matrices and Response vectors (COR) to select the optimal subsets in distributed estimation. The philosophy of the package is described in Guo G. (2024) <doi:10.1007/s11222-024-10471-z>.

License:

MIT + file LICENSE

Encoding:

UTF-8

RoxygenNote:

7.3.2

Imports:

stats

NeedsCompilation:

Packaged:

2024-12-16 08:52:32 UTC; ASUS

Author:

Guangbao Guo

[aut, cre], Haoyue Song [aut], Lixing Zhu [aut]

Maintainer:

Guangbao Guo <ggb11111111@163.com>

Depends:

R (≥ 3.5.0)

Repository:

CRAN

Date/Publication:

2024-12-16 10:20:02 UTC

Caculate the optimal subset lengths on the COR

Description

Caculate the optimal subset lengths on the COR

Usage

COR(K = K, nk = nk, alpha = alpha, X = X, y = y)

Arguments

K

is the number of subsets

nk

is the length of subsets

alpha

is the significance level

X

is the observation matrix

y

is the response vector

Value

A list containing:

seqL

The index of the subset with the minimum L value.

seqN

The index of the subset with the minimum N value.

lWMN

The optimal subset lengths on the COR.

References

Guo, G., Song, H. & Zhu, L. The COR criterion for optimal subset selection in distributed estimation. Statistics and Computing, 34, 163 (2024). doi:10.1007/s11222-024-10471-z

Examples

p=6;n=1000;K=2;nk=200;alpha=0.05;sigma=1
e=rnorm(n,0,sigma); beta=c(sort(c(runif(p,0,1))));
data=c(rnorm(n*p,5,10));X=matrix(data, ncol=p);
y=X%*%beta+e;
COR(K=K,nk=nk,alpha=alpha,X=X,y=y)

Calculate the LIC estimator for linear regression

Description

This function estimates the coefficients of a linear regression model using a design matrix 'X' and a response vector 'Y'. It implements an A-optimal and D-optimal design criteria to choose optimal subsets of observations.

Usage

LICbeta(X, Y, alpha, K, nk)

Arguments

X

The observation matrix (n x p)

Y

The response vector (n x 1)

alpha

The significance level for computing confidence intervals

K

The number of subsets

nk

The number of observations per subset

Value

A list containing:

E5

The LIC estimator for linear regression.

References

Guo, G., Song, H. & Zhu, L. The COR criterion for optimal subset selection in distributed estimation. Statistics and Computing, 34, 163 (2024). doi:10.1007/s11222-024-10471-z

Calculate the LIC estimator based on A-optimal and D-optimal criterion

Description

Calculate the LIC estimator based on A-optimal and D-optimal criterion

Usage

LICnew(X, Y, alpha, K, nk)

Arguments

X

A matrix of observations (design matrix) with size n x p

Y

A vector of responses with length n

alpha

The significance level for confidence intervals

K

The number of subsets to consider

nk

The size of each subset

Value

A list containing:

E5

The LIC estimator based on A-optimal and D-optimal criterion.

References

Guo, G., Song, H. & Zhu, L. The COR criterion for optimal subset selection in distributed estimation. Statistics and Computing, 34, 163 (2024). doi:10.1007/s11222-024-10471-z

Examples

p = 6; n = 1000; K = 2; nk = 200; alpha = 0.05; sigma = 1
e = rnorm(n, 0, sigma); beta = c(sort(c(runif(p, 0, 1))));
data = c(rnorm(n * p, 5, 10)); X = matrix(data, ncol = p);
Y = X %*% beta + e;
LICnew(X = X, Y = Y, alpha = alpha, K = K, nk = nk)

Calculate MSE values for different beta estimation methods

Description

Calculate MSE values for different beta estimation methods

Usage

MSEbeta(X, Y, alpha, K, nk)

Arguments

X

The design matrix (observations).

Y

The response vector.

alpha

The significance level.

K

The number of subsets.

nk

The length of subsets (number of observations in each subset).

Value

A list containing:

MSECOR

The MSE of the COR beta estimator.

MSEAopt

The MSE of the A-optimal beta estimator.

MSEDopt

The MSE of the D-optimal beta estimator.

MSElic

The MSE of the LIC beta estimator.

References

Guo, G., Song, H. & Zhu, L. The COR criterion for optimal subset selection in distributed estimation. Statistics and Computing, 34, 163 (2024). doi:10.1007/s11222-024-10471-z

Caculate the MSE values of the COR criterion in simulation

Description

Caculate the MSE values of the COR criterion in simulation

Usage

MSEcom(K = K, nk = nk, alpha = alpha, X = X, y = y)

Arguments

K

is the number of subsets

nk

is the length of subsets

alpha

is the significance level

X

is the observation matrix

y

is the response vector

Value

A list containing:

MSEx

The Mean Squared Error between the true beta and the estimate betax based on the COR.

MSEA

The Mean Squared Error between the true beta and the estimate betaA based on the least squares estimate for subset A.

MSEc

The Mean Squared Error between the true beta and the estimate betac based on the COR-selected subset.

MSEm

The Mean Squared Error between the true beta and the median estimator betamm across all subsets.

MSEa

The Mean Squared Error between the true beta and the mean estimator betaa across all subsets.

References

Guo, G., Song, H. & Zhu, L. The COR criterion for optimal subset selection in distributed estimation. Statistics and Computing, 34, 163 (2024). doi:10.1007/s11222-024-10471-z

Examples

p=6;n=1000;K=2;nk=500;alpha=0.05;sigma=1
e=rnorm(n,0,sigma); beta=c(sort(c(runif(p,0,1))));
data=c(rnorm(n*p,5,10));X=matrix(data, ncol=p);
y=X%*%beta+e;
MSEcom(K=K,nk=nk,alpha=alpha,X=X,y=y)

Caculate the MSE values of the COR criterion for redundant data in simulation

Description

Caculate the MSE values of the COR criterion for redundant data in simulation

Usage

MSEver(K = K, nk = nk, alpha = alpha, X = X, y = y)

Arguments

K

is the number of subsets

nk

is the length of subsets

alpha

is the significance level

X

is the observation matrix

y

is the response vector

Value

A list containing:

minE

The minimum value of the error variance estimator.

Mcor

The MSE of the COR estimator.

Mx

The MSE of the estimator based on the subset with the maximum M.

MA

The MSE of the estimator based on the subset with the minimum W.

References

Guo, G., Song, H. & Zhu, L. The COR criterion for optimal subset selection in distributed estimation. Statistics and Computing, 34, 163 (2024). doi:10.1007/s11222-024-10471-z

Examples

p=6;n=1000;K=2;nk=200;alpha=0.05;sigma=1
e=rnorm(n,0,sigma); beta=c(sort(c(runif(p,0,1))));
data=c(rnorm(n*p,5,10));X=matrix(data, ncol=p);
y=X%*%beta+e;
MSEver(K=K,nk=nk,alpha=alpha,X=X,y=y)

Caculate the estimators of beta on the A-opt and D-opt

Description

Caculate the estimators of beta on the A-opt and D-opt

Usage

beta_AD(K = K, nk = nk, alpha = alpha, X = X, y = y)

Arguments

K

is the number of subsets

nk

is the length of subsets

alpha

is the significance level

X

is the observation matrix

y

is the response vector

Value

A list containing:

betaA

The estimator of beta on the A-opt.

betaD

The estimator of beta on the D-opt.

References

Guo, G., Song, H. & Zhu, L. The COR criterion for optimal subset selection in distributed estimation. Statistics and Computing, 34, 163 (2024). doi:10.1007/s11222-024-10471-z

Examples

 p=6;n=1000;K=2;nk=200;alpha=0.05;sigma=1
 e=rnorm(n,0,sigma); beta=c(sort(c(runif(p,0,1))));
 data=c(rnorm(n*p,5,10));X=matrix(data, ncol=p);
 y=X%*%beta+e;
 beta_AD(K=K,nk=nk,alpha=alpha,X=X,y=y)

Caculate the estimators of beta on the LEV-opt#'

Description

Caculate the estimators of beta on the LEV-opt#'

Usage

beta_LW(X, Y, K, nk)

Arguments

X

is the observation matrix

Y

is the response vector

K

is the number of subsets

nk

is the length of subsets

Value

A list containing:

betalev

The estimator of beta on the LEV-opt subset.

betam

The mean of the beta estimators across all K subsets.

AMSE

The Average Mean Squared Error (AMSE) for the estimator.

WMSE

The Weighted Mean Squared Error (WMSE) for the estimator.

MSElevb

The Mean Squared Error (MSE) of the LEV-opt estimator compared to the true beta.

MSEb

The Mean Squared Error (MSE) of the mean estimator (betam) compared to the true beta.

MSEyleva

The Mean Squared Error (MSE) of the LEV-opt estimator on the subset with the maximum hat value (Xleva).

MSEyleviy

The Mean Squared Error (MSE) of the LEV-opt estimator on the subset with the minimum hat value (Xlevi).

MSEW

The Mean Squared Error (MSE) of the weighted estimator (Wbeta) compared to the true beta.

MSEw

The Mean Squared Error (MSE) of the weighted estimator (wbeta) compared to the true beta.

References

Guo, G., Song, H. & Zhu, L. The COR criterion for optimal subset selection in distributed estimation. Statistics and Computing, 34, 163 (2024). doi:10.1007/s11222-024-10471-z

Caculate the estimator of beta on the COR

Description

Caculate the estimator of beta on the COR

Usage

beta_cor(K = K, nk = nk, alpha = alpha, X = X, y = y)

Arguments

K

is the number of subsets

nk

is the length of subsets

alpha

is the significance level

X

is the observation matrix

y

is the response vector

Value

A list containing:

betaC

The estimator of beta on the COR.

References

Guo, G., Song, H. & Zhu, L. The COR criterion for optimal subset selection in distributed estimation. Statistics and Computing, 34, 163 (2024). doi:10.1007/s11222-024-10471-z

Examples

 p=6;n=1000;K=2;nk=200;alpha=0.05;sigma=1
 e=rnorm(n,0,sigma); beta=c(sort(c(runif(p,0,1))));
 data=c(rnorm(n*p,5,10));X=matrix(data, ncol=p);
 y=X%*%beta+e;
 beta_cor(K=K,nk=nk,alpha=alpha,X=X,y=y)

The communities and crime data set

Description

A data set about the communities and crime

Usage

data("communities")

Format

A data frame with 1994 observations on the following 128 variables.

V1: a numeric vector
V2: a numeric vector
V3: a numeric vector
V4: a character vector
V5: a numeric vector
V6: a numeric vector
V7: a numeric vector
V8: a numeric vector
V9: a numeric vector
V10: a numeric vector
V11: a numeric vector
V12: a numeric vector
V13: a numeric vector
V14: a numeric vector
V15: a numeric vector
V16: a numeric vector
V17: a numeric vector
V18: a numeric vector
V19: a numeric vector
V20: a numeric vector
V21: a numeric vector
V22: a numeric vector
V23: a numeric vector
V24: a numeric vector
V25: a numeric vector
V26: a numeric vector
V27: a numeric vector
V28: a numeric vector
V29: a numeric vector
V30: a numeric vector
V31: a numeric vector
V32: a numeric vector
V33: a numeric vector
V34: a numeric vector
V35: a numeric vector
V36: a numeric vector
V37: a numeric vector
V38: a numeric vector
V39: a numeric vector
V40: a numeric vector
V41: a numeric vector
V42: a numeric vector
V43: a numeric vector
V44: a numeric vector
V45: a numeric vector
V46: a numeric vector
V47: a numeric vector
V48: a numeric vector
V49: a numeric vector
V50: a numeric vector
V51: a numeric vector
V52: a numeric vector
V53: a numeric vector
V54: a numeric vector
V55: a numeric vector
V56: a numeric vector
V57: a numeric vector
V58: a numeric vector
V59: a numeric vector
V60: a numeric vector
V61: a numeric vector
V62: a numeric vector
V63: a numeric vector
V64: a numeric vector
V65: a numeric vector
V66: a numeric vector
V67: a numeric vector
V68: a numeric vector
V69: a numeric vector
V70: a numeric vector
V71: a numeric vector
V72: a numeric vector
V73: a numeric vector
V74: a numeric vector
V75: a numeric vector
V76: a numeric vector
V77: a numeric vector
V78: a numeric vector
V79: a numeric vector
V80: a numeric vector
V81: a numeric vector
V82: a numeric vector
V83: a numeric vector
V84: a numeric vector
V85: a numeric vector
V86: a numeric vector
V87: a numeric vector
V88: a numeric vector
V89: a numeric vector
V90: a numeric vector
V91: a numeric vector
V92: a numeric vector
V93: a numeric vector
V94: a numeric vector
V95: a numeric vector
V96: a numeric vector
V97: a numeric vector
V98: a numeric vector
V99: a numeric vector
V100: a numeric vector
V101: a numeric vector
V102: a numeric vector
V103: a numeric vector
V104: a numeric vector
V105: a numeric vector
V106: a numeric vector
V107: a numeric vector
V108: a numeric vector
V109: a numeric vector
V110: a numeric vector
V111: a numeric vector
V112: a numeric vector
V113: a numeric vector
V114: a numeric vector
V115: a numeric vector
V116: a numeric vector
V117: a numeric vector
V118: a numeric vector
V119: a numeric vector
V120: a numeric vector
V121: a numeric vector
V122: a numeric vector
V123: a numeric vector
V124: a numeric vector
V125: a numeric vector
V126: a numeric vector
V127: a numeric vector
V128: a numeric vector

Source

UCI repository

References

Redmond, M. A. and A. Baveja: A Data-Driven Software Tool for Enabling Cooperative Information Sharing Among Police Departments. European Journal of Operational Research 141 (2002) 660-678.

Examples

data(communities)
## maybe str(communities) ; plot(communities) ...

The chemical sensor data set

Description

A data set about chemical sensor

Usage

data("ethylene_CO")

Format

A data frame with 4001 observations on the following 19 variables.

V1: a character vector
V2: a character vector
V3: a character vector
V4: a character vector
V5: a character vector
V6: a character vector
V7: a character vector
V8: a character vector
V9: a character vector
V10: a character vector
V11: a character vector
V12: a character vector
V13: a character vector
V14: a character vector
V15: a character vector
V16: a character vector
V17: a character vector
V18: a character vector
V19: a character vector

Details

We selected the first 4001 rows on the original data set about 1048576 observations on 19 variables.

Source

UCI Repository

References

Wang, H. Y., Zhu, R., and Ma, P. (2018). Optimal subsampling for large sample logistic regression. Journal of the American Statistical Association, 113(522), 829-844.

Examples

data(ethylene_CO)
## maybe str(ethylene_CO) ; plot(ethylene_CO) ...