Type: Package
Title: Generalized Measure of Correlation (GMC)
Version: 0.1.2
Description: Provides tools to compute the Generalized Measure of Correlation (GMC), a dependence measure accounting for nonlinearity and asymmetry in the relationship between variables. Based on the method proposed by Zheng, Shi, and Zhang (2012) <doi:10.1080/01621459.2012.710509>.
License: GPL (≥ 3)
Encoding: UTF-8
RoxygenNote: 7.3.2
Suggests: testthat (≥ 3.0.0), knitr, rmarkdown
Config/testthat/edition: 3
Imports: ks, stats
VignetteBuilder: knitr
NeedsCompilation: no
Packaged: 2025-10-31 01:40:15 UTC; ding'x'j
Author: Xuejing Ding [aut, cre], Zhengjun Zhang [aut]
Maintainer: Xuejing Ding <dingxuejing24@mails.ucas.ac.cn>
Repository: CRAN
Date/Publication: 2025-10-31 12:10:02 UTC

Generalized Measure of Correlation: GMC(X | Y)

Description

Generalized Measure of Correlation: GMC(X | Y)

Usage

GMC_X_given_Y(X, Y, kernel = dnorm)

Arguments

X

Predictor variable

Y

Response variable

kernel

Kernel function (default = dnorm)

Value

GMC(X|Y) estimate

Examples

# Generate sample data with nonlinear relationship
set.seed(123)
n <- 1000
X <- rnorm(n)
Y <- X^2 + rnorm(n, sd = 0.5)

# Calculate GMC(X|Y)
gmc_result <- GMC_X_given_Y(X, Y)
print(gmc_result)

Generalized Measure of Correlation: GMC(Y | X)

Description

Generalized Measure of Correlation: GMC(Y | X)

Usage

GMC_Y_given_X(X, Y, kernel = dnorm)

Arguments

X

Predictor variable

Y

Response variable

kernel

Kernel function (default = dnorm)

Value

GMC(Y|X) estimate

Examples

# Generate sample data with linear relationship
set.seed(123)
n <- 1000
X <- rnorm(n)
Y <- 2 * X + rnorm(n, sd = 0.5)

# Calculate GMC(Y|X)
gmc_result <- GMC_Y_given_X(X, Y)
print(gmc_result)

Feature selection using GMC ranking

Description

Feature selection using GMC ranking

Usage

GMC_feature_ranking(X, Y, kernel = dnorm, sort = TRUE)

Arguments

X

A matrix or data.frame of predictors

Y

A numeric response vector

kernel

Kernel function (default = dnorm)

sort

Logical, whether to sort variables by GMC score

Value

A data.frame with variable names and GMC scores

Examples

# Generate sample data with multiple predictors
set.seed(123)
n <- 500
X1 <- rnorm(n)
X2 <- rnorm(n)
X3 <- rnorm(n)
Y <- 2 * X1 + X2^2 + rnorm(n, sd = 0.5)
X <- cbind(X1, X2, X3)

# Rank features by GMC
ranking <- GMC_feature_ranking(X, Y)
print(ranking)

Estimate E[(E[Y|X])^2] using kernel regression

Description

This function estimates the squared conditional expectation E[(E[Y|X])^2] using Nadaraya-Watson regression with Gaussian kernel.

Usage

estimate_EY_X_squared(X, Y, grid_length = 10000, kernel = dnorm)

Arguments

X

A numeric vector of predictors.

Y

A numeric vector of responses.

grid_length

Number of grid points for numerical integration (default = 10000).

kernel

Kernel function (default is dnorm).

Value

A list containing:

estimate

Estimated value of E[(E[Y|X])^2]

bandwidth

Selected kernel bandwidth

mean_Y

Mean of Y

var_Y

Variance of Y

EY_grid

Grid values of E[Y|X]

fx_grid

Estimated marginal density of X

x_grid

Grid points used in estimation

References

Zheng, S., Shi, N.Z., & Zhang, Z. (2012). Generalized Measures of Correlation for Asymmetry, Nonlinearity, and Beyond. Journal of the American Statistical Association, 107(499), 1239-1252. doi:10.1080/01621459.2012.710509