Help for package KQM

Type:

Package

Title:

K Quantiles Medoids (KQM) Clustering

Version:

1.1.1

Date:

2025-10-13

Depends:

methods, MASS, gtools, cluster

Description:

K Quantiles Medoids (KQM) clustering applies quantiles to divide data of each dimension into K mean intervals. Combining quantiles of all the dimensions of the data and fully permuting quantiles on each dimension is the strategy to determine a pool of candidate initial cluster centers. To find the best initial cluster centers from the pool of candidate initial cluster centers, two methods based on quantile strategy and PAM strategy respectively are proposed. During a clustering process, medoids of clusters are used to update cluster centers in each iteration. Comparison between KQM and the method of randomly selecting initial cluster centers shows that KQM is almost always getting clustering results with smaller total sum squares of distances.

Collate:

AllClasses.R KQM.R C.f.R Dist.R

License:

GPL-2

NeedsCompilation:

Packaged:

2025-10-12 17:35:40 UTC; Yarong

Author:

Yarong Yang [aut, cre], Nader Ebrahimi [ctb], Yoram Rubin [ctb], Jacob Zhang [ctb]

Maintainer:

Yarong Yang <Yi.YA_yaya@hotmail.com>

Repository:

CRAN

Date/Publication:

2025-10-16 18:30:19 UTC

K Quantiles Medoids (KQM) Clustering.

Description

K Quantiles Medoids (KQM) clustering applies quantiles to divide data of each dimension into K mean intervals. Combining quantiles of all the dimensions of the data and fully permuting quantiles on each dimension is the strategy to determine a pool of candidate initial cluster centers. To find the best initial cluster centers from the pool of candidate initial cluster centers, two methods based on quantile stategy and PAM strategy respectively are proposed. During a clustering process, medoids of clusters are used to update cluster centers in each iteration. Comparison between KQM and the method of randomly selecting initial cluster centers shows that KQM is almost always getting clustering results with smaller total sum squares of distances.

Details

Package:	KQM
Type:	Package
Version:	1.1.1
Date:	2025-10-13
License:	GPL-2

Author(s)

Yarong Yang, Nader Ebrahimi, Yoram Rubin, and Jacob Zhang

References

Yarong Yang, Nader Ebrahimi, Yoram Rubin, and Jacob Zhang.(2025) KQM: K Quantiles Medoids (KQM) Clustering. technical report in preparation

Examples


# Generate 10 bivariate normal samples
require(MASS)
mu1 <- c(0, 0)          
sigma1 <- matrix(c(1, 0.5, 0.5, 1), nrow=2)  
SP1 <- mvrnorm(n=10, mu=mu1, Sigma=sigma1)

# Generate another 10 bivariate normal samples
mu2<-c(1,1)
sigma2<-matrix(c(1,0,0,1),nrow=2)
SP2<-mvrnorm(n=10,mu=mu2,Sigma=sigma2)

# Generate 10 more new bivariate normal samples
mu3<-c(2,2)
sigma3<-matrix(c(1,0.5,0.5,1),nrow=2)
SP3<-mvrnorm(n=10,mu=mu3,Sigma=sigma3)

# Combine the three groups of bivariate normal samples
data<-rbind(SP1,SP2,SP3)

# Conduct KQM analysis with K=3 by the "YY" method and quantile strategy is 
# selected to determine the initial cluster centers
Res<-KQM(data,3,method="YY",iteration=1000,type=1,style=1)
names(Res@Classes[[1]])<-rep("red",length(Res@Classes[[1]]))
names(Res@Classes[[2]])<-rep("blue",length(Res@Classes[[2]]))
names(Res@Classes[[3]])<-rep("green",length(Res@Classes[[3]]))
Cols<-names(sort(c(Res@Classes[[1]],Res@Classes[[2]],Res@Classes[[3]])))
plot(data[,1],data[,2],type="p",pch=19,col=Cols,lwd=2,xlab=paste("Total SSE = ",
     round(Res@SSE[length(Res@SSE)],2),sep=""),ylab="",
     main="KQM Clustering Results, 'YY' method, style 1")
points(Res@Centers,pch=5,col=c("red","blue","green")) 

#  Compare the clustering results with the original samples 
oldpar<-par(mfrow=c(1,2))
plot(data[,1],data[,2],type="p",pch=19,col=rep(c("sky blue","orange","purple"),rep(10,3)),
     lwd=2,xlab="",ylab="",main="Original Data")
plot(data[,1],data[,2],type="p",pch=19,col=Cols,lwd=2,xlab=paste("Total SSE = ",
     round(Res@SSE[length(Res@SSE)],2),sep=""),ylab="",
     main="KQM Clustering Results, 'YY' method, style 1")
points(Res@Centers,pch=5,col=c("red","blue","green")) 
on.exit<-par(oldpar)

# Conduct KQM analysis with K=3 and randomly picking 3 samples as initial 
#  cluster centers
Res2<-KQM(data,3,method="initial",initial=data[sample(1:nrow(data),3),],iteration=1000,type=1)
names(Res2@Classes[[1]])<-rep("red",length(Res2@Classes[[1]]))
names(Res2@Classes[[2]])<-rep("blue",length(Res2@Classes[[2]]))
names(Res2@Classes[[3]])<-rep("green",length(Res2@Classes[[3]]))
Cols2<-names(sort(c(Res2@Classes[[1]],Res2@Classes[[2]],Res2@Classes[[3]])))
plot(data[,1],data[,2],type="p",pch=19,col=Cols2,lwd=2,xlab=paste("Total SSE = ",
     round(Res2@SSE[length(Res2@SSE)],2),sep=""),ylab="",
     main="KQM Clustering Results, 'initial' method")
points(Res2@Centers,pch=5,col=c("red","blue","green"))

#  Compare the clustering results by the above "YY" method and the "initial" method
oldpar<-par(mfrow=c(1,2))
plot(data[,1],data[,2],type="p",pch=19,col=Cols,lwd=2,xlab=paste("Total SSE = ",
     round(Res@SSE[length(Res@SSE)],2),sep=""),ylab="",
     main="KQM Clustering Results, 'YY' method, style 1")
points(Res@Centers,pch=5,col=c("red","blue","green")) 
plot(data[,1],data[,2],type="p",pch=19,col=Cols2,lwd=2,xlab=paste("Total SSE = ",
     round(Res2@SSE[length(Res2@SSE)],2),sep=""),ylab="",
     main="KQM Clustering Results, 'initial' method")
points(Res2@Centers,pch=5,col=c("red","blue","green")) 
on.exit(oldpar)

# Conduct KQM analysis with K=3 by the "YY" method and PAM strategy is selected to 
# determine the initial cluster centers
Res.2<-KQM(data,3,method="YY",iteration=1000,type=1,style=2)
names(Res.2@Classes[[1]])<-rep("red",length(Res.2@Classes[[1]]))
names(Res.2@Classes[[2]])<-rep("blue",length(Res.2@Classes[[2]]))
names(Res.2@Classes[[3]])<-rep("green",length(Res.2@Classes[[3]]))
Cols.2<-names(sort(c(Res.2@Classes[[1]],Res.2@Classes[[2]],Res.2@Classes[[3]])))
plot(data[,1],data[,2],type="p",pch=19,col=Cols.2,lwd=2,xlab=paste("Total SSE = ",
     round(Res.2@SSE[length(Res.2@SSE)],2),sep=""),ylab="",
     main="KQM Clustering Results, 'YY' method, style 2")
points(Res.2@Centers,pch=5,col=c("red","blue","green")) 

#  Compare the clustering results by the "YY" method with style 1 and style 2 respectively 
oldpar<-par(mfrow=c(1,2))
plot(data[,1],data[,2],type="p",pch=19,col=Cols,lwd=2,xlab=paste("Total SSE = ",
     round(Res@SSE[length(Res@SSE)],2),sep=""),ylab="",
     main="KQM Clustering Results, YY method, style 1")
points(Res@Centers,pch=5,col=c("red","blue","green")) 
plot(data[,1],data[,2],type="p",pch=19,col=Cols.2,lwd=2,xlab=paste("Total SSE = ",
     round(Res.2@SSE[length(Res.2@SSE)],2),sep=""),ylab="",
     main="KQM Clustering Results, 'YY' method, style 2")
points(Res.2@Centers,pch=5,col=c("red","blue","green")) 
on.exit<-par(oldpar)

Finding the center of a cluster.

Description

It's a function of finding the center of a cluster.

Usage

C.f(DM, Cl.ind)

Arguments

DM

Numeric. The distance matrix with the element of the i-th row and the j-th column being the distance between the i-th object and the j-th object.

Cl.ind

Integer. The index vector of a cluster.

Value

A vector.

Author(s)

Yarong Yang

Examples


### Generate a distance matrix with 10 observations
dm1<-matrix(rnorm(100,0,1),10,10)
dm2<-t(dm1)
dm<-dm1+dm2
for(i in 1:10) 
     dm[i,i]<-0
DM<-abs(dm)

### Randomly generate a cluster with 5 samples
Cl.ind<-sample(1:10,5)

Res<-C.f(DM,Cl.ind)

Finding the distance between two observations.

Description

It's a function of finding the distance between two observations.

Usage

Dist(x,y,type)

Arguments

x

Numeric. A vector denoting an observation.

y

Numeric. A vector denoting an observation.

type

Integer. The type of distance between observations. 1 for Euclidean distance. 2 for Manhattan distance. 3 for maximum deviation among dimensions.

Value

A numeric number.

Examples

x<-rnorm(10,0,1)
y<-rnorm(10,1,1)
z<-rnorm(10,2,1)
data<-cbind(x,y,z)
Res<-Dist(data[1,],data[2,],type=1)

K Quantiles Medoids (KQM) Clustering.

Description

K Quantiles Medoids (KQM) clustering applies quantiles to divide data of each dimension into K mean intervals. Combining quantiles of all the dimensions of the data and fully permuting quantiles on each dimension is the strategy to determine a pool of candidate initial cluster centers. To find the best initial cluster centers from the pool of candidate initial cluster centers, two methods based on quantile stategy and PAM strategy respectively are proposed. During a clustering process, medoids of clusters are used to update cluster centers in each iteration. Comparison between KQM and the method of randomly selecting initial cluster centers shows that KQM is almost always getting clustering results with smaller total sum squares of distances.

Usage

     KQM(data, K, method, initial, iteration, type, style)

Arguments

data

Numeric. An observation matrix with each row being an oberservation.

K

Integer. The number of clusters expected.

method

Character. "YY" or "initial". No initial cluster centers are required for "YY" method. "initial" method can work for any initial cluster centers.

initial

Numeric. Either the selected initial center matrix with each row being an observation, or 1 for the first K rows of the data matrix being the intial center.

iteration

Integer. The number of the most iterations wanted for the clustering process.

type

Integer. The type of distance between observations. 1 for Euclidean distance. 2 for Manhattan distance. 3 for maximum deviation among dimensions.

style

Integer. 1 or 2. 1 for quantile strategy and 2 for PAM strategy respectively in the stage of selecting initial cluster centers.

Value

An object of class KQMobj.

Author(s)

Yarong Yang

References

Yarong Yang, Nader Ebrahimi, Yoram Rubin, and Jacob Zhang.(2025) KQM: K Quantiles Medoids (KQM) Clustering. technical report in preparation

Examples


# Generate 10 bivariate normal samples
require(MASS)
mu1 <- c(0, 0)          
sigma1 <- matrix(c(1, 0.5, 0.5, 1), nrow=2)  
SP1 <- mvrnorm(n=10, mu=mu1, Sigma=sigma1)

# Generate another 10 bivariate normal samples
mu2<-c(1,1)
sigma2<-matrix(c(1,0,0,1),nrow=2)
SP2<-mvrnorm(n=10,mu=mu2,Sigma=sigma2)

# Generate 10 more new bivariate normal samples
mu3<-c(2,2)
sigma3<-matrix(c(1,0.5,0.5,1),nrow=2)
SP3<-mvrnorm(n=10,mu=mu3,Sigma=sigma3)

# Combine the three groups of bivariate normal samples
data<-rbind(SP1,SP2,SP3)

# Conduct KQM analysis with K=3 by the "YY" method and quantile strategy is selected 
# to determine the initial cluster centers
Res<-KQM(data,3,method="YY",iteration=1000,type=1,style=1)
names(Res@Classes[[1]])<-rep("red",length(Res@Classes[[1]]))
names(Res@Classes[[2]])<-rep("blue",length(Res@Classes[[2]]))
names(Res@Classes[[3]])<-rep("green",length(Res@Classes[[3]]))
Cols<-names(sort(c(Res@Classes[[1]],Res@Classes[[2]],Res@Classes[[3]])))
plot(data[,1],data[,2],type="p",pch=19,col=Cols,lwd=2,xlab=paste("Total SSE = ",
     round(Res@SSE[length(Res@SSE)],2),sep=""),ylab="",
     main="KQM Clustering Results, 'YY' method, style 1")
points(Res@Centers,pch=5,col=c("red","blue","green")) 

#  Compare the clustering results with the original samples 
oldpar<-par(mfrow=c(1,2))
plot(data[,1],data[,2],type="p",pch=19,col=rep(c("sky blue","orange","purple"),rep(10,3)),
     lwd=2,xlab="",ylab="",main="Original Data")
plot(data[,1],data[,2],type="p",pch=19,col=Cols,lwd=2,xlab=paste("Total SSE = ",
     round(Res@SSE[length(Res@SSE)],2),sep=""),ylab="",
     main="KQM Clustering Results, 'YY' method, style 1")
points(Res@Centers,pch=5,col=c("red","blue","green")) 
on.exit<-par(oldpar)

# Conduct KQM analysis with K=3 and randomly picking 3 samples as initial 
# cluster centers
Res2<-KQM(data,3,method="initial",initial=data[sample(1:nrow(data),3),],iteration=1000,type=1)
names(Res2@Classes[[1]])<-rep("red",length(Res2@Classes[[1]]))
names(Res2@Classes[[2]])<-rep("blue",length(Res2@Classes[[2]]))
names(Res2@Classes[[3]])<-rep("green",length(Res2@Classes[[3]]))
Cols2<-names(sort(c(Res2@Classes[[1]],Res2@Classes[[2]],Res2@Classes[[3]])))
plot(data[,1],data[,2],type="p",pch=19,col=Cols2,lwd=2,xlab=paste("Total SSE = ",
     round(Res2@SSE[length(Res2@SSE)],2),sep=""),ylab="",
     main="KQM Clustering Results, 'initial' method")
points(Res2@Centers,pch=5,col=c("red","blue","green"))

#  Compare the clustering results by the above "YY" method and the "initial" method
oldpar<-par(mfrow=c(1,2))
plot(data[,1],data[,2],type="p",pch=19,col=Cols,lwd=2,xlab=paste("Total SSE = ",
     round(Res@SSE[length(Res@SSE)],2),sep=""),ylab="",
     main="KQM Clustering Results, 'YY' method, style 1")
points(Res@Centers,pch=5,col=c("red","blue","green")) 
plot(data[,1],data[,2],type="p",pch=19,col=Cols2,lwd=2,xlab=paste("Total SSE = ",
     round(Res2@SSE[length(Res2@SSE)],2),sep=""),ylab="",
     main="KQM Clustering Results, 'initial' method")
points(Res2@Centers,pch=5,col=c("red","blue","green")) 
on.exit(oldpar)

# Conduct KQM analysis with K=3 by the "YY" method and PAM strategy is selected to 
# determine the initial cluster centers
Res.2<-KQM(data,3,method="YY",iteration=1000,type=1,style=2)
names(Res.2@Classes[[1]])<-rep("red",length(Res.2@Classes[[1]]))
names(Res.2@Classes[[2]])<-rep("blue",length(Res.2@Classes[[2]]))
names(Res.2@Classes[[3]])<-rep("green",length(Res.2@Classes[[3]]))
Cols.2<-names(sort(c(Res.2@Classes[[1]],Res.2@Classes[[2]],Res.2@Classes[[3]])))
plot(data[,1],data[,2],type="p",pch=19,col=Cols.2,lwd=2,xlab=paste("Total SSE = ",
     round(Res.2@SSE[length(Res.2@SSE)],2),sep=""),ylab="",
     main="KQM Clustering Results, 'YY' method, style 2")
points(Res.2@Centers,pch=5,col=c("red","blue","green")) 

#  Compare the clustering results by the "YY" method with style 1 and style 2 respectively 
oldpar<-par(mfrow=c(1,2))
plot(data[,1],data[,2],type="p",pch=19,col=Cols,lwd=2,xlab=paste("Total SSE = ",
     round(Res@SSE[length(Res@SSE)],2),sep=""),ylab="",
     main="KQM Clustering Results, YY method, style 1")
points(Res@Centers,pch=5,col=c("red","blue","green")) 
plot(data[,1],data[,2],type="p",pch=19,col=Cols.2,lwd=2,xlab=paste("Total SSE = ",
     round(Res.2@SSE[length(Res.2@SSE)],2),sep=""),ylab="",
     main="KQM Clustering Results, 'YY' method, style 2")
points(Res.2@Centers,pch=5,col=c("red","blue","green")) 
on.exit<-par(oldpar)

Class to contain the results from function KQM.

Description

The function KQM returns object of class KQMobj that contains the number of clusters, centers of clusters, positions of observations in each cluster, the observations in each cluster, and the sum squares of distances in each cluster and all the clusters.

Objects from the Class

new("KQMobj",K=new("numeric"),DM=new("matrix"),Centers.ix=new("vector"),Centers=new("matrix"),Classes=new("list"),Clusters=new("list"),SSE=new("numeric"))

Slots

K:: An integer being the number of clusters.
DM:: A numeric matrix with the element of the i-th row and the j-th column being the distance between the i-th object and the j-th object.
Centers.ix:: An integer vector showing the original positions of the cluster centers.
Centers:: A numeric matrix with each row being the center of a cluster.
Classes:: An integer list showing the original positions of the observations in each cluster.
Clusters:: A numeric list showing the observations in each cluster.
SSE:: A numeric vector composed of SSE of each cluster and the total SSE of all the clusters.

Author(s)

Yarong Yang

References

Yarong Yang, Nader Ebrahimi, Yoram Rubin, and Jacob Zhang.(2025) K Quantiles Medoids (KQM) Clustering. technical report in preparation

Examples

showClass("KQMobj")