| Type: | Package | 
| Title: | Single-Cell Meta-Path Based Omic Embedding | 
| Version: | 0.1.3 | 
| Author: | Yuntong Hou  | 
| Description: | Provide a workflow to jointly embed chromatin accessibility peaks and expressed genes into a shared low-dimensional space using paired single-cell ATAC-seq (scATAC-seq) and single-cell RNA-seq (scRNA-seq) data. It integrates regulatory relationships among peak-peak interactions (via 'Cicero'), peak-gene interactions (via Lasso, random forest, and XGBoost), and gene-gene interactions (via principal component regression). With the input of paired scATAC-seq and scRNA-seq data matrices, it assigns a low-dimensional feature vector to each gene and peak. Additionally, it supports the reconstruction of gene-gene network with low-dimensional projections (via epsilon-NN) and then the comparison of the networks of two conditions through manifold alignment implemented in 'scTenifoldNet'. See <doi:10.1093/bioinformatics/btaf483> for more details. | 
| URL: | https://github.com/Houyt23/scPOEM | 
| BugReports: | https://github.com/Houyt23/scPOEM/issues | 
| License: | GPL-2 | GPL-3 [expanded from: GPL (≥ 2)] | 
| Encoding: | UTF-8 | 
| Imports: | methods, utils, stats, foreach (≥ 1.5.2), doParallel (≥ 1.0.17), tictoc (≥ 1.2.1), Matrix (≥ 1.6-3), glmnet (≥ 4.1-8), xgboost (≥ 1.7.10), reticulate, stringr, magrittr, scTenifoldNet, VGAM (≥ 1.1-13), Biobase (≥ 2.66.0), BiocGenerics (≥ 0.52.0), monocle (≥ 2.34.0), cicero (≥ 1.24.0) | 
| Depends: | R (≥ 4.1.0) | 
| RoxygenNote: | 7.3.2 | 
| NeedsCompilation: | no | 
| Packaged: | 2025-09-25 06:08:15 UTC; qingzhi | 
| Maintainer: | Yuntong Hou <houyt223@gmail.com> | 
| Repository: | CRAN | 
| Date/Publication: | 2025-09-25 07:10:02 UTC | 
Construct Gene-Gene Network
Description
Construct the gene-gene network via principle component regression.
Usage
GGN(
  Y,
  dirpath = tempdir(),
  count_device = 1,
  nComp = 5,
  rebuild_GGN = TRUE,
  save_file = TRUE,
  python_env = "scPOEM_env"
)
Arguments
Y | 
 The scRNA-seq data, sparse matrix.  | 
dirpath | 
 The folder path to read or write file.  | 
count_device | 
 The number of cpus used to train the Lasso model.  | 
nComp | 
 The number of PCs used for regression  | 
rebuild_GGN | 
 Logical. Whether to rebuild the gene-gene network (GGN) from scratch. If FALSE, the function will attempt to read from   | 
save_file | 
 Logical, whether to save the output to a file.  | 
python_env | 
 Name or path of the Python environment to be used.  | 
Value
The GGN network.
Examples
library(scPOEM)
dirpath <- "./example_data"
# Download single mode example data
data(example_data_single)
# Construct GGN net.
gg_net <- GGN(example_data_single$Y,
              file.path(dirpath, "single"),
              save_file=FALSE)
Peak-Gene Network via Lasso
Description
Construct the peak-gene network via Lasso.
Usage
PGN_Lasso(
  X,
  Y,
  gene_data,
  neibor_peak,
  dirpath = tempdir(),
  count_device = 1,
  rebuild_PGN_Lasso = TRUE,
  save_file = TRUE
)
Arguments
X | 
 The scATAC-seq data, sparse matrix.  | 
Y | 
 The scRNA-seq data, sparse matrix.  | 
gene_data | 
 The information for genes, must have a col names "gene_name".  | 
neibor_peak | 
 The peak IDs within a certain range of each gene, must have cols c("gene_name", "start_use", "end_use"). The id numbers in "start_use" and "end_use" are start from 0.  | 
dirpath | 
 The folder path to read or write file.  | 
count_device | 
 The number of cpus used to train the Lasso model.  | 
rebuild_PGN_Lasso | 
 Logical. Whether to rebuild the peak-gene network via Lasso from scratch. If FALSE, the function will attempt to read from   | 
save_file | 
 Logical, whether to save the output to a file.  | 
Value
The PGN_Lasso network.
Examples
library(scPOEM)
dirpath <- "./example_data"
# Download single mode example data
data(example_data_single)
# Construct PGN net via Lasso.
net_Lasso <- PGN_Lasso(example_data_single$X,
                       example_data_single$Y,
                       example_data_single$gene_data,
                       example_data_single$neibor_peak,
                       file.path(dirpath, "single"),
                       save_file=FALSE)
Peak-Gene Network via Random Forest
Description
Construct the peak-gene network via random forest.
Usage
PGN_RF(
  X,
  Y,
  gene_data,
  neibor_peak,
  dirpath = tempdir(),
  count_device = 1,
  rebuild_PGN_RF = TRUE,
  save_file = TRUE,
  seed = NULL,
  python_env = "scPOEM_env"
)
Arguments
X | 
 The scATAC-seq data, sparse matrix.  | 
Y | 
 The scRNA-seq data, sparse matrix.  | 
gene_data | 
 The information for genes, must have a col names "gene_name".  | 
neibor_peak | 
 The peak IDs within a certain range of each gene, must have cols c("gene_name", "start_use", "end_use"). The id numbers in "start_use" and "end_use" are start from 0.  | 
dirpath | 
 The folder path to read or write file.  | 
count_device | 
 The number of cpus used to train the Lasso model.  | 
rebuild_PGN_RF | 
 Logical. Whether to rebuild the peak-gene network via random forest from scratch. If FALSE, the function will attempt to read from   | 
save_file | 
 Logical, whether to save the output to a file.  | 
seed | 
 An integer specifying the random seed to ensure reproducible results.  | 
python_env | 
 Name or path of the Python environment to be used.  | 
Value
The PGN_RF network.
Examples
library(scPOEM)
dirpath <- "./example_data"
# Download single mode example data
data(example_data_single)
# Construct PGN net via random forest (RF).
net_RF <- PGN_RF(example_data_single$X,
                 example_data_single$Y,
                 example_data_single$gene_data,
                 example_data_single$neibor_peak,
                 file.path(dirpath, "single"),
                 save_file=FALSE)
Peak-Gene Network via XGBoost
Description
Construct the peak-gene network via XGBoost.
Usage
PGN_XGBoost(
  X,
  Y,
  gene_data,
  neibor_peak,
  dirpath = tempdir(),
  count_device = 1,
  rebuild_PGN_XGB = TRUE,
  save_file = TRUE
)
Arguments
X | 
 The scATAC-seq data, sparse matrix.  | 
Y | 
 The scRNA-seq data, sparse matrix.  | 
gene_data | 
 The information for genes, must have a col names "gene_name".  | 
neibor_peak | 
 The peak IDs within a certain range of each gene, must have cols c("gene_name", "start_use", "end_use"). The id numbers in "start_use" and "end_use" are start from 0.  | 
dirpath | 
 The folder path to read or write file.  | 
count_device | 
 The number of cpus used to train the Lasso model.  | 
rebuild_PGN_XGB | 
 Logical. Whether to rebuild the peak-gene network via XGBoost from scratch. If FALSE, the function will attempt to read from   | 
save_file | 
 Logical, whether to save the output to a file.  | 
Value
The PGN_XGBoost network.
Examples
library(scPOEM)
dirpath <- "./example_data"
# Download single mode example data
data(example_data_single)
# Construct PGN net via XGBoost.
net_XGB <- PGN_XGBoost(example_data_single$X,
                       example_data_single$Y,
                       example_data_single$gene_data,
                       example_data_single$neibor_peak,
                       file.path(dirpath, "single"),
                       save_file=FALSE)
Construct Peak-Peak Network
Description
Construct peak-peak network.
Usage
PPN(
  X,
  peak_data,
  cell_data,
  genome,
  dirpath = tempdir(),
  rebuild_PPN = TRUE,
  save_file = TRUE,
  seed = NULL
)
Arguments
X | 
 The scATAC-seq data, sparse matrix.  | 
peak_data | 
 The information for peaks, must have a col names "peak_name".  | 
cell_data | 
 The information for cells, must have a col names "cell_name".  | 
genome | 
 The genome length for the species.  | 
dirpath | 
 The folder path to read or write file.  | 
rebuild_PPN | 
 Logical. Whether to rebuild the peak-peak network (PPN) from scratch. If FALSE, the function will attempt to read from   | 
save_file | 
 Logical, whether to save the output to a file.  | 
seed | 
 An integer specifying the random seed to ensure reproducible results.  | 
Value
The PPN network.
Examples
library(scPOEM)
library(monocle)
dirpath <- "./example_data"
# Download single mode example data
data(example_data_single)
# Construct PPN net.
pp_net <- PPN(example_data_single$X,
              example_data_single$peak_data,
              example_data_single$cell_data,
              example_data_single$genome,
              file.path(dirpath, "single"),
              save_file=FALSE)
Gene Network Reconstruction and Alignment
Description
Reconstruct gene networks via epsilon-NN and compare conditions using manifold alignment implemented in scTenifoldNet.
Usage
align_embedding(
  gene_data1,
  gene_node1,
  E1,
  gene_data2,
  gene_node2,
  E2,
  dirpath = tempdir(),
  save_file = TRUE,
  d = 100
)
Arguments
gene_data1 | 
 The information for genes in state1, must have a col names "gene_name".  | 
gene_node1 | 
 Gene ids that are associated with other peaks or genes in state1.  | 
E1 | 
 Embedding representations of peaks and genes in state1.  | 
gene_data2 | 
 The information for genes in state2, must have a col names "gene_name".  | 
gene_node2 | 
 Gene ids that are associated with other peaks or genes in state2.  | 
E2 | 
 Embedding representations of peaks and genes in state2.  | 
dirpath | 
 The folder path to read or write file  | 
save_file | 
 Logical, whether to save the output to a file.  | 
d | 
 The dimension of latent space.  | 
Value
A list containing the following elements:
E_g2Low-dimensional embedding representations of genes under the two conditions.
common_genesGenes shared between both conditions and used in the analysis.
diffRegulationA list of differential regulatory information for each gene.
Examples
library(scPOEM)
library(monocle)
dirpath <- "./example_data"
# Download compare mode example data
data(example_data_compare)
data_S1 <- example_data_compare$S1
data_S2 <- example_data_compare$S2
gg_net1 <- GGN(data_S1$Y, file.path(dirpath, "compare/S1"), save_file=FALSE)
pp_net1 <- PPN(data_S1$X, data_S1$peak_data, data_S1$cell_data,
               data_S1$genome, file.path(dirpath, "compare/S1"), save_file=FALSE)
net_Lasso1 <- PGN_Lasso(data_S1$X, data_S1$Y,
                        data_S1$gene_data, data_S1$neibor_peak,
                        file.path(dirpath, "compare/S1"), save_file=FALSE)
net_RF1 <- PGN_RF(data_S1$X, data_S1$Y, data_S1$gene_data,
                  data_S1$neibor_peak, file.path(dirpath, "compare/S1"), save_file=FALSE)
net_XGB1 <- PGN_XGBoost(data_S1$X, data_S1$Y,
                        data_S1$gene_data, data_S1$neibor_peak,
                        file.path(dirpath, "compare/S1"), save_file=FALSE)
pg_net_list1 <- list(net_Lasso1, net_RF1, net_XGB1)
E_result_S1 <- pg_embedding(gg_net1, pp_net1, pg_net_list1,
               file.path(dirpath, "compare/S1"), save_file=FALSE)
gg_net2 <- GGN(data_S2$Y, file.path(dirpath, "compare/S2"), save_file=FALSE)
pp_net2 <- PPN(data_S2$X, data_S2$peak_data,
               data_S2$cell_data, data_S2$genome,
               file.path(dirpath, "compare/S2"), save_file=FALSE)
net_Lasso2 <- PGN_Lasso(data_S2$X, data_S2$Y,
                        data_S2$gene_data, data_S2$neibor_peak,
                        file.path(dirpath, "compare/S2"), save_file=FALSE)
net_RF2 <- PGN_RF(data_S2$X, data_S2$Y, data_S2$gene_data,
                  data_S2$neibor_peak, file.path(dirpath, "compare/S2"), save_file=FALSE)
net_XGB2 <- PGN_XGBoost(data_S2$X, data_S2$Y,
                        data_S2$gene_data, data_S2$neibor_peak,
                        file.path(dirpath, "compare/S2"), save_file=FALSE)
pg_net_list2 <- list(net_Lasso2, net_RF2, net_XGB2)
E_result_S2 <- pg_embedding(gg_net2, pp_net2, pg_net_list2,
               file.path(dirpath, "compare/S2"), save_file=FALSE)
compare_result <- align_embedding(data_S1$gene_data,
                                  E_result_S1$gene_node,
                                  E_result_S1$E,
                                  data_S2$gene_data,
                                  E_result_S2$gene_node,
                                  E_result_S2$E,
                                  file.path(dirpath, "compare/compare"),
                                  save_file=FALSE)
Network Reconstruction via epsilon-NN
Description
Reconstruction of gene-gene network via low-dimentional projections (via epsilon-NN).
Usage
eNN(E_g)
Arguments
E_g | 
 Embedding representations of genes.  | 
Value
The epsilon-NN network.
Example Input Data for Compare Mode Analysis
Description
A list containing example single-cell multi-omics data used in "compare" mode of the scPOEM package.
Usage
data(example_data_compare)
Format
A named list of length 2. Each element is itself a named list with the following components:
XThe scATAC-seq data, sparse matrix.
YThe scRNA-seq data, sparse matrix.
peak_dataA data.frame containing peak information.
gene_dataA data.frame containing gene information (must contain column "gene_name").
cell_dataA data.frame containing cell metadata.
neibor_peakThe peak IDs within a certain range of each gene, must have cols c("gene_name", "start_use", "end_use"). The id numbers in "start_use" and "end_use" are start from 0.
genomeThe genome length for the species.
Examples
data(example_data_compare)
Example Input Data for Single Mode Analysis
Description
A list containing example single-cell multi-omics data used in "single" mode of the scPOEM package.
Usage
data(example_data_single)
Format
A named list with 7 elements:
XThe scATAC-seq data, sparse matrix.
YThe scRNA-seq data, sparse matrix.
peak_dataA data.frame containing peak information.
gene_dataA data.frame containing gene information (must contain column "gene_name").
cell_dataA data.frame containing cell metadata.
neibor_peakThe peak IDs within a certain range of each gene, must have cols c("gene_name", "start_use", "end_use"). The id numbers in "start_use" and "end_use" are start from 0.
genomeThe genome length for the species.
Examples
data(example_data_single)
Co-embeddings of Peaks and Genes.
Description
Learn the low-dimensional representations for peaks and genes with a meta-path based method.
Usage
pg_embedding(
  gg_net,
  pp_net,
  pg_net_list,
  dirpath = tempdir(),
  relearn_pg_embedding = TRUE,
  save_file = TRUE,
  d = 100,
  numwalks = 5,
  walklength = 3,
  epochs = 100,
  neg_sample = 5,
  batch_size = 32,
  weighted = TRUE,
  exclude_pos = FALSE,
  seed = NULL,
  python_env = "scPOEM_env"
)
Arguments
gg_net | 
 The gene-gene network.  | 
pp_net | 
 The peak-peak network.  | 
pg_net_list | 
 A list of peak-gene networks, constructed via different methods.  | 
dirpath | 
 The folder path to read or write file.  | 
relearn_pg_embedding | 
 Logical. Whether to relearn the low-dimensional representations for peaks and genes from scratch. If FALSE, the function will attempt to read from   | 
save_file | 
 Logical, whether to save the output to a file.  | 
d | 
 Dimension of the latent space. Default is 100.  | 
numwalks | 
 Number of random walks per node. Default is 5.  | 
walklength | 
 Length of walk depth. Default is 3.  | 
epochs | 
 Number of training epochs. Default is 100.  | 
neg_sample | 
 Number of negative samples per positive sample. Default is 5.  | 
batch_size | 
 Batch size for training. Default is 32.  | 
weighted | 
 Whether the sampling network is weighted. Default is TRUE.  | 
exclude_pos | 
 Whether to exclude positive samples from negative sampling. Default is FALSE.  | 
seed | 
 An integer specifying the random seed to ensure reproducible results.  | 
python_env | 
 Name or path of the Python environment to be used.  | 
Value
A list containing the following:
ELow-dimensional representations of peaks and genes
peak_nodePeak ids that are associated with other peaks or genes.
gene_nodeGene ids that are associated with other peaks or genes.
Examples
library(scPOEM)
library(monocle)
dirpath <- "./example_data"
# Download single mode example data
data(example_data_single)
gg_net <- GGN(example_data_single$Y,
              file.path(dirpath, "single"),
              save_file=FALSE)
pp_net <- PPN(example_data_single$X, example_data_single$peak_data,
              example_data_single$cell_data, example_data_single$genome,
              file.path(dirpath, "single"), save_file=FALSE)
net_Lasso <- PGN_Lasso(example_data_single$X, example_data_single$Y,
                       example_data_single$gene_data, example_data_single$neibor_peak,
                       file.path(dirpath, "single"), save_file=FALSE)
net_RF <- PGN_RF(example_data_single$X, example_data_single$Y,
                 example_data_single$gene_data, example_data_single$neibor_peak,
                 file.path(dirpath, "single"), save_file=FALSE)
net_XGB <- PGN_XGBoost(example_data_single$X, example_data_single$Y,
                       example_data_single$gene_data, example_data_single$neibor_peak,
                       file.path(dirpath, "single"), save_file=FALSE)
E_result <- pg_embedding(gg_net, pp_net, list(net_Lasso, net_RF, net_XGB),
                         file.path(dirpath, "single"), save_file=FALSE)
Main Function.
Description
This function takes paired single-cell ATAC-seq (scATAC-seq) and RNA-seq (scRNA-seq) data to embed peaks and genes into a shared low-dimensional space. It integrates regulatory relationships from peak-peak interactions (via Cicero), peak-gene interactions (via Lasso, random forest, and XGBoost), and gene-gene interactions (via principal component regression). Additionally, it supports gene-gene network reconstruction using epsilon-NN projections and compares networks across conditions through manifold alignment (scTenifoldNet).
Usage
scPOEM(
  mode = c("single", "compare"),
  input_data,
  dirpath = tempdir(),
  count_device = 1,
  nComp = 5,
  seed = NULL,
  numwalks = 5,
  walklength = 3,
  epochs = 100,
  neg_sample = 5,
  batch_size = 32,
  weighted = TRUE,
  exclude_pos = FALSE,
  d = 100,
  rebuild_GGN = TRUE,
  rebuild_PPN = TRUE,
  rebuild_PGN_Lasso = TRUE,
  rebuild_PGN_RF = TRUE,
  rebuild_PGN_XGB = TRUE,
  relearn_pg_embedding = TRUE,
  save_file = TRUE,
  pg_method = c("Lasso", "RF", "XGBoost"),
  python_env = "scPOEM_env"
)
Arguments
mode | 
 The mode indicating whether to analyze data from a single condition or to compare two conditions.  | 
input_data | 
 A list of input data. If  
 If   | 
dirpath | 
 The folder path to read or write file.  | 
count_device | 
 The number of cpus used to train models.  | 
nComp | 
 The number of PCs used for regression in constructing GGN.  | 
seed | 
 An integer specifying the random seed to ensure reproducible results.  | 
numwalks | 
 Number of random walks per node. Default is 5.  | 
walklength | 
 Length of walk depth. Default is 3.  | 
epochs | 
 Number of training epochs. Default is 100.  | 
neg_sample | 
 Number of negative samples per positive sample. Default is 5.  | 
batch_size | 
 Batch size for training. Default is 32.  | 
weighted | 
 Whether the sampling network is weighted. Default is TRUE.  | 
exclude_pos | 
 Whether to exclude positive samples from negative sampling. Default is FALSE.  | 
d | 
 The dimension of latent space. Default is 100.  | 
rebuild_GGN | 
 Logical. Whether to rebuild the gene-gene network from scratch. If FALSE, the function will attempt to read from   | 
rebuild_PPN | 
 Logical. Whether to rebuild the peak-peak network from scratch. If FALSE, the function will attempt to read from   | 
rebuild_PGN_Lasso | 
 Logical. Whether to rebuild the peak-gene network via Lasso from scratch. If FALSE, the function will attempt to read from   | 
rebuild_PGN_RF | 
 Logical. Whether to rebuild the peak-gene network via random forest from scratch. If FALSE, the function will attempt to read from   | 
rebuild_PGN_XGB | 
 Logical. Whether to rebuild the peak-gene network via XGBoost from scratch. If FALSE, the function will attempt to read from   | 
relearn_pg_embedding | 
 Logical. Whether to relearn the low-dimensional representations for peaks and genes from scratch. If FALSE, the function will attempt to read from   | 
save_file | 
 Logical, whether to save the output to a file.  | 
pg_method | 
 The vector of methods used to construct peak-gene net. Default is c("Lasso", "RF", "XGBoost").  | 
python_env | 
 Name or path of the Python environment to be used.  | 
Value
The scPOEM result.
- Single Mode
 Returns a list containing the following elements:
ELow-dimensional representations of peaks and genes.
peak_nodePeak IDs that are associated with other peaks or genes.
gene_nodeGene IDs that are associated with other peaks or genes.
- Compare Mode
 Returns a list containing the following elements:
state1 nameThe single-mode result for the first condition.
state2 nameThe single-mode result for the second condition.
compareA summary list containing:
E_g2Low-dimensional embedding representations of genes under the two conditions.
common_genesGenes shared between both conditions and used in the analysis.
diffRegulationA list of differential regulatory information for each gene.
Examples
library(scPOEM)
library(monocle)
dirpath <- "./example_data"
# An example for analysing a single dataset.
# Download and read data.
data(example_data_single)
single_result <- scPOEM(mode = "single",
                        input_data=example_data_single,
                        dirpath=file.path(dirpath, "single"),
                        save_file=FALSE)
# An example for analysing and comparing datasets from two conditions.
# Download compare mode example data
data(example_data_compare)
compare_result <- scPOEM(mode = "compare",
                         input_data=example_data_compare,
                         dirpath=file.path(dirpath, "compare"),
                         save_file=FALSE)