Open-Access Computational Biology Datasets
Efficiently access a curated library of open-access computational biology datasets. Tables support predicate pushdown and projection to the cloud storage backend, enabling quick, iterative access to otherwise massive, unwieldy tables.
bedrockbio consists of three user-facing functions:
list_tables(): returns a character vector of available
table identifiersdescribe_table("<name>"): returns metadata,
citation, and column definitions for a tableload_table("<name>", ...): takes a table name and
required partition filters, and returns a lazily-evaluated data
framedplyr verbs (filter, select)
can be used on the data frame returned by load_table to
push down additional row filters and column selections to the storage
backend.
Install from CRAN:
install.packages("bedrockbio")Or install the current development version from GitHub:
# install.packages("pak")
pak::pak("bedrock-bio/bedrock-bio-client/r")Load the package (and dplyr for downstream data frame
manipulation):
library(bedrockbio)
library(dplyr)List available tables:
list_tables()Describe a table to see its metadata, citation, and columns:
describe_table("ukb_ppp.pqtls")Lazily load a table with required partition filters, select columns, and collect the relevant subset into an in-memory data frame:
df <- load_table(
"ukb_ppp.pqtls",
ancestry = "EUR",
protein_id = "A0FGR8",
panel = "Inflammation"
) |>
select(
chromosome,
position,
effect_allele,
other_allele,
beta,
neg_log_10_p_value
) |>
collect()To request the addition of a new table to the library, open an issue.