bedrockbio

Open-Access Computational Biology Datasets

Description

Efficiently access a curated library of open-access computational biology datasets. Tables support predicate pushdown and projection to the cloud storage backend, enabling quick, iterative access to otherwise massive, unwieldy tables.

bedrockbio consists of three user-facing functions:

dplyr verbs (filter, select) can be used on the data frame returned by load_table to push down additional row filters and column selections to the storage backend.

Installation

Install from CRAN:

install.packages("bedrockbio")

Or install the current development version from GitHub:

# install.packages("pak")
pak::pak("bedrock-bio/bedrock-bio-client/r")

Examples

Load the package (and dplyr for downstream data frame manipulation):

library(bedrockbio)
library(dplyr)

List available tables:

list_tables()

Describe a table to see its metadata, citation, and columns:

describe_table("ukb_ppp.pqtls")

Lazily load a table with required partition filters, select columns, and collect the relevant subset into an in-memory data frame:

df <- load_table(
  "ukb_ppp.pqtls",
  ancestry = "EUR",
  protein_id = "A0FGR8",
  panel = "Inflammation"
) |>
  select(
    chromosome,
    position,
    effect_allele,
    other_allele,
    beta,
    neg_log_10_p_value
  ) |>
  collect()

Dataset Requests

To request the addition of a new table to the library, open an issue.