Package {rsynthbio}


Type: Package
Title: Synthesize Bio API Wrapper
Version: 4.2.0
Description: Access Synthesize Bio models from their API https://app.synthesize.bio/ using this wrapper that provides a convenient interface to the Synthesize Bio API, allowing users to generate realistic gene expression data based on specified biological conditions. This package enables researchers to easily access AI-generated transcriptomic data for various modalities including bulk RNA-seq, single-cell RNA-seq, microarray data, and more.
URL: https://github.com/synthesizebio/rsynthbio
BugReports: https://github.com/synthesizebio/rsynthbio/issues
Imports: getPass, keyring, jsonlite, httr
Suggests: rmarkdown, knitr, testthat (≥ 3.0.2), mockery, arrow
Config/testthat/edition: 3
Encoding: UTF-8
RoxygenNote: 7.3.3
VignetteBuilder: knitr
License: MIT + file LICENSE
NeedsCompilation: no
Packaged: 2026-06-11 21:28:00 UTC; alex
Author: Synthesize Bio [aut, cre]
Maintainer: Synthesize Bio <candace@synthesize.bio>
Repository: CRAN
Date/Publication: 2026-06-12 06:30:02 UTC

API Base URL

Description

Base URL for the Synthesize Bio API

Usage

API_BASE_URL

Format

An object of class character of length 1.


Default Poll Interval

Description

Default polling interval (seconds) for async model queries

Usage

DEFAULT_POLL_INTERVAL_SECONDS

Format

An object of class numeric of length 1.


Default Poll Timeout

Description

Default maximum timeout (seconds) for async model queries

Usage

DEFAULT_POLL_TIMEOUT_SECONDS

Format

An object of class numeric of length 1.


Default Timeout

Description

Default timeout (seconds) for outbound HTTP requests

Usage

DEFAULT_TIMEOUT

Format

An object of class numeric of length 1.


Self-Hosted Timeout

Description

Timeout (seconds) for synchronous self-hosted container predictions. These run on the partner's GPU box and can take minutes for large sample counts, so they use a longer timeout than hosted control-plane calls.

Usage

SELF_HOSTED_TIMEOUT

Format

An object of class numeric of length 1.


Clear Synthesize Bio API Token

Description

Clears the Synthesize Bio API token from the environment for the current R session. This is useful for security purposes when you've finished working with the API or when switching between different accounts.

Usage

clear_synthesize_token(remove_from_keyring = FALSE)

Arguments

remove_from_keyring

Logical, whether to also remove the token from the system keyring if it's stored there. Defaults to FALSE.

Value

Invisibly returns TRUE.

Examples

## Not run: 
# Clear token from current session only
clear_synthesize_token()

# Clear token from both session and keyring
clear_synthesize_token(remove_from_keyring = TRUE)

## End(Not run)

Get Example Query for Model

Description

Retrieves an example query structure for a specific model. This provides a template that can be modified for your specific needs.

Usage

get_example_query(model_id, api_base_url = NULL, self_hosted = NULL)

Arguments

model_id

Character string specifying the model ID (e.g., "gem-1-bulk", "gem-1-sc").

api_base_url

The base URL for the API server. When NULL (default), it is resolved from the 'SYNTHESIZE_API_BASE_URL' environment variable, falling back to the production default (API_BASE_URL). Point this at a self-hosted model container to fetch its example query.

self_hosted

Logical; when TRUE, the request targets a self-hosted container and does not require an API key (one is only sent if set). When NULL (default), it is resolved from the 'SYNTHESIZE_SELF_HOSTED' environment variable (truthy for 1/true/yes/on), defaulting to FALSE.

Value

A list representing a valid query structure for the specified model.

Examples

## Not run: 
# Get example query for bulk RNA-seq model
query <- get_example_query(model_id = "gem-1-bulk")$example_query

# Get example query for single-cell model
query_sc <- get_example_query(model_id = "gem-1-sc")$example_query

# Modify the query structure
query$inputs[[1]]$num_samples <- 10

# Fetch from a self-hosted container (no API key required)
query <- get_example_query(
  model_id = "gem-1-bulk",
  api_base_url = "https://gem-1-bulk.internal.partner.example",
  self_hosted = TRUE
)$example_query

## End(Not run)

Check if Synthesize Bio API Token is Set

Description

Checks whether a Synthesize Bio API token is currently set in the environment. Useful for conditional code that requires an API token.

Usage

has_synthesize_token()

Value

Logical, TRUE if token is set, FALSE otherwise.

Examples

 ## Not run: 
# Check if token is set
if (!has_synthesize_token()) {
  # Prompt for token if not set
  set_synthesize_token()
}

## End(Not run)

List Available Models

Description

Returns a list of all models available in the Synthesize Bio API. Each model has a unique ID that can be used with predict_query() and get_example_query().

Usage

list_models(api_base_url = NULL, self_hosted = NULL)

Arguments

api_base_url

The base URL for the API server. When NULL (default), it is resolved from the 'SYNTHESIZE_API_BASE_URL' environment variable, falling back to the production default (API_BASE_URL). Point this at a self-hosted model container to list its models.

self_hosted

Logical; when TRUE, the request targets a self-hosted container and does not require an API key (one is only sent if set). When NULL (default), it is resolved from the 'SYNTHESIZE_SELF_HOSTED' environment variable (truthy for 1/true/yes/on), defaulting to FALSE.

Value

A list or data frame containing available models with their IDs and metadata.

Examples

## Not run: 
# Get all available models
models <- list_models()
print(models)

# List models from a self-hosted container (no API key required)
models <- list_models(
  api_base_url = "https://gem-1-bulk.internal.partner.example",
  self_hosted = TRUE
)

## End(Not run)

Load Synthesize Bio API Token from Keyring

Description

Loads the previously stored Synthesize Bio API token from the system keyring and sets it in the environment for the current session.

Usage

load_synthesize_token_from_keyring()

Value

Invisibly returns TRUE if successful, FALSE if token not found in keyring.

Examples

## Not run: 
# Load token from keyring
load_synthesize_token_from_keyring()

## End(Not run)

Predict Gene Expression

Description

Sends a query to the Synthesize Bio API for prediction and retrieves gene expression samples. This function sends the query to the API and processes the response into usable data frames.

Usage

predict_query(
  query,
  model_id,
  api_base_url = NULL,
  poll_interval_seconds = DEFAULT_POLL_INTERVAL_SECONDS,
  poll_timeout_seconds = DEFAULT_POLL_TIMEOUT_SECONDS,
  return_download_url = FALSE,
  raw_response = FALSE,
  self_hosted = NULL,
  ...
)

Arguments

query

A list representing the query data to send to the API. Use 'get_example_query()' to generate an example. The query supports additional optional fields:

  • 'total_count' (integer): Library size used when converting predicted log CPM back to raw counts. Higher values scale counts up proportionally.

  • 'deterministic_latents' (logical): If TRUE, the model uses the mean of each latent distribution instead of sampling, producing deterministic outputs for the same inputs. Useful for reproducibility.

  • 'seed' (integer): Random seed for reproducibility.

model_id

Character string specifying the model ID (e.g., "gem-1-bulk", "gem-1-sc"). Use 'list_models()' to see available models.

api_base_url

The base URL for the API server. When NULL (default), it is resolved in order from the per-model environment variable 'SYNTHESIZE_API_BASE_URL__<MODEL>' (e.g. 'SYNTHESIZE_API_BASE_URL__GEM_1_BULK'), then the global 'SYNTHESIZE_API_BASE_URL', then the production default (API_BASE_URL). The per-model variable lets you point each self-hosted model at its own container once and omit 'api_base_url' on every call.

poll_interval_seconds

Seconds between polling attempts of the status endpoint. Default is DEFAULT_POLL_INTERVAL_SECONDS (2).

poll_timeout_seconds

Maximum total seconds to wait before timing out. Default is DEFAULT_POLL_TIMEOUT_SECONDS (900 = 15 minutes).

return_download_url

Logical, if TRUE, returns a list containing the signed download URL instead of parsing into data frames. Default is FALSE.

raw_response

Logical, if TRUE, returns the raw (unformatted) response from the API without applying any output transformers. For the production path this is the parsed JSON; for 'self_hosted = TRUE' it is the parsed Arrow 'Table' together with its schema metadata. Default is FALSE.

self_hosted

Logical, if TRUE, sends a single synchronous request to a self-hosted model container that returns predictions as an Apache Arrow IPC stream (no polling, no download URL). Requires the optional 'arrow' package and an 'api_base_url' pointing at the container. Unlike the production path, no API key is required (one is only sent if configured). When NULL (default), it is resolved from the 'SYNTHESIZE_SELF_HOSTED' environment variable (truthy for 1/true/yes/on), defaulting to FALSE.

...

Additional parameters to include in the query body. These are passed directly to the API and validated server-side.

Value

A list. For the production path, if 'return_download_url' is 'FALSE' (default) the list contains 'metadata' and 'expression' data frames; if 'TRUE' it contains 'download_url' and empty data frames. For 'self_hosted = TRUE', the list contains the transformed data frames ('metadata', 'expression', and 'latents'; plus 'classifier_probs' for metadata-prediction models) with 'model_version' and 'request_type' attached as attributes.

Examples

# Set your API key (in practice, use a more secure method)
## Not run: 

# To start using rsynthbio, first you need to have an account with synthesize.bio.
# Go here to create one: https://app.synthesize.bio/

set_synthesize_token()

# Get available models
models <- list_models()

# Create a query for a specific model
query <- get_example_query(model_id = "gem-1-bulk")$example_query

# Request raw counts
result <- predict_query(query, model_id = "gem-1-bulk")

# Access the results
metadata <- result$metadata
expression <- result$expression

# Explore the top expressed genes in the first sample
head(sort(expression[1, ], decreasing = TRUE))

# Use deterministic latents for reproducible results
query$deterministic_latents <- TRUE
result_det <- predict_query(query, model_id = "gem-1-bulk")

# Specify a custom total count (library size)
query$total_count <- 5000000
result_custom <- predict_query(query, model_id = "gem-1-bulk")

# Self-hosted container returning a synchronous Apache Arrow IPC stream
result_sh <- predict_query(
  query,
  model_id = "gem-1-bulk",
  api_base_url = "https://gem-1-bulk.internal.partner.example",
  self_hosted = TRUE
)

## End(Not run)

Set Synthesize Bio API Token

Description

Securely prompts for and stores the Synthesize Bio API token in the environment. This function uses getPass to securely handle the token input without displaying it in the console. The token is stored in the SYNTHESIZE_API_KEY environment variable for the current R session.

Usage

set_synthesize_token(use_keyring = FALSE, token = NULL)

Arguments

use_keyring

Logical, whether to also store the token securely in the system keyring for future sessions. Defaults to FALSE.

token

Character, optional. If provided, uses this token instead of prompting. This parameter should only be used in non-interactive scripts.

Value

Invisibly returns TRUE if successful.

Examples

# Interactive prompt for token
## Not run: 
set_synthesize_token()

# Provide token directly (less secure, not recommended for interactive use)
set_synthesize_token(token = "your-token-here")

# Store in system keyring for future sessions
set_synthesize_token(use_keyring = TRUE)

## End(Not run)