The Protein Data Bank (PDB) is the primary global archive for experimentally determined three-dimensional structures of biological macromolecules. The RCSB PDB exposes these data through programmatic interfaces that support search, metadata retrieval, coordinate download, and access to assembly- and entity-level annotations. For structural bioinformatics, these APIs make it possible to move from a biological question to a reproducible computational workflow without manual browsing of the PDB website.
rPDBapi provides an R interface to these services. It
combines search helpers, operator-based query construction, metadata
retrieval, GraphQL-based data fetching, and structure download utilities
in a form that fits naturally into R-based data analysis pipelines.
This vignette is written for users who want to retrieve and analyze PDB data directly in R. The examples focus on protein kinase structures because kinases are biologically important, structurally diverse, and common targets in drug-discovery workflows.
install.packages("rPDBapi")
# Development version
remotes::install_github("selcukorkmaz/rPDBapi")
The package can be installed from CRAN or from the development
repository. The development version is useful when you want the newest
API mappings, tests, and documentation updates. In this vignette,
installation commands are left executable so the chunk follows the same
eval = TRUE policy as the rest of the document.
suppressPackageStartupMessages(library(rPDBapi))
suppressPackageStartupMessages(library(dplyr))
This chunk loads the package and dplyr, which we will
use for simple tabulation and ranking. Most structural bioinformatics
workflows combine API access with data manipulation, so it is useful to
establish that pattern from the beginning.
Programmatic PDB access is valuable when you need to:
In practice, this means that a question such as “Which high-resolution protein kinase structures are available, what organisms do they come from, and what do their assemblies look like?” can be answered in one analysis script instead of through manual web browsing.
At a high level, the package supports seven related tasks:
The package also hardens return contracts and error classes. That
matters when rPDBapi is embedded in larger pipelines,
because downstream code can now make stronger assumptions about object
classes, identifier formats, and failure modes.
rPDBapi exports functions that fall into nine practical
groups:
query_search(),
perform_search()DefaultOperator(),
ExactMatchOperator(), InOperator(),
ContainsWordsOperator(),
ContainsPhraseOperator(),
ComparisonOperator(), RangeOperator(),
ExistsOperator(), SequenceOperator(),
SeqMotifOperator(), StructureOperator(),
ChemicalOperator()QueryNode(),
QueryGroup(), RequestOptions(),
infer_search_service(), ScoredResult()infer_id_type(),
parse_rcsb_id(), build_entry_id(),
build_assembly_id(), build_entity_id(),
build_instance_id()data_fetcher(),
fetch_data(), generate_json_query(),
get_info(), find_results(),
find_papers(), describe_chemical(),
get_fasta_from_rcsb_entry()list_rcsb_fields(),
search_rcsb_fields(), validate_properties(),
add_property()data_fetcher_batch(), cache_info(),
clear_rpdbapi_cache()get_pdb_file(),
get_pdb_api_url()as_rpdb_entry(),
as_rpdb_assembly(), as_rpdb_polymer_entity(),
as_rpdb_chemical_component(),
as_rpdb_structure(), summarize_entries(),
summarize_assemblies(),
extract_taxonomy_table(),
extract_ligand_table(),
extract_calpha_coordinates(),
join_structure_sequence()send_api_request(), handle_api_errors(),
parse_response(), search_graphql(),
return_data_as_dataframe()The main workflow only needs a subset of these functions, but the full package is designed as a layered interface. High-level helpers are convenient for routine work, while low-level helpers make it possible to debug requests, build custom workflows, or extend the package into larger pipelines.
Before starting with code, it helps to distinguish a few PDB concepts:
ENTRY: a PDB deposition such as 4HHBASSEMBLY: a biological assembly within an entry, such
as 4HHB-1POLYMER_ENTITY: a unique macromolecular entity within
an entry, such as a protein chain definitionATTRIBUTE: a searchable or retrievable field, such as
exptl.method or
rcsb_entry_info.molecular_weightrPDBapi mirrors these levels. Search functions return
identifiers at the appropriate level, and metadata functions use those
identifiers to fetch the corresponding records.
kinase_full_text <- DefaultOperator("protein kinase")
high_resolution <- RangeOperator(
attribute = "rcsb_entry_info.resolution_combined",
from_value = 0,
to_value = 2.5
)
xray_method <- ExactMatchOperator(
attribute = "exptl.method",
value = "X-RAY DIFFRACTION"
)
kinase_query <- QueryGroup(
queries = list(kinase_full_text, xray_method, high_resolution),
logical_operator = "AND"
)
kinase_query
#> $type
#> [1] "group"
#>
#> $logical_operator
#> [1] "and"
#>
#> $nodes
#> $nodes[[1]]
#> $nodes[[1]]$type
#> [1] "terminal"
#>
#> $nodes[[1]]$service
#> [1] "full_text"
#>
#> $nodes[[1]]$parameters
#> $value
#> [1] "protein kinase"
#>
#> attr(,"class")
#> [1] "DefaultOperator" "list"
#>
#>
#> $nodes[[2]]
#> $nodes[[2]]$type
#> [1] "terminal"
#>
#> $nodes[[2]]$service
#> [1] "text"
#>
#> $nodes[[2]]$parameters
#> $attribute
#> [1] "exptl.method"
#>
#> $value
#> [1] "X-RAY DIFFRACTION"
#>
#> $operator
#> [1] "exact_match"
#>
#> attr(,"class")
#> [1] "ExactMatchOperator" "list"
#>
#>
#> $nodes[[3]]
#> $nodes[[3]]$type
#> [1] "terminal"
#>
#> $nodes[[3]]$service
#> [1] "text"
#>
#> $nodes[[3]]$parameters
#> $operator
#> [1] "range"
#>
#> $attribute
#> [1] "rcsb_entry_info.resolution_combined"
#>
#> $negation
#> [1] FALSE
#>
#> $value
#> $value$from
#> [1] 0
#>
#> $value$to
#> [1] 2.5
#>
#>
#> attr(,"class")
#> [1] "RangeOperator" "list"
This code builds a structured query object without contacting the API. The query says: search for records related to “protein kinase”, require X-ray diffraction as the experimental method, and restrict the results to structures with a reported resolution of 2.5 angstroms or better. This is a useful pattern because it separates biological intent from the mechanics of the HTTP request.
search_controls <- RequestOptions(
result_start_index = 0,
num_results = 10,
sort_by = "score",
desc = TRUE
)
search_controls
#> $paginate
#> $paginate$start
#> [1] 0
#>
#> $paginate$rows
#> [1] 10
#>
#>
#> $sort
#> $sort[[1]]
#> $sort[[1]]$sort_by
#> [1] "score"
#>
#> $sort[[1]]$direction
#> [1] "desc"
RequestOptions() defines how many hits to return and how
to sort them. In other words, the query object describes what you want,
and the request options describe how you want it delivered. That
distinction matters when you are iterating over result pages or creating
reproducible subsets for downstream analysis.
example_ids <- c("4HHB", "4HHB-1", "4HHB_1", "4HHB.A", "ATP")
dplyr::tibble(
id = example_ids,
inferred_type = infer_id_type(example_ids)
)
#> # A tibble: 5 × 2
#> id inferred_type
#> <chr> <chr>
#> 1 4HHB ENTRY
#> 2 4HHB-1 ASSEMBLY
#> 3 4HHB_1 ENTITY
#> 4 4HHB.A INSTANCE
#> 5 ATP CHEMICAL_COMPONENT
parse_rcsb_id("4HHB-1")
#> $raw_id
#> [1] "4HHB-1"
#>
#> $normalized_id
#> [1] "4HHB-1"
#>
#> $id_type
#> [1] "ASSEMBLY"
#>
#> $entry_id
#> [1] "4HHB"
#>
#> $assembly_id
#> [1] "1"
#>
#> $entity_id
#> NULL
#>
#> $instance_id
#> NULL
#>
#> $separator
#> [1] "-"
build_entry_id(" 4HHB ")
#> [1] "4HHB"
build_assembly_id("4HHB", 1)
#> [1] "4HHB-1"
build_entity_id("4HHB", 1)
#> [1] "4HHB_1"
build_instance_id("4HHB", "A")
#> [1] "4HHB.A"
These helpers make identifier handling explicit. They are useful when a workflow moves across entry-, assembly-, entity-, and chain-level retrieval, because the required identifier syntax changes with the biological level. In practice, the helpers reduce ad hoc string handling and make it easier to write validation checks before a request is sent.
kinase_hits <- query_search("protein kinase")
head(kinase_hits, 10)
#> [1] "6GWV" "1QH4" "4IDV" "1QL6" "6CQE" "2V4Y" "7ZP0" "2QKR" "6Z1T" "5CEN"
class(kinase_hits)
#> [1] "rPDBapi_query_ids" "character"
attr(kinase_hits, "return_type")
#> [1] "entry"
query_search() is the fastest way to ask a general
question of the archive. Here it performs a full-text search and returns
entry identifiers. The returned object is not just a plain character
vector: it carries the class rPDBapi_query_ids, which makes
the contract explicit and helps downstream code reason about what kind
of object was returned. The output above also shows the
return_type attribute, which confirms that entry IDs were
requested.
kinase_entry_ids <- perform_search(
search_operator = kinase_query,
return_type = "ENTRY",
request_options = search_controls,
verbosity = FALSE
)
kinase_entry_ids
#> [1] "4V7N" "3JU5" "2AKO" "4O75" "1J91" "1Z0S" "6RKE" "4Z9M" "1E19" "2A1F"
#> attr(,"class")
#> [1] "rPDBapi_search_ids" "character"
class(kinase_entry_ids)
#> [1] "rPDBapi_search_ids" "character"
perform_search() executes the operator-based query
assembled earlier. This is the function to use when you need precise
control over attributes, logical combinations, return types, or
pagination. In structural bioinformatics, this kind of targeted search
is often more useful than full-text search alone, because it lets you
combine biological meaning with experimental constraints. As shown
above, identifier results are tagged with class
rPDBapi_search_ids.
entry_properties <- list(
rcsb_id = list(),
struct = c("title"),
struct_keywords = c("pdbx_keywords"),
exptl = c("method"),
rcsb_entry_info = c("molecular_weight", "resolution_combined"),
rcsb_accession_info = c("initial_release_date")
)
entry_properties
#> $rcsb_id
#> list()
#>
#> $struct
#> [1] "title"
#>
#> $struct_keywords
#> [1] "pdbx_keywords"
#>
#> $exptl
#> [1] "method"
#>
#> $rcsb_entry_info
#> [1] "molecular_weight" "resolution_combined"
#>
#> $rcsb_accession_info
#> [1] "initial_release_date"
This property list defines the fields we want from the GraphQL endpoint. It captures both structural metadata and biologically meaningful annotations: structure title, keywords, experimental method, molecular weight, resolution, and release date. By stating these fields explicitly, the workflow remains transparent and easy to reproduce.
head(list_rcsb_fields("ENTRY"), 10)
#> data_type field subfield
#> 1 ENTRY rcsb_id <NA>
#> 2 ENTRY struct title
#> 3 ENTRY struct_keywords pdbx_keywords
#> 4 ENTRY exptl method
#> 5 ENTRY cell length_a
#> 6 ENTRY cell length_b
#> 7 ENTRY cell length_c
#> 8 ENTRY cell volume
#> 9 ENTRY cell angle_beta
#> 10 ENTRY citation title
search_rcsb_fields("resolution", data_type = "ENTRY")
#> data_type field subfield
#> 13 ENTRY rcsb_entry_info resolution_combined
validate_properties(
properties = entry_properties,
data_type = "ENTRY",
strict = TRUE
)
validate_properties(
properties = list(
rcsb_entry_info = c("resolution_combined", "unknown_subfield")
),
data_type = "ENTRY",
strict = FALSE
)
#> $unknown_fields
#> character(0)
#>
#> $unknown_subfields
#> $unknown_subfields$rcsb_entry_info
#> [1] "unknown_subfield"
The schema-aware helpers are useful when building property lists
iteratively. list_rcsb_fields() exposes the package’s
built-in field registry, search_rcsb_fields() narrows it to
a topic of interest, and validate_properties() checks that
a property list matches the expected data-type-specific structure. In
strict mode, validation fails early; in non-strict mode, it returns
diagnostics that can be incorporated into an interactive workflow or a
package test.
old_opt <- options(rPDBapi.strict_property_validation = TRUE)
on.exit(options(old_opt), add = TRUE)
generate_json_query(
ids = c("4HHB"),
data_type = "ENTRY",
properties = list(rcsb_entry_info = c("resolution_combined"))
)
#> [1] "{entries(entry_ids: [\"4HHB\"]){rcsb_entry_info {resolution_combined}}}"
This option-gated strict mode is useful when you want a pipeline to reject unknown fields immediately. Because the option is opt-in, the package preserves backward compatibility for existing code while still supporting stricter validation in controlled workflows.
kinase_metadata <- data_fetcher(
id = kinase_entry_ids[1:5],
data_type = "ENTRY",
properties = entry_properties,
return_as_dataframe = TRUE
)
kinase_metadata
#> # A tibble: 5 × 7
#> rcsb_id title pdbx_keywords method molecular_weight resolution_combined
#> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 4V7N Glycocyamin… TRANSFERASE X-RAY… 1610.6 2.3
#> 2 3JU5 Crystal Str… TRANSFERASE X-RAY… 171.14 1.75
#> 3 2AKO Crystal str… TRANSFERASE X-RAY… 114.17 2.2
#> 4 4O75 Crystal str… TRANSCRIPTIO… X-RAY… 15.8 1.55
#> 5 1J91 Crystal str… TRANSFERASE X-RAY… 79.45 2.22
#> # ℹ 1 more variable: initial_release_date <chr>
data_fetcher() is the main high-level retrieval function
for metadata. It takes identifiers, the data level of interest, and a
property list, then returns either a validated nested response or a
flattened data frame. For many analysis tasks, returning a data frame is
the most convenient choice because it fits directly into standard R
workflows for filtering, joining, and plotting.
kinase_json_query <- generate_json_query(
ids = kinase_entry_ids[1:3],
data_type = "ENTRY",
properties = entry_properties
)
cat(kinase_json_query)
#> {entries(entry_ids: ["4V7N", "3JU5", "2AKO"]){rcsb_id , struct {title}, struct_keywords {pdbx_keywords}, exptl {method}, rcsb_entry_info {molecular_weight, resolution_combined}, rcsb_accession_info {initial_release_date}}}
This chunk exposes the GraphQL query string that rPDBapi
sends to the RCSB data API. Seeing the generated query is helpful when
you are debugging a field name, comparing package output with the
official schema, or teaching others how the package maps R objects to
API requests.
kinase_raw <- fetch_data(
json_query = kinase_json_query,
data_type = "ENTRY",
ids = kinase_entry_ids[1:3]
)
str(kinase_raw, max.level = 2)
#> List of 1
#> $ data:List of 1
#> ..$ entries:List of 3
#> - attr(*, "class")= chr [1:2] "rPDBapi_fetch_response" "list"
#> - attr(*, "ids")= chr [1:3] "4V7N" "3JU5" "2AKO"
#> - attr(*, "data_type")= chr "ENTRY"
fetch_data() returns a validated raw payload and tags it
with the class rPDBapi_fetch_response. This is useful when
you want to inspect nested JSON content before flattening it, preserve
hierarchy for custom parsing, or verify that a field is present before
building a larger workflow around it. The printed structure confirms a
list-like response with explicit contract tagging.
kinase_tidy <- return_data_as_dataframe(
response = kinase_raw,
data_type = "ENTRY",
ids = kinase_entry_ids[1:3]
)
kinase_tidy
#> # A tibble: 3 × 7
#> rcsb_id title pdbx_keywords method molecular_weight resolution_combined
#> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 4V7N Glycocyamin… TRANSFERASE X-RAY… 1610.6 2.3
#> 2 3JU5 Crystal Str… TRANSFERASE X-RAY… 171.14 1.75
#> 3 2AKO Crystal str… TRANSFERASE X-RAY… 114.17 2.2
#> # ℹ 1 more variable: initial_release_date <chr>
return_data_as_dataframe() converts the nested response
into a rectangular R data structure. This transformation is central to
reproducible bioinformatics: once the results are tidy, they can be
analyzed with dplyr, joined to other annotations,
summarized statistically, or passed to visualization packages.
High-throughput structural workflows rarely stop at one or two identifiers. In screening, comparative analysis, or annotation projects, it is common to fetch dozens or hundreds of records with the same property specification.
cache_dir <- file.path(tempdir(), "rpdbapi-vignette-cache")
kinase_batch <- data_fetcher_batch(
id = kinase_entry_ids[1:5],
data_type = "ENTRY",
properties = entry_properties,
return_as_dataframe = TRUE,
batch_size = 2,
retry_attempts = 2,
retry_backoff = 0,
cache = TRUE,
cache_dir = cache_dir,
progress = FALSE,
verbosity = FALSE
)
kinase_batch
#> # A tibble: 5 × 7
#> rcsb_id title pdbx_keywords method molecular_weight resolution_combined
#> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 4V7N Glycocyamin… TRANSFERASE X-RAY… 1610.6 2.3
#> 2 3JU5 Crystal Str… TRANSFERASE X-RAY… 171.14 1.75
#> 3 2AKO Crystal str… TRANSFERASE X-RAY… 114.17 2.2
#> 4 4O75 Crystal str… TRANSCRIPTIO… X-RAY… 15.8 1.55
#> 5 1J91 Crystal str… TRANSFERASE X-RAY… 79.45 2.22
#> # ℹ 1 more variable: initial_release_date <chr>
attr(kinase_batch, "provenance")
#> $fetched_at
#> [1] "2026-03-07 16:36:55.093741"
#>
#> $mode
#> [1] "batch"
#>
#> $data_type
#> [1] "ENTRY"
#>
#> $requested_ids
#> [1] 5
#>
#> $batch_size
#> [1] 2
#>
#> $num_batches
#> [1] 3
#>
#> $retry_attempts
#> [1] 2
#>
#> $retry_backoff
#> [1] 0
#>
#> $cache_enabled
#> [1] TRUE
#>
#> $cache_dir
#> [1] "/var/folders/dj/y28dp44x303ggfg6rg8n2v0h0000gn/T//RtmpwNIFML/rpdbapi-vignette-cache"
#>
#> $cache_hits
#> [1] 0
#>
#> $cache_misses
#> [1] 3
#>
#> $batches
#> $batches[[1]]
#> $batches[[1]]$batch_index
#> [1] 1
#>
#> $batches[[1]]$batch_size
#> [1] 2
#>
#> $batches[[1]]$ids
#> [1] "4V7N" "3JU5"
#>
#> $batches[[1]]$attempts
#> [1] 1
#>
#> $batches[[1]]$cache_hit
#> [1] FALSE
#>
#>
#> $batches[[2]]
#> $batches[[2]]$batch_index
#> [1] 2
#>
#> $batches[[2]]$batch_size
#> [1] 2
#>
#> $batches[[2]]$ids
#> [1] "2AKO" "4O75"
#>
#> $batches[[2]]$attempts
#> [1] 1
#>
#> $batches[[2]]$cache_hit
#> [1] FALSE
#>
#>
#> $batches[[3]]
#> $batches[[3]]$batch_index
#> [1] 3
#>
#> $batches[[3]]$batch_size
#> [1] 1
#>
#> $batches[[3]]$ids
#> [1] "1J91"
#>
#> $batches[[3]]$attempts
#> [1] 1
#>
#> $batches[[3]]$cache_hit
#> [1] FALSE
cache_info(cache_dir)
#> $cache_dir
#> [1] "/private/var/folders/dj/y28dp44x303ggfg6rg8n2v0h0000gn/T/RtmpwNIFML/rpdbapi-vignette-cache"
#>
#> $total_entries
#> [1] 3
#>
#> $total_size_bytes
#> [1] 1415
#>
#> $entries
#> file size_bytes
#> 1 cache-0f77b464ceebdd6a5ea3ee57739104f8.rds 485
#> 2 cache-1a7566aeec7fda76c5905962538070c4.rds 427
#> 3 cache-560039a4a46316d788c5a7538fd66c75.rds 503
#> modified
#> 1 2026-03-07 16:36:54.936792
#> 2 2026-03-07 16:36:55.092213
#> 3 2026-03-07 16:36:54.784697
data_fetcher_batch() scales the single-request
data_fetcher() model to larger identifier sets. It splits
requests into batches, retries transient failures, optionally stores
batch results on disk, and attaches provenance to the returned object.
That provenance is important for reproducibility because it records the
retrieval mode, batch size, retry configuration, and cache usage.
clear_rpdbapi_cache(cache_dir)
cache_info(cache_dir)
#> $cache_dir
#> [1] "/private/var/folders/dj/y28dp44x303ggfg6rg8n2v0h0000gn/T/RtmpwNIFML/rpdbapi-vignette-cache"
#>
#> $total_entries
#> [1] 0
#>
#> $total_size_bytes
#> [1] 0
#>
#> $entries
#> [1] file size_bytes modified
#> <0 rows> (or 0-length row.names)
This cache-management pattern is especially useful in iterative analysis. Repeated metadata retrieval becomes faster when the same requests are reused, while explicit cache inspection and cleanup keep the workflow transparent.
# Use data_fetcher() when:
# - the ID set is small
# - you want the simplest request path
# - retry, cache, and provenance are unnecessary
# Use data_fetcher_batch() when:
# - the ID set is large
# - requests may need retries
# - repeated retrieval should reuse cached results
# - you want an explicit provenance record
In practice, data_fetcher() is usually sufficient for
exploratory work. data_fetcher_batch() becomes more useful
as the workflow moves toward larger or repeated retrieval, where retry
behavior, caching, and provenance become part of the analysis design
rather than implementation detail.
provenance_tbl <- dplyr::tibble(
field = names(attr(kinase_batch, "provenance")),
value = vapply(
attr(kinase_batch, "provenance"),
function(x) {
if (is.list(x)) "<list>" else as.character(x)
},
character(1)
)
)
provenance_tbl
#> # A tibble: 13 × 2
#> field value
#> <chr> <chr>
#> 1 fetched_at 2026-03-07 16:36:55.093741
#> 2 mode batch
#> 3 data_type ENTRY
#> 4 requested_ids 5
#> 5 batch_size 2
#> 6 num_batches 3
#> 7 retry_attempts 2
#> 8 retry_backoff 0
#> 9 cache_enabled TRUE
#> 10 cache_dir /var/folders/dj/y28dp44x303ggfg6rg8n2v0h0000gn/T//RtmpwNIFML/…
#> 11 cache_hits 0
#> 12 cache_misses 3
#> 13 batches <list>
Interpreting provenance explicitly is useful when results are produced in a non-interactive workflow. The provenance record makes it clear how many batches were used, whether caching was enabled, and how the retrieval was configured, which makes the metadata table easier to audit later.
Biological assemblies are often the correct unit of interpretation for oligomeric proteins. A deposited asymmetric unit may not reflect the functional quaternary structure, so assembly-level retrieval is important when studying stoichiometry, symmetry, and interfaces.
kinase_assembly_ids <- perform_search(
search_operator = kinase_query,
return_type = "ASSEMBLY",
request_options = RequestOptions(result_start_index = 0, num_results = 5),
verbosity = FALSE
)
kinase_assembly_ids
#> [1] "4V7N-10" "4V7N-14" "4V7N-15" "4V7N-2" "4V7N-5"
#> attr(,"class")
#> [1] "rPDBapi_search_ids" "character"
This search requests assembly identifiers rather than entry identifiers. The returned IDs encode both the entry and the assembly number, making them appropriate inputs for assembly-level metadata retrieval. This is an important distinction in structural biology because entry-level and assembly-level questions are not interchangeable.
assembly_properties <- list(
rcsb_id = list(),
pdbx_struct_assembly = c("details", "method_details", "oligomeric_count"),
rcsb_struct_symmetry = c("kind", "symbol")
)
kinase_assemblies <- data_fetcher(
id = kinase_assembly_ids,
data_type = "ASSEMBLY",
properties = assembly_properties,
return_as_dataframe = TRUE
)
kinase_assemblies
#> # A tibble: 5 × 5
#> rcsb_id details oligomeric_count kind symbol
#> <chr> <chr> <chr> <chr> <chr>
#> 1 4V7N-10 author_defined_assembly 2 Global Symmetry C2
#> 2 4V7N-14 author_defined_assembly 2 Global Symmetry C2
#> 3 4V7N-15 author_defined_assembly 2 Global Symmetry C2
#> 4 4V7N-2 author_defined_assembly 2 Global Symmetry C2
#> 5 4V7N-5 author_defined_assembly 2 Global Symmetry C2
This chunk retrieves assembly descriptors and symmetry annotations. In practice, these fields help answer questions about oligomeric state and biological interpretation, such as whether a kinase structure is monomeric, dimeric, or associated with a symmetric assembly.
assembly_object <- as_rpdb_assembly(
kinase_assemblies,
metadata = list(query = "protein kinase assemblies")
)
assembly_object
#> <rPDBapi_assembly> with data class: rPDBapi_dataframe/tbl_df/tbl/data.frame
dplyr::as_tibble(assembly_object)
#> # A tibble: 5 × 5
#> rcsb_id details oligomeric_count kind symbol
#> <chr> <chr> <chr> <chr> <chr>
#> 1 4V7N-10 author_defined_assembly 2 Global Symmetry C2
#> 2 4V7N-14 author_defined_assembly 2 Global Symmetry C2
#> 3 4V7N-15 author_defined_assembly 2 Global Symmetry C2
#> 4 4V7N-2 author_defined_assembly 2 Global Symmetry C2
#> 5 4V7N-5 author_defined_assembly 2 Global Symmetry C2
summarize_assemblies(assembly_object)
#> # A tibble: 1 × 3
#> n_assemblies median_oligomeric_count n_symmetry_labels
#> <int> <dbl> <int>
#> 1 5 2 1
The assembly object wrapper is useful when you want to retain
lightweight metadata alongside a table while still working with
tibble-oriented tools. summarize_assemblies() then provides
a narrow helper for common assembly questions, such as typical
oligomeric count and the diversity of symmetry labels in the retrieved
result set.
One practical source of bugs in structural workflows is mixing
identifier levels. A valid entry ID is not automatically a valid
assembly or entity ID, and the corresponding data_type must
match the biological level of the request.
dplyr::tibble(
example_id = c("4HHB", "4HHB-1", "4HHB_1", "4HHB.A", "ATP"),
inferred_type = infer_id_type(c("4HHB", "4HHB-1", "4HHB_1", "4HHB.A", "ATP"))
)
#> # A tibble: 5 × 2
#> example_id inferred_type
#> <chr> <chr>
#> 1 4HHB ENTRY
#> 2 4HHB-1 ASSEMBLY
#> 3 4HHB_1 ENTITY
#> 4 4HHB.A INSTANCE
#> 5 ATP CHEMICAL_COMPONENT
parse_rcsb_id("4HHB.A")
#> $raw_id
#> [1] "4HHB.A"
#>
#> $normalized_id
#> [1] "4HHB.A"
#>
#> $id_type
#> [1] "INSTANCE"
#>
#> $entry_id
#> [1] "4HHB"
#>
#> $assembly_id
#> NULL
#>
#> $entity_id
#> NULL
#>
#> $instance_id
#> [1] "A"
#>
#> $separator
#> [1] "."
This is useful as a preflight step before retrieval. In larger workflows, a small identifier check often saves time because it catches level mismatches before the request reaches the API.
# Entry-level retrieval
data_fetcher(
id = build_entry_id("4HHB"),
data_type = "ENTRY",
properties = list(rcsb_id = list())
)
#> # A tibble: 1 × 1
#> rcsb_id
#> <chr>
#> 1 4HHB
# Assembly-level retrieval
data_fetcher(
id = build_assembly_id("4HHB", 1),
data_type = "ASSEMBLY",
properties = list(rcsb_id = list())
)
#> # A tibble: 1 × 1
#> rcsb_id
#> <chr>
#> 1 4HHB-1
# Polymer-entity retrieval
data_fetcher(
id = build_entity_id("4HHB", 1),
data_type = "POLYMER_ENTITY",
properties = list(rcsb_id = list())
)
#> # A tibble: 1 × 1
#> rcsb_id
#> <chr>
#> 1 4HHB_1
Using the builder helpers makes the intended record level explicit in code. That is especially helpful when identifiers are generated programmatically from entry IDs and entity or assembly indices.
Many analyses need entity-level annotations rather than whole-entry metadata. For example, taxonomy belongs naturally to polymer entities because different entities within the same structure can come from different organisms or constructs.
kinase_polymer_ids <- perform_search(
search_operator = kinase_query,
return_type = "POLYMER_ENTITY",
request_options = RequestOptions(result_start_index = 0, num_results = 5),
verbosity = FALSE
)
kinase_polymer_ids
#> [1] "4V7N_1" "2AKO_1" "1Z0S_1" "6RKE_2" "4A7X_1"
#> attr(,"class")
#> [1] "rPDBapi_search_ids" "character"
Here the same biological query is projected onto polymer entities. This is useful when you want annotations at the chain-definition level, such as source organism, sequence grouping, or entity-specific descriptors.
polymer_properties <- list(
rcsb_id = list(),
rcsb_entity_source_organism = c("ncbi_taxonomy_id", "ncbi_scientific_name"),
rcsb_cluster_membership = c("cluster_id", "identity")
)
kinase_polymer_metadata <- data_fetcher(
id = kinase_polymer_ids,
data_type = "POLYMER_ENTITY",
properties = polymer_properties,
return_as_dataframe = TRUE
)
kinase_polymer_metadata
#> # A tibble: 5 × 6
#> ID rcsb_id ncbi_taxonomy_id ncbi_scientific_name cluster_id identity
#> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 4V7N_1 4V7N_1 243920 Namalycastis sp. ST01 31085 100
#> 2 2AKO_1 1Z0S_1 2234 Archaeoglobus fulgidus 20922 100
#> 3 1Z0S_1 6RKE_2 322710 Azotobacter vinelandii DJ 2488 100
#> 4 6RKE_2 4A7X_1 85962 Helicobacter pylori 26695 45400 100
#> 5 4A7X_1 2AKO_1 197 Campylobacter jejuni 51745 100
This result provides organism-level context that is often essential in comparative structural biology. For example, you might use these fields to separate human kinase structures from bacterial homologs, or to identify closely related entities before selecting representatives for downstream modeling.
polymer_object <- as_rpdb_polymer_entity(
kinase_polymer_metadata,
metadata = list(query = "kinase polymer entities")
)
taxonomy_table <- extract_taxonomy_table(polymer_object)
taxonomy_table
#> # A tibble: 5 × 3
#> rcsb_id ncbi_taxonomy_id ncbi_scientific_name
#> <chr> <chr> <chr>
#> 1 4V7N_1 243920 Namalycastis sp. ST01
#> 2 1Z0S_1 2234 Archaeoglobus fulgidus
#> 3 6RKE_2 322710 Azotobacter vinelandii DJ
#> 4 4A7X_1 85962 Helicobacter pylori 26695
#> 5 2AKO_1 197 Campylobacter jejuni
taxonomy_table %>%
count(ncbi_scientific_name, sort = TRUE)
#> # A tibble: 5 × 2
#> ncbi_scientific_name n
#> <chr> <int>
#> 1 Archaeoglobus fulgidus 1
#> 2 Azotobacter vinelandii DJ 1
#> 3 Campylobacter jejuni 1
#> 4 Helicobacter pylori 26695 1
#> 5 Namalycastis sp. ST01 1
extract_taxonomy_table() is intentionally narrow: it
keeps only the fields needed to represent source-organism assignments
cleanly. This is useful when a larger polymer-entity table contains many
retrieval columns, but the immediate analysis question is taxonomic
composition or species-level redundancy.
selected_entry <- kinase_entry_ids[[1]]
selected_info <- quietly(get_info(selected_entry))
entry_summary <- dplyr::tibble(
rcsb_id = selected_entry,
title = purrr::pluck(selected_info, "struct", "title", .default = NA_character_),
keywords = purrr::pluck(selected_info, "struct_keywords", "pdbx_keywords", .default = NA_character_),
method = purrr::pluck(selected_info, "exptl", 1, "method", .default = NA_character_),
citation_title = purrr::pluck(selected_info, "rcsb_primary_citation", "title", .default = NA_character_),
resolution = paste(
purrr::pluck(selected_info, "rcsb_entry_info", "resolution_combined", .default = NA),
collapse = "; "
)
)
entry_summary
#> # A tibble: 1 × 6
#> rcsb_id title keywords method citation_title resolution
#> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 4V7N Glycocyamine kinase, beta-b… TRANSFE… <NA> Structural ba… 2.3
get_info() retrieves a full entry record as a nested
list. This is useful when you want a richer, less filtered
representation than a GraphQL property subset. In this example, we
extract structure title, keywords, experimental method, citation title,
and resolution to build a compact summary of one kinase entry. These
fields are exactly the kinds of annotations structural biologists
inspect when deciding whether a structure is suitable for biological
interpretation or downstream modeling. Depending on the deposited
metadata, some fields (for example experimental method in this run) may
be missing (NA).
if (!exists("selected_entry", inherits = TRUE) || !nzchar(selected_entry)) {
selected_entry <- "4HHB"
}
literature_term <- selected_entry
kinase_papers <- quietly(find_papers(literature_term, max_results = 3))
kinase_keywords <- quietly(find_results(literature_term, field = "struct_keywords"))
kinase_papers
#> $`4V7N`
#> [1] "Structural basis for the mechanism and substrate specificity of glycocyamine kinase, a phosphagen kinase family member."
head(kinase_keywords, 3)
#> $`4V7N`
#> $`4V7N`$pdbx_keywords
#> [1] "TRANSFERASE"
#>
#> $`4V7N`$text
#> [1] "phosphagen kinase, glycocyamine kinase, transition state analog, Kinase, Transferase"
These helper functions show how rPDBapi can bridge
structures and biological interpretation. find_papers()
provides publication titles associated with matching entries, while
find_results() can retrieve selected metadata fields across
search results. Here we use selected_entry as the search
term to keep the vignette runtime bounded while still demonstrating both
helper APIs. In this run, both calls return one-key lists keyed by the
selected entry.
kinase_structure <- get_pdb_file(
pdb_id = selected_entry,
filetype = "cif",
verbosity = FALSE
)
coordinate_matrix <- matrix(kinase_structure$xyz, ncol = 3, byrow = TRUE)
coordinate_df <- data.frame(
x = coordinate_matrix[, 1],
y = coordinate_matrix[, 2],
z = coordinate_matrix[, 3]
)
calpha_atoms <- cbind(
kinase_structure$atom[kinase_structure$calpha, c("chain", "resno", "resid")],
coordinate_df[kinase_structure$calpha, , drop = FALSE]
)
head(calpha_atoms, 10)
#> chain resno resid x y z
#> 2 AA 25 PHE 43.573 95.837 41.431
#> 13 AA 26 LYS 42.654 96.945 44.971
#> 22 AA 27 ALA 41.784 94.561 47.802
#> 27 AA 28 ALA 38.081 95.202 47.160
#> 32 AA 29 ASP 38.298 93.850 43.599
#> 40 AA 30 ASN 39.321 90.459 44.964
#> 48 AA 31 PHE 37.105 90.117 48.039
#> 59 AA 32 PRO 35.488 86.639 47.866
#> 66 AA 33 ASP 31.809 86.337 46.944
#> 74 AA 34 LEU 30.517 84.502 50.003
get_pdb_file() downloads and parses the structure file
into an object that contains atomic records and coordinates. This is the
transition point from metadata analysis to coordinate analysis. The
example extracts C-alpha atoms, which are commonly used in structural
alignment, geometry summaries, coarse distance analyses, and quick
visual inspection of protein backbones.
calpha_atoms <- extract_calpha_coordinates(kinase_structure)
head(calpha_atoms, 10)
#> # A tibble: 10 × 6
#> chain resno resid x y z
#> <chr> <int> <chr> <dbl> <dbl> <dbl>
#> 1 AA 25 PHE 43.6 95.8 41.4
#> 2 AA 26 LYS 42.7 96.9 45.0
#> 3 AA 27 ALA 41.8 94.6 47.8
#> 4 AA 28 ALA 38.1 95.2 47.2
#> 5 AA 29 ASP 38.3 93.8 43.6
#> 6 AA 30 ASN 39.3 90.5 45.0
#> 7 AA 31 PHE 37.1 90.1 48.0
#> 8 AA 32 PRO 35.5 86.6 47.9
#> 9 AA 33 ASP 31.8 86.3 46.9
#> 10 AA 34 LEU 30.5 84.5 50.0
extract_calpha_coordinates() packages a common
structural-bioinformatics step into a reusable helper. The result is
immediately usable for plotting, distance-based summaries, or
chain-level coordinate analyses without manually reconstructing the
atom/coordinate join each time.
kinase_sequences <- get_fasta_from_rcsb_entry(selected_entry, verbosity = FALSE)
length(kinase_sequences)
#> [1] 1
utils::head(nchar(unlist(kinase_sequences)))
#> 4V7N_1|Chains AA[auth AB], A[auth AA], BA[auth AD], B[auth AC], CA[auth AF], C[auth AE], DA[auth AH], D[auth AG], EA[auth AJ], E[auth AI], FA[auth AL], F[auth AK], GA[auth AN], G[auth AM], HA[auth AP], H[auth AO], IA[auth AR], I[auth AQ], JA[auth BB], J[auth BA], K[auth BC], L[auth BD], M[auth BE], N[auth BF], O[auth BG], P[auth BH], Q[auth BI], R[auth BJ], S[auth BK], T[auth BL], U[auth BM], V[auth BN], W[auth BO], X[auth BP], Y[auth BQ], Z[auth BR]|Glycocyamine kinase beta chain|Namalycastis sp. ST01 (243920)
#> 390
The FASTA workflow complements coordinate retrieval by exposing the underlying macromolecular sequences. Having both sequence and structure available in the same environment is useful for tasks such as domain boundary checks, sequence length summaries, and linking structural hits to sequence-based pipelines.
chain_sequence_summary <- join_structure_sequence(
kinase_structure,
kinase_sequences
)
chain_sequence_summary
#> # A tibble: 1 × 5
#> sequence_header sequence chain sequence_length n_calpha
#> <chr> <chr> <chr> <int> <int>
#> 1 4V7N_1|Chains AA[auth AB], A[auth AA]… MGSAIQD… s 390 NA
This helper joins sequence-level and coordinate-level summaries at the chain level. In practice, it provides a quick diagnostic for whether the downloaded structure and the FASTA content align as expected, and it creates a compact table that can be extended with additional chain annotations.
The typed object wrappers are intentionally lightweight. They do not replace the underlying data; instead, they add a stable class layer for printing, conversion, and helper dispatch.
entry_demo <- as_rpdb_entry(
data.frame(
rcsb_id = c("4HHB", "1CRN"),
method = c("X-RAY DIFFRACTION", "SOLUTION NMR"),
resolution_combined = c("1.74", NA),
stringsAsFactors = FALSE
),
metadata = list(example = "local object demo")
)
entry_demo
#> <rPDBapi_entry> with data class: data.frame
dplyr::as_tibble(entry_demo)
#> # A tibble: 2 × 3
#> rcsb_id method resolution_combined
#> <chr> <chr> <chr>
#> 1 4HHB X-RAY DIFFRACTION 1.74
#> 2 1CRN SOLUTION NMR <NA>
summarize_entries(entry_demo)
#> # A tibble: 1 × 4
#> n_entries n_methods best_resolution median_molecular_weight
#> <int> <int> <dbl> <dbl>
#> 1 2 2 1.74 NA
entry_demo$metadata
#> $example
#> [1] "local object demo"
This pattern is useful when a workflow needs to preserve both data
and context. For example, metadata can record which query produced a
table or which processing choices were applied. The
as_tibble() methods then let the object drop back into a
standard tidyverse pipeline without extra conversion code.
structure_demo <- as_rpdb_structure(
list(
atom = data.frame(
chain = c("A", "A"),
resno = c(1L, 2L),
resid = c("GLY", "ALA"),
stringsAsFactors = FALSE
),
xyz = c(1, 2, 3, 4, 5, 6),
calpha = c(TRUE, FALSE)
),
metadata = list(source = "illustration")
)
structure_demo
#> <rPDBapi_structure> with data class: list
dplyr::as_tibble(structure_demo)
#> # A tibble: 6 × 3
#> atom xyz calpha
#> <chr> <chr> <chr>
#> 1 A;A;1;2;GLY;ALA 1 TRUE
#> 2 A;A;1;2;GLY;ALA 2 FALSE
#> 3 A;A;1;2;GLY;ALA 3 TRUE
#> 4 A;A;1;2;GLY;ALA 4 FALSE
#> 5 A;A;1;2;GLY;ALA 5 TRUE
#> 6 A;A;1;2;GLY;ALA 6 FALSE
The structure wrapper is especially useful when one analysis session contains multiple parsed structures and derived tables. The class layer makes those objects easier to distinguish and easier to handle consistently.
entry_object <- as_rpdb_entry(
kinase_metadata,
metadata = list(query = "protein kinase entry metadata")
)
summarize_entries(entry_object)
#> # A tibble: 1 × 4
#> n_entries n_methods best_resolution median_molecular_weight
#> <int> <int> <dbl> <dbl>
#> 1 5 1 1.55 114.
kinase_summary <- dplyr::as_tibble(entry_object) %>%
mutate(
molecular_weight = as.numeric(molecular_weight),
resolution_combined = as.numeric(resolution_combined),
initial_release_date = as.Date(initial_release_date)
) %>%
arrange(resolution_combined) %>%
select(
rcsb_id,
title,
pdbx_keywords,
method,
molecular_weight,
resolution_combined,
initial_release_date
)
kinase_summary
#> # A tibble: 5 × 7
#> rcsb_id title pdbx_keywords method molecular_weight resolution_combined
#> <chr> <chr> <chr> <chr> <dbl> <dbl>
#> 1 4O75 Crystal str… TRANSCRIPTIO… X-RAY… 15.8 1.55
#> 2 3JU5 Crystal Str… TRANSFERASE X-RAY… 171. 1.75
#> 3 2AKO Crystal str… TRANSFERASE X-RAY… 114. 2.2
#> 4 1J91 Crystal str… TRANSFERASE X-RAY… 79.4 2.22
#> 5 4V7N Glycocyamin… TRANSFERASE X-RAY… 1611. 2.3
#> # ℹ 1 more variable: initial_release_date <date>
kinase_summary %>%
summarise(
n_structures = n(),
median_molecular_weight = median(molecular_weight, na.rm = TRUE),
best_resolution = min(resolution_combined, na.rm = TRUE)
)
#> # A tibble: 1 × 3
#> n_structures median_molecular_weight best_resolution
#> <int> <dbl> <dbl>
#> 1 5 114. 1.55
Once the metadata are in a data frame, standard R analysis becomes
immediate. This chunk ranks the retrieved kinase structures by
resolution and computes a few simple summaries. Although the analysis is
straightforward, it illustrates the main advantage of using
rPDBapi: PDB metadata become ordinary tabular R data that
can be manipulated with the same tools used elsewhere in
bioinformatics.
kinase_polymer_metadata %>%
count(ncbi_scientific_name, sort = TRUE)
#> # A tibble: 5 × 2
#> ncbi_scientific_name n
#> <chr> <int>
#> 1 Archaeoglobus fulgidus 1
#> 2 Azotobacter vinelandii DJ 1
#> 3 Campylobacter jejuni 1
#> 4 Helicobacter pylori 26695 1
#> 5 Namalycastis sp. ST01 1
This example summarizes the source organisms represented in the polymer-entity results. A table like this is often the first step in identifying redundancy, sampling bias, or opportunities to compare orthologous structures across species.
r3d <- asNamespace("r3dmol")
visualization_entry <- "4HHB"
saved_structure <- quietly(get_pdb_file(
pdb_id = visualization_entry,
filetype = "pdb",
save = TRUE,
path = tempdir(),
verbosity = FALSE
))
r3d$r3dmol() %>%
r3d$m_add_model(data = saved_structure$path, format = "pdb") %>%
r3d$m_set_style(style = r3d$m_style_cartoon(color = "spectrum")) %>%
r3d$m_zoom_to()
This optional chunk demonstrates how rPDBapi fits into
broader R visualization workflows. The package itself focuses on data
access and parsing, while a tool such as r3dmol can be used
to render the retrieved structure in 3D. That separation of
responsibilities is useful because it keeps data access, analysis, and
visualization composable.
The previous sections used text-based and attribute-based search.
rPDBapi also supports sequence, motif, structure, and
chemical searches. These are especially important in structural
bioinformatics, where the biological question is often not “Which entry
mentions a keyword?” but “Which structures resemble this sequence,
motif, fold, or ligand chemistry?”
kinase_motif_sequence <- "VAIKTLKPGTMSPEAFLQEAQVMKKLRHEKLVQLYAVV"
sequence_operator <- SequenceOperator(
sequence = kinase_motif_sequence,
sequence_type = "PROTEIN",
evalue_cutoff = 10,
identity_cutoff = 0.7
)
sequence_operator
#> $evalue_cutoff
#> [1] 10
#>
#> $identity_cutoff
#> [1] 0.7
#>
#> $target
#> [1] "pdb_protein_sequence"
#>
#> $value
#> [1] "VAIKTLKPGTMSPEAFLQEAQVMKKLRHEKLVQLYAVV"
#>
#> attr(,"class")
#> [1] "SequenceOperator" "list"
autoresolve_sequence_type("ATGCGTACGTAGC")
#> [1] "DNA"
autoresolve_sequence_type("AUGCGUACGUAGC")
#> [1] "RNA"
SequenceOperator() constructs a sequence-similarity
search request. This is useful when you have a sequence of interest,
such as a kinase domain fragment, and want to find structurally
characterized homologs in the PDB. The companion function
autoresolve_sequence_type() infers whether a sequence is
DNA, RNA, or protein, which is helpful for interactive workflows and
programmatic pipelines that ingest sequences from external sources.
sequence_hits <- perform_search(
search_operator = sequence_operator,
return_type = "POLYMER_ENTITY",
request_options = RequestOptions(result_start_index = 0, num_results = 5),
verbosity = FALSE
)
sequence_hits
#> [1] "1FMK_1" "1KSW_1" "1Y57_1" "1YI6_1" "1YOJ_1"
#> attr(,"class")
#> [1] "rPDBapi_search_ids" "character"
This query asks the RCSB search service for polymer entities similar to the input sequence. The polymer-entity return type is appropriate here because sequence similarity is defined at the entity level rather than at the whole entry level.
prosite_like_motif <- SeqMotifOperator(
pattern = "[LIV][ACDEFGHIKLMNPQRSTVWY]K[GST]",
sequence_type = "PROTEIN",
pattern_type = "REGEX"
)
prosite_like_motif
#> $value
#> [1] "[LIV][ACDEFGHIKLMNPQRSTVWY]K[GST]"
#>
#> $pattern_type
#> [1] "regex"
#>
#> $target
#> [1] "pdb_protein_sequence"
#>
#> attr(,"class")
#> [1] "SeqMotifOperator" "list"
SeqMotifOperator() targets local sequence patterns
rather than whole-sequence similarity. This is useful for catalytic
signatures, binding motifs, or short conserved regions that occur across
otherwise diverse proteins.
motif_hits <- perform_search(
search_operator = prosite_like_motif,
return_type = "POLYMER_ENTITY",
request_options = RequestOptions(result_start_index = 0, num_results = 5),
verbosity = FALSE
)
motif_hits
#> [1] "103L_1" "109M_1" "10DC_1" "10FM_1" "10FT_1"
#> attr(,"class")
#> [1] "rPDBapi_search_ids" "character"
Motif search is especially helpful when a biological question depends on a functional local pattern rather than on full-length homology. In kinase-related work, motifs can help you locate catalytic or regulatory sequence signatures across structural entries.
structure_operator <- StructureOperator(
pdb_entry_id = "4HHB",
assembly_id = 1,
search_mode = "RELAXED_SHAPE_MATCH"
)
structure_operator
#> $value
#> $value$entry_id
#> [1] "4HHB"
#>
#> $value$assembly_id
#> [1] "1"
#>
#>
#> $operator
#> [1] "relaxed_shape_match"
#>
#> attr(,"class")
#> [1] "StructureOperator" "list"
infer_search_service(structure_operator)
#> [1] "structure"
StructureOperator() creates a shape-based structure
search using an existing PDB structure as the template. This is useful
when you want to identify structures with related global geometry, even
if sequence identity is limited. infer_search_service()
confirms that this operator is routed to the structure search backend
rather than to text or full-text search.
structure_hits <- perform_search(
search_operator = QueryNode(structure_operator),
return_type = "ASSEMBLY",
request_options = RequestOptions(result_start_index = 0, num_results = 5),
verbosity = FALSE
)
structure_hits
#> [1] "4HHB-1" "1COH-1" "2HHB-1" "1QSH-1" "1G9V-1"
#> attr(,"class")
#> [1] "rPDBapi_search_ids" "character"
Returning assemblies is often sensible for shape-based search because biological function frequently depends on the quaternary arrangement of chains rather than only on the deposited asymmetric unit.
atp_like_operator <- ChemicalOperator(
descriptor = "O=P(O)(O)OP(=O)(O)OP(=O)(O)O",
matching_criterion = "fingerprint-similarity"
)
atp_like_operator
#> $value
#> [1] "O=P(O)(O)OP(=O)(O)OP(=O)(O)O"
#>
#> $type
#> [1] "descriptor"
#>
#> $descriptor_type
#> [1] "SMILES"
#>
#> $match_type
#> [1] "fingerprint-similarity"
#>
#> attr(,"class")
#> [1] "ChemicalOperator" "list"
infer_search_service(atp_like_operator)
#> [1] "chemical"
ChemicalOperator() supports ligand-centric workflows by
searching the archive with a SMILES or InChI descriptor. This is useful
when you want to find structures containing chemically similar ligands,
cofactors, or fragments.
chemical_hits <- perform_search(
search_operator = QueryNode(atp_like_operator),
return_type = "CHEMICAL_COMPONENT",
request_options = RequestOptions(result_start_index = 0, num_results = 5),
verbosity = FALSE
)
chemical_hits
#> [1] "3PO" "6YW" "6YX" "6YY" "7TT"
#> attr(,"class")
#> [1] "rPDBapi_search_ids" "character"
Here the return type is CHEMICAL_COMPONENT, which maps
to molecular definitions in the RCSB search API. This lets the search
focus on ligand identities rather than on whole macromolecular
entries.
The operator-based search grammar is one of the most important features of the package. The examples below summarize the text-oriented operators that can be combined in grouped queries.
exact_resolution <- ExactMatchOperator(
attribute = "exptl.method",
value = "X-RAY DIFFRACTION"
)
organism_inclusion <- InOperator(
attribute = "rcsb_entity_source_organism.taxonomy_lineage.name",
value = c("Homo sapiens", "Mus musculus")
)
title_words <- ContainsWordsOperator(
attribute = "struct.title",
value = "protein kinase"
)
title_phrase <- ContainsPhraseOperator(
attribute = "struct.title",
value = "protein kinase"
)
resolution_cutoff <- ComparisonOperator(
attribute = "rcsb_entry_info.resolution_combined",
value = 2.0,
comparison_type = "LESS"
)
resolution_window <- RangeOperator(
attribute = "rcsb_entry_info.resolution_combined",
from_value = 1.0,
to_value = 2.5
)
doi_exists <- ExistsOperator("rcsb_primary_citation.pdbx_database_id_doi")
list(
exact_resolution = exact_resolution,
organism_inclusion = organism_inclusion,
title_words = title_words,
title_phrase = title_phrase,
resolution_cutoff = resolution_cutoff,
resolution_window = resolution_window,
doi_exists = doi_exists
)
#> $exact_resolution
#> $attribute
#> [1] "exptl.method"
#>
#> $value
#> [1] "X-RAY DIFFRACTION"
#>
#> $operator
#> [1] "exact_match"
#>
#> attr(,"class")
#> [1] "ExactMatchOperator" "list"
#>
#> $organism_inclusion
#> $attribute
#> [1] "rcsb_entity_source_organism.taxonomy_lineage.name"
#>
#> $operator
#> [1] "in"
#>
#> $value
#> [1] "Homo sapiens" "Mus musculus"
#>
#> attr(,"class")
#> [1] "InOperator" "list"
#>
#> $title_words
#> $attribute
#> [1] "struct.title"
#>
#> $operator
#> [1] "contains_words"
#>
#> $value
#> [1] "protein kinase"
#>
#> attr(,"class")
#> [1] "ContainsWordsOperator" "list"
#>
#> $title_phrase
#> $attribute
#> [1] "struct.title"
#>
#> $operator
#> [1] "contains_phrase"
#>
#> $value
#> [1] "protein kinase"
#>
#> attr(,"class")
#> [1] "ContainsPhraseOperator" "list"
#>
#> $resolution_cutoff
#> $operator
#> [1] "less"
#>
#> $attribute
#> [1] "rcsb_entry_info.resolution_combined"
#>
#> $value
#> [1] 2
#>
#> attr(,"class")
#> [1] "ComparisonOperator" "list"
#>
#> $resolution_window
#> $operator
#> [1] "range"
#>
#> $attribute
#> [1] "rcsb_entry_info.resolution_combined"
#>
#> $negation
#> [1] FALSE
#>
#> $value
#> $value$from
#> [1] 1
#>
#> $value$to
#> [1] 2.5
#>
#>
#> attr(,"class")
#> [1] "RangeOperator" "list"
#>
#> $doi_exists
#> $attribute
#> [1] "rcsb_primary_citation.pdbx_database_id_doi"
#>
#> $operator
#> [1] "exists"
#>
#> attr(,"class")
#> [1] "ExistsOperator" "list"
These operator constructors do not perform a search by themselves. Instead, they make query intent explicit and composable. That matters when analyses need to be read, reviewed, and reproduced months later.
operator_node <- QueryNode(title_words)
composite_query <- QueryGroup(
queries = list(title_words, resolution_window, doi_exists),
logical_operator = "AND"
)
scored_example <- ScoredResult(entity_id = "4HHB", score = 0.98)
operator_node
#> $type
#> [1] "terminal"
#>
#> $service
#> [1] "text"
#>
#> $parameters
#> $attribute
#> [1] "struct.title"
#>
#> $operator
#> [1] "contains_words"
#>
#> $value
#> [1] "protein kinase"
#>
#> attr(,"class")
#> [1] "ContainsWordsOperator" "list"
composite_query
#> $type
#> [1] "group"
#>
#> $logical_operator
#> [1] "and"
#>
#> $nodes
#> $nodes[[1]]
#> $nodes[[1]]$type
#> [1] "terminal"
#>
#> $nodes[[1]]$service
#> [1] "text"
#>
#> $nodes[[1]]$parameters
#> $attribute
#> [1] "struct.title"
#>
#> $operator
#> [1] "contains_words"
#>
#> $value
#> [1] "protein kinase"
#>
#> attr(,"class")
#> [1] "ContainsWordsOperator" "list"
#>
#>
#> $nodes[[2]]
#> $nodes[[2]]$type
#> [1] "terminal"
#>
#> $nodes[[2]]$service
#> [1] "text"
#>
#> $nodes[[2]]$parameters
#> $operator
#> [1] "range"
#>
#> $attribute
#> [1] "rcsb_entry_info.resolution_combined"
#>
#> $negation
#> [1] FALSE
#>
#> $value
#> $value$from
#> [1] 1
#>
#> $value$to
#> [1] 2.5
#>
#>
#> attr(,"class")
#> [1] "RangeOperator" "list"
#>
#>
#> $nodes[[3]]
#> $nodes[[3]]$type
#> [1] "terminal"
#>
#> $nodes[[3]]$service
#> [1] "text"
#>
#> $nodes[[3]]$parameters
#> $attribute
#> [1] "rcsb_primary_citation.pdbx_database_id_doi"
#>
#> $operator
#> [1] "exists"
#>
#> attr(,"class")
#> [1] "ExistsOperator" "list"
scored_example
#> $entity_id
#> [1] "4HHB"
#>
#> $score
#> [1] 0.98
QueryNode() and QueryGroup() are the glue
that turn independent operator objects into a full search graph.
ScoredResult() is a small utility that represents the shape
of a scored hit and is useful for result handling or for teaching the
output model used by structure-search APIs.
scored_structure_hits <- perform_search(
search_operator = QueryNode(structure_operator),
return_type = "ASSEMBLY",
request_options = RequestOptions(result_start_index = 0, num_results = 3),
return_with_scores = TRUE,
verbosity = FALSE
)
scored_structure_hits
#> identifier score
#> 1 4HHB-1 1.0000000
#> 2 1COH-1 0.9895895
#> 3 2HHB-1 0.9868059
class(scored_structure_hits)
#> [1] "rPDBapi_search_scores" "data.frame"
This example shows the difference between returning plain identifiers
and returning scored hits. In similarity-oriented workflows, the score
itself can be analytically useful because it helps rank follow-up
candidates before any metadata are fetched. A common pattern is to
inspect scored results first, choose a cutoff, and only then retrieve
metadata for the retained identifiers. The class output above shows
rPDBapi_search_scores.
# Pattern: build small reusable operators first
title_filter <- ContainsPhraseOperator("struct.title", "protein kinase")
resolution_filter <- ComparisonOperator(
"rcsb_entry_info.resolution_combined",
2.5,
"LESS_OR_EQUAL"
)
# Combine them only when the biological question is clear
query_graph <- QueryGroup(
queries = list(
title_filter,
resolution_filter
),
logical_operator = "AND"
)
This is the main reason to treat operator construction as a separate step. A query graph can be assembled gradually, reviewed independently of the network request, and reused across multiple searches or result types.
query_search() is intentionally simpler than
perform_search(), but it still supports several specialized
query types as well as low-level request overrides.
# PubMed-linked structures
query_search(search_term = 27499440, query_type = "PubmedIdQuery")
#> [1] "5IMT" "5IMW" "5IMY"
#> attr(,"class")
#> [1] "rPDBapi_query_ids" "character"
#> attr(,"return_type")
#> [1] "entry"
# Organism/taxonomy search
organism_search <- query_search(search_term = "9606", query_type = "TreeEntityQuery")
head(organism_search)
#> [1] "10AD" "10FT" "11GS" "13RZ" "13SK" "13SN"
# Experimental method search
experimental_search <- query_search(search_term = "X-RAY DIFFRACTION", query_type = "ExpTypeQuery")
head(experimental_search)
#> [1] "101D" "107L" "109D" "10KY" "110M" "131L"
# Author search
query_search(search_term = "Kuriyan, J.", query_type = "AdvancedAuthorQuery")
#> [1] "1A06" "1A5T" "1AD5" "1AQC" "1AXC" "1AYA" "1AYB" "1AYC" "1AYD" "1B6C"
#> [11] "1BF5" "1BGF" "1BK5" "1BK6" "1BKD" "1CKA" "1CKB" "1CZD" "1D5A" "1DBH"
#> [21] "1DKG" "1DSB" "1EE4" "1EE5" "1EFN" "1EM8" "1EQN" "1FJL" "1FJM" "1FPU"
#> [31] "1HKX" "1IA9" "1IAH" "1IAJ" "1IAS" "1IEP" "1JQJ" "1JQL" "1JR3" "1M52"
#> [41] "1MBC" "1NJF" "1NJG" "1NVU" "1NVV" "1NVW" "1NVX" "1OPJ" "1OPK" "1OPL"
#> [51] "1PLQ" "1PLR" "1Q9C" "1QCF" "1QQC" "1SHA" "1SHB" "1SPR" "1SPS" "1SXJ"
#> [61] "1TDE" "1TDF" "1TRB" "1U4H" "1U55" "1U56" "1X11" "1XD2" "1XD4" "1XDV"
#> [71] "1XXH" "1XXI" "2AVT" "2BDW" "2F4J" "2F86" "2FO0" "2G1T" "2G2F" "2G2H"
#> [81] "2G2I" "2GS2" "2GS6" "2GS7" "2HCK" "2HNH" "2HQA" "2II0" "2IJE" "2M20"
#> [91] "2MA2" "2OIQ" "2OZO" "2POL" "2RF9" "2RFD" "2RFE" "2TPR" "3BEP" "3D1E"
#> [101] "3D1F" "3D1G" "3D7T" "3D7U" "3DQW" "3DQX" "3EEE" "3ET6" "3F6X" "3FW1"
#> [111] "3G6G" "3G6H" "3GEQ" "3GLF" "3GLG" "3GLH" "3GLI" "3GT8" "3K4X" "3KEX"
#> [121] "3KK8" "3KK9" "3KL8" "3KSY" "3LAH" "3LAI" "3NVR" "3NVU" "3SOA" "3TF0"
#> [131] "3TF1" "3TF8" "3TF9" "3TFA" "3TFD" "3TFE" "3TFF" "3TFG" "3U5Z" "3U60"
#> [141] "3U61" "4JOM" "4K2R" "4L9M" "4L9U" "4U99" "4U9B" "4U9G" "4U9J" "4U9K"
#> [151] "4XEY" "4XI2" "4XUF" "4XZ0" "4XZ1" "4Y93" "4Y94" "4Y95" "5BNB" "5CNN"
#> [161] "5CNO" "5IG0" "5IG1" "5IG3" "5IG4" "5IG5" "5NLV" "5NLY" "5WDO" "5WDP"
#> [171] "5WDQ" "5WDR" "5WDS" "6AXF" "6AXG" "6BK5" "6M90" "6M91" "6M92" "6M93"
#> [181] "6M94" "6OF8" "6OF9" "6UAN" "6W66" "6W67" "6W68" "6W69" "6WCQ" "7REC"
#> [191] "7ROY" "7SA7" "7SYD" "7SYE" "7SZ0" "7SZ1" "7SZ5" "7SZ7" "8CZI" "8E4T"
#> [201] "8UH7" "8UK9" "8UNF" "8UNH" "8UY3" "9EOY"
#> attr(,"class")
#> [1] "rPDBapi_query_ids" "character"
#> attr(,"return_type")
#> [1] "entry"
# UniProt-linked entries
query_search(search_term = "P31749", query_type = "uniprot")
#> [1] "1H10" "1UNP" "1UNQ" "1UNR" "2UVM" "2UZR" "2UZS" "3CQU" "3CQW" "3MV5"
#> [11] "3MVH" "3O96" "3OCB" "3OW4" "3QKK" "3QKL" "3QKM" "4EJN" "4EKK" "4EKL"
#> [21] "4GV1" "5KCV" "6BUU" "6CCY" "6HHF" "6HHG" "6HHH" "6HHI" "6HHJ" "6NPZ"
#> [31] "6S9W" "6S9X" "7APJ" "7MYX" "7NH4" "7NH5" "8JOW" "8UVY" "8UW2" "8UW7"
#> [41] "8UW9" "8ZPU"
#> attr(,"class")
#> [1] "rPDBapi_query_ids" "character"
#> attr(,"return_type")
#> [1] "entry"
# PFAM-linked entries
pfam_search <- query_search(search_term = "PF00069", query_type = "pfam")
head(pfam_search)
#> [1] "10BL" "10JU" "10SL" "1A06" "1A9U" "1APM"
These convenience modes are useful when the search criterion maps directly to a common biological identifier or curation field. They are less flexible than a fully operator-based query, but faster to write for routine tasks.
custom_scan_params <- list(
request_options = list(
paginate = list(start = 0, rows = 5),
return_all_hits = FALSE
)
)
custom_scan_params
#> $request_options
#> $request_options$paginate
#> $request_options$paginate$start
#> [1] 0
#>
#> $request_options$paginate$rows
#> [1] 5
#>
#>
#> $request_options$return_all_hits
#> [1] FALSE
scan_params lets you override the request body that
query_search() sends to the search API. This is useful when
you want lightweight access to custom pagination or request options
without switching fully to perform_search().
limited_kinase_hits <- query_search(
search_term = "protein kinase",
scan_params = custom_scan_params
)
limited_kinase_hits
#> [1] "1QK1" "6GWV" "2BUF" "2J4L" "1CRK"
#> attr(,"class")
#> [1] "rPDBapi_query_ids" "character"
#> attr(,"return_type")
#> [1] "entry"
This example illustrates the practical use of
scan_params: constrain the result set while still using the
simpler query helper.
The data_fetcher() interface supports more than entry
and polymer-entity data. The main supported data types are:
ENTRYASSEMBLYPOLYMER_ENTITYBRANCHED_ENTITYNONPOLYMER_ENTITYPOLYMER_ENTITY_INSTANCEBRANCHED_ENTITY_INSTANCENONPOLYMER_ENTITY_INSTANCECHEMICAL_COMPONENTThis breadth matters because structural records are hierarchical. Different questions belong to different levels: entry-level methods, assembly-level symmetry, entity-level taxonomy, instance-level chain annotations, and component-level ligand chemistry.
base_properties <- list(
rcsb_entry_info = c("resolution_combined"),
exptl = c("method")
)
extended_properties <- add_property(list(
rcsb_entry_info = c("molecular_weight", "resolution_combined"),
struct = c("title")
))
base_properties
#> $rcsb_entry_info
#> [1] "resolution_combined"
#>
#> $exptl
#> [1] "method"
extended_properties
#> $rcsb_entry_info
#> [1] "molecular_weight" "resolution_combined"
#>
#> $struct
#> [1] "title"
add_property() helps construct or merge property
specifications without duplicating subfields. This is especially useful
in interactive analyses, where you may start with a minimal query and
then progressively request additional annotations.
property_workflow <- add_property(list(
rcsb_id = list(),
struct = c("title"),
rcsb_entry_info = c("resolution_combined")
))
property_workflow <- add_property(list(
rcsb_entry_info = c("molecular_weight", "resolution_combined"),
exptl = c("method")
))
property_workflow
#> $rcsb_entry_info
#> [1] "molecular_weight" "resolution_combined"
#>
#> $exptl
#> [1] "method"
validate_properties(property_workflow, data_type = "ENTRY", strict = FALSE)
#> $unknown_fields
#> character(0)
#>
#> $unknown_subfields
#> list()
This pattern is useful because GraphQL property lists tend to grow as an analysis becomes more specific. Building them incrementally makes it easier to keep a compact initial query, add only the fields that become necessary, and check that the evolving specification still matches the expected schema.
ligand_properties <- list(
rcsb_id = list(),
chem_comp = c("id", "name", "formula", "formula_weight", "type"),
rcsb_chem_comp_info = c("initial_release_date")
)
ligand_properties
#> $rcsb_id
#> list()
#>
#> $chem_comp
#> [1] "id" "name" "formula" "formula_weight"
#> [5] "type"
#>
#> $rcsb_chem_comp_info
#> [1] "initial_release_date"
This property specification targets chemical components rather than whole structures. That distinction is important when the biological focus is on ligands, cofactors, inhibitors, or bound metabolites.
chemical_component_df <- data_fetcher(
id = head(chemical_hits, 3),
data_type = "CHEMICAL_COMPONENT",
properties = ligand_properties,
return_as_dataframe = TRUE
)
chemical_component_df
#> # A tibble: 3 × 7
#> rcsb_id id name formula formula_weight type initial_release_date
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 3PO 3PO TRIPHOSPHATE H5 O10… 257.955 non-… 2000-09-13T00:00:00Z
#> 2 6YW 6YW [oxidanyl-[ox… H8 O19… 497.895 non-… 2017-10-11T00:00:00Z
#> 3 6YX 6YX [oxidanyl-[ox… H26 O7… 1937.533 non-… 2017-10-11T00:00:00Z
The resulting table can be used to compare ligand formulas, molecular weights, and release histories. This is often useful in medicinal chemistry and structure-based design workflows.
ligand_object <- as_rpdb_chemical_component(
chemical_component_df,
metadata = list(query = "ATP-like chemical components")
)
extract_ligand_table(ligand_object)
#> # A tibble: 3 × 6
#> rcsb_id id name formula formula_weight type
#> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 3PO 3PO TRIPHOSPHATE H5 O10… 257.955 non-…
#> 2 6YW 6YW [oxidanyl-[oxidanyl-[oxidanyl(phos… H8 O19… 497.895 non-…
#> 3 6YX 6YX [oxidanyl-[oxidanyl-[oxidanyl-[oxi… H26 O7… 1937.533 non-…
extract_ligand_table() keeps the most analysis-relevant
chemical-component columns in a compact form. That is useful when ligand
retrieval is only one part of a broader workflow and you want a small,
stable table for downstream joins or ranking.
atp_description <- quietly(describe_chemical("ATP"))
dplyr::tibble(
chem_id = "ATP",
name = purrr::pluck(atp_description, "chem_comp", "name", .default = NA_character_),
formula = purrr::pluck(atp_description, "chem_comp", "formula", .default = NA_character_),
formula_weight = purrr::pluck(atp_description, "chem_comp", "formula_weight", .default = NA),
smiles = purrr::pluck(atp_description, "rcsb_chem_comp_descriptor", "smiles", .default = NA_character_)
)
#> # A tibble: 1 × 5
#> chem_id name formula formula_weight smiles
#> <chr> <chr> <chr> <dbl> <chr>
#> 1 ATP ADENOSINE-5'-TRIPHOSPHATE C10 H16 N5 O13 P3 507. c1nc(c2c(n…
describe_chemical() provides a direct route to detailed
ligand information for a single chemical component. It complements
data_fetcher() by supporting a focused, ligand-centric
lookup.
# Polymer chain instance
polymer_instance <- data_fetcher(
id = "4HHB.A",
data_type = "POLYMER_ENTITY_INSTANCE",
properties = list(rcsb_id = list()),
return_as_dataframe = TRUE,
verbosity = FALSE
)
# Non-polymer instance (heme in hemoglobin entry 4HHB)
nonpolymer_instance <- data_fetcher(
id = "4HHB.E",
data_type = "NONPOLYMER_ENTITY_INSTANCE",
properties = list(rcsb_id = list()),
return_as_dataframe = TRUE,
verbosity = FALSE
)
polymer_instance
#> # A tibble: 1 × 1
#> rcsb_id
#> <chr>
#> 1 4HHB.A
nonpolymer_instance
#> # A tibble: 1 × 1
#> rcsb_id
#> <chr>
#> 1 4HHB.E
Instance-level retrieval is relevant when chain-level or
site-specific annotations matter. The exact identifier format depends on
the corresponding RCSB data type and record level. The examples above
show valid polymer and non-polymer instance retrievals from the same
entry (4HHB).
The package exposes lower-level functions for users who need full control over HTTP requests, URLs, or response parsing.
entry_url <- get_pdb_api_url("core/entry/", "4HHB")
chem_url <- get_pdb_api_url("core/chemcomp/", "ATP")
entry_url
#> [1] "https://data.rcsb.org/rest/v1/core/entry/4HHB"
chem_url
#> [1] "https://data.rcsb.org/rest/v1/core/chemcomp/ATP"
get_pdb_api_url() constructs endpoint-specific URLs.
This is a small utility, but it makes low-level workflows clearer and
reduces hard-coded URL strings in custom scripts.
# Manual request lifecycle
url <- get_pdb_api_url("core/entry/", "4HHB")
response <- send_api_request(url, verbosity = FALSE)
handle_api_errors(response, url)
payload <- parse_response(response, format = "json")
This low-level lifecycle is useful when you are developing a new helper, debugging an endpoint transition, or comparing package behavior with the raw RCSB API. It also makes clear where URL construction, HTTP transport, status checking, and parsing are separated inside the package.
entry_response <- send_api_request(entry_url, verbosity = FALSE)
handle_api_errors(entry_response, entry_url)
entry_payload <- parse_response(entry_response, format = "json")
names(entry_payload)[1:5]
#> [1] "audit_author" "cell" "citation" "database2" "diffrn"
These functions expose the package’s low-level request stack.
send_api_request() handles the HTTP request,
handle_api_errors() checks the returned status, and
parse_response() converts the body into an R object. This
layer is useful when you need to debug endpoint behavior or prototype a
new helper around the RCSB REST API.
mini_graphql <- generate_json_query(
ids = kinase_entry_ids[1:2],
data_type = "ENTRY",
properties = list(rcsb_id = list(), struct = c("title"))
)
mini_graphql_response <- search_graphql(list(query = mini_graphql))
str(mini_graphql_response, max.level = 2)
#> List of 1
#> $ data:List of 1
#> ..$ entries:List of 2
search_graphql() is the low-level GraphQL entry point.
It is useful when you want to inspect the raw content returned by the
RCSB GraphQL service before it is normalized by
fetch_data() or flattened by
return_data_as_dataframe().
One of the notable features of the package is that core functions return typed objects. This improves programmatic safety because code can distinguish search identifiers, scored results, raw responses, and flattened data frames.
list(
query_search_class = class(query_search("kinase")),
perform_search_class = class(
perform_search(DefaultOperator("kinase"), verbosity = FALSE)
),
perform_search_scores_class = class(
perform_search(
DefaultOperator("kinase"),
return_with_scores = TRUE,
verbosity = FALSE
)
)
)
#> $query_search_class
#> [1] "rPDBapi_query_ids" "character"
#>
#> $perform_search_class
#> [1] "rPDBapi_search_ids" "character"
#>
#> $perform_search_scores_class
#> [1] "rPDBapi_search_scores" "data.frame"
The classes shown above make return semantics explicit. In a larger analysis pipeline, this reduces ambiguity and makes it easier to validate assumptions at each stage of data retrieval.
raw_entry_response <- data_fetcher(
id = kinase_entry_ids[1:2],
data_type = "ENTRY",
properties = list(rcsb_id = list()),
return_as_dataframe = FALSE
)
tidy_entry_response <- data_fetcher(
id = kinase_entry_ids[1:2],
data_type = "ENTRY",
properties = list(rcsb_id = list()),
return_as_dataframe = TRUE
)
class(raw_entry_response)
#> [1] "rPDBapi_fetch_response" "list"
class(tidy_entry_response)
#> [1] "rPDBapi_dataframe" "tbl_df" "tbl"
#> [4] "data.frame"
This example emphasizes the dual output model of
data_fetcher(): retain the nested payload when structure
matters, or request a data frame when analysis and integration matter
more.
list(
entry_object_class = class(as_rpdb_entry(kinase_metadata)),
assembly_object_class = class(as_rpdb_assembly(kinase_assemblies)),
polymer_object_class = class(as_rpdb_polymer_entity(kinase_polymer_metadata)),
structure_object_class = class(as_rpdb_structure(kinase_structure)),
batch_provenance_names = names(attr(kinase_batch, "provenance"))
)
#> $entry_object_class
#> [1] "rPDBapi_entry" "rPDBapi_object" "list"
#>
#> $assembly_object_class
#> [1] "rPDBapi_assembly" "rPDBapi_object" "list"
#>
#> $polymer_object_class
#> [1] "rPDBapi_polymer_entity" "rPDBapi_object" "list"
#>
#> $structure_object_class
#> [1] "rPDBapi_structure" "rPDBapi_object" "list"
#>
#> $batch_provenance_names
#> [1] "fetched_at" "mode" "data_type" "requested_ids"
#> [5] "batch_size" "num_batches" "retry_attempts" "retry_backoff"
#> [9] "cache_enabled" "cache_dir" "cache_hits" "cache_misses"
#> [13] "batches"
These richer object wrappers are deliberately lightweight. They preserve the underlying data while attaching a semantically meaningful class, which makes it easier to define helper methods and to branch on object type in larger structural analysis pipelines.
local_entry_object <- as_rpdb_entry(
data.frame(
rcsb_id = "4HHB",
method = "X-RAY DIFFRACTION",
resolution_combined = "1.74",
stringsAsFactors = FALSE
),
metadata = list(source = "local method demo")
)
print(local_entry_object)
#> <rPDBapi_entry> with data class: data.frame
dplyr::as_tibble(local_entry_object)
#> # A tibble: 1 × 3
#> rcsb_id method resolution_combined
#> <chr> <chr> <chr>
#> 1 4HHB X-RAY DIFFRACTION 1.74
This example makes the object behavior explicit. The custom print
method gives a concise summary of the wrapped object, while the
as_tibble() method provides an immediate path back to a
standard tabular workflow. That combination is the main point of the
object model: preserve semantic type information without making
downstream manipulation cumbersome.
invalid_property_result <- tryCatch(
validate_properties(
properties = list(unknown_field = c("x")),
data_type = "ENTRY",
strict = TRUE
),
rPDBapi_error_invalid_input = function(e) e
)
invalid_fetch_result <- tryCatch(
data_fetcher(
id = character(0),
data_type = "ENTRY",
properties = list(rcsb_id = list())
),
rPDBapi_error_invalid_input = function(e) e
)
list(
invalid_property_class = class(invalid_property_result),
invalid_property_message = conditionMessage(invalid_property_result),
invalid_fetch_class = class(invalid_fetch_result),
invalid_fetch_message = conditionMessage(invalid_fetch_result)
)
#> $invalid_property_class
#> [1] "rPDBapi_error_invalid_input" "rPDBapi_error"
#> [3] "error" "condition"
#>
#> $invalid_property_message
#> [1] "Unknown properties for data_type 'ENTRY': unknown_field"
#>
#> $invalid_fetch_class
#> [1] "rPDBapi_error_invalid_input" "rPDBapi_error"
#> [3] "error" "condition"
#>
#> $invalid_fetch_message
#> [1] "Invalid input: 'id' must not be NULL or empty."
These examples show how typed errors support defensive programming.
Instead of matching raw error text, a calling workflow can branch on the
condition class and decide whether to stop, retry, skip a record, or log
the problem for later review. That is particularly valuable when
rPDBapi is used inside larger automated pipelines.
The table below maps every exported function to its primary role in the package. This is not a replacement for the individual help pages, but it does make the full surface area of the package explicit inside a single tutorial document.
| Function | Role |
|---|---|
| query_search | High-level convenience search helper |
| perform_search | Operator-based search engine |
| DefaultOperator | Full-text search operator |
| ExactMatchOperator | Exact attribute match operator |
| InOperator | Set-membership operator |
| ContainsWordsOperator | Word containment operator |
| ContainsPhraseOperator | Phrase containment operator |
| ComparisonOperator | Numeric/date comparison operator |
| RangeOperator | Range filter operator |
| ExistsOperator | Attribute existence operator |
| SequenceOperator | Sequence similarity search operator |
| autoresolve_sequence_type | Automatic DNA/RNA/protein detection |
| SeqMotifOperator | Sequence motif search operator |
| StructureOperator | Structure similarity search operator |
| ChemicalOperator | Chemical descriptor search operator |
| QueryNode | Wrap one operator as a query node |
| QueryGroup | Combine nodes with AND/OR logic |
| RequestOptions | Pagination and sorting controls |
| ScoredResult | Represent a scored hit |
| infer_search_service | Infer backend service from operator |
| infer_id_type | Infer identifier level from an ID string |
| parse_rcsb_id | Parse an identifier into structured components |
| build_entry_id | Normalize or build entry identifiers |
| build_assembly_id | Build assembly identifiers |
| build_entity_id | Build entity identifiers |
| build_instance_id | Build instance or chain identifiers |
| add_property | Merge/extend GraphQL property lists |
| list_rcsb_fields | List known retrievable fields by data type |
| search_rcsb_fields | Search the built-in field registry |
| validate_properties | Validate a property list against the field registry |
| generate_json_query | Build a GraphQL query string |
| search_graphql | Low-level GraphQL request helper |
| fetch_data | Normalize validated GraphQL payloads |
| return_data_as_dataframe | Flatten nested payloads into data frames |
| data_fetcher | High-level metadata fetcher |
| data_fetcher_batch | Batch metadata fetcher with retry and provenance |
| cache_info | Inspect batch-cache contents |
| clear_rpdbapi_cache | Clear on-disk cache entries |
| get_info | Retrieve full entry metadata |
| find_results | Extract one field across search hits |
| find_papers | Extract primary citation titles |
| describe_chemical | Retrieve ligand/chemical-component details |
| get_fasta_from_rcsb_entry | Retrieve FASTA sequences |
| get_pdb_file | Download and parse structure files |
| get_pdb_api_url | Build REST endpoint URLs |
| send_api_request | Send low-level GET/POST requests |
| handle_api_errors | Check HTTP status and stop on error |
| parse_response | Parse JSON or text responses |
| as_rpdb_entry | Wrap entry data in a typed object |
| as_rpdb_assembly | Wrap assembly data in a typed object |
| as_rpdb_polymer_entity | Wrap polymer-entity data in a typed object |
| as_rpdb_chemical_component | Wrap chemical-component data in a typed object |
| as_rpdb_structure | Wrap structure data in a typed object |
| summarize_entries | Summarize entry-level metadata |
| summarize_assemblies | Summarize assembly-level metadata |
| extract_taxonomy_table | Extract taxonomy-focused columns |
| extract_ligand_table | Extract ligand-focused columns |
| extract_calpha_coordinates | Extract C-alpha coordinates |
| join_structure_sequence | Join sequence summaries to chain coordinates |
This table is intended as a package navigation aid. It makes it easier to identify whether a task belongs to searching, retrieval, parsing, or lower-level API control before you start writing a workflow.
The next block gives a compact usage sketch for every exported function. These examples are deliberately short and grouped by role so that users can quickly find a starting pattern.
# Search helpers
query_search("4HHB")
#> [1] "4HHB" "1J7W" "2W6V"
#> attr(,"class")
#> [1] "rPDBapi_query_ids" "character"
#> attr(,"return_type")
#> [1] "entry"
perform_search(DefaultOperator("4HHB"), verbosity = FALSE)
#> [1] "4HHB" "1J7W" "2W6V"
#> attr(,"class")
#> [1] "rPDBapi_search_ids" "character"
# Text and attribute operators
DefaultOperator("kinase")
#> $value
#> [1] "kinase"
#>
#> attr(,"class")
#> [1] "DefaultOperator" "list"
ExactMatchOperator("exptl.method", "X-RAY DIFFRACTION")
#> $attribute
#> [1] "exptl.method"
#>
#> $value
#> [1] "X-RAY DIFFRACTION"
#>
#> $operator
#> [1] "exact_match"
#>
#> attr(,"class")
#> [1] "ExactMatchOperator" "list"
InOperator("rcsb_entity_source_organism.taxonomy_lineage.name", c("Homo sapiens", "Mus musculus"))
#> $attribute
#> [1] "rcsb_entity_source_organism.taxonomy_lineage.name"
#>
#> $operator
#> [1] "in"
#>
#> $value
#> [1] "Homo sapiens" "Mus musculus"
#>
#> attr(,"class")
#> [1] "InOperator" "list"
ContainsWordsOperator("struct.title", "protein kinase")
#> $attribute
#> [1] "struct.title"
#>
#> $operator
#> [1] "contains_words"
#>
#> $value
#> [1] "protein kinase"
#>
#> attr(,"class")
#> [1] "ContainsWordsOperator" "list"
ContainsPhraseOperator("struct.title", "protein kinase")
#> $attribute
#> [1] "struct.title"
#>
#> $operator
#> [1] "contains_phrase"
#>
#> $value
#> [1] "protein kinase"
#>
#> attr(,"class")
#> [1] "ContainsPhraseOperator" "list"
ComparisonOperator("rcsb_entry_info.resolution_combined", 2.0, "LESS")
#> $operator
#> [1] "less"
#>
#> $attribute
#> [1] "rcsb_entry_info.resolution_combined"
#>
#> $value
#> [1] 2
#>
#> attr(,"class")
#> [1] "ComparisonOperator" "list"
RangeOperator("rcsb_entry_info.resolution_combined", 1.0, 2.5)
#> $operator
#> [1] "range"
#>
#> $attribute
#> [1] "rcsb_entry_info.resolution_combined"
#>
#> $negation
#> [1] FALSE
#>
#> $value
#> $value$from
#> [1] 1
#>
#> $value$to
#> [1] 2.5
#>
#>
#> attr(,"class")
#> [1] "RangeOperator" "list"
ExistsOperator("rcsb_primary_citation.pdbx_database_id_doi")
#> $attribute
#> [1] "rcsb_primary_citation.pdbx_database_id_doi"
#>
#> $operator
#> [1] "exists"
#>
#> attr(,"class")
#> [1] "ExistsOperator" "list"
# Specialized operators
SequenceOperator("MVLSPADKTNVKAAW", sequence_type = "PROTEIN")
#> $evalue_cutoff
#> [1] 100
#>
#> $identity_cutoff
#> [1] 0.95
#>
#> $target
#> [1] "pdb_protein_sequence"
#>
#> $value
#> [1] "MVLSPADKTNVKAAW"
#>
#> attr(,"class")
#> [1] "SequenceOperator" "list"
autoresolve_sequence_type("ATGCGTACGTAGC")
#> [1] "DNA"
SeqMotifOperator("[LIV][ACDEFGHIKLMNPQRSTVWY]K[GST]", "PROTEIN", "REGEX")
#> $value
#> [1] "[LIV][ACDEFGHIKLMNPQRSTVWY]K[GST]"
#>
#> $pattern_type
#> [1] "regex"
#>
#> $target
#> [1] "pdb_protein_sequence"
#>
#> attr(,"class")
#> [1] "SeqMotifOperator" "list"
StructureOperator("4HHB", assembly_id = 1, search_mode = "RELAXED_SHAPE_MATCH")
#> $value
#> $value$entry_id
#> [1] "4HHB"
#>
#> $value$assembly_id
#> [1] "1"
#>
#>
#> $operator
#> [1] "relaxed_shape_match"
#>
#> attr(,"class")
#> [1] "StructureOperator" "list"
ChemicalOperator("C1=CC=CC=C1", matching_criterion = "graph-strict")
#> $value
#> [1] "C1=CC=CC=C1"
#>
#> $type
#> [1] "descriptor"
#>
#> $descriptor_type
#> [1] "SMILES"
#>
#> $match_type
#> [1] "graph-strict"
#>
#> attr(,"class")
#> [1] "ChemicalOperator" "list"
# Query composition
QueryNode(DefaultOperator("kinase"))
#> $type
#> [1] "terminal"
#>
#> $service
#> [1] "full_text"
#>
#> $parameters
#> $value
#> [1] "kinase"
#>
#> attr(,"class")
#> [1] "DefaultOperator" "list"
QueryGroup(list(DefaultOperator("kinase"), ExistsOperator("rcsb_primary_citation.title")), "AND")
#> $type
#> [1] "group"
#>
#> $logical_operator
#> [1] "and"
#>
#> $nodes
#> $nodes[[1]]
#> $nodes[[1]]$type
#> [1] "terminal"
#>
#> $nodes[[1]]$service
#> [1] "full_text"
#>
#> $nodes[[1]]$parameters
#> $value
#> [1] "kinase"
#>
#> attr(,"class")
#> [1] "DefaultOperator" "list"
#>
#>
#> $nodes[[2]]
#> $nodes[[2]]$type
#> [1] "terminal"
#>
#> $nodes[[2]]$service
#> [1] "text"
#>
#> $nodes[[2]]$parameters
#> $attribute
#> [1] "rcsb_primary_citation.title"
#>
#> $operator
#> [1] "exists"
#>
#> attr(,"class")
#> [1] "ExistsOperator" "list"
RequestOptions(result_start_index = 0, num_results = 10)
#> $paginate
#> $paginate$start
#> [1] 0
#>
#> $paginate$rows
#> [1] 10
#>
#>
#> $sort
#> $sort[[1]]
#> $sort[[1]]$sort_by
#> [1] "score"
#>
#> $sort[[1]]$direction
#> [1] "desc"
ScoredResult("4HHB", 0.98)
#> $entity_id
#> [1] "4HHB"
#>
#> $score
#> [1] 0.98
infer_search_service(StructureOperator("4HHB"))
#> [1] "structure"
infer_id_type(c("4HHB", "4HHB-1", "4HHB_1", "4HHB.A", "ATP"))
#> [1] "ENTRY" "ASSEMBLY" "ENTITY"
#> [4] "INSTANCE" "CHEMICAL_COMPONENT"
parse_rcsb_id("4HHB-1")
#> $raw_id
#> [1] "4HHB-1"
#>
#> $normalized_id
#> [1] "4HHB-1"
#>
#> $id_type
#> [1] "ASSEMBLY"
#>
#> $entry_id
#> [1] "4HHB"
#>
#> $assembly_id
#> [1] "1"
#>
#> $entity_id
#> NULL
#>
#> $instance_id
#> NULL
#>
#> $separator
#> [1] "-"
build_entry_id("4HHB")
#> [1] "4HHB"
build_assembly_id("4HHB", 1)
#> [1] "4HHB-1"
build_entity_id("4HHB", 1)
#> [1] "4HHB_1"
build_instance_id("4HHB", "A")
#> [1] "4HHB.A"
# Metadata helpers
add_property(list(rcsb_entry_info = c("resolution_combined")))
#> $rcsb_entry_info
#> [1] "resolution_combined"
list_rcsb_fields("ENTRY")
#> data_type field subfield
#> 1 ENTRY rcsb_id <NA>
#> 2 ENTRY struct title
#> 3 ENTRY struct_keywords pdbx_keywords
#> 4 ENTRY exptl method
#> 5 ENTRY cell length_a
#> 6 ENTRY cell length_b
#> 7 ENTRY cell length_c
#> 8 ENTRY cell volume
#> 9 ENTRY cell angle_beta
#> 10 ENTRY citation title
#> 11 ENTRY rcsb_primary_citation title
#> 12 ENTRY rcsb_entry_info molecular_weight
#> 13 ENTRY rcsb_entry_info resolution_combined
#> 14 ENTRY rcsb_entry_info polymer_entity_count_DNA
#> 15 ENTRY rcsb_accession_info initial_release_date
#> 16 ENTRY rcsb_accession_info deposit_date
search_rcsb_fields("resolution", data_type = "ENTRY")
#> data_type field subfield
#> 13 ENTRY rcsb_entry_info resolution_combined
validate_properties(
list(rcsb_id = list(), rcsb_entry_info = c("resolution_combined")),
data_type = "ENTRY",
strict = TRUE
)
generate_json_query(c("4HHB"), "ENTRY", list(rcsb_id = list(), struct = c("title")))
#> [1] "{entries(entry_ids: [\"4HHB\"]){rcsb_id , struct {title}}}"
search_graphql(list(query = generate_json_query(c("4HHB"), "ENTRY", list(rcsb_id = list()))))
#> $data
#> $data$entries
#> $data$entries[[1]]
#> $data$entries[[1]]$rcsb_id
#> [1] "4HHB"
fetch_data(generate_json_query(c("4HHB"), "ENTRY", list(rcsb_id = list())), "ENTRY", "4HHB")
#> $data
#> $data$entries
#> $data$entries$`4HHB`
#> $data$entries$`4HHB`$rcsb_id
#> [1] "4HHB"
#>
#>
#>
#>
#> attr(,"class")
#> [1] "rPDBapi_fetch_response" "list"
#> attr(,"ids")
#> [1] "4HHB"
#> attr(,"data_type")
#> [1] "ENTRY"
return_data_as_dataframe(
fetch_data(generate_json_query(c("4HHB"), "ENTRY", list(rcsb_id = list())), "ENTRY", "4HHB"),
"ENTRY",
"4HHB"
)
#> # A tibble: 1 × 1
#> rcsb_id
#> <chr>
#> 1 4HHB
data_fetcher("4HHB", "ENTRY", list(rcsb_id = list(), struct = c("title")))
#> # A tibble: 1 × 2
#> rcsb_id title
#> <chr> <chr>
#> 1 4HHB THE CRYSTAL STRUCTURE OF HUMAN DEOXYHAEMOGLOBIN AT 1.74 ANGSTROMS RES…
data_fetcher_batch(
c("4HHB", "1CRN"),
"ENTRY",
list(rcsb_id = list(), struct = c("title")),
batch_size = 1,
cache = FALSE
)
#> # A tibble: 2 × 2
#> rcsb_id title
#> <chr> <chr>
#> 1 4HHB THE CRYSTAL STRUCTURE OF HUMAN DEOXYHAEMOGLOBIN AT 1.74 ANGSTROMS RES…
#> 2 1CRN WATER STRUCTURE OF A HYDROPHOBIC PROTEIN AT ATOMIC RESOLUTION. PENTAG…
cache_info()
#> $cache_dir
#> [1] "/var/folders/dj/y28dp44x303ggfg6rg8n2v0h0000gn/T//RtmpwNIFML/rPDBapi-cache"
#>
#> $total_entries
#> [1] 0
#>
#> $total_size_bytes
#> [1] 0
#>
#> $entries
#> [1] file size_bytes modified
#> <0 rows> (or 0-length row.names)
clear_rpdbapi_cache()
quietly(get_info("4HHB"))
#> $audit_author
#> name pdbx_ordinal
#> 1 Fermi, G. 1
#> 2 Perutz, M.F. 2
#>
#> $cell
#> $cell$angle_alpha
#> [1] 90
#>
#> $cell$angle_beta
#> [1] 99.34
#>
#> $cell$angle_gamma
#> [1] 90
#>
#> $cell$length_a
#> [1] 63.15
#>
#> $cell$length_b
#> [1] 83.59
#>
#> $cell$length_c
#> [1] 53.8
#>
#> $cell$zpdb
#> [1] 4
#>
#>
#> $citation
#> country id
#> 1 UK primary
#> 2 UK 1
#> 3 US 3
#> 4 UK 4
#> 5 UK 5
#> 6 UK 6
#> 7 <NA> 2
#> 8 <NA> 7
#> 9 <NA> 8
#> journal_abbrev
#> 1 J.Mol.Biol.
#> 2 Nature
#> 3 Annu.Rev.Biochem.
#> 4 J.Mol.Biol.
#> 5 J.Mol.Biol.
#> 6 Nature
#> 7 Haemoglobin and Myoglobin. Atlas of Molecular Structures in Biology
#> 8 Atlas of Protein Sequence and Structure (Data Section)
#> 9 Atlas of Protein Sequence and Structure (Data Section)
#> journal_id_astm journal_id_csd journal_id_issn journal_volume page_first
#> 1 JMOBAK 0070 0022-2836 175 159
#> 2 NATUAS 0006 0028-0836 295 535
#> 3 ARBOAW 0413 0066-4154 48 327
#> 4 JMOBAK 0070 0022-2836 100 3
#> 5 JMOBAK 0070 0022-2836 97 237
#> 6 NATUAS 0006 0028-0836 228 516
#> 7 <NA> 0986 0-19-854706-4 2 <NA>
#> 8 <NA> 0435 0-912466-02-2 5 56
#> 9 <NA> 0435 0-912466-02-2 5 64
#> page_last pdbx_database_id_doi pdbx_database_id_pub_med
#> 1 174 10.1016/0022-2836(84)90472-8 6726807
#> 2 <NA> <NA> NA
#> 3 <NA> <NA> NA
#> 4 <NA> <NA> NA
#> 5 <NA> <NA> NA
#> 6 <NA> <NA> NA
#> 7 <NA> <NA> NA
#> 8 <NA> <NA> NA
#> 9 <NA> <NA> NA
#> rcsb_authors
#> 1 Fermi, G., Perutz, M.F., Shaanan, B., Fourme, R.
#> 2 Perutz, M.F., Hasnain, S.S., Duke, P.J., Sessler, J.L., Hahn, J.E.
#> 3 Perutz, M.F.
#> 4 Teneyck, L.F., Arnone, A.
#> 5 Fermi, G.
#> 6 Muirhead, H., Greer, J.
#> 7 Fermi, G., Perutz, M.F.
#> 8 NULL
#> 9 NULL
#> rcsb_is_primary
#> 1 Y
#> 2 N
#> 3 N
#> 4 N
#> 5 N
#> 6 N
#> 7 N
#> 8 N
#> 9 N
#> rcsb_journal_abbrev
#> 1 J Mol Biology
#> 2 Nature
#> 3 Annu Rev Biochem
#> 4 J Mol Biology
#> 5 J Mol Biology
#> 6 Nature
#> 7 Haemoglobin And Myoglobin Atlas Of Molecular Structures In Biology
#> 8 Atlas Of Protein Sequence And Structure (data Section)
#> 9 Atlas Of Protein Sequence And Structure (data Section)
#> title
#> 1 The crystal structure of human deoxyhaemoglobin at 1.74 A resolution
#> 2 Stereochemistry of Iron in Deoxyhaemoglobin
#> 3 Regulation of Oxygen Affinity of Hemoglobin. Influence of Structure of the Globin on the Heme Iron
#> 4 Three-Dimensional Fourier Synthesis of Human Deoxyhemoglobin at 2.5 Angstroms Resolution, I.X-Ray Analysis
#> 5 Three-Dimensional Fourier Synthesis of Human Deoxyhaemoglobin at 2.5 Angstroms Resolution, Refinement of the Atomic Model
#> 6 Three-Dimensional Fourier Synthesis of Human Deoxyhaemoglobin at 3.5 Angstroms Resolution
#> 7 <NA>
#> 8 <NA>
#> 9 <NA>
#> year book_publisher
#> 1 1984 <NA>
#> 2 1982 <NA>
#> 3 1979 <NA>
#> 4 1976 <NA>
#> 5 1975 <NA>
#> 6 1970 <NA>
#> 7 1981 Oxford University Press
#> 8 1972 National Biomedical Research Foundation, Silver Spring,Md.
#> 9 1972 National Biomedical Research Foundation, Silver Spring,Md.
#>
#> $database2
#> database_code database_id pdbx_doi pdbx_database_accession
#> 1 4HHB PDB 10.2210/pdb4hhb/pdb pdb_00004hhb
#> 2 D_1000179340 WWPDB <NA> <NA>
#>
#> $diffrn
#> crystal_id id
#> 1 1 1
#>
#> $entry
#> $entry$id
#> [1] "4HHB"
#>
#>
#> $exptl
#> method
#> 1 X-RAY DIFFRACTION
#>
#> $exptl_crystal
#> density_matthews density_percent_sol id
#> 1 2.26 45.48 1
#>
#> $pdbx_audit_revision_category
#> category data_content_type ordinal revision_ordinal
#> 1 atom_site Structure model 1 4
#> 2 database_PDB_caveat Structure model 2 4
#> 3 entity Structure model 3 4
#> 4 entity_name_com Structure model 4 4
#> 5 entity_src_gen Structure model 5 4
#> 6 pdbx_database_status Structure model 6 4
#> 7 pdbx_validate_rmsd_angle Structure model 7 4
#> 8 pdbx_validate_rmsd_bond Structure model 8 4
#> 9 struct_ref Structure model 9 4
#> 10 struct_ref_seq Structure model 10 4
#> 11 atom_site Structure model 11 5
#> 12 pdbx_validate_rmsd_angle Structure model 12 5
#> 13 pdbx_validate_rmsd_bond Structure model 13 5
#> 14 struct_site Structure model 14 5
#> 15 atom_site Structure model 15 6
#> 16 atom_sites Structure model 16 6
#> 17 database_2 Structure model 17 6
#> 18 database_PDB_matrix Structure model 18 6
#> 19 pdbx_struct_conn_angle Structure model 19 6
#> 20 pdbx_validate_close_contact Structure model 20 6
#> 21 pdbx_validate_main_chain_plane Structure model 21 6
#> 22 pdbx_validate_peptide_omega Structure model 22 6
#> 23 pdbx_validate_planes Structure model 23 6
#> 24 pdbx_validate_polymer_linkage Structure model 24 6
#> 25 pdbx_validate_rmsd_angle Structure model 25 6
#> 26 pdbx_validate_rmsd_bond Structure model 26 6
#> 27 pdbx_validate_torsion Structure model 27 6
#> 28 struct_ncs_oper Structure model 28 6
#> 29 pdbx_database_remark Structure model 29 7
#> 30 chem_comp_atom Structure model 30 8
#> 31 chem_comp_bond Structure model 31 8
#>
#> $pdbx_audit_revision_details
#> data_content_type ordinal provider revision_ordinal type
#> 1 Structure model 1 repository 1 Initial release
#> 2 Structure model 2 repository 6 Remediation
#> details
#> 1 <NA>
#> 2 Coordinates and associated ncs operations (if present) transformed into standard crystal frame
#>
#> $pdbx_audit_revision_group
#> data_content_type group ordinal revision_ordinal
#> 1 Structure model Version format compliance 1 2
#> 2 Structure model Advisory 2 3
#> 3 Structure model Version format compliance 3 3
#> 4 Structure model Advisory 4 4
#> 5 Structure model Atomic model 5 4
#> 6 Structure model Data collection 6 4
#> 7 Structure model Database references 7 4
#> 8 Structure model Other 8 4
#> 9 Structure model Source and taxonomy 9 4
#> 10 Structure model Structure summary 10 4
#> 11 Structure model Atomic model 11 5
#> 12 Structure model Data collection 12 5
#> 13 Structure model Derived calculations 13 5
#> 14 Structure model Advisory 14 6
#> 15 Structure model Atomic model 15 6
#> 16 Structure model Data collection 16 6
#> 17 Structure model Database references 17 6
#> 18 Structure model Derived calculations 18 6
#> 19 Structure model Other 19 6
#> 20 Structure model Refinement description 20 6
#> 21 Structure model Advisory 21 7
#> 22 Structure model Data collection 22 8
#>
#> $pdbx_audit_revision_history
#> data_content_type major_revision minor_revision ordinal
#> 1 Structure model 1 0 1
#> 2 Structure model 1 1 2
#> 3 Structure model 1 2 3
#> 4 Structure model 2 0 4
#> 5 Structure model 3 0 5
#> 6 Structure model 4 0 6
#> 7 Structure model 4 1 7
#> 8 Structure model 4 2 8
#> revision_date
#> 1 1984-07-17T00:00:00+0000
#> 2 2008-03-03T00:00:00+0000
#> 3 2011-07-13T00:00:00+0000
#> 4 2020-06-17T00:00:00+0000
#> 5 2021-03-31T00:00:00+0000
#> 6 2023-02-08T00:00:00+0000
#> 7 2023-03-15T00:00:00+0000
#> 8 2024-05-22T00:00:00+0000
#>
#> $pdbx_audit_revision_item
#> data_content_type item ordinal
#> 1 Structure model _atom_site.B_iso_or_equiv 1
#> 2 Structure model _atom_site.Cartn_x 2
#> 3 Structure model _atom_site.Cartn_y 3
#> 4 Structure model _atom_site.Cartn_z 4
#> 5 Structure model _entity.pdbx_description 5
#> 6 Structure model _entity_src_gen.gene_src_common_name 6
#> 7 Structure model _entity_src_gen.pdbx_beg_seq_num 7
#> 8 Structure model _entity_src_gen.pdbx_end_seq_num 8
#> 9 Structure model _entity_src_gen.pdbx_gene_src_gene 9
#> 10 Structure model _entity_src_gen.pdbx_seq_type 10
#> 11 Structure model _pdbx_database_status.process_site 11
#> 12 Structure model _pdbx_validate_rmsd_angle.angle_deviation 12
#> 13 Structure model _pdbx_validate_rmsd_angle.angle_value 13
#> 14 Structure model _pdbx_validate_rmsd_bond.bond_deviation 14
#> 15 Structure model _pdbx_validate_rmsd_bond.bond_value 15
#> 16 Structure model _struct_ref.pdbx_align_begin 16
#> 17 Structure model _struct_ref_seq.db_align_beg 17
#> 18 Structure model _struct_ref_seq.db_align_end 18
#> 19 Structure model _atom_site.B_iso_or_equiv 19
#> 20 Structure model _atom_site.Cartn_x 20
#> 21 Structure model _atom_site.Cartn_y 21
#> 22 Structure model _atom_site.Cartn_z 22
#> 23 Structure model _pdbx_validate_rmsd_bond.bond_deviation 23
#> 24 Structure model _pdbx_validate_rmsd_bond.bond_value 24
#> 25 Structure model _struct_site.pdbx_auth_asym_id 25
#> 26 Structure model _struct_site.pdbx_auth_comp_id 26
#> 27 Structure model _struct_site.pdbx_auth_seq_id 27
#> 28 Structure model _atom_site.Cartn_x 28
#> 29 Structure model _atom_site.Cartn_y 29
#> 30 Structure model _atom_site.Cartn_z 30
#> 31 Structure model _atom_sites.fract_transf_matrix[1][1] 31
#> 32 Structure model _atom_sites.fract_transf_matrix[1][2] 32
#> 33 Structure model _atom_sites.fract_transf_matrix[1][3] 33
#> 34 Structure model _atom_sites.fract_transf_matrix[2][1] 34
#> 35 Structure model _atom_sites.fract_transf_matrix[2][2] 35
#> 36 Structure model _atom_sites.fract_transf_matrix[2][3] 36
#> 37 Structure model _atom_sites.fract_transf_matrix[3][1] 37
#> 38 Structure model _atom_sites.fract_transf_matrix[3][2] 38
#> 39 Structure model _atom_sites.fract_transf_matrix[3][3] 39
#> 40 Structure model _atom_sites.fract_transf_vector[1] 40
#> 41 Structure model _atom_sites.fract_transf_vector[2] 41
#> 42 Structure model _atom_sites.fract_transf_vector[3] 42
#> 43 Structure model _database_2.pdbx_DOI 43
#> 44 Structure model _database_2.pdbx_database_accession 44
#> 45 Structure model _database_PDB_matrix.origx[1][1] 45
#> 46 Structure model _database_PDB_matrix.origx[1][2] 46
#> 47 Structure model _database_PDB_matrix.origx[1][3] 47
#> 48 Structure model _database_PDB_matrix.origx[2][1] 48
#> 49 Structure model _database_PDB_matrix.origx[2][2] 49
#> 50 Structure model _database_PDB_matrix.origx[2][3] 50
#> 51 Structure model _database_PDB_matrix.origx[3][1] 51
#> 52 Structure model _database_PDB_matrix.origx[3][2] 52
#> 53 Structure model _database_PDB_matrix.origx[3][3] 53
#> 54 Structure model _database_PDB_matrix.origx_vector[1] 54
#> 55 Structure model _database_PDB_matrix.origx_vector[2] 55
#> 56 Structure model _database_PDB_matrix.origx_vector[3] 56
#> 57 Structure model _pdbx_struct_conn_angle.value 57
#> 58 Structure model _pdbx_validate_close_contact.dist 58
#> 59 Structure model _pdbx_validate_peptide_omega.omega 59
#> 60 Structure model _pdbx_validate_planes.rmsd 60
#> 61 Structure model _pdbx_validate_polymer_linkage.dist 61
#> 62 Structure model _pdbx_validate_torsion.phi 62
#> 63 Structure model _pdbx_validate_torsion.psi 63
#> 64 Structure model _struct_ncs_oper.matrix[1][1] 64
#> 65 Structure model _struct_ncs_oper.matrix[1][2] 65
#> 66 Structure model _struct_ncs_oper.matrix[1][3] 66
#> 67 Structure model _struct_ncs_oper.matrix[2][1] 67
#> 68 Structure model _struct_ncs_oper.matrix[2][2] 68
#> 69 Structure model _struct_ncs_oper.matrix[2][3] 69
#> 70 Structure model _struct_ncs_oper.matrix[3][1] 70
#> 71 Structure model _struct_ncs_oper.matrix[3][2] 71
#> 72 Structure model _struct_ncs_oper.matrix[3][3] 72
#> 73 Structure model _struct_ncs_oper.vector[1] 73
#> 74 Structure model _struct_ncs_oper.vector[2] 74
#> 75 Structure model _struct_ncs_oper.vector[3] 75
#> revision_ordinal
#> 1 4
#> 2 4
#> 3 4
#> 4 4
#> 5 4
#> 6 4
#> 7 4
#> 8 4
#> 9 4
#> 10 4
#> 11 4
#> 12 4
#> 13 4
#> 14 4
#> 15 4
#> 16 4
#> 17 4
#> 18 4
#> 19 5
#> 20 5
#> 21 5
#> 22 5
#> 23 5
#> 24 5
#> 25 5
#> 26 5
#> 27 5
#> 28 6
#> 29 6
#> 30 6
#> 31 6
#> 32 6
#> 33 6
#> 34 6
#> 35 6
#> 36 6
#> 37 6
#> 38 6
#> 39 6
#> 40 6
#> 41 6
#> 42 6
#> 43 6
#> 44 6
#> 45 6
#> 46 6
#> 47 6
#> 48 6
#> 49 6
#> 50 6
#> 51 6
#> 52 6
#> 53 6
#> 54 6
#> 55 6
#> 56 6
#> 57 6
#> 58 6
#> 59 6
#> 60 6
#> 61 6
#> 62 6
#> 63 6
#> 64 6
#> 65 6
#> 66 6
#> 67 6
#> 68 6
#> 69 6
#> 70 6
#> 71 6
#> 72 6
#> 73 6
#> 74 6
#> 75 6
#>
#> $pdbx_database_pdbobs_spr
#> date id pdb_id replace_pdb_id
#> 1 1984-07-17T00:00:00+0000 SPRSDE 4HHB 1HHB
#>
#> $pdbx_database_related
#> content_type db_id db_name
#> 1 unspecified 2HHB PDB
#> 2 unspecified 3HHB PDB
#> 3 unspecified 1GLI PDB
#> details
#> 1 REFINED BY THE METHOD OF JACK AND LEVITT. THIS\n ENTRY PRESENTS THE BEST ESTIMATE OF THE\n COORDINATES.
#> 2 SYMMETRY AVERAGED ABOUT THE (NON-CRYSTALLOGRAPHIC)\n MOLECULAR AXIS AND THEN RE-REGULARIZED BY THE\n ENERGY REFINEMENT METHOD OF LEVITT. THIS ENTRY\n PRESENTS COORDINATES THAT ARE ADEQUATE FOR MOST\n PURPOSES, SUCH AS COMPARISON WITH OTHER STRUCTURES.
#> 3 <NA>
#>
#> $pdbx_database_status
#> $pdbx_database_status$pdb_format_compatible
#> [1] "Y"
#>
#> $pdbx_database_status$process_site
#> [1] "BNL"
#>
#> $pdbx_database_status$recvd_initial_deposition_date
#> [1] "1984-03-07T00:00:00+0000"
#>
#> $pdbx_database_status$status_code
#> [1] "REL"
#>
#>
#> $pdbx_vrpt_summary
#> $pdbx_vrpt_summary$attempted_validation_steps
#> [1] "molprobity,validation-pack,mogul,buster-report,percentiles,writexml,writecif,writepdf"
#>
#> $pdbx_vrpt_summary$ligands_for_buster_report
#> [1] "Y"
#>
#> $pdbx_vrpt_summary$report_creation_date
#> [1] "2023-03-08T06:17:00+0000"
#>
#>
#> $pdbx_vrpt_summary_geometry
#> angles_rmsz bonds_rmsz clashscore num_hreduce num_angles_rmsz num_bonds_rmsz
#> 1 7.11 9.69 141.11 4456 6114 4500
#> percent_ramachandran_outliers percent_rotamer_outliers
#> 1 1.24 8.44
#>
#> $rcsb_accession_info
#> $rcsb_accession_info$deposit_date
#> [1] "1984-03-07T00:00:00+0000"
#>
#> $rcsb_accession_info$has_released_experimental_data
#> [1] "N"
#>
#> $rcsb_accession_info$initial_release_date
#> [1] "1984-07-17T00:00:00+0000"
#>
#> $rcsb_accession_info$major_revision
#> [1] 4
#>
#> $rcsb_accession_info$minor_revision
#> [1] 2
#>
#> $rcsb_accession_info$revision_date
#> [1] "2024-05-22T00:00:00+0000"
#>
#> $rcsb_accession_info$status_code
#> [1] "REL"
#>
#>
#> $rcsb_entry_container_identifiers
#> $rcsb_entry_container_identifiers$assembly_ids
#> [1] "1"
#>
#> $rcsb_entry_container_identifiers$entity_ids
#> [1] "1" "2" "3" "4" "5"
#>
#> $rcsb_entry_container_identifiers$entry_id
#> [1] "4HHB"
#>
#> $rcsb_entry_container_identifiers$model_ids
#> [1] 1
#>
#> $rcsb_entry_container_identifiers$non_polymer_entity_ids
#> [1] "3" "4"
#>
#> $rcsb_entry_container_identifiers$polymer_entity_ids
#> [1] "1" "2"
#>
#> $rcsb_entry_container_identifiers$pubmed_id
#> [1] 6726807
#>
#> $rcsb_entry_container_identifiers$rcsb_id
#> [1] "4HHB"
#>
#>
#> $rcsb_entry_info
#> $rcsb_entry_info$assembly_count
#> [1] 1
#>
#> $rcsb_entry_info$branched_entity_count
#> [1] 0
#>
#> $rcsb_entry_info$cis_peptide_count
#> [1] 0
#>
#> $rcsb_entry_info$deposited_atom_count
#> [1] 4779
#>
#> $rcsb_entry_info$deposited_deuterated_water_count
#> [1] 0
#>
#> $rcsb_entry_info$deposited_hydrogen_atom_count
#> [1] 0
#>
#> $rcsb_entry_info$deposited_model_count
#> [1] 1
#>
#> $rcsb_entry_info$deposited_modeled_polymer_monomer_count
#> [1] 574
#>
#> $rcsb_entry_info$deposited_nonpolymer_entity_instance_count
#> [1] 6
#>
#> $rcsb_entry_info$deposited_polymer_entity_instance_count
#> [1] 4
#>
#> $rcsb_entry_info$deposited_polymer_monomer_count
#> [1] 574
#>
#> $rcsb_entry_info$deposited_solvent_atom_count
#> [1] 221
#>
#> $rcsb_entry_info$deposited_unmodeled_polymer_monomer_count
#> [1] 0
#>
#> $rcsb_entry_info$disulfide_bond_count
#> [1] 0
#>
#> $rcsb_entry_info$entity_count
#> [1] 5
#>
#> $rcsb_entry_info$experimental_method
#> [1] "X-ray"
#>
#> $rcsb_entry_info$experimental_method_count
#> [1] 1
#>
#> $rcsb_entry_info$inter_mol_covalent_bond_count
#> [1] 0
#>
#> $rcsb_entry_info$inter_mol_metalic_bond_count
#> [1] 4
#>
#> $rcsb_entry_info$molecular_weight
#> [1] 64.74
#>
#> $rcsb_entry_info$na_polymer_entity_types
#> [1] "Other"
#>
#> $rcsb_entry_info$nonpolymer_bound_components
#> [1] "HEM"
#>
#> $rcsb_entry_info$nonpolymer_entity_count
#> [1] 2
#>
#> $rcsb_entry_info$nonpolymer_molecular_weight_maximum
#> [1] 0.62
#>
#> $rcsb_entry_info$nonpolymer_molecular_weight_minimum
#> [1] 0.09
#>
#> $rcsb_entry_info$polymer_composition
#> [1] "heteromeric protein"
#>
#> $rcsb_entry_info$polymer_entity_count
#> [1] 2
#>
#> $rcsb_entry_info$polymer_entity_count_dna
#> [1] 0
#>
#> $rcsb_entry_info$polymer_entity_count_rna
#> [1] 0
#>
#> $rcsb_entry_info$polymer_entity_count_nucleic_acid
#> [1] 0
#>
#> $rcsb_entry_info$polymer_entity_count_nucleic_acid_hybrid
#> [1] 0
#>
#> $rcsb_entry_info$polymer_entity_count_protein
#> [1] 2
#>
#> $rcsb_entry_info$polymer_entity_taxonomy_count
#> [1] 2
#>
#> $rcsb_entry_info$polymer_molecular_weight_maximum
#> [1] 15.89
#>
#> $rcsb_entry_info$polymer_molecular_weight_minimum
#> [1] 15.15
#>
#> $rcsb_entry_info$polymer_monomer_count_maximum
#> [1] 146
#>
#> $rcsb_entry_info$polymer_monomer_count_minimum
#> [1] 141
#>
#> $rcsb_entry_info$resolution_combined
#> [1] 1.74
#>
#> $rcsb_entry_info$selected_polymer_entity_types
#> [1] "Protein (only)"
#>
#> $rcsb_entry_info$solvent_entity_count
#> [1] 1
#>
#> $rcsb_entry_info$structure_determination_methodology
#> [1] "experimental"
#>
#> $rcsb_entry_info$structure_determination_methodology_priority
#> [1] 10
#>
#> $rcsb_entry_info$diffrn_resolution_high
#> $rcsb_entry_info$diffrn_resolution_high$provenance_source
#> [1] "From refinement resolution cutoff"
#>
#> $rcsb_entry_info$diffrn_resolution_high$value
#> [1] 1.74
#>
#>
#>
#> $rcsb_primary_citation
#> $rcsb_primary_citation$country
#> [1] "UK"
#>
#> $rcsb_primary_citation$id
#> [1] "primary"
#>
#> $rcsb_primary_citation$journal_abbrev
#> [1] "J.Mol.Biol."
#>
#> $rcsb_primary_citation$journal_id_astm
#> [1] "JMOBAK"
#>
#> $rcsb_primary_citation$journal_id_csd
#> [1] "0070"
#>
#> $rcsb_primary_citation$journal_id_issn
#> [1] "0022-2836"
#>
#> $rcsb_primary_citation$journal_volume
#> [1] "175"
#>
#> $rcsb_primary_citation$page_first
#> [1] "159"
#>
#> $rcsb_primary_citation$page_last
#> [1] "174"
#>
#> $rcsb_primary_citation$pdbx_database_id_doi
#> [1] "10.1016/0022-2836(84)90472-8"
#>
#> $rcsb_primary_citation$pdbx_database_id_pub_med
#> [1] 6726807
#>
#> $rcsb_primary_citation$rcsb_orcididentifiers
#> [1] "?" "?" "?" "?"
#>
#> $rcsb_primary_citation$rcsb_authors
#> [1] "Fermi, G." "Perutz, M.F." "Shaanan, B." "Fourme, R."
#>
#> $rcsb_primary_citation$rcsb_journal_abbrev
#> [1] "J Mol Biology"
#>
#> $rcsb_primary_citation$title
#> [1] "The crystal structure of human deoxyhaemoglobin at 1.74 A resolution"
#>
#> $rcsb_primary_citation$year
#> [1] 1984
#>
#>
#> $refine
#> details
#> 1 THE COORDINATES GIVEN HERE ARE IN THE ORTHOGONAL ANGSTROM\nSYSTEM STANDARD FOR HEMOGLOBINS. THE Y AXIS IS THE\n(NON CRYSTALLOGRAPHIC) MOLECULAR DIAD AND THE X AXIS IS THE\nPSEUDO DIAD WHICH RELATES THE ALPHA-1 AND BETA-1 CHAINS.\nTHE TRANSFORMATION GIVEN IN THE *MTRIX* RECORDS BELOW\nWILL GENERATE COORDINATES FOR THE *C* AND *D* CHAINS FROM\nTHE *A* AND *B* CHAINS RESPECTIVELY.
#> ls_rfactor_rwork ls_dres_high pdbx_diffrn_id pdbx_refine_id
#> 1 0.135 1.74 1 X-RAY DIFFRACTION
#>
#> $refine_hist
#> cycle_id d_res_high number_atoms_solvent number_atoms_total
#> 1 LAST 1.74 221 4779
#> pdbx_number_atoms_ligand pdbx_number_atoms_nucleic_acid
#> 1 174 0
#> pdbx_number_atoms_protein pdbx_refine_id
#> 1 4384 X-RAY DIFFRACTION
#>
#> $struct
#> $struct$title
#> [1] "THE CRYSTAL STRUCTURE OF HUMAN DEOXYHAEMOGLOBIN AT 1.74 ANGSTROMS RESOLUTION"
#>
#>
#> $struct_keywords
#> $struct_keywords$pdbx_keywords
#> [1] "OXYGEN TRANSPORT"
#>
#> $struct_keywords$text
#> [1] "OXYGEN TRANSPORT"
#>
#>
#> $symmetry
#> $symmetry$int_tables_number
#> [1] 4
#>
#> $symmetry$space_group_name_hm
#> [1] "P 1 21 1"
#>
#>
#> $rcsb_id
#> [1] "4HHB"
quietly(find_results("4HHB", field = "struct_keywords"))
#> $`4HHB`
#> $`4HHB`$pdbx_keywords
#> [1] "OXYGEN TRANSPORT"
#>
#> $`4HHB`$text
#> [1] "OXYGEN TRANSPORT"
#>
#>
#> $`1J7W`
#> $`1J7W`$pdbx_keywords
#> [1] "OXYGEN STORAGE/TRANSPORT"
#>
#> $`1J7W`$text
#> [1] "globin, OXYGEN STORAGE-TRANSPORT COMPLEX"
#>
#>
#> $`2W6V`
#> $`2W6V`$pdbx_keywords
#> [1] "OXYGEN TRANSPORT"
#>
#> $`2W6V`$text
#> [1] "OXYGEN TRANSPORT, PACKING DEFECTS, HYDROPHOBIC CAVITIES"
quietly(find_papers("4HHB", max_results = 3))
#> $`4HHB`
#> [1] "The crystal structure of human deoxyhaemoglobin at 1.74 A resolution"
#>
#> $`1J7W`
#> [1] "Control of heme reactivity by diffusion: structural basis and functional characterization in hemoglobin mutants."
#>
#> $`2W6V`
#> [1] "Pattern of Cavities in Globins: The Case of Human Hemoglobin."
describe_chemical("ATP")
#> $chem_comp
#> $chem_comp$formula
#> [1] "C10 H16 N5 O13 P3"
#>
#> $chem_comp$formula_weight
#> [1] 507.181
#>
#> $chem_comp$id
#> [1] "ATP"
#>
#> $chem_comp$name
#> [1] "ADENOSINE-5'-TRIPHOSPHATE"
#>
#> $chem_comp$pdbx_ambiguous_flag
#> [1] "N"
#>
#> $chem_comp$pdbx_formal_charge
#> [1] 0
#>
#> $chem_comp$pdbx_initial_date
#> [1] "1999-07-08T00:00:00+0000"
#>
#> $chem_comp$pdbx_modified_date
#> [1] "2011-06-04T00:00:00+0000"
#>
#> $chem_comp$pdbx_processing_site
#> [1] "EBI"
#>
#> $chem_comp$pdbx_release_status
#> [1] "REL"
#>
#> $chem_comp$three_letter_code
#> [1] "ATP"
#>
#> $chem_comp$type
#> [1] "non-polymer"
#>
#>
#> $pdbx_chem_comp_audit
#> action_type comp_id date ordinal
#> 1 Create component ATP 1999-07-08T00:00:00+0000 1
#> 2 Modify descriptor ATP 2011-06-04T00:00:00+0000 2
#>
#> $pdbx_chem_comp_descriptor
#> comp_id
#> 1 ATP
#> 2 ATP
#> 3 ATP
#> 4 ATP
#> 5 ATP
#> 6 ATP
#> 7 ATP
#> descriptor
#> 1 O=P(O)(O)OP(=O)(O)OP(=O)(O)OCC3OC(n2cnc1c(ncnc12)N)C(O)C3O
#> 2 Nc1ncnc2n(cnc12)[C@@H]3O[C@H](CO[P@](O)(=O)O[P@@](O)(=O)O[P](O)(O)=O)[C@@H](O)[C@H]3O
#> 3 Nc1ncnc2n(cnc12)[CH]3O[CH](CO[P](O)(=O)O[P](O)(=O)O[P](O)(O)=O)[CH](O)[CH]3O
#> 4 c1nc(c2c(n1)n(cn2)[C@H]3[C@@H]([C@@H]([C@H](O3)CO[P@@](=O)(O)O[P@](=O)(O)OP(=O)(O)O)O)O)N
#> 5 c1nc(c2c(n1)n(cn2)C3C(C(C(O3)COP(=O)(O)OP(=O)(O)OP(=O)(O)O)O)O)N
#> 6 InChI=1S/C10H16N5O13P3/c11-8-5-9(13-2-12-8)15(3-14-5)10-7(17)6(16)4(26-10)1-25-30(21,22)28-31(23,24)27-29(18,19)20/h2-4,6-7,10,16-17H,1H2,(H,21,22)(H,23,24)(H2,11,12,13)(H2,18,19,20)/t4-,6-,7-,10-/m1/s1
#> 7 ZKHQWZAMYRWXGA-KQYNXXCUSA-N
#> program program_version type
#> 1 ACDLabs 10.04 SMILES
#> 2 CACTVS 3.341 SMILES_CANONICAL
#> 3 CACTVS 3.341 SMILES
#> 4 OpenEye OEToolkits 1.5.0 SMILES_CANONICAL
#> 5 OpenEye OEToolkits 1.5.0 SMILES
#> 6 InChI 1.03 InChI
#> 7 InChI 1.03 InChIKey
#>
#> $pdbx_chem_comp_identifier
#> comp_id
#> 1 ATP
#> 2 ATP
#> identifier
#> 1 adenosine 5'-(tetrahydrogen triphosphate)
#> 2 [[(2R,3S,4R,5R)-5-(6-aminopurin-9-yl)-3,4-dihydroxy-oxolan-2-yl]methoxy-hydroxy-phosphoryl] phosphono hydrogen phosphate
#> program program_version type
#> 1 ACDLabs 10.04 SYSTEMATIC NAME
#> 2 OpenEye OEToolkits 1.5.0 SYSTEMATIC NAME
#>
#> $rcsb_chem_comp_container_identifiers
#> $rcsb_chem_comp_container_identifiers$comp_id
#> [1] "ATP"
#>
#> $rcsb_chem_comp_container_identifiers$drugbank_id
#> [1] "DB00171"
#>
#> $rcsb_chem_comp_container_identifiers$rcsb_id
#> [1] "ATP"
#>
#>
#> $rcsb_chem_comp_descriptor
#> $rcsb_chem_comp_descriptor$in_ch_i
#> [1] "InChI=1S/C10H16N5O13P3/c11-8-5-9(13-2-12-8)15(3-14-5)10-7(17)6(16)4(26-10)1-25-30(21,22)28-31(23,24)27-29(18,19)20/h2-4,6-7,10,16-17H,1H2,(H,21,22)(H,23,24)(H2,11,12,13)(H2,18,19,20)/t4-,6-,7-,10-/m1/s1"
#>
#> $rcsb_chem_comp_descriptor$in_ch_ikey
#> [1] "ZKHQWZAMYRWXGA-KQYNXXCUSA-N"
#>
#> $rcsb_chem_comp_descriptor$smiles
#> [1] "c1nc(c2c(n1)n(cn2)C3C(C(C(O3)COP(=O)(O)OP(=O)(O)OP(=O)(O)O)O)O)N"
#>
#> $rcsb_chem_comp_descriptor$comp_id
#> [1] "ATP"
#>
#> $rcsb_chem_comp_descriptor$smilesstereo
#> [1] "c1nc(c2c(n1)n(cn2)[C@H]3[C@@H]([C@@H]([C@H](O3)CO[P@@](=O)(O)O[P@](=O)(O)OP(=O)(O)O)O)O)N"
#>
#>
#> $rcsb_chem_comp_info
#> $rcsb_chem_comp_info$atom_count
#> [1] 47
#>
#> $rcsb_chem_comp_info$atom_count_chiral
#> [1] 6
#>
#> $rcsb_chem_comp_info$atom_count_heavy
#> [1] 31
#>
#> $rcsb_chem_comp_info$bond_count
#> [1] 49
#>
#> $rcsb_chem_comp_info$bond_count_aromatic
#> [1] 10
#>
#> $rcsb_chem_comp_info$comp_id
#> [1] "ATP"
#>
#> $rcsb_chem_comp_info$initial_deposition_date
#> [1] "1999-07-08T00:00:00+0000"
#>
#> $rcsb_chem_comp_info$initial_release_date
#> [1] "1982-09-24T00:00:00+0000"
#>
#> $rcsb_chem_comp_info$release_status
#> [1] "REL"
#>
#> $rcsb_chem_comp_info$revision_date
#> [1] "2011-06-04T00:00:00+0000"
#>
#>
#> $rcsb_chem_comp_related
#> comp_id ordinal related_mapping_method resource_accession_code
#> 1 ATP 1 matching InChIKey in DrugBank DB00171
#> 2 ATP 2 matching InChIKey in PubChem 5957
#> 3 ATP 3 assigned by PubChem resource 56-65-5
#> 4 ATP 4 assigned by PubChem resource CHEBI:15422
#> 5 ATP 5 assigned by PubChem resource CHEMBL14249
#> 6 ATP 6 matching ChEMBL ID in Pharos CHEMBL14249
#> resource_name
#> 1 DrugBank
#> 2 PubChem
#> 3 CAS
#> 4 ChEBI
#> 5 ChEMBL
#> 6 Pharos
#>
#> $rcsb_chem_comp_synonyms
#> comp_id
#> 1 ATP
#> 2 ATP
#> 3 ATP
#> 4 ATP
#> 5 ATP
#> 6 ATP
#> 7 ATP
#> 8 ATP
#> 9 ATP
#> name
#> 1 ADENOSINE-5'-TRIPHOSPHATE
#> 2 adenosine 5'-(tetrahydrogen triphosphate)
#> 3 [[(2R,3S,4R,5R)-5-(6-aminopurin-9-yl)-3,4-dihydroxy-oxolan-2-yl]methoxy-hydroxy-phosphoryl] phosphono hydrogen phosphate
#> 4 Adenosine triphosphate disodium
#> 5 Adenosine triphosphate
#> 6 ATP
#> 7 Adenosine-5'-triphosphate
#> 8 Adenosine 5'-triphosphate
#> 9 Adenosine triphosphate disodium trihydrate
#> ordinal provenance_source type
#> 1 1 PDB Reference Data Preferred Name
#> 2 2 ACDLabs Systematic Name
#> 3 3 OpenEye OEToolkits Systematic Name
#> 4 4 DrugBank Synonym
#> 5 5 DrugBank Synonym
#> 6 6 DrugBank Synonym
#> 7 7 DrugBank Synonym
#> 8 8 DrugBank Synonym
#> 9 9 DrugBank Synonym
#>
#> $rcsb_chem_comp_target
#> comp_id interaction_type
#> 1 ATP target
#> 2 ATP target
#> 3 ATP target
#> 4 ATP target
#> 5 ATP target
#> 6 ATP target
#> 7 ATP target
#> 8 ATP target
#> 9 ATP target
#> 10 ATP target
#> 11 ATP target
#> 12 ATP target
#> 13 ATP target
#> 14 ATP target
#> 15 ATP target
#> 16 ATP target
#> 17 ATP target
#> 18 ATP target
#> 19 ATP target
#> 20 ATP target
#> 21 ATP target
#> 22 ATP target
#> 23 ATP target
#> 24 ATP target
#> 25 ATP target
#> 26 ATP target
#> 27 ATP target
#> 28 ATP target
#> 29 ATP target
#> 30 ATP target
#> 31 ATP target
#> 32 ATP target
#> 33 ATP target
#> 34 ATP target
#> 35 ATP target
#> 36 ATP target
#> 37 ATP target
#> 38 ATP target
#> 39 ATP target
#> 40 ATP target
#> 41 ATP target
#> 42 ATP target
#> 43 ATP enzyme
#> name
#> 1 Tyrosine-protein kinase ABL1
#> 2 ATP-binding cassette sub-family C member 6
#> 3 ATP-binding cassette sub-family C member 4
#> 4 Multidrug resistance-associated protein 1
#> 5 Cystic fibrosis transmembrane conductance regulator
#> 6 Phosphatidylinositol 4,5-bisphosphate 3-kinase catalytic subunit alpha isoform
#> 7 Casein kinase II subunit alpha
#> 8 Casein kinase II subunit beta
#> 9 P2Y purinoceptor 11
#> 10 Serine/threonine-protein phosphatase 5
#> 11 Tyrosine-protein kinase ABL2
#> 12 Phospholipid-transporting ATPase ABCA1
#> 13 Acetyl-coenzyme A synthetase, cytoplasmic
#> 14 ALK tyrosine kinase receptor
#> 15 NEDD8-activating enzyme E1 regulatory subunit
#> 16 5'-AMP-activated protein kinase catalytic subunit alpha-1
#> 17 Serine/threonine-protein kinase A-Raf
#> 18 Activin receptor type-1-like
#> 19 Long-chain-fatty-acid--CoA ligase 1
#> 20 Cytosolic purine 5'-nucleotidase
#> 21 ATPase GET3
#> 22 ATP-binding cassette sub-family C member 9
#> 23 RAC-alpha serine/threonine-protein kinase
#> 24 Beta-adrenergic receptor kinase 1
#> 25 Apoptotic protease-activating factor 1
#> 26 Acetyl-coenzyme A synthetase 2-like, mitochondrial
#> 27 Activin receptor type-1B
#> 28 Activin receptor type-1
#> 29 Bile salt export pump
#> 30 Asparagine synthetase [glutamine-hydrolyzing]
#> 31 Cyclin-dependent kinase 15
#> 32 ADP/ATP translocase 1
#> 33 ATP-binding cassette sub-family C member 8
#> 34 Argininosuccinate synthase
#> 35 Mitochondrial inner membrane m-AAA protease component AFG3L2
#> 36 G protein-coupled receptor kinase 3
#> 37 Anti-Muellerian hormone type-2 receptor
#> 38 Activated CDC42 kinase 1
#> 39 Adenylate cyclase type 1
#> 40 ATP-dependent translocase ABCB1
#> 41 ATP-binding cassette sub-family G member 1
#> 42 ATP-binding cassette sub-family C member 2
#> 43 Adenosine kinase
#> ordinal provenance_source reference_database_accession_code
#> 1 1 DrugBank P00519
#> 2 2 DrugBank O95255
#> 3 3 DrugBank O15439
#> 4 4 DrugBank P33527
#> 5 5 DrugBank P13569
#> 6 6 DrugBank P42336
#> 7 7 DrugBank P68400
#> 8 8 DrugBank P67870
#> 9 9 DrugBank Q96G91
#> 10 10 DrugBank P53041
#> 11 11 DrugBank P42684
#> 12 12 DrugBank O95477
#> 13 13 DrugBank Q9NR19
#> 14 14 DrugBank Q9UM73
#> 15 15 DrugBank Q13564
#> 16 16 DrugBank Q13131
#> 17 17 DrugBank P10398
#> 18 18 DrugBank P37023
#> 19 19 DrugBank P33121
#> 20 20 DrugBank P49902
#> 21 21 DrugBank O43681
#> 22 22 DrugBank O60706
#> 23 23 DrugBank P31749
#> 24 24 DrugBank P25098
#> 25 25 DrugBank O14727
#> 26 26 DrugBank Q9NUB1
#> 27 27 DrugBank P36896
#> 28 28 DrugBank Q04771
#> 29 29 DrugBank O95342
#> 30 30 DrugBank P08243
#> 31 31 DrugBank Q96Q40
#> 32 32 DrugBank P12235
#> 33 33 DrugBank Q09428
#> 34 34 DrugBank P00966
#> 35 35 DrugBank Q9Y4W6
#> 36 36 DrugBank P35626
#> 37 37 DrugBank Q16671
#> 38 38 DrugBank Q07912
#> 39 39 DrugBank Q08828
#> 40 40 DrugBank P08183
#> 41 41 DrugBank P45844
#> 42 42 DrugBank Q92887
#> 43 43 DrugBank P55263
#> reference_database_name target_actions
#> 1 UniProt inhibitor
#> 2 UniProt NULL
#> 3 UniProt NULL
#> 4 UniProt NULL
#> 5 UniProt cofactor
#> 6 UniProt NULL
#> 7 UniProt NULL
#> 8 UniProt NULL
#> 9 UniProt NULL
#> 10 UniProt NULL
#> 11 UniProt inhibitor
#> 12 UniProt NULL
#> 13 UniProt NULL
#> 14 UniProt NULL
#> 15 UniProt NULL
#> 16 UniProt NULL
#> 17 UniProt NULL
#> 18 UniProt NULL
#> 19 UniProt NULL
#> 20 UniProt NULL
#> 21 UniProt NULL
#> 22 UniProt NULL
#> 23 UniProt NULL
#> 24 UniProt NULL
#> 25 UniProt NULL
#> 26 UniProt NULL
#> 27 UniProt NULL
#> 28 UniProt NULL
#> 29 UniProt NULL
#> 30 UniProt NULL
#> 31 UniProt NULL
#> 32 UniProt NULL
#> 33 UniProt NULL
#> 34 UniProt NULL
#> 35 UniProt NULL
#> 36 UniProt NULL
#> 37 UniProt NULL
#> 38 UniProt NULL
#> 39 UniProt NULL
#> 40 UniProt NULL
#> 41 UniProt NULL
#> 42 UniProt NULL
#> 43 UniProt substrate
#>
#> $rcsb_id
#> [1] "ATP"
get_fasta_from_rcsb_entry("4HHB")
#> $`4HHB_1|Chains A, C|Hemoglobin subunit alpha|Homo sapiens (9606)`
#> [1] "VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSHGSAQVKGHGKKVADALTNAVAHVDDMPNALSALSDLHAHKLRVDPVNFKLLSHCLLVTLAAHLPAEFTPAVHASLDKFLASVSTVLTSKYR"
#>
#> $`4HHB_2|Chains B, D|Hemoglobin subunit beta|Homo sapiens (9606)`
#> [1] "VHLTPEEKSAVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGNPKVKAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVANALAHKYH"
# Files and low-level HTTP
get_pdb_file("4HHB", filetype = "cif", verbosity = FALSE)
#>
#> Call: get_pdb_file(pdb_id = "4HHB", filetype = "cif", verbosity = FALSE)
#>
#> Total Models#: 1
#> Total Atoms#: 4779, XYZs#: 14337 Chains#: 4 (values: A B C D)
#>
#> Protein Atoms#: 4384 (residues/Calpha atoms#: 574)
#> Nucleic acid Atoms#: 0 (residues/phosphate atoms#: 0)
#>
#> Non-protein/nucleic Atoms#: 395 (residues: 227)
#> Non-protein/nucleic resid values: [ HEM (4), HOH (221), PO4 (2) ]
#>
#> Protein sequence:
#> VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSHGSAQVKGHGK
#> KVADALTNAVAHVDDMPNALSALSDLHAHKLRVDPVNFKLLSHCLLVTLAAHLPAEFTPA
#> VHASLDKFLASVSTVLTSKYRVHLTPEEKSAVTALWGKVNVDEVGGEALGRLLVVYPWTQ
#> RFFESFGDLSTPDAVMGNPKVKAHGKKVLGAFSDGLAHLDNLKGT...<cut>...HKYH
#>
#> + attr: atom, xyz, calpha, call, path
get_pdb_api_url("core/entry/", "4HHB")
#> [1] "https://data.rcsb.org/rest/v1/core/entry/4HHB"
resp <- send_api_request(get_pdb_api_url("core/entry/", "4HHB"), verbosity = FALSE)
handle_api_errors(resp, get_pdb_api_url("core/entry/", "4HHB"))
parse_response(resp, format = "json")
#> $audit_author
#> name pdbx_ordinal
#> 1 Fermi, G. 1
#> 2 Perutz, M.F. 2
#>
#> $cell
#> $cell$angle_alpha
#> [1] 90
#>
#> $cell$angle_beta
#> [1] 99.34
#>
#> $cell$angle_gamma
#> [1] 90
#>
#> $cell$length_a
#> [1] 63.15
#>
#> $cell$length_b
#> [1] 83.59
#>
#> $cell$length_c
#> [1] 53.8
#>
#> $cell$zpdb
#> [1] 4
#>
#>
#> $citation
#> country id
#> 1 UK primary
#> 2 UK 1
#> 3 US 3
#> 4 UK 4
#> 5 UK 5
#> 6 UK 6
#> 7 <NA> 2
#> 8 <NA> 7
#> 9 <NA> 8
#> journal_abbrev
#> 1 J.Mol.Biol.
#> 2 Nature
#> 3 Annu.Rev.Biochem.
#> 4 J.Mol.Biol.
#> 5 J.Mol.Biol.
#> 6 Nature
#> 7 Haemoglobin and Myoglobin. Atlas of Molecular Structures in Biology
#> 8 Atlas of Protein Sequence and Structure (Data Section)
#> 9 Atlas of Protein Sequence and Structure (Data Section)
#> journal_id_astm journal_id_csd journal_id_issn journal_volume page_first
#> 1 JMOBAK 0070 0022-2836 175 159
#> 2 NATUAS 0006 0028-0836 295 535
#> 3 ARBOAW 0413 0066-4154 48 327
#> 4 JMOBAK 0070 0022-2836 100 3
#> 5 JMOBAK 0070 0022-2836 97 237
#> 6 NATUAS 0006 0028-0836 228 516
#> 7 <NA> 0986 0-19-854706-4 2 <NA>
#> 8 <NA> 0435 0-912466-02-2 5 56
#> 9 <NA> 0435 0-912466-02-2 5 64
#> page_last pdbx_database_id_doi pdbx_database_id_pub_med
#> 1 174 10.1016/0022-2836(84)90472-8 6726807
#> 2 <NA> <NA> NA
#> 3 <NA> <NA> NA
#> 4 <NA> <NA> NA
#> 5 <NA> <NA> NA
#> 6 <NA> <NA> NA
#> 7 <NA> <NA> NA
#> 8 <NA> <NA> NA
#> 9 <NA> <NA> NA
#> rcsb_authors
#> 1 Fermi, G., Perutz, M.F., Shaanan, B., Fourme, R.
#> 2 Perutz, M.F., Hasnain, S.S., Duke, P.J., Sessler, J.L., Hahn, J.E.
#> 3 Perutz, M.F.
#> 4 Teneyck, L.F., Arnone, A.
#> 5 Fermi, G.
#> 6 Muirhead, H., Greer, J.
#> 7 Fermi, G., Perutz, M.F.
#> 8 NULL
#> 9 NULL
#> rcsb_is_primary
#> 1 Y
#> 2 N
#> 3 N
#> 4 N
#> 5 N
#> 6 N
#> 7 N
#> 8 N
#> 9 N
#> rcsb_journal_abbrev
#> 1 J Mol Biology
#> 2 Nature
#> 3 Annu Rev Biochem
#> 4 J Mol Biology
#> 5 J Mol Biology
#> 6 Nature
#> 7 Haemoglobin And Myoglobin Atlas Of Molecular Structures In Biology
#> 8 Atlas Of Protein Sequence And Structure (data Section)
#> 9 Atlas Of Protein Sequence And Structure (data Section)
#> title
#> 1 The crystal structure of human deoxyhaemoglobin at 1.74 A resolution
#> 2 Stereochemistry of Iron in Deoxyhaemoglobin
#> 3 Regulation of Oxygen Affinity of Hemoglobin. Influence of Structure of the Globin on the Heme Iron
#> 4 Three-Dimensional Fourier Synthesis of Human Deoxyhemoglobin at 2.5 Angstroms Resolution, I.X-Ray Analysis
#> 5 Three-Dimensional Fourier Synthesis of Human Deoxyhaemoglobin at 2.5 Angstroms Resolution, Refinement of the Atomic Model
#> 6 Three-Dimensional Fourier Synthesis of Human Deoxyhaemoglobin at 3.5 Angstroms Resolution
#> 7 <NA>
#> 8 <NA>
#> 9 <NA>
#> year book_publisher
#> 1 1984 <NA>
#> 2 1982 <NA>
#> 3 1979 <NA>
#> 4 1976 <NA>
#> 5 1975 <NA>
#> 6 1970 <NA>
#> 7 1981 Oxford University Press
#> 8 1972 National Biomedical Research Foundation, Silver Spring,Md.
#> 9 1972 National Biomedical Research Foundation, Silver Spring,Md.
#>
#> $database2
#> database_code database_id pdbx_doi pdbx_database_accession
#> 1 4HHB PDB 10.2210/pdb4hhb/pdb pdb_00004hhb
#> 2 D_1000179340 WWPDB <NA> <NA>
#>
#> $diffrn
#> crystal_id id
#> 1 1 1
#>
#> $entry
#> $entry$id
#> [1] "4HHB"
#>
#>
#> $exptl
#> method
#> 1 X-RAY DIFFRACTION
#>
#> $exptl_crystal
#> density_matthews density_percent_sol id
#> 1 2.26 45.48 1
#>
#> $pdbx_audit_revision_category
#> category data_content_type ordinal revision_ordinal
#> 1 atom_site Structure model 1 4
#> 2 database_PDB_caveat Structure model 2 4
#> 3 entity Structure model 3 4
#> 4 entity_name_com Structure model 4 4
#> 5 entity_src_gen Structure model 5 4
#> 6 pdbx_database_status Structure model 6 4
#> 7 pdbx_validate_rmsd_angle Structure model 7 4
#> 8 pdbx_validate_rmsd_bond Structure model 8 4
#> 9 struct_ref Structure model 9 4
#> 10 struct_ref_seq Structure model 10 4
#> 11 atom_site Structure model 11 5
#> 12 pdbx_validate_rmsd_angle Structure model 12 5
#> 13 pdbx_validate_rmsd_bond Structure model 13 5
#> 14 struct_site Structure model 14 5
#> 15 atom_site Structure model 15 6
#> 16 atom_sites Structure model 16 6
#> 17 database_2 Structure model 17 6
#> 18 database_PDB_matrix Structure model 18 6
#> 19 pdbx_struct_conn_angle Structure model 19 6
#> 20 pdbx_validate_close_contact Structure model 20 6
#> 21 pdbx_validate_main_chain_plane Structure model 21 6
#> 22 pdbx_validate_peptide_omega Structure model 22 6
#> 23 pdbx_validate_planes Structure model 23 6
#> 24 pdbx_validate_polymer_linkage Structure model 24 6
#> 25 pdbx_validate_rmsd_angle Structure model 25 6
#> 26 pdbx_validate_rmsd_bond Structure model 26 6
#> 27 pdbx_validate_torsion Structure model 27 6
#> 28 struct_ncs_oper Structure model 28 6
#> 29 pdbx_database_remark Structure model 29 7
#> 30 chem_comp_atom Structure model 30 8
#> 31 chem_comp_bond Structure model 31 8
#>
#> $pdbx_audit_revision_details
#> data_content_type ordinal provider revision_ordinal type
#> 1 Structure model 1 repository 1 Initial release
#> 2 Structure model 2 repository 6 Remediation
#> details
#> 1 <NA>
#> 2 Coordinates and associated ncs operations (if present) transformed into standard crystal frame
#>
#> $pdbx_audit_revision_group
#> data_content_type group ordinal revision_ordinal
#> 1 Structure model Version format compliance 1 2
#> 2 Structure model Advisory 2 3
#> 3 Structure model Version format compliance 3 3
#> 4 Structure model Advisory 4 4
#> 5 Structure model Atomic model 5 4
#> 6 Structure model Data collection 6 4
#> 7 Structure model Database references 7 4
#> 8 Structure model Other 8 4
#> 9 Structure model Source and taxonomy 9 4
#> 10 Structure model Structure summary 10 4
#> 11 Structure model Atomic model 11 5
#> 12 Structure model Data collection 12 5
#> 13 Structure model Derived calculations 13 5
#> 14 Structure model Advisory 14 6
#> 15 Structure model Atomic model 15 6
#> 16 Structure model Data collection 16 6
#> 17 Structure model Database references 17 6
#> 18 Structure model Derived calculations 18 6
#> 19 Structure model Other 19 6
#> 20 Structure model Refinement description 20 6
#> 21 Structure model Advisory 21 7
#> 22 Structure model Data collection 22 8
#>
#> $pdbx_audit_revision_history
#> data_content_type major_revision minor_revision ordinal
#> 1 Structure model 1 0 1
#> 2 Structure model 1 1 2
#> 3 Structure model 1 2 3
#> 4 Structure model 2 0 4
#> 5 Structure model 3 0 5
#> 6 Structure model 4 0 6
#> 7 Structure model 4 1 7
#> 8 Structure model 4 2 8
#> revision_date
#> 1 1984-07-17T00:00:00+0000
#> 2 2008-03-03T00:00:00+0000
#> 3 2011-07-13T00:00:00+0000
#> 4 2020-06-17T00:00:00+0000
#> 5 2021-03-31T00:00:00+0000
#> 6 2023-02-08T00:00:00+0000
#> 7 2023-03-15T00:00:00+0000
#> 8 2024-05-22T00:00:00+0000
#>
#> $pdbx_audit_revision_item
#> data_content_type item ordinal
#> 1 Structure model _atom_site.B_iso_or_equiv 1
#> 2 Structure model _atom_site.Cartn_x 2
#> 3 Structure model _atom_site.Cartn_y 3
#> 4 Structure model _atom_site.Cartn_z 4
#> 5 Structure model _entity.pdbx_description 5
#> 6 Structure model _entity_src_gen.gene_src_common_name 6
#> 7 Structure model _entity_src_gen.pdbx_beg_seq_num 7
#> 8 Structure model _entity_src_gen.pdbx_end_seq_num 8
#> 9 Structure model _entity_src_gen.pdbx_gene_src_gene 9
#> 10 Structure model _entity_src_gen.pdbx_seq_type 10
#> 11 Structure model _pdbx_database_status.process_site 11
#> 12 Structure model _pdbx_validate_rmsd_angle.angle_deviation 12
#> 13 Structure model _pdbx_validate_rmsd_angle.angle_value 13
#> 14 Structure model _pdbx_validate_rmsd_bond.bond_deviation 14
#> 15 Structure model _pdbx_validate_rmsd_bond.bond_value 15
#> 16 Structure model _struct_ref.pdbx_align_begin 16
#> 17 Structure model _struct_ref_seq.db_align_beg 17
#> 18 Structure model _struct_ref_seq.db_align_end 18
#> 19 Structure model _atom_site.B_iso_or_equiv 19
#> 20 Structure model _atom_site.Cartn_x 20
#> 21 Structure model _atom_site.Cartn_y 21
#> 22 Structure model _atom_site.Cartn_z 22
#> 23 Structure model _pdbx_validate_rmsd_bond.bond_deviation 23
#> 24 Structure model _pdbx_validate_rmsd_bond.bond_value 24
#> 25 Structure model _struct_site.pdbx_auth_asym_id 25
#> 26 Structure model _struct_site.pdbx_auth_comp_id 26
#> 27 Structure model _struct_site.pdbx_auth_seq_id 27
#> 28 Structure model _atom_site.Cartn_x 28
#> 29 Structure model _atom_site.Cartn_y 29
#> 30 Structure model _atom_site.Cartn_z 30
#> 31 Structure model _atom_sites.fract_transf_matrix[1][1] 31
#> 32 Structure model _atom_sites.fract_transf_matrix[1][2] 32
#> 33 Structure model _atom_sites.fract_transf_matrix[1][3] 33
#> 34 Structure model _atom_sites.fract_transf_matrix[2][1] 34
#> 35 Structure model _atom_sites.fract_transf_matrix[2][2] 35
#> 36 Structure model _atom_sites.fract_transf_matrix[2][3] 36
#> 37 Structure model _atom_sites.fract_transf_matrix[3][1] 37
#> 38 Structure model _atom_sites.fract_transf_matrix[3][2] 38
#> 39 Structure model _atom_sites.fract_transf_matrix[3][3] 39
#> 40 Structure model _atom_sites.fract_transf_vector[1] 40
#> 41 Structure model _atom_sites.fract_transf_vector[2] 41
#> 42 Structure model _atom_sites.fract_transf_vector[3] 42
#> 43 Structure model _database_2.pdbx_DOI 43
#> 44 Structure model _database_2.pdbx_database_accession 44
#> 45 Structure model _database_PDB_matrix.origx[1][1] 45
#> 46 Structure model _database_PDB_matrix.origx[1][2] 46
#> 47 Structure model _database_PDB_matrix.origx[1][3] 47
#> 48 Structure model _database_PDB_matrix.origx[2][1] 48
#> 49 Structure model _database_PDB_matrix.origx[2][2] 49
#> 50 Structure model _database_PDB_matrix.origx[2][3] 50
#> 51 Structure model _database_PDB_matrix.origx[3][1] 51
#> 52 Structure model _database_PDB_matrix.origx[3][2] 52
#> 53 Structure model _database_PDB_matrix.origx[3][3] 53
#> 54 Structure model _database_PDB_matrix.origx_vector[1] 54
#> 55 Structure model _database_PDB_matrix.origx_vector[2] 55
#> 56 Structure model _database_PDB_matrix.origx_vector[3] 56
#> 57 Structure model _pdbx_struct_conn_angle.value 57
#> 58 Structure model _pdbx_validate_close_contact.dist 58
#> 59 Structure model _pdbx_validate_peptide_omega.omega 59
#> 60 Structure model _pdbx_validate_planes.rmsd 60
#> 61 Structure model _pdbx_validate_polymer_linkage.dist 61
#> 62 Structure model _pdbx_validate_torsion.phi 62
#> 63 Structure model _pdbx_validate_torsion.psi 63
#> 64 Structure model _struct_ncs_oper.matrix[1][1] 64
#> 65 Structure model _struct_ncs_oper.matrix[1][2] 65
#> 66 Structure model _struct_ncs_oper.matrix[1][3] 66
#> 67 Structure model _struct_ncs_oper.matrix[2][1] 67
#> 68 Structure model _struct_ncs_oper.matrix[2][2] 68
#> 69 Structure model _struct_ncs_oper.matrix[2][3] 69
#> 70 Structure model _struct_ncs_oper.matrix[3][1] 70
#> 71 Structure model _struct_ncs_oper.matrix[3][2] 71
#> 72 Structure model _struct_ncs_oper.matrix[3][3] 72
#> 73 Structure model _struct_ncs_oper.vector[1] 73
#> 74 Structure model _struct_ncs_oper.vector[2] 74
#> 75 Structure model _struct_ncs_oper.vector[3] 75
#> revision_ordinal
#> 1 4
#> 2 4
#> 3 4
#> 4 4
#> 5 4
#> 6 4
#> 7 4
#> 8 4
#> 9 4
#> 10 4
#> 11 4
#> 12 4
#> 13 4
#> 14 4
#> 15 4
#> 16 4
#> 17 4
#> 18 4
#> 19 5
#> 20 5
#> 21 5
#> 22 5
#> 23 5
#> 24 5
#> 25 5
#> 26 5
#> 27 5
#> 28 6
#> 29 6
#> 30 6
#> 31 6
#> 32 6
#> 33 6
#> 34 6
#> 35 6
#> 36 6
#> 37 6
#> 38 6
#> 39 6
#> 40 6
#> 41 6
#> 42 6
#> 43 6
#> 44 6
#> 45 6
#> 46 6
#> 47 6
#> 48 6
#> 49 6
#> 50 6
#> 51 6
#> 52 6
#> 53 6
#> 54 6
#> 55 6
#> 56 6
#> 57 6
#> 58 6
#> 59 6
#> 60 6
#> 61 6
#> 62 6
#> 63 6
#> 64 6
#> 65 6
#> 66 6
#> 67 6
#> 68 6
#> 69 6
#> 70 6
#> 71 6
#> 72 6
#> 73 6
#> 74 6
#> 75 6
#>
#> $pdbx_database_pdbobs_spr
#> date id pdb_id replace_pdb_id
#> 1 1984-07-17T00:00:00+0000 SPRSDE 4HHB 1HHB
#>
#> $pdbx_database_related
#> content_type db_id db_name
#> 1 unspecified 2HHB PDB
#> 2 unspecified 3HHB PDB
#> 3 unspecified 1GLI PDB
#> details
#> 1 REFINED BY THE METHOD OF JACK AND LEVITT. THIS\n ENTRY PRESENTS THE BEST ESTIMATE OF THE\n COORDINATES.
#> 2 SYMMETRY AVERAGED ABOUT THE (NON-CRYSTALLOGRAPHIC)\n MOLECULAR AXIS AND THEN RE-REGULARIZED BY THE\n ENERGY REFINEMENT METHOD OF LEVITT. THIS ENTRY\n PRESENTS COORDINATES THAT ARE ADEQUATE FOR MOST\n PURPOSES, SUCH AS COMPARISON WITH OTHER STRUCTURES.
#> 3 <NA>
#>
#> $pdbx_database_status
#> $pdbx_database_status$pdb_format_compatible
#> [1] "Y"
#>
#> $pdbx_database_status$process_site
#> [1] "BNL"
#>
#> $pdbx_database_status$recvd_initial_deposition_date
#> [1] "1984-03-07T00:00:00+0000"
#>
#> $pdbx_database_status$status_code
#> [1] "REL"
#>
#>
#> $pdbx_vrpt_summary
#> $pdbx_vrpt_summary$attempted_validation_steps
#> [1] "molprobity,validation-pack,mogul,buster-report,percentiles,writexml,writecif,writepdf"
#>
#> $pdbx_vrpt_summary$ligands_for_buster_report
#> [1] "Y"
#>
#> $pdbx_vrpt_summary$report_creation_date
#> [1] "2023-03-08T06:17:00+0000"
#>
#>
#> $pdbx_vrpt_summary_geometry
#> angles_rmsz bonds_rmsz clashscore num_hreduce num_angles_rmsz num_bonds_rmsz
#> 1 7.11 9.69 141.11 4456 6114 4500
#> percent_ramachandran_outliers percent_rotamer_outliers
#> 1 1.24 8.44
#>
#> $rcsb_accession_info
#> $rcsb_accession_info$deposit_date
#> [1] "1984-03-07T00:00:00+0000"
#>
#> $rcsb_accession_info$has_released_experimental_data
#> [1] "N"
#>
#> $rcsb_accession_info$initial_release_date
#> [1] "1984-07-17T00:00:00+0000"
#>
#> $rcsb_accession_info$major_revision
#> [1] 4
#>
#> $rcsb_accession_info$minor_revision
#> [1] 2
#>
#> $rcsb_accession_info$revision_date
#> [1] "2024-05-22T00:00:00+0000"
#>
#> $rcsb_accession_info$status_code
#> [1] "REL"
#>
#>
#> $rcsb_entry_container_identifiers
#> $rcsb_entry_container_identifiers$assembly_ids
#> [1] "1"
#>
#> $rcsb_entry_container_identifiers$entity_ids
#> [1] "1" "2" "3" "4" "5"
#>
#> $rcsb_entry_container_identifiers$entry_id
#> [1] "4HHB"
#>
#> $rcsb_entry_container_identifiers$model_ids
#> [1] 1
#>
#> $rcsb_entry_container_identifiers$non_polymer_entity_ids
#> [1] "3" "4"
#>
#> $rcsb_entry_container_identifiers$polymer_entity_ids
#> [1] "1" "2"
#>
#> $rcsb_entry_container_identifiers$pubmed_id
#> [1] 6726807
#>
#> $rcsb_entry_container_identifiers$rcsb_id
#> [1] "4HHB"
#>
#>
#> $rcsb_entry_info
#> $rcsb_entry_info$assembly_count
#> [1] 1
#>
#> $rcsb_entry_info$branched_entity_count
#> [1] 0
#>
#> $rcsb_entry_info$cis_peptide_count
#> [1] 0
#>
#> $rcsb_entry_info$deposited_atom_count
#> [1] 4779
#>
#> $rcsb_entry_info$deposited_deuterated_water_count
#> [1] 0
#>
#> $rcsb_entry_info$deposited_hydrogen_atom_count
#> [1] 0
#>
#> $rcsb_entry_info$deposited_model_count
#> [1] 1
#>
#> $rcsb_entry_info$deposited_modeled_polymer_monomer_count
#> [1] 574
#>
#> $rcsb_entry_info$deposited_nonpolymer_entity_instance_count
#> [1] 6
#>
#> $rcsb_entry_info$deposited_polymer_entity_instance_count
#> [1] 4
#>
#> $rcsb_entry_info$deposited_polymer_monomer_count
#> [1] 574
#>
#> $rcsb_entry_info$deposited_solvent_atom_count
#> [1] 221
#>
#> $rcsb_entry_info$deposited_unmodeled_polymer_monomer_count
#> [1] 0
#>
#> $rcsb_entry_info$disulfide_bond_count
#> [1] 0
#>
#> $rcsb_entry_info$entity_count
#> [1] 5
#>
#> $rcsb_entry_info$experimental_method
#> [1] "X-ray"
#>
#> $rcsb_entry_info$experimental_method_count
#> [1] 1
#>
#> $rcsb_entry_info$inter_mol_covalent_bond_count
#> [1] 0
#>
#> $rcsb_entry_info$inter_mol_metalic_bond_count
#> [1] 4
#>
#> $rcsb_entry_info$molecular_weight
#> [1] 64.74
#>
#> $rcsb_entry_info$na_polymer_entity_types
#> [1] "Other"
#>
#> $rcsb_entry_info$nonpolymer_bound_components
#> [1] "HEM"
#>
#> $rcsb_entry_info$nonpolymer_entity_count
#> [1] 2
#>
#> $rcsb_entry_info$nonpolymer_molecular_weight_maximum
#> [1] 0.62
#>
#> $rcsb_entry_info$nonpolymer_molecular_weight_minimum
#> [1] 0.09
#>
#> $rcsb_entry_info$polymer_composition
#> [1] "heteromeric protein"
#>
#> $rcsb_entry_info$polymer_entity_count
#> [1] 2
#>
#> $rcsb_entry_info$polymer_entity_count_dna
#> [1] 0
#>
#> $rcsb_entry_info$polymer_entity_count_rna
#> [1] 0
#>
#> $rcsb_entry_info$polymer_entity_count_nucleic_acid
#> [1] 0
#>
#> $rcsb_entry_info$polymer_entity_count_nucleic_acid_hybrid
#> [1] 0
#>
#> $rcsb_entry_info$polymer_entity_count_protein
#> [1] 2
#>
#> $rcsb_entry_info$polymer_entity_taxonomy_count
#> [1] 2
#>
#> $rcsb_entry_info$polymer_molecular_weight_maximum
#> [1] 15.89
#>
#> $rcsb_entry_info$polymer_molecular_weight_minimum
#> [1] 15.15
#>
#> $rcsb_entry_info$polymer_monomer_count_maximum
#> [1] 146
#>
#> $rcsb_entry_info$polymer_monomer_count_minimum
#> [1] 141
#>
#> $rcsb_entry_info$resolution_combined
#> [1] 1.74
#>
#> $rcsb_entry_info$selected_polymer_entity_types
#> [1] "Protein (only)"
#>
#> $rcsb_entry_info$solvent_entity_count
#> [1] 1
#>
#> $rcsb_entry_info$structure_determination_methodology
#> [1] "experimental"
#>
#> $rcsb_entry_info$structure_determination_methodology_priority
#> [1] 10
#>
#> $rcsb_entry_info$diffrn_resolution_high
#> $rcsb_entry_info$diffrn_resolution_high$provenance_source
#> [1] "From refinement resolution cutoff"
#>
#> $rcsb_entry_info$diffrn_resolution_high$value
#> [1] 1.74
#>
#>
#>
#> $rcsb_primary_citation
#> $rcsb_primary_citation$country
#> [1] "UK"
#>
#> $rcsb_primary_citation$id
#> [1] "primary"
#>
#> $rcsb_primary_citation$journal_abbrev
#> [1] "J.Mol.Biol."
#>
#> $rcsb_primary_citation$journal_id_astm
#> [1] "JMOBAK"
#>
#> $rcsb_primary_citation$journal_id_csd
#> [1] "0070"
#>
#> $rcsb_primary_citation$journal_id_issn
#> [1] "0022-2836"
#>
#> $rcsb_primary_citation$journal_volume
#> [1] "175"
#>
#> $rcsb_primary_citation$page_first
#> [1] "159"
#>
#> $rcsb_primary_citation$page_last
#> [1] "174"
#>
#> $rcsb_primary_citation$pdbx_database_id_doi
#> [1] "10.1016/0022-2836(84)90472-8"
#>
#> $rcsb_primary_citation$pdbx_database_id_pub_med
#> [1] 6726807
#>
#> $rcsb_primary_citation$rcsb_orcididentifiers
#> [1] "?" "?" "?" "?"
#>
#> $rcsb_primary_citation$rcsb_authors
#> [1] "Fermi, G." "Perutz, M.F." "Shaanan, B." "Fourme, R."
#>
#> $rcsb_primary_citation$rcsb_journal_abbrev
#> [1] "J Mol Biology"
#>
#> $rcsb_primary_citation$title
#> [1] "The crystal structure of human deoxyhaemoglobin at 1.74 A resolution"
#>
#> $rcsb_primary_citation$year
#> [1] 1984
#>
#>
#> $refine
#> details
#> 1 THE COORDINATES GIVEN HERE ARE IN THE ORTHOGONAL ANGSTROM\nSYSTEM STANDARD FOR HEMOGLOBINS. THE Y AXIS IS THE\n(NON CRYSTALLOGRAPHIC) MOLECULAR DIAD AND THE X AXIS IS THE\nPSEUDO DIAD WHICH RELATES THE ALPHA-1 AND BETA-1 CHAINS.\nTHE TRANSFORMATION GIVEN IN THE *MTRIX* RECORDS BELOW\nWILL GENERATE COORDINATES FOR THE *C* AND *D* CHAINS FROM\nTHE *A* AND *B* CHAINS RESPECTIVELY.
#> ls_rfactor_rwork ls_dres_high pdbx_diffrn_id pdbx_refine_id
#> 1 0.135 1.74 1 X-RAY DIFFRACTION
#>
#> $refine_hist
#> cycle_id d_res_high number_atoms_solvent number_atoms_total
#> 1 LAST 1.74 221 4779
#> pdbx_number_atoms_ligand pdbx_number_atoms_nucleic_acid
#> 1 174 0
#> pdbx_number_atoms_protein pdbx_refine_id
#> 1 4384 X-RAY DIFFRACTION
#>
#> $struct
#> $struct$title
#> [1] "THE CRYSTAL STRUCTURE OF HUMAN DEOXYHAEMOGLOBIN AT 1.74 ANGSTROMS RESOLUTION"
#>
#>
#> $struct_keywords
#> $struct_keywords$pdbx_keywords
#> [1] "OXYGEN TRANSPORT"
#>
#> $struct_keywords$text
#> [1] "OXYGEN TRANSPORT"
#>
#>
#> $symmetry
#> $symmetry$int_tables_number
#> [1] 4
#>
#> $symmetry$space_group_name_hm
#> [1] "P 1 21 1"
#>
#>
#> $rcsb_id
#> [1] "4HHB"
# Object wrappers and analysis helpers
as_rpdb_entry(data.frame(rcsb_id = "4HHB"))
#> <rPDBapi_entry> with data class: data.frame
as_rpdb_assembly(data.frame(rcsb_id = "4HHB-1"))
#> <rPDBapi_assembly> with data class: data.frame
as_rpdb_polymer_entity(data.frame(rcsb_id = "4HHB_1"))
#> <rPDBapi_polymer_entity> with data class: data.frame
as_rpdb_chemical_component(data.frame(rcsb_id = "ATP"))
#> <rPDBapi_chemical_component> with data class: data.frame
as_rpdb_structure(get_pdb_file("4HHB", filetype = "cif", verbosity = FALSE))
#> <rPDBapi_structure> with data class: pdb
summarize_entries(data.frame(method = "X-RAY DIFFRACTION", resolution_combined = "1.8"))
#> # A tibble: 1 × 4
#> n_entries n_methods best_resolution median_molecular_weight
#> <int> <int> <dbl> <dbl>
#> 1 1 1 1.8 NA
summarize_assemblies(data.frame(oligomeric_count = "2", symbol = "C2"))
#> # A tibble: 1 × 3
#> n_assemblies median_oligomeric_count n_symmetry_labels
#> <int> <dbl> <int>
#> 1 1 2 1
extract_taxonomy_table(data.frame(rcsb_id = "4HHB_1", ncbi_taxonomy_id = "9606"))
#> # A tibble: 1 × 2
#> rcsb_id ncbi_taxonomy_id
#> <chr> <chr>
#> 1 4HHB_1 9606
extract_ligand_table(data.frame(rcsb_id = "ATP", formula_weight = "507.18"))
#> # A tibble: 1 × 2
#> rcsb_id formula_weight
#> <chr> <chr>
#> 1 ATP 507.18
extract_calpha_coordinates(get_pdb_file("4HHB", filetype = "cif", verbosity = FALSE))
#> # A tibble: 574 × 6
#> chain resno resid x y z
#> <chr> <int> <chr> <dbl> <dbl> <dbl>
#> 1 A 1 VAL 20.1 30.5 42.4
#> 2 A 2 LEU 23.8 30.0 41.9
#> 3 A 3 SER 25.9 31.9 44.4
#> 4 A 4 PRO 28.9 33.4 43.2
#> 5 A 5 ALA 31.1 30.9 44.6
#> 6 A 6 ASP 28.9 28.2 42.5
#> 7 A 7 LYS 30.1 30.2 39.5
#> 8 A 8 THR 33.6 29.9 40.4
#> 9 A 9 ASN 33.3 26.4 40.9
#> 10 A 10 VAL 31.5 25.6 37.6
#> # ℹ 564 more rows
join_structure_sequence(
get_pdb_file("4HHB", filetype = "cif", verbosity = FALSE),
get_fasta_from_rcsb_entry("4HHB")
)
#> # A tibble: 2 × 5
#> sequence_header sequence chain sequence_length n_calpha
#> <chr> <chr> <chr> <int> <int>
#> 1 4HHB_1|Chains A, C|Hemoglobin subunit… VLSPADK… s 141 NA
#> 2 4HHB_2|Chains B, D|Hemoglobin subunit… VHLTPEE… s 146 NA
This appendix is intentionally compact. Its purpose is not to replace the narrative examples above, but to ensure that every exported function has an immediately visible calling pattern in the vignette.
One source of confusion in the RCSB ecosystem is that different
endpoints expect different identifier types. The table below summarizes
the levels supported by data_fetcher() and the search
return types used in perform_search().
| Data_or_Return_Type | Typical_ID_Format | Typical_Use |
|---|---|---|
| ENTRY | 4-character PDB ID, e.g. 4HHB | Whole-structure metadata |
| ASSEMBLY | Entry plus assembly ID, e.g. 4HHB-1 | Biological assembly and symmetry |
| POLYMER_ENTITY | Entry plus entity ID, e.g. 4HHB_1 | Entity-level taxonomy or sequence annotations |
| BRANCHED_ENTITY | Entry plus branched entity ID | Glycan/branched entity records |
| NONPOLYMER_ENTITY | Entry plus nonpolymer entity ID, e.g. 3PQR_5 | Ligand records within structures |
| POLYMER_ENTITY_INSTANCE | Instance or chain-level identifier, endpoint-specific | Chain-specific annotations |
| BRANCHED_ENTITY_INSTANCE | Instance-level identifier, endpoint-specific | Branched entity instance records |
| NONPOLYMER_ENTITY_INSTANCE | Instance-level identifier, endpoint-specific | Ligand instance records |
| CHEMICAL_COMPONENT | Chemical component ID, e.g. ATP | Ligand chemistry and descriptors |
The precise identifier syntax for instance-level records depends on
the RCSB schema and endpoint, but the key conceptual point is that the
package supports multiple biological levels and expects identifiers that
match those levels. The identifier helpers introduced in rPDBapi make
this easier to manage explicitly: infer_id_type()
classifies common patterns, parse_rcsb_id() decomposes
them, and the build_*_id() functions generate normalized
identifiers programmatically.
The package uses return classes as lightweight contracts. The most important ones are summarized here.
| Function | Return_Class | Meaning |
|---|---|---|
| query_search(return_type = ‘entry’) | rPDBapi_query_ids | Identifier vector from query_search() |
| query_search(other return_type) | rPDBapi_query_response | Parsed query_search payload |
| perform_search() | rPDBapi_search_ids | Identifier vector from perform_search() |
| perform_search(return_with_scores = TRUE) | rPDBapi_search_scores | Scored search results |
| perform_search(return_raw_json_dict = TRUE) | rPDBapi_search_raw_response | Raw JSON-like search payload |
| fetch_data() | rPDBapi_fetch_response | Validated GraphQL fetch payload |
| data_fetcher_batch(return_as_dataframe = TRUE) | rPDBapi_dataframe | Flattened batch result with provenance metadata |
| data_fetcher(return_as_dataframe = TRUE) | rPDBapi_dataframe | Flattened analysis-ready table |
| data_fetcher(return_as_dataframe = FALSE) | rPDBapi_fetch_response | Nested validated fetch payload |
| as_rpdb_entry() | rPDBapi_entry | Typed entry wrapper around retrieved data |
| as_rpdb_assembly() | rPDBapi_assembly | Typed assembly wrapper around retrieved data |
| as_rpdb_polymer_entity() | rPDBapi_polymer_entity | Typed polymer-entity wrapper around retrieved data |
| as_rpdb_chemical_component() | rPDBapi_chemical_component | Typed chemical-component wrapper around retrieved data |
| as_rpdb_structure() | rPDBapi_structure | Typed structure wrapper around retrieved data |
These classes are useful when writing wrappers, tests, or pipelines that need to branch on the kind of object returned by the package.
The package also uses typed errors in several important places. Users do not need to memorize these classes for normal interactive work, but they are useful for robust scripting and package development.
error_guidance <- data.frame(
Scenario = c(
"Malformed search response",
"Unsupported return-type mapping",
"Invalid input to search/fetch helper",
"Unknown property or subproperty in strict mode",
"Batch retrieval failure after retries",
"HTTP failure",
"Response parsing failure"
),
Typical_Class_or_Source = c(
"rPDBapi_error_malformed_response",
"rPDBapi_error_unsupported_mapping",
"rPDBapi_error_invalid_input",
"validate_properties() / generate_json_query()",
"data_fetcher_batch()",
"handle_api_errors() / send_api_request()",
"parse_response()"
),
stringsAsFactors = FALSE
)
knitr::kable(error_guidance, align = c("l", "l"))
| Scenario | Typical_Class_or_Source |
|---|---|
| Malformed search response | rPDBapi_error_malformed_response |
| Unsupported return-type mapping | rPDBapi_error_unsupported_mapping |
| Invalid input to search/fetch helper | rPDBapi_error_invalid_input |
| Unknown property or subproperty in strict mode | validate_properties() / generate_json_query() |
| Batch retrieval failure after retries | data_fetcher_batch() |
| HTTP failure | handle_api_errors() / send_api_request() |
| Response parsing failure | parse_response() |
In practice, these classes matter when you want to distinguish network failures from schema mismatches or user-input errors. That distinction is particularly important in automated structural bioinformatics pipelines that may run over many identifiers.
Programmatic structure retrieval is most useful when the search logic and the retrieved identifiers are stored alongside the analysis. A practical workflow is to save:
analysis_manifest <- list(
live_examples = TRUE,
package_version = as.character(utils::packageVersion("rPDBapi")),
query = kinase_query,
requested_entry_fields = entry_properties,
strict_property_validation = getOption("rPDBapi.strict_property_validation", FALSE),
built_ids = list(
entry = build_entry_id("4HHB"),
assembly = build_assembly_id("4HHB", 1),
entity = build_entity_id("4HHB", 1),
instance = build_instance_id("4HHB", "A")
),
batch_provenance_example = attr(kinase_batch, "provenance")
)
str(analysis_manifest, max.level = 2)
#> List of 7
#> $ live_examples : logi TRUE
#> $ package_version : chr "3.0.1"
#> $ query :List of 3
#> ..$ type : chr "group"
#> ..$ logical_operator: chr "and"
#> ..$ nodes :List of 3
#> $ requested_entry_fields :List of 6
#> ..$ rcsb_id : list()
#> ..$ struct : chr "title"
#> ..$ struct_keywords : chr "pdbx_keywords"
#> ..$ exptl : chr "method"
#> ..$ rcsb_entry_info : chr [1:2] "molecular_weight" "resolution_combined"
#> ..$ rcsb_accession_info: chr "initial_release_date"
#> $ strict_property_validation: logi FALSE
#> $ built_ids :List of 4
#> ..$ entry : chr "4HHB"
#> ..$ assembly: chr "4HHB-1"
#> ..$ entity : chr "4HHB_1"
#> ..$ instance: chr "4HHB.A"
#> $ batch_provenance_example :List of 13
#> ..$ fetched_at : chr "2026-03-07 16:36:55.093741"
#> ..$ mode : chr "batch"
#> ..$ data_type : chr "ENTRY"
#> ..$ requested_ids : int 5
#> ..$ batch_size : int 2
#> ..$ num_batches : int 3
#> ..$ retry_attempts: num 2
#> ..$ retry_backoff : num 0
#> ..$ cache_enabled : logi TRUE
#> ..$ cache_dir : chr "/var/folders/dj/y28dp44x303ggfg6rg8n2v0h0000gn/T//RtmpwNIFML/rpdbapi-vignette-cache"
#> ..$ cache_hits : int 0
#> ..$ cache_misses : int 3
#> ..$ batches :List of 3
This manifest is a simple example of how to preserve the logic of an
analysis. Because the search operators and requested fields are explicit
R objects, they can be saved with saveRDS() and reused
later. That is a better long-term strategy than relying on manual notes
about which website filters were used. When batch retrieval is part of
the workflow, the provenance attribute from
data_fetcher_batch() provides an additional audit trail for
how the data were obtained.
rPDBapi supports an end-to-end workflow for structural
bioinformatics in R: search the archive, refine the result set with
explicit operators, validate properties against known schema fields,
work across identifier levels, retrieve entry-, entity-, and
assembly-level metadata, scale retrieval with batch and cache-aware
helpers, convert nested responses into tidy data frames or typed
objects, download coordinate files, and integrate the results with
analysis and visualization packages. This workflow is useful not only
for exploratory access to the PDB, but also for reproducible, scriptable
analyses that can be revised and rerun as biological questions
evolve.
sessionInfo()
#> R version 4.5.2 (2025-10-31)
#> Platform: aarch64-apple-darwin20
#> Running under: macOS Sequoia 15.7.3
#>
#> Matrix products: default
#> BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
#> LAPACK: /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.12.1
#>
#> locale:
#> [1] C/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#>
#> time zone: Europe/Istanbul
#> tzcode source: internal
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] dplyr_1.2.0 rPDBapi_3.0.1
#>
#> loaded via a namespace (and not attached):
#> [1] bio3d_2.4-5 jsonlite_2.0.0 compiler_4.5.2 promises_1.5.0
#> [5] tidyselect_1.2.1 Rcpp_1.1.1 xml2_1.5.2 parallel_4.5.2
#> [9] later_1.4.8 jquerylib_0.1.4 yaml_2.3.12 fastmap_1.2.0
#> [13] mime_0.13 R6_2.6.1 generics_0.1.4 curl_7.0.0
#> [17] knitr_1.51 htmlwidgets_1.6.4 tibble_3.3.1 shiny_1.13.0
#> [21] bslib_0.10.0 pillar_1.11.1 rlang_1.1.7 utf8_1.2.6
#> [25] cachem_1.1.0 httpuv_1.6.16 xfun_0.56 sass_0.4.10
#> [29] otel_0.2.0 cli_3.6.5 withr_3.0.2 magrittr_2.0.4
#> [33] digest_0.6.39 r3dmol_0.1.2 grid_4.5.2 xtable_1.8-8
#> [37] rstudioapi_0.18.0 lifecycle_1.0.5 vctrs_0.7.1 evaluate_1.0.5
#> [41] glue_1.8.0 rmarkdown_2.30 purrr_1.2.1 httr_1.4.8
#> [45] tools_4.5.2 pkgconfig_2.0.3 htmltools_0.5.9