rPDBapi is an R package designed to provide seamless
access to the RCSB Protein Data Bank (PDB). It simplifies the retrieval
and analysis of 3D structural data of large biological molecules,
essential for bioinformatics and structural biology research. This
package leverages the PDB’s XML-based API to facilitate custom queries,
data retrieval, and advanced search capabilities within the R
programming environment.
You can install the stable version of rPDBapi from
CRAN:
install.packages("rPDBapi", repos = "http://cran.us.r-project.org")To install the development version from GitHub:
devtools::install_github("selcukorkmaz/rPDBapi")Loading the Package
library(rPDBapi)Retrieving PDB IDs Retrieve PDB IDs related to a specific term, such as “hemoglobin”:
pdbs <- query_search(search_term = "hemoglobin")
head(pdbs)Advanced Searches Search by PubMed ID:
pdbs <- query_search(search_term = 32453425, query_type = "PubmedIdQuery")
pdbsSearch by source organism:
pdbs <- query_search(search_term = '7227', query_type = 'TreeEntityQuery')
head(pdbs)Search by experimental method:
pdbs <- query_search(search_term = 'SOLID-STATE NMR', query_type='ExpTypeQuery')
head(pdbs)Data Retrieval Fetch data based on user-defined IDs and properties:
properties <- list(rcsb_entry_info = c("molecular_weight"), exptl = "method", rcsb_accession_info = "deposit_date")
ids <- query_search("CRISPR")
df <- data_fetcher(id = ids, data_type = "ENTRY", properties = properties, return_as_dataframe = TRUE)
dfDescribing Chemical Compounds Retrieve comprehensive descriptions of chemical compounds:
chem_desc <- describe_chemical('ATP')
chem_desc$rcsb_chem_comp_descriptor$smilesRetrieving PDB Files Download PDB files in various formats:
pdb_file <- get_pdb_file(pdb_id = "4HHB", filetype = "cif")
head(pdb_file$atom)Additional Functions get_info: Retrieve
detailed information about a specific PDB entry.
get_fasta_from_rcsb_entry: Fetch FASTA sequences for
specified PDB entry IDs.
For more detailed examples and usage, please refer to the package documentation.
Core functions return typed objects to make downstream behavior explicit:
query_search()
return_type = "entry": character vector with class
rPDBapi_query_idsrPDBapi_query_responseperform_search()
rPDBapi_search_idsreturn_with_scores = TRUE: class
rPDBapi_search_scoresreturn_raw_json_dict = TRUE: class
rPDBapi_search_raw_responsefetch_data()
rPDBapi_fetch_responsedata_fetcher()
return_as_dataframe = TRUE: data frame with class
rPDBapi_dataframereturn_as_dataframe = FALSE: class
rPDBapi_fetch_responseError signaling uses typed conditions (e.g.,
rPDBapi_error_malformed_response,
rPDBapi_error_unsupported_mapping) for reliable
programmatic handling.
NONPOLYMER_ENTITY and NON_POLYMER_ENTITY
map to the same API return type.CHEMICAL_COMPONENT maps to
MOL_DEFINITION.citation and rcsb_primary_citation are
resolved compatibly in find_results() and
find_papers().By default, the test suite runs only deterministic unit tests (no network calls):
Sys.setenv(RPDBAPI_RUN_LIVE = "false")
testthat::test_dir("tests/testthat")Live API integration tests are opt-in:
Sys.setenv(RPDBAPI_RUN_LIVE = "true", NOT_CRAN = "true")
testthat::test_dir("tests/testthat")This package is licensed under the MIT License.