Getting Started with SCIproj

What is a research compendium?

A research compendium is a self-contained collection of data, code, and documentation that accompanies a research project. By structuring a project as an R package, you gain:

SCIproj automates the creation of such a compendium, adding opinionated defaults for reproducible workflows (targets), dependency snapshots (renv), and FAIR-compliant metadata (CITATION.cff).

Getting started

Install SCIproj from GitHub:

# install.packages("remotes")
remotes::install_github("saskiaotto/SCIproj")

Create a new project with a single call:

library(SCIproj)
create_proj("~/projects/my_analysis")

This creates a fully scaffolded research compendium with renv and targets enabled by default.

Customizing the call

create_proj("~/projects/baltic_cod",
  add_license = "MIT",
  license_holder = "Jane Doe",
  orcid = "0000-0001-2345-6789",
  use_docker = TRUE,
  use_git = TRUE
)

Directory names with underscores or hyphens are fine — the R package name in DESCRIPTION is automatically sanitized (e.g., baltic_cod becomes baltic.cod).

Project structure

After creation, the project directory looks like this:

your-project/
├── DESCRIPTION             # Project metadata, dependencies, and author info (with ORCID).
├── README.Rmd              # Top-level project description.
├── your-project.Rproj      # RStudio project file.
├── CITATION.cff            # Machine-readable citation metadata for FAIR compliance.
├── CONTRIBUTING.md         # Contribution guidelines.
├── LICENSE.md              # Full license text (here: MIT).
├── NAMESPACE               # Auto-generated by roxygen2 (do not edit by hand).
│
├── data-raw/               # Raw data files and pre-processing scripts.
│   ├── clean_data.R        # Script template for data cleaning.
│   ├── DATA_SOURCES.md     # Data provenance: source, license, DOI, download date.
│   └── ...
│
├── data/                   # Cleaned datasets stored as .rda files.
│
├── R/                      # Custom R functions and dataset documentation.
│   ├── function_ex.R       # Template for custom functions.
│   ├── data.R              # Template for dataset documentation.
│   └── ...
│
├── analyses/               # R scripts or R Markdown/Quarto documents for analyses.
│   ├── figures/            # Generated plots.
│   └── ...
│
├── docs/                   # Publication-ready documents (article, report, presentation).
├── trash/                  # Temporary files that can be safely deleted.
│
├── _targets.R              # Pipeline definition for reproducible workflow.
├── renv/                   # renv library and settings.
├── renv.lock               # Lockfile for reproducible package versions.
└── Dockerfile              # Container definition for full reproducibility.
Directory / File Purpose
R/ Reusable R functions (documented with roxygen2)
data/ Cleaned, analysis-ready datasets (.rda format)
data-raw/ Raw data files and the script that cleans them
analyses/ Analysis scripts, R Markdown reports, figures
docs/ Manuscripts, presentations, supplementary material
trash/ Temporary files not under version control
_targets.R Pipeline definition for targets
CITATION.cff Machine-readable citation metadata
CONTRIBUTING.md Guidelines for collaborators

FAIR compliance

SCIproj encourages FAIR (Findable, Accessible, Interoperable, Reusable) research practices through several built-in features:

CITATION.cff

A Citation File Format file is created automatically. It includes the project title, author name, version, release date, and optionally a license and ORCID iD. Services like GitHub and Zenodo can parse this file to generate proper citations.

create_proj("my_project",
  license_holder = "Jane Doe",
  orcid = "0000-0001-2345-6789",
  add_license = "MIT"
)

DATA_SOURCES.md

When data_raw = TRUE (the default), a DATA_SOURCES.md template is placed in data-raw/. Use it to document the provenance of every dataset: source, URL, DOI, license, download date, and file names.

ORCID

Pass your ORCID iD via the orcid parameter to embed it in CITATION.cff, making your authorship unambiguously machine-readable.

Workflow with targets

By default (use_targets = TRUE), SCIproj adds a _targets.R pipeline template. The targets package provides:

A typical workflow:

# 1. Define targets in _targets.R
# 2. Inspect the pipeline
targets::tar_manifest()
targets::tar_visnetwork()
# 3. Run the pipeline
targets::tar_make()
# 4. Read a result
targets::tar_read(my_result)

Edit _targets.R to define your data-loading, analysis, and reporting steps. Each step is a target that depends on upstream targets and R functions in R/.

Dependency management with renv

By default (use_renv = TRUE), SCIproj initializes renv with the "explicit" snapshot type. This means renv discovers dependencies from DESCRIPTION rather than scanning all R files, which is the recommended approach for package-based compendia.

Key commands:

renv::status()     # check if lockfile is in sync
renv::snapshot()   # update the lockfile after adding packages
renv::restore()    # reinstall packages from the lockfile

The renv.lock file should be committed to version control so collaborators can reproduce your exact package versions.

Optional features

Docker

Set use_docker = TRUE to add a Dockerfile and .dockerignore. The Dockerfile provides a template for building a container that reproduces your computational environment, independent of the host system.

GitHub and CI

Set create_github_repo = TRUE to create a GitHub repository (requires a configured GITHUB_PAT). Add ci = "gh-actions" to include a GitHub Actions workflow for automated R CMD check on push.

create_proj("my_project",
  use_git = TRUE,
  create_github_repo = TRUE,
  ci = "gh-actions"
)

Licenses

Choose from "MIT", "GPL", "AGPL", "LGPL", "Apache", "CCBY", or"CC0" via the add_license parameter. The selected license is applied to DESCRIPTION and recorded in CITATION.cff.

testthat

Set testthat = TRUE to add testing infrastructure (tests/testthat.R and tests/testthat/). Writing tests for your analysis functions helps catch regressions early.

Makefile

Set makefile = TRUE to add a makefile.R script as an alternative to targets for orchestrating your workflow.

Typical development cycle

  1. Create the project

    SCIproj::create_proj("~/projects/my_study", add_license = "MIT",
      license_holder = "Your Name")
  2. Open the .Rproj file in RStudio.

  3. Add raw data to data-raw/ and document it in DATA_SOURCES.md.

  4. Write cleaning code in data-raw/clean_data.R; save cleaned data to data/ with usethis::use_data().

  5. Write analysis functions in R/ and document them with roxygen2.

  6. Define the pipeline in _targets.R to connect data, functions, and reports.

  7. Run targets::tar_make() to execute the pipeline.

  8. Write reports in analyses/ using R Markdown or Quarto, reading results with targets::tar_read().

  9. Snapshot dependencies with renv::snapshot() before sharing.

  10. Push to GitHub and let CI run R CMD check automatically.