Getting Started with SCIproj

What is a research compendium?

A research compendium is a self-contained collection of data, code, and documentation that accompanies a research project. By structuring a project as an R package, you gain:

a standard, well-understood directory layout,
built-in dependency management via DESCRIPTION,
documentation infrastructure (roxygen2, vignettes),
testing infrastructure (testthat),
easy sharing and installation via GitHub.

SCIproj automates the creation of such a compendium, adding opinionated defaults for reproducible workflows (targets), dependency snapshots (renv), and FAIR-compliant metadata (CITATION.cff).

Getting started

Install SCIproj from GitHub:

# install.packages("remotes")
remotes::install_github("saskiaotto/SCIproj")

Create a new project with a single call:

library(SCIproj)
create_proj("~/projects/my_analysis")

This creates a fully scaffolded research compendium with renv and targets enabled by default.

Customizing the call

create_proj("~/projects/baltic_cod",
  add_license = "MIT",
  license_holder = "Jane Doe",
  orcid = "0000-0001-2345-6789",
  use_docker = TRUE,
  use_git = TRUE
)

Directory names with underscores or hyphens are fine — the R package name in DESCRIPTION is automatically sanitized (e.g., baltic_cod becomes baltic.cod).

IDE support

SCIproj works with RStudio, Positron, VSCode, and terminal R sessions. By default, create_proj() sets your working directory to the new project (setwd_to_proj = TRUE) so you can start working immediately. Three parameters control post-creation behavior:

setwd_to_proj (default TRUE): whether the current session’s working directory is switched to the new project. Set to FALSE for batch workflows where you create multiple projects in sequence and want to stay in your current directory.
use_rproj (default TRUE): whether an .Rproj file is created. Set to FALSE for projects used exclusively in Positron, VSCode, or similar IDEs that don’t rely on .Rproj files.
open_proj (default FALSE): if TRUE, opens the new project in a fresh RStudio or Positron session. In that case, the current session keeps its original working directory.

Project structure

After creation, the project directory looks like this:

your-project/
├── DESCRIPTION             # Project metadata, dependencies, and author info (with ORCID).
├── README.Rmd              # Top-level project description.
├── your-project.Rproj      # RStudio project file (if use_rproj = TRUE, default).
├── CITATION.cff            # Machine-readable citation metadata for FAIR compliance.
├── CONTRIBUTING.md         # Contribution guidelines.
├── LICENSE.md              # Full license text (here: MIT).
├── NAMESPACE               # Auto-generated by roxygen2 (do not edit by hand).
│
├── data-raw/               # Raw data files and pre-processing scripts.
│   ├── clean_data.R        # Script template for data cleaning.
│   ├── DATA_SOURCES.md     # Data provenance: source, license, DOI, download date.
│   └── ...
│
├── data/                   # Cleaned datasets stored as .rda files.
│
├── R/                      # Custom R functions and dataset documentation.
│   ├── function_ex.R       # Template for custom functions.
│   ├── data.R              # Template for dataset documentation.
│   └── ...
│
├── analyses/               # R scripts or R Markdown/Quarto documents for analyses.
│   ├── figures/            # Generated plots.
│   └── ...
│
├── docs/                   # Publication-ready documents (article, report, presentation).
├── trash/                  # Temporary files that can be safely deleted.
│
├── _targets.R              # Pipeline definition for reproducible workflow.
├── renv/                   # renv library and settings.
├── renv.lock               # Lockfile for reproducible package versions.
└── Dockerfile              # Container definition for full reproducibility.

Directory / File	Purpose
`R/`	Reusable R functions (documented with `roxygen2`)
`data/`	Cleaned, analysis-ready datasets (`.rda` format)
`data-raw/`	Raw data files and the script that cleans them
`analyses/`	Analysis scripts, R Markdown reports, figures
`docs/`	Manuscripts, presentations, supplementary material
`trash/`	Temporary files not under version control
`_targets.R`	Pipeline definition for `targets`
`CITATION.cff`	Machine-readable citation metadata
`CONTRIBUTING.md`	Guidelines for collaborators

FAIR compliance

SCIproj encourages FAIR (Findable, Accessible, Interoperable, Reusable) research practices through several built-in features:

CITATION.cff

A Citation File Format file is created automatically. It includes the project title, author name, version, release date, and optionally a license and ORCID iD. Services like GitHub and Zenodo can parse this file to generate proper citations.

create_proj("my_project",
  license_holder = "Jane Doe",
  orcid = "0000-0001-2345-6789",
  add_license = "MIT"
)

DATA_SOURCES.md

When data_raw = TRUE (the default), a DATA_SOURCES.md template is placed in data-raw/. Use it to document the provenance of every dataset: source, URL, DOI, license, download date, and file names.

ORCID

Pass your ORCID iD via the orcid parameter to embed it in CITATION.cff, making your authorship unambiguously machine-readable.

Workflow with targets

By default (use_targets = TRUE), SCIproj adds a _targets.R pipeline template. The targets package provides:

Automatic dependency tracking — only outdated targets are re-run.
Caching — results are stored in the _targets/ data store.
Visualization — tar_visnetwork() shows the pipeline as a graph.

A typical workflow:

# 1. Define targets in _targets.R
# 2. Inspect the pipeline
targets::tar_manifest()
targets::tar_visnetwork()
# 3. Run the pipeline
targets::tar_make()
# 4. Read a result
targets::tar_read(my_result)

Edit _targets.R to define your data-loading, analysis, and reporting steps. Each step is a target that depends on upstream targets and R functions in R/.

Dependency management with renv

By default (use_renv = TRUE), SCIproj initializes renv with the "explicit" snapshot type. This means renv discovers dependencies from DESCRIPTION rather than scanning all R files, which is the recommended approach for package-based compendia.

Key commands:

renv::status()     # check if lockfile is in sync
renv::snapshot()   # update the lockfile after adding packages
renv::restore()    # reinstall packages from the lockfile

The renv.lock file should be committed to version control so collaborators can reproduce your exact package versions.

Optional features

Docker

Set use_docker = TRUE to add a Dockerfile and .dockerignore. The Dockerfile provides a template for building a container that reproduces your computational environment, independent of the host system.

GitHub and CI

Set create_github_repo = TRUE to create a GitHub repository (requires a configured GITHUB_PAT). Add ci = "gh-actions" to include a GitHub Actions workflow for automated R CMD check on push.

create_proj("my_project",
  use_git = TRUE,
  create_github_repo = TRUE,
  ci = "gh-actions"
)

Licenses

Choose from "MIT", "GPL", "AGPL", "LGPL", "Apache", "CCBY", or"CC0" via the add_license parameter. The selected license is applied to DESCRIPTION and recorded in CITATION.cff.

testthat

Set testthat = TRUE to add testing infrastructure (tests/testthat.R and tests/testthat/). Writing tests for your analysis functions helps catch regressions early.

Makefile

Set makefile = TRUE to add a makefile.R script as an alternative to targets for orchestrating your workflow.

Typical development cycle

Create the project

SCIproj::create_proj("~/projects/my_study", add_license = "MIT",
  license_holder = "Your Name")

Start working in the project. Your working directory is already set to the new project (setwd_to_proj = TRUE by default), so you can continue immediately. For a dedicated project session, either open the .Rproj file manually (RStudio) or the project folder as a workspace (Positron/VSCode) — or pass open_proj = TRUE to create_proj() to open a new IDE session automatically.
Add raw data to data-raw/ and document it in DATA_SOURCES.md.
Write cleaning code in data-raw/clean_data.R; save cleaned data to data/ with usethis::use_data().
Write analysis functions in R/ and document them with roxygen2.
Define the pipeline in _targets.R to connect data, functions, and reports.
Run targets::tar_make() to execute the pipeline.
Write reports in analyses/ using R Markdown or Quarto, reading results with targets::tar_read().
Snapshot dependencies with renv::snapshot() before sharing.
Push to GitHub and let CI run R CMD check automatically.