dqcheckrGUI is a point-and-click Shiny interface for dqcheckr. It
lets you configure dataset quality checks, run them against incoming
file deliveries, and browse historical results — without writing any R
code.
The app runs entirely on your local machine. No internet connection, no server, and no cloud services are required.
Install dqcheckr before installing
dqcheckrGUI:
The app opens in your default browser at
http://localhost:4321. It reads and writes files relative
to the working directory you launched from, so set that to your project
folder before calling run_app(). The terminal window must
remain open while the app is running — closing it stops the app.
Alternatively, double-click launch.command (macOS),
launch.sh (Linux), or launch.bat (Windows)
from the app directory.
On first launch, if no dqcheckr.yml global config file
exists in the working directory, the app opens directly to
Global Config and prompts you to set two paths before
doing anything else:
data/snapshots.sqlite)reports/)Click Save global config once these are set, then proceed to configure your first dataset.
The app has a fixed sidebar on the left and a main panel that changes based on your selection.
┌──────────────────┬────────────────────────────────┐
│ SIDEBAR │ MAIN PANEL │
│ │ │
│ Datasets │ [content changes here] │
│ customers ✓ │ │
│ suppliers ⚠ │ │
│ [+ New] │ │
│ │ │
│ ▶ Run │ │
│ ⏱ History │ │
│ ⚙ Global Config │ │
└──────────────────┴────────────────────────────────┘
Status badges appear in the sidebar, dataset panel, and history table:
| Badge | Meaning |
|---|---|
| ✓ PASS | All checks passed |
| ⚠ WARN | One or more warnings; no failures |
| ✗ FAIL | One or more checks failed |
| ● RUNNING | A check is currently in progress |
| — not run | No runs recorded yet |
Clicking a dataset name in the sidebar opens its Dataset panel in the main area:
customer_accounts
──────────────────────────────────────────
Format: CSV Location: data/incoming/
Config: config/customer_accounts.yml
[Edit config] [Run check ▶]
Recent runs (last 5):
2026-05-30 20260530.csv ✓ PASS 0 failures
2026-05-23 20260523.csv ⚠ WARN 0 failures
2026-05-16 20260516.csv ✓ PASS 0 failures
[View all in History →]
Compare drift: ☐ run 1 ☐ run 2 [Compare ▶]
Click + New dataset in the sidebar to open the configuration wizard. Navigate between steps using the Back and Next buttons, or click the numbered breadcrumb bar at the top. All values are preserved as you move back and forth. If you navigate away from the wizard with unsaved changes, the app will warn you before discarding them.
Enter a short machine-readable name for the dataset. Names must start
with a letter and contain only letters, numbers, and underscores (e.g.
customer_accounts). This name is used as the config
filename and passed directly to
dqcheckr::run_dq_check().
An optional free-text description can also be added.
Choose how files are identified for each run:
Folder scan (recommended) — point to a folder; the app picks the two most recently modified files automatically on each run. Use this for delivery processes that drop files into a fixed directory.
A Preview most recent file button shows the names and sizes of the two most recent files in the selected folder: “Current: 20260530.csv (2.4 MB) | Previous: 20260523.csv (2.3 MB)”.
Explicit file paths — list a current file and optionally a previous file by path. Useful when files are versioned by name. If no previous file is given, comparison checks (CP series) are skipped.
This step identifies the file’s format and column layout. The top of the screen shows a raw text preview of the first 50 lines of the file — the full file is never loaded.
Auto-detection: when a file is loaded, the app uses
readr to detect the delimiter, encoding, quote character,
and whether the first row is a header. Results are shown as an editable
confirmation panel — you always make the final call, nothing is silently
committed.
Confirm or adjust:
A parsed preview updates live as you change these settings. If there is no header row, a column-naming panel appears alongside the preview where you can enter names for each column. Names must be valid R identifiers; a suggestion is offered if you enter something invalid.
When FWF is selected, a visual ruler activates above the text preview.
readr::fwf_empty().Below the ruler, a table shows the resulting column definitions (start position, width, name, type). Edit column names and types here. A validation badge confirms whether the column widths account for the full record length:
If the file has header rows to skip before the data begins, set Header rows to skip. Column names are pre-populated from the skipped header row if one is present.
Review every detected column. For each column you can:
character even if it looks numeric. Use this for
postcodes, phone numbers, account codes, BSB numbers, or any identifier
that happens to contain only digits. Click the inferred type to open the
override dropdown.Each column is shown as a collapsible card. Expand a column to add optional per-column validation rules.
Standard rules (always visible):
| Rule | Applies to | Check |
|---|---|---|
| Allowed values | character columns | QC-09: flag any value not in the list |
| Min value | numeric columns | QC-10: flag values below the minimum |
| Max value | numeric columns | QC-10: flag values above the maximum |
Advanced rules (click Advanced ▼ to reveal):
| Rule | Description |
|---|---|
| Regex pattern | Flag values that do not match the pattern (QC-13). Click Test against sample to verify the pattern against the actual file before saving. |
| Max missing rate | Override the dataset-level threshold for this column only |
| Max non-numeric rate | Override for numeric columns only |
| Max missing rate change | Override the comparison threshold for this column |
| Max mean shift | Override the mean shift threshold for this column |
A regex syntax error disables Next; a pattern that fails against sample values shows a warning but does not block you from proceeding.
Override the global default thresholds for this dataset only. Each field is pre-filled with the current global default. Only fields whose value differs from the global default are written to the dataset config; fields left at their default are omitted so that a later change to the global config is automatically inherited.
| Threshold | Default | What it controls |
|---|---|---|
| Max missing rate | 0.05 | Flag a column if > 5 % of values are blank |
| Max non-numeric rate | 0.01 | Flag a numeric column if > 1 % of values cannot be parsed |
| Min row count | 0 (off) | Fail if the delivery has fewer rows than this |
| Max row count change | 10 % | Warn if row count changes by > 10 % vs previous delivery |
| Max mean shift | 20 % | Warn if a numeric column mean shifts by > 20 % |
| Max missing change | 2 pp | Warn if missing rate changes by > 2 percentage points |
| Max non-numeric change | 1 pp | Warn if non-numeric rate changes by > 1 percentage point |
| Type inference threshold | 0.90 | A column is typed numeric if ≥ 90 % of values parse as numbers |
Schema change flags control whether warnings are raised when columns are added, dropped, change type, or change order between deliveries.
Point to a plain R file that defines a custom_checks(df)
function. The app validates the file immediately: it checks that the
file exists, parses without syntax errors, and defines the expected
function. A green badge confirms a valid file; a red badge shows the
specific problem.
Leave this field blank to skip custom checks. See
vignette("dqcheckr", package = "dqcheckr") for the custom
checks function signature and return value.
A summary of all settings is shown alongside a YAML preview of the configuration that will be written. Click Save config ✓ to write the file. A success notification confirms the path. The app then navigates to the dataset panel for the saved dataset.
For analysts who hand-edit YAML: the app preserves
any keys you have added to the YAML file outside the wizard. On the next
edit, those keys appear in the Step 8 preview under
# preserved from original file and are written back
unchanged. The wizard never silently drops hand-added config keys.
Click ▶ Run in the sidebar (or Run check ▶ from a dataset panel).
The check runs in a background process so the UI stays responsive. Progress is streamed to the log area in real time. When complete:
Status: ✓ PASS [Open report ↗] [View log]
0 failures 0 warnings 22 passed
Report: reports/customer_accounts_20260531_143022.html
Click ⏱ History in the sidebar to see all past runs across all datasets.
The table can be filtered by dataset name, status, or date using the filter row at the top of each column. Click Load more to page through older runs.
To open a report: click the Open link in the Report column to open that run’s HTML report in a new browser tab.
To compare two deliveries:
The drift report opens in a new tab, showing column-by-column changes between the two snapshots. The button is disabled with a tooltip if the selected rows are from different datasets.
Click ⚙ Global Config to set defaults that apply to all datasets unless overridden at the dataset level (Step 6).
Infrastructure paths — set the snapshot database path and report output directory. The snapshot database is created automatically on first run if it does not exist. If either path’s parent directory does not exist, a red validation message is shown — create the directory on disk first, then save.
Default rule thresholds — the same thresholds described in Step 6 above, applied globally. Dataset-level overrides take precedence.
Click Save global config to write changes.
See vignette("dqcheckr", package = "dqcheckr") for a
full description of every quality check (QC-01 to QC-14, SC-01/02, CP-01
to CP-08), per-column configuration options, custom checks, snapshot
database schema, and the compare_snapshots() function used
for drift reports.