Auditing an R package you have just received

library(checkhelper)

This vignette is the canonical end-to-end walkthrough: a colleague hands you an R package and asks “is this CRAN-ready?”. The goal is to surface every CRAN-blocking issue with the smallest possible number of R CMD check runs, then apply the safe automatic fixes.

For audits that have their own pipeline (full CRAN environment, file-system snapshots), see the companion vignette vignette("pre-submission-gates", package = "checkhelper").

TL;DR - the audit script

pkg <- "/path/to/the/package"

# 1. Run R CMD check ONCE and reuse it everywhere it's needed.
chk <- rcmdcheck::rcmdcheck(pkg, args = "--as-cran")

# 2. Static audits (no extra check needed).
audit_tags(pkg)            # exported funs without @return / internals without @noRd
audit_ascii(pkg)           # non-ASCII characters in R/, tests/, vignettes/, man/, DESCRIPTION, NAMESPACE
audit_dataset_doc(pkg)     # datasets in data/ without a roxygen block
audit_citation(pkg)        # old-style personList() / citEntry() in inst/CITATION
audit_dontrun(pkg)         # \dontrun{} blocks in man/*.Rd
audit_description(pkg)     # unquoted package names in DESCRIPTION's Description field
audit_downloads(pkg)       # network / download call sites to review for offline-safe guards

# 3. Audits that need the check output - pass `chk` to skip a 2nd run.
audit_globals(pkg, checks = chk)

# 4. Apply the safe fixes.
fix_globals(pkg, checks = chk, write = TRUE)

# Preview before applying: fix_ascii() returns invisibly, so capture
# it to see which files would change.
preview <- fix_ascii(pkg, dry_run = TRUE)
preview[preview$changed, ]
fix_ascii(pkg, dry_run = FALSE)        # then apply

fix_dataset_doc("my_data", pkg = pkg,
                description = "Description of my_data",
                source = "Internal")        # one call per undocumented dataset

audit_globals() and fix_globals() parse the notes field of an rcmdcheck::rcmdcheck() result to extract the no visible binding for global variable and no visible global function definition notes. By default each call runs its own check, which is slow on a real package.

Both functions accept a checks = argument. When supplied, they skip the rcmdcheck() call and parse the existing object. This lets you run the check once and reuse the result for the whole audit.

chk <- rcmdcheck::rcmdcheck(pkg, args = "--as-cran")

audit_globals(pkg, checks = chk)
fix_globals(pkg,   checks = chk, write = TRUE)

The other audits do not need a check at all:

Audit	Needs `R CMD check`?	Notes
`audit_tags()`	no	static via roxygen2
`audit_ascii()`	no	line-by-line via `stringi::stri_enc_isascii()`
`audit_dataset_doc()`	no	inspects `data/` and `R/`
`audit_citation()`	no	static parse of `inst/CITATION`
`audit_description()`	no	tokenises DESCRIPTION’s Description
`audit_dontrun()`	no	line-by-line scan of `man/*.Rd`
`audit_downloads()`	no	AST walk of `R/`, `tests/`, `vignettes/`, `inst/`
`audit_globals()`	yes (reusable)	accepts `checks =`
`audit_userspace()`	yes (own pipeline)	takes file-system snapshots, separate
`audit_check()`	yes	this is the check, with CRAN env

Per-issue cheatsheet

Globals (`no visible binding`)

audit_globals() returns a 3-element list of names CRAN flagged:

globalVariables - undeclared variables that need a utils::globalVariables() declaration.
functions - external functions that need an @importFrom line.
operators - NSE tokens, data.table / rlang pronouns (:=, .SD, .N, .data, !!, …) that also need an @importFrom rather than a globalVariables() entry.

fix_globals(write = TRUE) writes the globalVariables set into R/globals.R (merging with whatever names that file already declares - the freshly detected names are added on top of the existing ones, deduplicated). The operators section is printed on stdout so you wire each one into a roxygen @importFrom block by hand:

audit_globals(pkg, checks = chk)
fix_globals(pkg, checks = chk, write = TRUE)

When a token is exported by more than one candidate package (e.g. := is exported by both data.table and rlang), every candidate is listed and you pick one consciously - no silent guessing.

Without write = TRUE, fix_globals() only prints both blocks to copy-paste.

Missing roxygen tags

audit_tags() flags exported functions without @return and documented internals without @noRd. Read-only - no automatic fix because adding accurate @return text needs a human:

audit_tags(pkg)

Non-ASCII characters

audit_ascii() walks R/, tests/, vignettes/, man/, DESCRIPTION and NAMESPACE line-by-line and reports every line containing non-ASCII characters (columns: file, line, text, n_tokens). fix_ascii() then rewrites them - using the parser AST so each token is rewritten per its context: string literals become \uXXXX escapes, comments and roxygen get Latin-ASCII transliteration. It dry-runs by default:

audit_ascii(pkg)

# Always preview which files would change. fix_ascii() returns
# invisibly - capture the result to inspect per-file detail
# (path, changed, n_tokens, n_chars).
preview <- fix_ascii(pkg, dry_run = TRUE)
preview[preview$changed, ]

# Apply when you've reviewed the proposed rewrite.
fix_ascii(pkg, dry_run = FALSE)

Identifiers with non-ASCII characters are refused by default (renaming would be a breaking change).

Undocumented datasets

audit_dataset_doc() lists every data/*.rda without a matching roxygen block under R/. fix_dataset_doc() writes a documentation skeleton (one call per dataset, takes the dataset name):

audit_dataset_doc(pkg)

fix_dataset_doc("my_data",
                pkg = pkg,
                description = "Description of my_data",
                source = "Internal")

The skeleton is editable: you fill in the description / source / column-by-column comments by hand, then re-run devtools::document().

Old-style `inst/CITATION`

audit_citation() parses inst/CITATION statically (no eval()) and surfaces every call to personList(), as.personList() or citEntry() that CRAN rejects on submission with Package CITATION file contains call(s) to old-style .... It returns a tibble with call, line and a one-line suggestion for the modern equivalent (c() on person() objects; bibentry() instead of citEntry()):

audit_citation(pkg)

Read-only - rewriting a CITATION file usually needs editorial judgment, so there is no automated fix_citation().

Unquoted package names in `Description`

CRAN incoming pretest emits Package names should be quoted in the Description field when a package name (or any software name) appears in the Description field of DESCRIPTION without surrounding single quotes.

audit_description() reads the Description field, tokenises it, and surfaces every word that matches an installed package name yet is not wrapped in single quotes. The package’s own name is intentionally skipped, and so are compound forms like dplyr-style or httr2-based (a hyphen on either side disqualifies the token from being a standalone package reference). Returns a tibble with word, position and suggestion:

audit_description(pkg)

Read-only - the fix is editorial (decide whether each hit is a real package reference or a coincidental word, then wrap with single quotes).

Network / download calls

CRAN policy: package code that downloads files or hits the network at install or runtime must degrade gracefully when the network is unavailable (offline build farms, sandboxed CI, locked-down user environment). Common rejection causes: downloads from inside .onLoad(), .onAttach(), vignettes or examples that have no tryCatch() / skip_if_offline() / \dontrun{} guard.

audit_downloads() walks R/, tests/, vignettes/ and inst/, parses each file, and surfaces every call to a known download or HTTP function: download.file(), httr::GET(), httr2::req_perform(), curl::curl_download(), etc. The call site (file + line) is paired with a one-line suggestion. Detection is purely static, so a user-defined function that shadows a downloader name (download.file <- function(...) { ... }) does not trigger a false positive on the definition site - only call sites are flagged:

audit_downloads(pkg)

Read-only - the fix is editorial: decide for each call whether the right CRAN-safe pattern is tryCatch() (continue on offline), testthat::skip_if_offline() (skip the test), or \dontrun{} (drop the example from the test surface).

`\dontrun{}` blocks in examples

CRAN policy is that \dontrun{} should only wrap example code that genuinely cannot be executed (missing API key, missing system dependency, side effect on the user’s filespace). Otherwise prefer \donttest{}, which still gets exercised by R CMD check --run-donttest but is skipped by default.

audit_dontrun() walks man/*.Rd line-by-line and surfaces every \dontrun{} opener (commented-out % \dontrun{ mentions are ignored), with the source Rd file, the documented topic, the line number and a one-line suggestion. Read-only - the call is your review checklist:

audit_dontrun(pkg)

Minimal end-to-end on a fake package

create_example_pkg() builds a fake package that deliberately trips each audit. The two with_* flags below activate the non-ASCII and undocumented-dataset fixtures so every audit has something to surface:

pkg <- create_example_pkg(with_nonascii = TRUE,
                          with_undocumented_data = TRUE)

chk <- rcmdcheck::rcmdcheck(pkg, args = "--as-cran")

audit_tags(pkg)             # @return / @noRd issues
audit_ascii(pkg)             # accents in comments / strings
audit_dataset_doc(pkg)       # data/demo_dataset.rda has no doc
audit_citation(pkg)          # old-style personList() / citEntry()
audit_dontrun(pkg)           # \dontrun{} blocks in examples
audit_description(pkg)       # unquoted package names in Description
audit_downloads(pkg)         # network call sites to review for offline-safe guards
audit_globals(pkg, checks = chk)

fix_globals(pkg, checks = chk, write = TRUE)
fix_ascii(pkg, dry_run = FALSE)
fix_dataset_doc("demo_dataset", pkg = pkg,
                description = "A small demo dataset",
                source = "Generated by create_example_pkg()")

After applying the fixes, re-run the check (the package state has changed, so a new rcmdcheck() is needed) and confirm 0 / 0 / 0.

Next step: pre-submission gates

When the dev-time audits above are clean, run the heavier gates that have their own pipeline and cannot reuse chk:

audit_check() - R CMD check with the full CRAN incoming environment.
audit_userspace() - checks that tests / examples / vignettes leave no files behind.

Both are documented in vignette("pre-submission-gates", package = "checkhelper").