Rewrite non-ASCII characters inside a string of R source code

asciify_r_source(
  text,
  strategy = c("auto", "escape", "translit", "report"),
  identifiers = c("error", "warn", "skip")
)

Arguments

text

character(1), the full source of an R script.

strategy

one of:

  • "auto" (default): per-token policy - \\uXXXX escape inside string literals (so they remain semantically equivalent and CRAN-safe); Latin-ASCII transliteration (drops diacritics, e.g. an accented e becomes plain e) inside comments and roxygen blocks (where escapes would not be interpreted).

  • "escape": force \\uXXXX escape on every non-identifier token.

  • "translit": force ASCII transliteration on every non-identifier token.

  • "report": rewrite nothing, just return the input unchanged. Useful in conjunction with find_nonascii_tokens() for a dry run.

identifiers

what to do when a non-ASCII identifier (variable, function name, formal, slot...) is found:

  • "error" (default): stop. Renaming an identifier changes the API surface and is not safe to automate.

  • "warn": warn and leave the token unchanged.

  • "skip": silently leave the token unchanged.

Value

character(1), the rewritten source code. The original is returned unchanged if no non-ASCII tokens are found.

Details

This function does not touch identifiers, even with identifiers = "skip": CRAN's policy is to forbid non-ASCII identifiers, but rewriting them automatically is unsafe (it would silently rename the user's exported API). Use find_nonascii_tokens() to surface them.

Strings declared with the R 4.0 raw form (r"(...)", R"---(...)---") are detected - by default they are still treated like regular STR_CONST (escaped); pass strategy = "translit" if you want to keep them raw and lose the accents instead.

Examples

src <- '
# accent dans un commentaire: ete
x <- "deja vu"
'
cat(asciify_r_source(src))
#> 
#> # accent dans un commentaire: ete
#> x <- "deja vu"