R/asciify_helpers.R
asciify_r_source.RdRewrite non-ASCII characters inside a string of R source code
character(1), the full source of an R script.
one of:
"auto" (default): per-token policy - \\uXXXX escape inside
string literals (so they remain semantically equivalent and CRAN-safe);
Latin-ASCII transliteration (drops diacritics, e.g. an accented
e becomes plain e) inside comments and roxygen blocks (where
escapes would not be interpreted).
"escape": force \\uXXXX escape on every non-identifier token.
"translit": force ASCII transliteration on every non-identifier token.
"report": rewrite nothing, just return the input unchanged. Useful
in conjunction with find_nonascii_tokens() for a dry run.
what to do when a non-ASCII identifier (variable, function name, formal, slot...) is found:
"error" (default): stop. Renaming an identifier changes the API
surface and is not safe to automate.
"warn": warn and leave the token unchanged.
"skip": silently leave the token unchanged.
character(1), the rewritten source code. The original is returned unchanged if no non-ASCII tokens are found.
This function does not touch identifiers, even with
identifiers = "skip": CRAN's policy is to forbid non-ASCII identifiers,
but rewriting them automatically is unsafe (it would silently rename the
user's exported API). Use find_nonascii_tokens() to surface them.
Strings declared with the R 4.0 raw form (r"(...)", R"---(...)---")
are detected - by default they are still treated like regular STR_CONST
(escaped); pass strategy = "translit" if you want to keep them raw and
lose the accents instead.
src <- '
# accent dans un commentaire: ete
x <- "deja vu"
'
cat(asciify_r_source(src))
#>
#> # accent dans un commentaire: ete
#> x <- "deja vu"