Parses text with base::parse() and utils::getParseData() and returns the token rows whose source text is not pure ASCII. Used as the building block of asciify_r_source() and asciify_pkg().

find_nonascii_tokens(text)

Arguments

text

character(1), R source code (one element, possibly with embedded newlines).

Value

a data.frame, the subset of getParseData() whose text field contains at least one non-ASCII byte. An extra logical column is_identifier flags symbol-like tokens that should not be auto-rewritten.

Details

Compared to a hand-rolled regex (e.g. the one used by dreamRs/prefixer), this catches every relevant context exactly once: string literals, comments, identifiers, numeric literals, etc., without false matches on lookalike characters that appear inside larger tokens.

See also

asciify_r_source() to apply the rewrite, find_nonascii_files() to scan a whole directory.