fansi - ANSI Control Sequence Aware String Functions
Counterparts to R string manipulation functions that account for the
effects of ANSI text formatting control sequences.
Formatting Strings with Control Sequences
Many terminals will recognize special sequences of characters in strings
and change display behavior as a result. For example, on my terminal the
sequences "\033[3?m"33[3?m" and "\033[4?m"33[4?m", where "?" is a digit in 1-7,
change the foreground and background colors of text respectively:
This type of sequence is called an ANSI CSI SGR control sequence. Most
*nix terminals support them, and newer versions of Windows and Rstudio
consoles do too. You can check whether your display supports them by
running term_cap_test().
Whether the fansi functions behave as expected depends on many
factors, including how your particular display handles Control
Sequences. See ?fansi for details, particularly if you are getting
unexpected results.
Manipulation of Formatted Strings
ANSI control characters and sequences (Control Sequences hereafter)
break the relationship between byte/character position in a string and
display position. For example, to extract the “ANS” part of our colored
“FANSI”, we would need to carefully compute the character positions:
With fansi we can select directly based on display position:
If you look closely you’ll notice that the text color for the substr
version is wrong as the naïve string extraction loses the
initial"\033[37m"33[37m" that sets the foreground color. Additionally, the
color from the last letter bleeds out into the next line.
fansi Functions
fansi provides counterparts to the following string functions:
substr (and substr<-)
strsplit
strtrim
strwrap
nchar / nzchar
trimws
These are drop-in replacements that behave (almost) identically to the
base counterparts, except for the Control Sequence awareness. There
are also utility functions such as strip_ctl to remove Control
Sequences and has_ctl to detect whether strings contain them.
Much of fansi is written in C so you should find performance of the
fansi functions to be slightly slower than the corresponding base
functions, with the exception that strwrap_ctl is much faster.
Operations involving type = "width" will be slower still. We have
prioritized convenience and safety over raw speed in the C code, but
unless your code is primarily engaged in string manipulation fansi
should be fast enough to avoid attention in benchmarking traces.
Width Based Substrings
fansi also includes improved versions of some of those functions, such
as substr2_ctl which allows for width based substrings. To illustrate,
let’s create an emoji string made up of two wide characters:
You can translate ANSI CSI SGR formatted strings into their HTML
counterparts with to_html:
Translate to HTML
Rmarkdown
It is possible to set knitr hooks such that R output that contains
ANSI CSI SGR is automatically converted to the HTML formatted equivalent
and displayed as intended. See the
vignette
for details.
Installation
This package is available on CRAN:
install.packages('fansi')
It has no runtime dependencies.
For the development version use
remotes::install_github('brodieg/fansi@development') or:
R Core for developing and maintaining such a wonderful language.
CRAN maintainers, for patiently shepherding packages onto CRAN and
maintaining the repository, and Uwe Ligges in particular for
maintaining Winbuilder.
Gábor Csárdi for getting me
started on the journey ANSI control sequences, and for many of the
ideas on how to process them.
fansi - ANSI Control Sequence Aware String Functions
Counterparts to R string manipulation functions that account for the effects of ANSI text formatting control sequences.
Formatting Strings with Control Sequences
Many terminals will recognize special sequences of characters in strings and change display behavior as a result. For example, on my terminal the sequences
"\033[3?m"33[3?m"and"\033[4?m"33[4?m", where"?"is a digit in 1-7, change the foreground and background colors of text respectively:This type of sequence is called an ANSI CSI SGR control sequence. Most *nix terminals support them, and newer versions of Windows and Rstudio consoles do too. You can check whether your display supports them by running
term_cap_test().Whether the
fansifunctions behave as expected depends on many factors, including how your particular display handles Control Sequences. See?fansifor details, particularly if you are getting unexpected results.Manipulation of Formatted Strings
ANSI control characters and sequences (Control Sequences hereafter) break the relationship between byte/character position in a string and display position. For example, to extract the “ANS” part of our colored “FANSI”, we would need to carefully compute the character positions:
With
fansiwe can select directly based on display position:If you look closely you’ll notice that the text color for the
substrversion is wrong as the naïve string extraction loses the initial"\033[37m"33[37m"that sets the foreground color. Additionally, the color from the last letter bleeds out into the next line.fansiFunctionsfansiprovides counterparts to the following string functions:substr(andsubstr<-)strsplitstrtrimstrwrapnchar/nzchartrimwsThese are drop-in replacements that behave (almost) identically to the base counterparts, except for the Control Sequence awareness. There are also utility functions such as
strip_ctlto remove Control Sequences andhas_ctlto detect whether strings contain them.Much of
fansiis written in C so you should find performance of thefansifunctions to be slightly slower than the corresponding base functions, with the exception thatstrwrap_ctlis much faster. Operations involvingtype = "width"will be slower still. We have prioritized convenience and safety over raw speed in the C code, but unless your code is primarily engaged in string manipulationfansishould be fast enough to avoid attention in benchmarking traces.Width Based Substrings
fansialso includes improved versions of some of those functions, such assubstr2_ctlwhich allows for width based substrings. To illustrate, let’s create an emoji string made up of two wide characters:And a colorful background made up of one wide characters:
When we inject the 2-wide emoji into the 1-wide background their widths are accounted for as shown by the result remaining rectangular:
fansiwidth calculations use heuristics to account for graphemes, including combining emoji:HTML Translation
You can translate ANSI CSI SGR formatted strings into their HTML counterparts with
to_html:Rmarkdown
It is possible to set
knitrhooks such that R output that contains ANSI CSI SGR is automatically converted to the HTML formatted equivalent and displayed as intended. See the vignette for details.Installation
This package is available on CRAN:
It has no runtime dependencies.
For the development version use
remotes::install_github('brodieg/fansi@development')or:There is no guarantee that development versions are stable or even working. The master branch typically mirrors CRAN and should be stable.
Related Packages and References
Acknowledgments