minty (Minimal type guesser) is a package with the type
inferencing and parsing tools (the so-called 1e parsing engine)
extracted from readr (with permission, see this issue
tidyverse/readr#1517).
Since July 2021, these tools are not used internally by readr for
parsing text files. Now vroom is used by default, unless explicitly
call the first edition parsing engine (see the explanation on
editions).
readr’s 1e type inferencing and parsing tools are used by various R
packages, e.g. readODS and surveytoolbox for parsing in-memory
objects, but those packages do not use the main functions
(e.g. readr::read_delim()) of readr. As explained in the README of
readr, those 1e code will be eventually removed from readr.
minty aims at providing a set of minimal, long-term, and compatible
type inferencing and parsing tools for those packages. You might
consider minty to be 1.5e parsing engine.
Installation
You can install the development version of minty like so:
if (!require("pak")){
install.packages("pak")
}
pak::pak("git::https://codeberg.org/chainsawriot/minty")
## built-in function type.convert:
## except numeric, no type inferencing
str(type.convert(text_only, as.is = TRUE))
#> 'data.frame': 3 obs. of 5 variables:
#> $ maybe_age : int 17 18 19
#> $ maybe_male: chr "true" "false" "true"
#> $ maybe_name: chr "AA" "BB" "CC"
#> $ some_na : chr NA "Not good" "Bad"
#> $ dob : chr "2019/07/21" "2019/08/31" "2019/10/01"
Inferencing the column types
library(minty)
data <- type_convert(text_only)
data
#> maybe_age maybe_male maybe_name some_na dob
#> 1 17 TRUE AA <NA> 2019-07-21
#> 2 18 FALSE BB Not good 2019-08-31
#> 3 019 TRUE CC Bad 2019-10-01
res <- parse_guess(c("2019-07-21", "2019-08-31", "2019-10-01", "IDK"), na = "IDK")
res
#> [1] "2019-07-21" "2019-08-31" "2019-10-01" NA
str(res)
#> Date[1:4], format: "2019-07-21" "2019-08-31" "2019-10-01" NA
Differences: readr vs minty
Unlike readr and vroom, please note that minty is mainly for
non-interactive usage. Therefore, minty emits fewer messages and
warnings than readr and vroom.
data <- minty::type_convert(text_only)
data
#> maybe_age maybe_male maybe_name some_na dob
#> 1 17 TRUE AA <NA> 2019-07-21
#> 2 18 FALSE BB Not good 2019-08-31
#> 3 019 TRUE CC Bad 2019-10-01
data <- readr::type_convert(text_only)
#>
#> ── Column specification ────────────────────────────────────────────────────────
#> cols(
#> maybe_age = col_character(),
#> maybe_male = col_logical(),
#> maybe_name = col_character(),
#> some_na = col_character(),
#> dob = col_date(format = "")
#> )
data
#> maybe_age maybe_male maybe_name some_na dob
#> 1 17 TRUE AA <NA> 2019-07-21
#> 2 18 FALSE BB Not good 2019-08-31
#> 3 019 TRUE CC Bad 2019-10-01
verbose option is added if you like those messages, default to
FALSE. To keep this package as minimal as possible, these optional
messages are printed with base R (not cli).
minty
minty(Minimal type guesser) is a package with the type inferencing and parsing tools (the so-called 1e parsing engine) extracted fromreadr(with permission, see this issue tidyverse/readr#1517). Since July 2021, these tools are not used internally byreadrfor parsing text files. Nowvroomis used by default, unless explicitly call the first edition parsing engine (see the explanation on editions).readr’s 1e type inferencing and parsing tools are used by various R packages, e.g.readODSandsurveytoolboxfor parsing in-memory objects, but those packages do not use the main functions (e.g.readr::read_delim()) ofreadr. As explained in the README ofreadr, those 1e code will be eventually removed fromreadr.mintyaims at providing a set of minimal, long-term, and compatible type inferencing and parsing tools for those packages. You might considermintyto be 1.5e parsing engine.Installation
You can install the development version of minty like so:
Example
A character-only data.frame
Inferencing the column types
Type-based parsing tools
Type guesser
Differences:
readrvsmintyUnlike
readrandvroom, please note thatmintyis mainly for non-interactive usage. Therefore,mintyemits fewer messages and warnings thanreadrandvroom.verboseoption is added if you like those messages, default toFALSE. To keep this package as minimal as possible, these optional messages are printed with base R (notcli).At the moment,
mintydoes not use theproblemsmechanism by default.Some features from
vroomhave been ported tominty, but notreadr.guess_maxis available forparse_guess()andtype_convert(), default toNA(same asreadr).For
parse_guess()andtype_convert(),trim_wsis considered before type guessing (the expected behavior ofvroom::vroom()/readr::read_delim()).Similar packages
For parsing ambiguous date(time)
Guess column types of a text file
Acknowledgements
Thanks to:
readr