Standardize Date Variables in LimeSurvey Data
Arguments
- .data
tibble, data frame with LimeSurvey data.
- date_cols
character, names of date columns to be standardized.
- as_na
character, values to be coerced to
NA(default toc("", "N", "Y")).- ...
Arguments passed on to
lubridate::parse_date_timeordersa character vector of date-time formats. Each order string is a series of formatting characters as listed in
base::strptime()but might not include the"%"prefix. For example, "ymd" will match all the possible dates in year, month, day order. Formatting orders might include arbitrary separators. These are discarded. See details for the implemented formats. If multiple order strings are supplied, they are applied in turn forparse_date_time2()andfast_strptime(). Forparse_date_time()the order of applied formats is determined byselect_formatsparameter.tza character string that specifies the time zone with which to parse the dates
truncatedinteger, number of formats that can be missing. The most common type of irregularity in date-time data is the truncation due to rounding or unavailability of the time stamp. If the
truncatedparameter is non-zeroparse_date_time()also checks for truncated formats. For example, if the format order is "ymdHMS" andtruncated = 3,parse_date_time()will correctly parse incomplete date-times like2012-06-01 12:23,2012-06-01 12and2012-06-01. NOTE: Theymd()family of functions is based onbase::strptime()which currently fails to parse%Y-%mformats.quietlogical. If
TRUE, progress messages are not printed, andNo formats founderror is suppressed and the function simply returns a vector of NAs. This mirrors the behavior of base R functionsbase::strptime()andbase::as.POSIXct().localelocale to be used, see locales. On Linux systems you can use
system("locale -a")to list all the installed locales.select_formatsA function to select actual formats for parsing from a set of formats which matched a training subset of
x. It receives a named integer vector and returns a character vector of selected formats. Names of the input vector are formats (not orders) that matched the training set. Numeric values are the number of dates (in the training set) that matched the corresponding format. You should use this argument if the default selection method fails to select the formats in the right order. By default the formats with most formatting tokens (%) are selected and%Ycounts as 2.5 tokens (so that it has a priority over%y%m). See examples.exactlogical. If
TRUE, theordersparameter is interpreted as an exactbase::strptime()format and no training or guessing are performed (i.e.train,dropparameters are ignored).trainlogical, default
TRUE. Whether to train formats on a subset of the input vector. As a result the supplied orders are sorted according to performance on this training set, which commonly results in increased performance. Please note that even whentrain = FALSE(andexact = FALSE) guessing of the actual formats is still performed on the training set (a pseudo-random subset of the original input vector). This might result inAll formats failed to parseerror. See notes below.droplogical, default
FALSE. Whether to drop formats that didn't match on the training set. IfFALSE, unmatched on the training set formats are tried as a last resort at the end of the parsing queue. Applies only whentrain = TRUE. Setting this parameter toTRUEmight slightly speed up parsing in situations involving many formats. Prior to v1.7.0 this parameter was implicitlyTRUE, which resulted in occasional surprising behavior when rare patterns where not present in the training set.