The goal of {fakir} is to provide fake datasets that can be used to teach R.

The full documentation is in this {pkgdown} site: https://thinkr-open.github.io/fakir/

Characteristics

This package is designed for teaching data wrangling and data visualisation:

  • some datasets follow the tidy-data principles, others don’t.
  • Some missing values are set for numeric and categorical variables
  • Some variables values are correlated

These datasets are suitable to introduce to the {tidyverse} and to provide examples for main functions.
Supported languages are, for now, French and US English.

Examples

Fake support ticket base :

  • All tickets
  • Separate tickets and client databases
tickets_db <- fake_ticket_client(vol = 100, split = TRUE)
tickets_db
#> $clients
#> # A tibble: 200 x 14
#>    num_client first last  job     age region id_dpt departement cb_provider
#>  * <chr>      <chr> <chr> <chr> <dbl> <chr>  <chr>  <chr>       <chr>      
#>  1 1          Solo… Hean… Civi…    53 Pays … 72     <NA>        Diners Clu…
#>  2 2          Karma Will… Scie…    81 Nord-… 62     Pas-de-Cal… VISA 13 di…
#>  3 3          Press Kulas Anim…    NA Poito… 17     <NA>        <NA>       
#>  4 4          Laken McDe… <NA>     NA Centre 36     Indre       <NA>       
#>  5 5          Sydn… Jask… Hort…    30 Prove… 13     Bouches-du… <NA>       
#>  6 6          Clay… Runo… Comm…    NA Midi-… 31     Haute-Garo… Diners Clu…
#>  7 7          Robe… Purd… Fina…    60 Aquit… 40     <NA>        <NA>       
#>  8 8          Dr.   Rona… Astr…    30 Midi-… 46     Lot         <NA>       
#>  9 9          Miss  Alon… Occu…    18 Champ… 08     Ardennes    Diners Clu…
#> 10 10         Vern… Ondr… Clin…    19 Franc… 70     Haute-Saône <NA>       
#> # … with 190 more rows, and 5 more variables: name <chr>,
#> #   entry_date <dttm>, fidelity_points <dbl>, priority_encoded <dbl>,
#> #   priority <fct>
#> 
#> $tickets
#> # A tibble: 100 x 10
#>    ref   num_client  year month   day timestamp  supported type  state
#>    <chr> <chr>      <dbl> <dbl> <int> <date>     <chr>     <chr> <fct>
#>  1 DOSS… 1           2013     1    22 2013-01-22 Non       Inst… Term…
#>  2 DOSS… 22          2016    11    14 2016-11-14 Non       Inst… Atte…
#>  3 DOSS… 9           2016    12    19 2016-12-19 Non       Inst… Term…
#>  4 DOSS… 8           2017     1     2 2017-01-02 Non       Box   Atte…
#>  5 DOSS… 30          2017     1    19 2017-01-19 Oui       Inst… Inte…
#>  6 DOSS… 10          2017     2     1 2017-02-01 Oui       Inst… Atte…
#>  7 DOSS… 37          2017     3     1 2017-03-01 Non       Ligne Atte…
#>  8 DOSS… 37          2017     4    21 2017-04-21 Non       Box   Atte…
#>  9 DOSS… 24          2017     4    28 2017-04-28 Non       <NA>  En c…
#> 10 DOSS… 12          2017     5    15 2017-05-15 Non       Inst… Atte…
#> # … with 90 more rows, and 1 more variable: source_call <fct>
ggplot(tickets_db$clients) +
  aes(entry_date, fidelity_points) +
  geom_point() +
  geom_smooth()

Fake questionnaire on mean of transport / goal

  • All answers
  • Separate individuals and their answers

Prior work

This package is heavily inspired by {charlatan}.

Scott Chamberlain (2017). charlatan: Make Fake Data. R package version 0.1.0. https://CRAN.R-project.org/package=charlatan

Contribute

You can contribute to {fakir} in two ways:

Translate

You can translate to other locales by providing :

  • new vec in “R/utils”
  • new local in “R/fake_client” and “R/fake_transport”

New dataset

Feel free to create new datasets generators.

COC

Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.