Tibbles

Tibbles are a modern take on data frames. They keep the features that have stood the test of time, and drop the features that used to be convenient but are now frustrating (i.e. converting character vectors to factors).

library(tibble)

Creating

tibble() is a nice way to create data frames. It encapsulates best practices for data frames:

Coercion

To complement tibble(), tibble provides as_tibble() to coerce objects into tibbles. Generally, as_tibble() methods are much simpler than as.data.frame() methods, and in fact, it’s precisely what as.data.frame() does, but it’s similar to do.call(cbind, lapply(x, data.frame)) - i.e. it coerces each component to a data frame and then uses cbind() to bind them all together.

as_tibble() has been written with an eye for performance:

l <- replicate(26, sample(100), simplify = FALSE)
names(l) <- letters

timing <- bench::mark(
  as_tibble(l),
  as.data.frame(l),
  check = FALSE
)

timing
#> # A tibble: 2 x 14
#>   expression       min         mean         median      max         `itr/sec`
#>   <chr>            <bench_tm>  <bench_tm>   <bench_tm>  <bench_tm>      <dbl>
#> 1 as_tibble(l)     0.000287696 0.0006251376 0.000327178 0.004508219     1600.
#> 2 as.data.frame(l) 0.000791522 0.0016640039 0.001098172 0.007652914      601.
#> # … with 8 more variables: mem_alloc <bnch_byt>, n_gc <dbl>, n_itr <int>,
#> #   total_time <bench_tm>, result <list>, memory <list>, time <list>, gc <list>

The speed of as.data.frame() is not usually a bottleneck when used interactively, but can be a problem when combining thousands of messy inputs into one tidy data frame.

Tibbles vs data frames

There are three key differences between tibbles and data frames: printing, subsetting, and recycling rules.

Printing

When you print a tibble, it only shows the first ten rows and all the columns that fit on one screen. It also prints an abbreviated description of the column type, and uses font styles and color for highlighting:

tibble(x = -5:1000)
#> # A tibble: 1,006 x 1
#>        x
#>    <int>
#>  1    -5
#>  2    -4
#>  3    -3
#>  4    -2
#>  5    -1
#>  6     0
#>  7     1
#>  8     2
#>  9     3
#> 10     4
#> # … with 996 more rows

You can control the default appearance with options:

Subsetting

Tibbles are quite strict about subsetting. [ always returns another tibble. Contrast this with a data frame: sometimes [ returns a data frame and sometimes it just returns a vector:

df1 <- data.frame(x = 1:3, y = 3:1)
class(df1[, 1:2])
#> [1] "data.frame"
class(df1[, 1])
#> [1] "integer"

df2 <- tibble(x = 1:3, y = 3:1)
class(df2[, 1:2])
#> [1] "tbl_df"     "tbl"        "data.frame"
class(df2[, 1])
#> [1] "tbl_df"     "tbl"        "data.frame"

To extract a single column use [[ or $:

class(df2[[1]])
#> [1] "integer"
class(df2$x)
#> [1] "integer"

Tibbles are also stricter with $. Tibbles never do partial matching, and will throw a warning and return NULL if the column does not exist:

df <- data.frame(abc = 1)
df$a
#> [1] 1

df2 <- tibble(abc = 1)
df2$a
#> Warning: Unknown or uninitialised column: `a`.
#> NULL

As of version 1.4.1, tibbles no longer ignore the drop argument:

data.frame(a = 1:3)[, "a", drop = TRUE]
#> [1] 1 2 3
tibble(a = 1:3)[, "a", drop = TRUE]
#> [1] 1 2 3

Recycling

When constructing a tibble, only values of length 1 are recycled. The first column with length different to one determines the number of rows in the tibble, conflicts lead to an error. This also extends to tibbles with zero rows, which is sometimes important for programming:

tibble(a = 1, b = 1:3)
#> # A tibble: 3 x 2
#>       a     b
#>   <dbl> <int>
#> 1     1     1
#> 2     1     2
#> 3     1     3
tibble(a = 1:3, b = 1)
#> # A tibble: 3 x 2
#>       a     b
#>   <int> <dbl>
#> 1     1     1
#> 2     2     1
#> 3     3     1
tibble(a = 1:3, c = 1:2)
#> Error: Tibble columns must have compatible sizes.
#> * Size 3: Existing data.
#> * Size 2: Column `c`.
#>  Only values of size one are recycled.
tibble(a = 1, b = integer())
#> # A tibble: 0 x 2
#> # … with 2 variables: a <dbl>, b <int>
tibble(a = integer(), b = 1)
#> # A tibble: 0 x 2
#> # … with 2 variables: a <int>, b <dbl>