Writing SQL with dbplyr

This vignette discusses why you might use dbplyr instead of writing SQL yourself, and what to do when dbplyr’s built-in translations can’t create the SQL that you need.

library(dplyr)
library(dbplyr)

mf <- memdb_frame(x = 1, y = 2)

Why use dbplyr?

One simple nicety of dplyr is that it will automatically generate subqueries if you want to use a freshly created variable in mutate():

mf %>% 
  mutate(
    a = y * x, 
    b = a ^ 2,
  ) %>% 
  show_query()
#> <SQL>
#> SELECT `x`, `y`, `a`, POWER(`a`, 2.0) AS `b`
#> FROM (SELECT `x`, `y`, `y` * `x` AS `a`
#> FROM `dbplyr_002`)

In general, it’s much easier to work iteratively in dbplyr. You can easily give intermediate queries names, and reuse them in multiple places. Or if you have a common operation that you want to do to many queries, you can easily wrap it up in a function. It’s also easy to chain count() to the end of any query to check the results are about what you expect.

What happens when dbplyr fails?

dbplyr aims to translate the most common R functions to their SQL equivalents, allowing you to ignore the vagaries of the SQL dialect that you’re working with, so you can focus on the data analysis problem at hand. But different backends have different capabilities, and sometimes there are SQL functions that don’t have exact equivalents in R. In those cases, you’ll need to write SQL code directly. This section shows you how you can do so.

Prefix functions

Any function that dbplyr doesn’t know about will be left as is:

mf %>% 
  mutate(z = foofify(x, y)) %>% 
  show_query()
#> <SQL>
#> SELECT `x`, `y`, foofify(`x`, `y`) AS `z`
#> FROM `dbplyr_002`

Because SQL functions are general case insensitive, I recommend using upper case when you’re using SQL functions in R code. That makes it easier to spot that you’re doing something unusual:

mf %>% 
  mutate(z = FOOFIFY(x, y)) %>% 
  show_query()
#> <SQL>
#> SELECT `x`, `y`, FOOFIFY(`x`, `y`) AS `z`
#> FROM `dbplyr_002`

Infix functions

As well as prefix functions (where the name of the function comes before the arguments), dbplyr also translates infix functions. That allows you to use expressions like LIKE which does a limited form of pattern matching:

mf %>% 
  filter(x %LIKE% "%foo%") %>% 
  show_query()
#> <SQL>
#> SELECT *
#> FROM `dbplyr_002`
#> WHERE (`x` LIKE '%foo%')

Or use || for string concatenation (note that backends should translate paste() and paste0() for you):

mf %>% 
  transmute(z = x %||% y) %>% 
  show_query()
#> <SQL>
#> SELECT `x` || `y` AS `z`
#> FROM `dbplyr_002`

Special forms

SQL functions tend to have a greater variety of syntax than R. That means there are a number of expressions that can’t be translated directly from R code. To insert these in your own queries, you can use literal SQL inside sql():

mf %>% 
  transmute(factorial = sql("x!")) %>% 
  show_query()
#> <SQL>
#> SELECT x! AS `factorial`
#> FROM `dbplyr_002`

mf %>% 
  transmute(factorial = sql("CAST(x AS FLOAT)")) %>% 
  show_query()
#> <SQL>
#> SELECT CAST(x AS FLOAT) AS `factorial`
#> FROM `dbplyr_002`

Note that you can use sql() at any depth inside the expression:

mf %>% 
  filter(x == sql("ANY VALUES(1, 2, 3)")) %>% 
  show_query()
#> <SQL>
#> SELECT *
#> FROM `dbplyr_002`
#> WHERE (`x` = ANY VALUES(1, 2, 3))