| Title: | Query 'R' Data Frames with 'SQL' |
|---|---|
| Description: | Use 'SQL' 'SELECT' statements to query 'R' data frames. |
| Authors: | Ian Cook [aut, cre], Cloudera [cph] |
| Maintainer: | Ian Cook <[email protected]> |
| License: | Apache License 2.0 |
| Version: | 0.2.4.9000 |
| Built: | 2026-05-17 05:40:49 UTC |
| Source: | https://github.com/ianmcook/tidyquery |
query takes a SQL SELECT statement and uses it to
query an R data frame
query(data, sql)query(data, sql)
data |
a data frame or data frame-like object (optional) |
sql |
a character string containing a SQL |
If the data argument is not specified, then the FROM
clause of the SQL statement determines which data frame to query.
The names of data frames and columns are case-sensitive (like in R). Keywords and function names are not case-sensitive (like in SQL).
In addition to R data frames and tibbles (tbl_df objects), this
function can query:
dtplyr_step objects created with dtplyr, a data.table backend for dplyr
Apache Arrow Table, RecordBatch, Dataset, and arrow_dplyr_query objects created with arrow
tbl_sql objects created with dbplyr, but this depends on which database and which backend package (if any) you are using, so results may vary
This function is subject to the current limitations of the queryparser package.
An object of the same class as data.
library(dplyr) iris %>% query("SELECT Species, AVG(Petal.Length) GROUP BY Species") query("SELECT Species, AVG(Petal.Length) FROM iris GROUP BY Species") iris %>% filter(Petal.Length > 4) %>% query("SELECT Species, MAX(Sepal.Length) AS max_sep_len GROUP BY Species") %>% arrange(desc(max_sep_len)) library(nycflights13) query <- "SELECT origin, dest, COUNT(flight) AS num_flts, round(AVG(distance)) AS dist, round(AVG(arr_delay)) AS avg_delay FROM flights WHERE distance BETWEEN 200 AND 300 AND air_time IS NOT NULL GROUP BY origin, dest HAVING num_flts > 5000 ORDER BY num_flts DESC, avg_delay DESC LIMIT 100;" query(query)library(dplyr) iris %>% query("SELECT Species, AVG(Petal.Length) GROUP BY Species") query("SELECT Species, AVG(Petal.Length) FROM iris GROUP BY Species") iris %>% filter(Petal.Length > 4) %>% query("SELECT Species, MAX(Sepal.Length) AS max_sep_len GROUP BY Species") %>% arrange(desc(max_sep_len)) library(nycflights13) query <- "SELECT origin, dest, COUNT(flight) AS num_flts, round(AVG(distance)) AS dist, round(AVG(arr_delay)) AS avg_delay FROM flights WHERE distance BETWEEN 200 AND 300 AND air_time IS NOT NULL GROUP BY origin, dest HAVING num_flts > 5000 ORDER BY num_flts DESC, avg_delay DESC LIMIT 100;" query(query)
show_dplyr takes a SQL SELECT statement and prints
equivalent dplyr code
show_dplyr(data, sql)show_dplyr(data, sql)
data |
a data frame or data frame-like object (optional) |
sql |
a character string containing a SQL |
For more details, see query. Instead of running the
dplyr code like query does, show_dplyr prints the dplyr code.
In function calls in the printed code, long lists of arguments may be
truncated and appended with ....
library(dplyr) library(nycflights13) query <- "SELECT origin, dest, COUNT(flight) AS num_flts, round(AVG(distance)) AS dist, round(AVG(arr_delay)) AS avg_delay FROM flights WHERE distance BETWEEN 200 AND 300 AND air_time IS NOT NULL GROUP BY origin, dest HAVING num_flts > 5000 ORDER BY num_flts DESC, avg_delay DESC LIMIT 100;" show_dplyr(query)library(dplyr) library(nycflights13) query <- "SELECT origin, dest, COUNT(flight) AS num_flts, round(AVG(distance)) AS dist, round(AVG(arr_delay)) AS avg_delay FROM flights WHERE distance BETWEEN 200 AND 300 AND air_time IS NOT NULL GROUP BY origin, dest HAVING num_flts > 5000 ORDER BY num_flts DESC, avg_delay DESC LIMIT 100;" show_dplyr(query)