Title: | Query 'R' Data Frames with 'SQL' |
---|---|
Description: | Use 'SQL' 'SELECT' statements to query 'R' data frames. |
Authors: | Ian Cook [aut, cre], Cloudera [cph] |
Maintainer: | Ian Cook <[email protected]> |
License: | Apache License 2.0 |
Version: | 0.2.4.9000 |
Built: | 2024-11-04 03:48:40 UTC |
Source: | https://github.com/ianmcook/tidyquery |
query
takes a SQL SELECT
statement and uses it to
query an R data frame
query(data, sql)
query(data, sql)
data |
a data frame or data frame-like object (optional) |
sql |
a character string containing a SQL |
If the data
argument is not specified, then the FROM
clause of the SQL statement determines which data frame to query.
The names of data frames and columns are case-sensitive (like in R). Keywords and function names are not case-sensitive (like in SQL).
In addition to R data frames and tibbles (tbl_df
objects), this
function can query:
dtplyr_step
objects created with dtplyr, a data.table backend for dplyr
Apache Arrow Table
, RecordBatch
, Dataset
, and arrow_dplyr_query
objects created with arrow
tbl_sql
objects created with dbplyr, but this depends on which database and which backend package (if any) you are using, so results may vary
This function is subject to the current limitations of the queryparser package.
An object of the same class as data
.
library(dplyr) iris %>% query("SELECT Species, AVG(Petal.Length) GROUP BY Species") query("SELECT Species, AVG(Petal.Length) FROM iris GROUP BY Species") iris %>% filter(Petal.Length > 4) %>% query("SELECT Species, MAX(Sepal.Length) AS max_sep_len GROUP BY Species") %>% arrange(desc(max_sep_len)) library(nycflights13) query <- "SELECT origin, dest, COUNT(flight) AS num_flts, round(AVG(distance)) AS dist, round(AVG(arr_delay)) AS avg_delay FROM flights WHERE distance BETWEEN 200 AND 300 AND air_time IS NOT NULL GROUP BY origin, dest HAVING num_flts > 5000 ORDER BY num_flts DESC, avg_delay DESC LIMIT 100;" query(query)
library(dplyr) iris %>% query("SELECT Species, AVG(Petal.Length) GROUP BY Species") query("SELECT Species, AVG(Petal.Length) FROM iris GROUP BY Species") iris %>% filter(Petal.Length > 4) %>% query("SELECT Species, MAX(Sepal.Length) AS max_sep_len GROUP BY Species") %>% arrange(desc(max_sep_len)) library(nycflights13) query <- "SELECT origin, dest, COUNT(flight) AS num_flts, round(AVG(distance)) AS dist, round(AVG(arr_delay)) AS avg_delay FROM flights WHERE distance BETWEEN 200 AND 300 AND air_time IS NOT NULL GROUP BY origin, dest HAVING num_flts > 5000 ORDER BY num_flts DESC, avg_delay DESC LIMIT 100;" query(query)
show_dplyr
takes a SQL SELECT
statement and prints
equivalent dplyr code
show_dplyr(data, sql)
show_dplyr(data, sql)
data |
a data frame or data frame-like object (optional) |
sql |
a character string containing a SQL |
For more details, see query
. Instead of running the
dplyr code like query
does, show_dplyr
prints the dplyr code.
In function calls in the printed code, long lists of arguments may be
truncated and appended with ...
.
library(dplyr) library(nycflights13) query <- "SELECT origin, dest, COUNT(flight) AS num_flts, round(AVG(distance)) AS dist, round(AVG(arr_delay)) AS avg_delay FROM flights WHERE distance BETWEEN 200 AND 300 AND air_time IS NOT NULL GROUP BY origin, dest HAVING num_flts > 5000 ORDER BY num_flts DESC, avg_delay DESC LIMIT 100;" show_dplyr(query)
library(dplyr) library(nycflights13) query <- "SELECT origin, dest, COUNT(flight) AS num_flts, round(AVG(distance)) AS dist, round(AVG(arr_delay)) AS avg_delay FROM flights WHERE distance BETWEEN 200 AND 300 AND air_time IS NOT NULL GROUP BY origin, dest HAVING num_flts > 5000 ORDER BY num_flts DESC, avg_delay DESC LIMIT 100;" show_dplyr(query)