You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Jonathan Keane (Jira)" <ji...@apache.org> on 2022/05/31 20:49:00 UTC
[jira] [Created] (ARROW-16701) [R] Can we execute SQL in a dply pipeline?
Jonathan Keane created ARROW-16701:
--------------------------------------
Summary: [R] Can we execute SQL in a dply pipeline?
Key: ARROW-16701
URL: https://issues.apache.org/jira/browse/ARROW-16701
Project: Apache Arrow
Issue Type: New Feature
Components: R
Reporter: Jonathan Keane
Now that we have {{to_duckdb()}} and {{to_arrow()}} is it possible to wrap those and allow someone to insert arbitrary SQL into a dplyr query?
Something like:
{code:r}
sql <- function(data, sql) {
tbl <- to_duckdb(data)
res <- DBI::dbSendQuery(dbplyr::remote_con(.data), sql, arrow = TRUE)
duckdb::duckdb_fetch_record_batch(res)
}
ds %>%
filter(year > 2020) %>%
sql("SELECT tip_amount, fare_amount, total_amount FROM ") %>%
compute()
{code}
This won't work totally, but is vaguely what we're looking for.
One part that we need to think about is how to deal with the {{from}} clause, a few possibilities:
* bis does this by making you "name" the table before doing sql so you can FROM explicitly
* though maybe you could get away with FROM . like it is a magrittr thing and sub that
* empty string, and we add it in based on the lazy_tbl object
Possibly related prior art: https://dbplyr.tidyverse.org/reference/build_sql.html (though the name isn't perfect IMO, and I think this is more geared towards package developers than end users?)
--
This message was sent by Atlassian Jira
(v8.20.7#820007)