You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Jonathan Keane (Jira)" <ji...@apache.org> on 2022/05/31 20:49:00 UTC

[jira] [Created] (ARROW-16701) [R] Can we execute SQL in a dply pipeline?

Jonathan Keane created ARROW-16701:
--------------------------------------

             Summary: [R] Can we execute SQL in a dply pipeline?
                 Key: ARROW-16701
                 URL: https://issues.apache.org/jira/browse/ARROW-16701
             Project: Apache Arrow
          Issue Type: New Feature
          Components: R
            Reporter: Jonathan Keane


Now that we have {{to_duckdb()}} and {{to_arrow()}} is it possible to wrap those and allow someone to insert arbitrary SQL into a dplyr query?

Something like:

{code:r}
sql <- function(data, sql) {
   tbl <- to_duckdb(data)
   res <- DBI::dbSendQuery(dbplyr::remote_con(.data), sql, arrow = TRUE)

  duckdb::duckdb_fetch_record_batch(res)
}

ds %>%
  filter(year > 2020) %>% 
  sql("SELECT tip_amount, fare_amount, total_amount FROM ") %>%
  compute()
{code}

This won't work totally, but is vaguely what we're looking for.

One part that we need to think about is how to deal with the {{from}} clause, a few possibilities:

* bis does this by making you "name" the table before doing sql so you can FROM explicitly
* though maybe you could get away with FROM . like it is a magrittr thing and sub that
* empty string, and we add it in based on the lazy_tbl object

Possibly related prior art: https://dbplyr.tidyverse.org/reference/build_sql.html (though the name isn't perfect IMO, and I think this is more geared towards package developers than end users?)



--
This message was sent by Atlassian Jira
(v8.20.7#820007)