You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Neal Richardson (Jira)" <ji...@apache.org> on 2021/09/03 17:58:00 UTC
[jira] [Updated] (ARROW-13107) [R] [C++] Implement SQL-alike
distinct() for dplyr queries
[ https://issues.apache.org/jira/browse/ARROW-13107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Neal Richardson updated ARROW-13107:
------------------------------------
Affects Version/s: (was: 4.0.1)
> [R] [C++] Implement SQL-alike distinct() for dplyr queries
> ----------------------------------------------------------
>
> Key: ARROW-13107
> URL: https://issues.apache.org/jira/browse/ARROW-13107
> Project: Apache Arrow
> Issue Type: New Feature
> Components: C++, R
> Reporter: Mauricio 'PachĂĄ' Vargas SepĂșlveda
> Priority: Major
>
> Hi
> It would be desirable to have the ability to obtain a data frame with the unique combinations, say
> {code:r}
> open_dataset("sitc-rev2/parquet/",
> partitioning = c("Year", "Trade Flow", "Reporter ISO")) %>%
> select(Year, `Reporter ISO`) %>%
> filter(Year >= 1988 & Year <= 1994) %>%
> distinct() %>%
> collect()
> {code}
> However, in the current development version of the Arrow package (installed from GitHub), we get this error for the last expression
> {code:r}
> Error in UseMethod("distinct") :
> no applicable method for 'distinct' applied to an object of class "arrow_dplyr_query"
> {code}
> This works
> {code:r}
> reporters_1 <- open_dataset("sitc-rev2/parquet/",
> partitioning = c("Year", "Trade Flow", "Reporter ISO")) %>%
> select(Year, `Reporter ISO`) %>%
> filter(Year >= 1988 & Year <= 1994) %>%
> collect() %>%
> distinct()
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)