You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Neal Richardson (Jira)" <ji...@apache.org> on 2021/09/03 17:58:00 UTC

[jira] [Updated] (ARROW-13107) [R] [C++] Implement SQL-alike distinct() for dplyr queries

     [ https://issues.apache.org/jira/browse/ARROW-13107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Neal Richardson updated ARROW-13107:
------------------------------------
    Affects Version/s:     (was: 4.0.1)

> [R] [C++] Implement SQL-alike distinct() for dplyr queries
> ----------------------------------------------------------
>
>                 Key: ARROW-13107
>                 URL: https://issues.apache.org/jira/browse/ARROW-13107
>             Project: Apache Arrow
>          Issue Type: New Feature
>          Components: C++, R
>            Reporter: Mauricio 'PachĂĄ' Vargas SepĂșlveda
>            Priority: Major
>
> Hi
> It would be desirable to have the ability to obtain a data frame with the unique combinations, say
> {code:r}
> open_dataset("sitc-rev2/parquet/",
>              partitioning = c("Year", "Trade Flow", "Reporter ISO")) %>%
>   select(Year, `Reporter ISO`) %>%
>   filter(Year >= 1988 & Year <= 1994) %>% 
>   distinct() %>% 
>   collect()
> {code}
> However, in the current development version of the Arrow package (installed from GitHub), we get this error for the last expression
> {code:r}
> Error in UseMethod("distinct") : 
>   no applicable method for 'distinct' applied to an object of class "arrow_dplyr_query"
> {code}
> This works
> {code:r}
> reporters_1 <- open_dataset("sitc-rev2/parquet/",
>              partitioning = c("Year", "Trade Flow", "Reporter ISO")) %>%
>   select(Year, `Reporter ISO`) %>%
>   filter(Year >= 1988 & Year <= 1994) %>% 
>   collect() %>% 
>   distinct()
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)