You are viewing a plain text version of this content. The canonical link for it is here.

Posted to jira@arrow.apache.org by "Mauricio 'Pachá' Vargas Sepúlveda (Jira)" <ji...@apache.org> on 2021/06/25 17:11:00 UTC

[jira] [Updated] (ARROW-13188) [R] [C++] Implement substr/str_sub for dplyr queries

     [ https://issues.apache.org/jira/browse/ARROW-13188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mauricio 'Pachá' Vargas Sepúlveda updated ARROW-13188:
------------------------------------------------------
    Summary: [R] [C++] Implement substr/str_sub for dplyr queries  (was: [R] [C++] Implement SQL-alike distinct() for dplyr queries)

> [R] [C++] Implement substr/str_sub for dplyr queries
> ----------------------------------------------------
>
>                 Key: ARROW-13188
>                 URL: https://issues.apache.org/jira/browse/ARROW-13188
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: C++, R
>    Affects Versions: 4.0.1
>            Reporter: Mauricio 'Pachá' Vargas Sepúlveda
>            Priority: Minor
>
> I would be highly desirable to be able to use (base) substr and/or (stringr) str_sub in dplyr queries, like
> {code:r}
> library(arrow)
> library(dplyr)
> library(stringr)
> # get animal products, year 20919
> open_dataset(
>   "../cepii-datasets-arrow/parquet/baci_hs92",
>   partitioning = c("year", "reporter_iso")
> ) %>% 
>   filter(
>     year == 2019,
>     str_sub(product_code, 1, 2) == "01"
>   ) %>% 
>   collect()
> Error: Filter expression not supported for Arrow Datasets: str_sub(product_code, 1, 2) == "01"
> Call collect() first to pull data into R.
> {code}
> Of course, this needs implementation, but similar to ARROW-13107, points to an easier integration with dplyr.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)