You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Pal (Jira)" <ji...@apache.org> on 2020/10/14 06:13:00 UTC

[jira] [Updated] (ARROW-10305) [R] Error: Filter expression not supported for Arrow Datasets (substr, grepl, str_detect)

     [ https://issues.apache.org/jira/browse/ARROW-10305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Pal updated ARROW-10305:
------------------------
    Description: 
Hi,

Some expressions, such as substr(), grepl(), str_detect() or others, are not supported while filtering after open_datatset(). Specifically, the code below :

 

{{library(dplyr)
 library(arrow)
 data = data.frame(a = c("a", "a2", "a3"))
 write_parquet(data, "Test_filter/data.parquet")

ds <- open_dataset("Test_filter/")

data_flt <- ds %>% 
 filter(substr(a, 1, 1) == "a")}}

gives this error :

 

{{Error: Filter expression not supported for Arrow Datasets: substr(a, 1, 1) == "a"
 Call collect() first to pull data into R.}}

These expressions may be very helpful, not to say necessary, to filter and collect a very large dataset. Is there anything it can be done to implement this new feature ?

Thank you.

  was:
Hi,

Some expressions, such as substr(), grepl(), str_detect() or others, are not supported while filtering after open_datatset(). Specifically, the code below :

 

```library(dplyr)
 library(arrow)
 data = data.frame(a = c("a", "a2", "a3"))
 write_parquet(data, "Test_filter/data.parquet")

ds <- open_dataset("Test_filter/")

data_flt <- ds %>% 
 filter(substr(a, 1, 1) == "a")```

gives this error :

 

{{Error: Filter expression not supported for Arrow Datasets: substr(a, 1, 1) == "a"
 Call collect() first to pull data into R.}}

These expressions may be very helpful, not to say necessary, to filter and collect a very large dataset. Is there anything it can be done to implement this new feature ?

Thank you.


> [R] Error: Filter expression not supported for Arrow Datasets (substr, grepl, str_detect)
> -----------------------------------------------------------------------------------------
>
>                 Key: ARROW-10305
>                 URL: https://issues.apache.org/jira/browse/ARROW-10305
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: R
>    Affects Versions: 1.0.1
>            Reporter: Pal
>            Priority: Major
>
> Hi,
> Some expressions, such as substr(), grepl(), str_detect() or others, are not supported while filtering after open_datatset(). Specifically, the code below :
>  
> {{library(dplyr)
>  library(arrow)
>  data = data.frame(a = c("a", "a2", "a3"))
>  write_parquet(data, "Test_filter/data.parquet")
> ds <- open_dataset("Test_filter/")
> data_flt <- ds %>% 
>  filter(substr(a, 1, 1) == "a")}}
> gives this error :
>  
> {{Error: Filter expression not supported for Arrow Datasets: substr(a, 1, 1) == "a"
>  Call collect() first to pull data into R.}}
> These expressions may be very helpful, not to say necessary, to filter and collect a very large dataset. Is there anything it can be done to implement this new feature ?
> Thank you.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)