You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Mauricio 'Pachá' Vargas Sepúlveda (Jira)" <ji...@apache.org> on 2021/06/25 17:11:00 UTC
[jira] [Created] (ARROW-13188) [R] [C++] Implement SQL-alike
distinct() for dplyr queries
Mauricio 'Pachá' Vargas Sepúlveda created ARROW-13188:
---------------------------------------------------------
Summary: [R] [C++] Implement SQL-alike distinct() for dplyr queries
Key: ARROW-13188
URL: https://issues.apache.org/jira/browse/ARROW-13188
Project: Apache Arrow
Issue Type: Bug
Components: C++, R
Affects Versions: 4.0.1
Reporter: Mauricio 'Pachá' Vargas Sepúlveda
I would be highly desirable to be able to use (base) substr and/or (stringr) str_sub in dplyr queries, like
{code:r}
library(arrow)
library(dplyr)
library(stringr)
# get animal products, year 20919
open_dataset(
"../cepii-datasets-arrow/parquet/baci_hs92",
partitioning = c("year", "reporter_iso")
) %>%
filter(
year == 2019,
str_sub(product_code, 1, 2) == "01"
) %>%
collect()
Error: Filter expression not supported for Arrow Datasets: str_sub(product_code, 1, 2) == "01"
Call collect() first to pull data into R.
{code}
Of course, this needs implementation, but similar to ARROW-13107, points to an easier integration with dplyr.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)