You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Apache Arrow JIRA Bot (Jira)" <ji...@apache.org> on 2022/10/18 17:52:00 UTC

[jira] [Assigned] (ARROW-14855) [R] build_expr() should check that non-expression inputs have vec_size() == 1L

     [ https://issues.apache.org/jira/browse/ARROW-14855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Apache Arrow JIRA Bot reassigned ARROW-14855:
---------------------------------------------

    Assignee:     (was: Dragoș Moldovan-Grünfeld)

> [R] build_expr() should check that non-expression inputs have vec_size() == 1L
> ------------------------------------------------------------------------------
>
>                 Key: ARROW-14855
>                 URL: https://issues.apache.org/jira/browse/ARROW-14855
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: R
>            Reporter: Dewey Dunnington
>            Priority: Major
>
> What I’m trying to do is error to prevent code like this from working (since row order isn’t guaranteed in Arrow but is in R): 
> {code:R}
> # remotes::install_github("apache/arrow/r#11690")
> library(arrow, warn.conflicts = FALSE)
> library(dplyr, warn.conflicts = FALSE)
> record_batch(a = c("something1", "something2")) %>% 
>   mutate(df_col = data.frame(a, b = c("other1", "other2")))
> #> InMemoryDataset (query)
> #> a: string
> #> df_col: struct<a: string, b: list<item: string>> ({a=a, b=...})
> #> 
> #> See $.data for the source Arrow object
> tibble(a = c("something1", "something2")) %>% 
>   mutate(df_col = data.frame(a, b = c("other1", "other2"))) %>% 
>   arrow:::arrow_dplyr_query()
> #> InMemoryDataset (query)
> #> a: string
> #> df_col: struct<a: string, b: string>
> #> 
> #> See $.data for the source Arrow object
> {code}
> This shows up elsewhere too with a confusing error: 
> {code:R}
> record_batch(a = 1:2) %>% mutate(a + 3:4)
> #> Error: NotImplemented: Function add_checked has no kernel matching input types (array[int32], scalar[list<item: int32>])
> #> /Users/deweydunnington/Desktop/rscratch/arrow/cpp/src/arrow/compute/exec/expression.cc:340  call.function->DispatchBest(&descrs)
> {code}
> I think we need slightly different rules than {{Scalar$create()}} uses when interpreting user expressions, since we want to error rather than wrap values that aren’t {{vctrs::vec_size() != 1}} in {{list()}} (thus changing the type that the user specified). 
> Relevant section of {{build_expr()}}: <https://github.com/apache/arrow/blob/4b1135ccfd3075a175667c38dc6326865288caf6/r/R/expression.R#L204-L209> 
> Relevant section of {{Scalar$create()}}: <https://github.com/apache/arrow/blob/4b1135ccfd3075a175667c38dc6326865288caf6/r/R/scalar.R#L75-L83>



--
This message was sent by Atlassian Jira
(v8.20.10#820010)