You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Apache Arrow JIRA Bot (Jira)" <ji...@apache.org> on 2022/10/18 17:52:00 UTC

[jira] [Commented] (ARROW-14855) [R] build_expr() should check that non-expression inputs have vec_size() == 1L

    [ https://issues.apache.org/jira/browse/ARROW-14855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17619718#comment-17619718 ] 

Apache Arrow JIRA Bot commented on ARROW-14855:
-----------------------------------------------

This issue was last updated over 90 days ago, which may be an indication it is no longer being actively worked. To better reflect the current state, the issue is being unassigned per [project policy|https://arrow.apache.org/docs/dev/developers/bug_reports.html#issue-assignment]. Please feel free to re-take assignment of the issue if it is being actively worked, or if you plan to start that work soon.

> [R] build_expr() should check that non-expression inputs have vec_size() == 1L
> ------------------------------------------------------------------------------
>
>                 Key: ARROW-14855
>                 URL: https://issues.apache.org/jira/browse/ARROW-14855
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: R
>            Reporter: Dewey Dunnington
>            Assignee: Dragoș Moldovan-Grünfeld
>            Priority: Major
>
> What I’m trying to do is error to prevent code like this from working (since row order isn’t guaranteed in Arrow but is in R): 
> {code:R}
> # remotes::install_github("apache/arrow/r#11690")
> library(arrow, warn.conflicts = FALSE)
> library(dplyr, warn.conflicts = FALSE)
> record_batch(a = c("something1", "something2")) %>% 
>   mutate(df_col = data.frame(a, b = c("other1", "other2")))
> #> InMemoryDataset (query)
> #> a: string
> #> df_col: struct<a: string, b: list<item: string>> ({a=a, b=...})
> #> 
> #> See $.data for the source Arrow object
> tibble(a = c("something1", "something2")) %>% 
>   mutate(df_col = data.frame(a, b = c("other1", "other2"))) %>% 
>   arrow:::arrow_dplyr_query()
> #> InMemoryDataset (query)
> #> a: string
> #> df_col: struct<a: string, b: string>
> #> 
> #> See $.data for the source Arrow object
> {code}
> This shows up elsewhere too with a confusing error: 
> {code:R}
> record_batch(a = 1:2) %>% mutate(a + 3:4)
> #> Error: NotImplemented: Function add_checked has no kernel matching input types (array[int32], scalar[list<item: int32>])
> #> /Users/deweydunnington/Desktop/rscratch/arrow/cpp/src/arrow/compute/exec/expression.cc:340  call.function->DispatchBest(&descrs)
> {code}
> I think we need slightly different rules than {{Scalar$create()}} uses when interpreting user expressions, since we want to error rather than wrap values that aren’t {{vctrs::vec_size() != 1}} in {{list()}} (thus changing the type that the user specified). 
> Relevant section of {{build_expr()}}: <https://github.com/apache/arrow/blob/4b1135ccfd3075a175667c38dc6326865288caf6/r/R/expression.R#L204-L209> 
> Relevant section of {{Scalar$create()}}: <https://github.com/apache/arrow/blob/4b1135ccfd3075a175667c38dc6326865288caf6/r/R/scalar.R#L75-L83>



--
This message was sent by Atlassian Jira
(v8.20.10#820010)