You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Dewey Dunnington (Jira)" <ji...@apache.org> on 2021/11/24 17:38:00 UTC

[jira] [Created] (ARROW-14855) [R] build_expr() should check that non-expression inputs have vec_size() == 1L

Dewey Dunnington created ARROW-14855:
----------------------------------------

             Summary: [R] build_expr() should check that non-expression inputs have vec_size() == 1L
                 Key: ARROW-14855
                 URL: https://issues.apache.org/jira/browse/ARROW-14855
             Project: Apache Arrow
          Issue Type: Improvement
          Components: R
            Reporter: Dewey Dunnington


What I’m trying to do is error to prevent code like this from working (since row order isn’t guaranteed in Arrow but is in R): 

{code:R}
# remotes::install_github("apache/arrow/r#11690")
library(arrow, warn.conflicts = FALSE)
library(dplyr, warn.conflicts = FALSE)

record_batch(a = c("something1", "something2")) %>% 
  mutate(df_col = data.frame(a, b = c("other1", "other2")))
#> InMemoryDataset (query)
#> a: string
#> df_col: struct<a: string, b: list<item: string>> ({a=a, b=...})
#> 
#> See $.data for the source Arrow object

tibble(a = c("something1", "something2")) %>% 
  mutate(df_col = data.frame(a, b = c("other1", "other2"))) %>% 
  arrow:::arrow_dplyr_query()
#> InMemoryDataset (query)
#> a: string
#> df_col: struct<a: string, b: string>
#> 
#> See $.data for the source Arrow object
{code}


This shows up elsewhere too with a confusing error: 

{code:R}
record_batch(a = 1:2) %>% mutate(a + 3:4)
#> Error: NotImplemented: Function add_checked has no kernel matching input types (array[int32], scalar[list<item: int32>])
#> /Users/deweydunnington/Desktop/rscratch/arrow/cpp/src/arrow/compute/exec/expression.cc:340  call.function->DispatchBest(&descrs)
{code}


I think we need slightly different rules than {{Scalar$create()}} uses when interpreting user expressions, since we want to error rather than wrap values that aren’t {{vctrs::vec_size() != 1}} in {{list()}} (thus changing the type that the user specified). 
Relevant section of {{build_expr()}}: <https://github.com/apache/arrow/blob/4b1135ccfd3075a175667c38dc6326865288caf6/r/R/expression.R#L204-L209> 
Relevant section of {{Scalar$create()}}: <https://github.com/apache/arrow/blob/4b1135ccfd3075a175667c38dc6326865288caf6/r/R/scalar.R#L75-L83>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)