You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Dewey Dunnington (Jira)" <ji...@apache.org> on 2021/11/03 12:35:00 UTC

[jira] [Commented] (ARROW-14071) [R] Try to arrow_eval user-defined functions

    [ https://issues.apache.org/jira/browse/ARROW-14071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17438024#comment-17438024 ] 

Dewey Dunnington commented on ARROW-14071:
------------------------------------------

Reprex: 

{{{color:#63a35c}library{color}(arrow, {color:#008080}warn.conflicts ={color} {color:#008080}FALSE{color}){color:#63a35c}library{color}(dplyr, {color:#008080}warn.conflicts ={color} {color:#008080}FALSE{color})nchar2 {color:#0086b3}<-{color} {color:#000000}function{color}(x) {  {color:#63a35c}nchar{color}(x)}RecordBatch{color:#008080}${color}{color:#63a35c}create{color}({color:#008080}my_string ={color} {color:#183691}"1234"{color}) {color:#008080}%>%{color}  {color:#63a35c}mutate{color}({color:#63a35c}nchar{color}(my_string), {color:#63a35c}nchar2{color}(my_string)) {color:#008080}%>%{color}  {color:#63a35c}collect{color}(){color:#969896}#> Warning: Expression nchar2(my_string) not supported in Arrow; pulling data into{color}{color:#969896}#> R{color}{color:#969896}#> # A tibble: 1 × 3{color}{color:#969896}#>   my_string `nchar(my_string)` `nchar2(my_string)`{color}{color:#969896}#>   <chr>                  <int>               <int>{color}{color:#969896}#> 1 1234                       4    {color}}}

 

I'm not sure if this works with the rlang data mask, but you could do this by setting `environment(fun)` to an environment that inherits the original `environment(fun)`. You probably don't want the data mask anyway because you don't want field references to interfere with the internal function variable names. (With apologies if you've done this already and I missed it):

{{masked_function {color:#0086b3}<-{color} {color:#000000}function{color}(fun, env) {  {color:#969896}# probably want to (shallow) copy `env` because we'd need to modify it{color}  {color:#969896}# and it's passed by reference{color}  env2 {color:#0086b3}<-{color} {color:#63a35c}new.env{color}({color:#008080}parent ={color} {color:#63a35c}environment{color}(fun))  {color:#000000}for{color} (name {color:#000000}in{color} {color:#63a35c}names{color}(env)) {    env2[[name]] {color:#0086b3}<-{color} env[[name]]  }    {color:#63a35c}environment{color}(fun) {color:#0086b3}<-{color} env2  fun}some_var {color:#0086b3}<-{color} {color:#009999}45{color}my_function {color:#0086b3}<-{color} {color:#000000}function{color}() {  some_var {color:#008080}+{color} {color:#009999}5{color}}{color:#63a35c}my_function{color}(){color:#969896}#> [1] 50{color}{color:#63a35c}masked_function{color}(my_function, {color:#63a35c}as.environment{color}({color:#63a35c}list{color}({color:#008080}some_var ={color} {color:#009999}1{color})))(){color:#969896}#> [1] 6{color}}}

> [R] Try to arrow_eval user-defined functions
> --------------------------------------------
>
>                 Key: ARROW-14071
>                 URL: https://issues.apache.org/jira/browse/ARROW-14071
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: R
>            Reporter: Neal Richardson
>            Assignee: Dewey Dunnington
>            Priority: Major
>             Fix For: 7.0.0
>
>
> The first test passes but the second one fails, even though they're equivalent. The user's function isn't being evaluated in the nse_funcs environment.
> {code}
>   expect_dplyr_equal(
>     input %>%
>       select(-fct) %>%
>       filter(nchar(padded_strings) < 10) %>%
>       collect(),
>     tbl
>   )
>   isShortString <- function(x) nchar(x) < 10
>   expect_dplyr_equal(
>     input %>%
>       select(-fct) %>%
>       filter(isShortString(padded_strings)) %>%
>       collect(),
>     tbl
>   )
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)