You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Dewey Dunnington (Jira)" <ji...@apache.org> on 2021/11/03 12:35:00 UTC
[jira] [Commented] (ARROW-14071) [R] Try to arrow_eval user-defined
functions
[ https://issues.apache.org/jira/browse/ARROW-14071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17438024#comment-17438024 ]
Dewey Dunnington commented on ARROW-14071:
------------------------------------------
Reprex:
{{{color:#63a35c}library{color}(arrow, {color:#008080}warn.conflicts ={color} {color:#008080}FALSE{color}){color:#63a35c}library{color}(dplyr, {color:#008080}warn.conflicts ={color} {color:#008080}FALSE{color})nchar2 {color:#0086b3}<-{color} {color:#000000}function{color}(x) { {color:#63a35c}nchar{color}(x)}RecordBatch{color:#008080}${color}{color:#63a35c}create{color}({color:#008080}my_string ={color} {color:#183691}"1234"{color}) {color:#008080}%>%{color} {color:#63a35c}mutate{color}({color:#63a35c}nchar{color}(my_string), {color:#63a35c}nchar2{color}(my_string)) {color:#008080}%>%{color} {color:#63a35c}collect{color}(){color:#969896}#> Warning: Expression nchar2(my_string) not supported in Arrow; pulling data into{color}{color:#969896}#> R{color}{color:#969896}#> # A tibble: 1 × 3{color}{color:#969896}#> my_string `nchar(my_string)` `nchar2(my_string)`{color}{color:#969896}#> <chr> <int> <int>{color}{color:#969896}#> 1 1234 4 {color}}}
I'm not sure if this works with the rlang data mask, but you could do this by setting `environment(fun)` to an environment that inherits the original `environment(fun)`. You probably don't want the data mask anyway because you don't want field references to interfere with the internal function variable names. (With apologies if you've done this already and I missed it):
{{masked_function {color:#0086b3}<-{color} {color:#000000}function{color}(fun, env) { {color:#969896}# probably want to (shallow) copy `env` because we'd need to modify it{color} {color:#969896}# and it's passed by reference{color} env2 {color:#0086b3}<-{color} {color:#63a35c}new.env{color}({color:#008080}parent ={color} {color:#63a35c}environment{color}(fun)) {color:#000000}for{color} (name {color:#000000}in{color} {color:#63a35c}names{color}(env)) { env2[[name]] {color:#0086b3}<-{color} env[[name]] } {color:#63a35c}environment{color}(fun) {color:#0086b3}<-{color} env2 fun}some_var {color:#0086b3}<-{color} {color:#009999}45{color}my_function {color:#0086b3}<-{color} {color:#000000}function{color}() { some_var {color:#008080}+{color} {color:#009999}5{color}}{color:#63a35c}my_function{color}(){color:#969896}#> [1] 50{color}{color:#63a35c}masked_function{color}(my_function, {color:#63a35c}as.environment{color}({color:#63a35c}list{color}({color:#008080}some_var ={color} {color:#009999}1{color})))(){color:#969896}#> [1] 6{color}}}
> [R] Try to arrow_eval user-defined functions
> --------------------------------------------
>
> Key: ARROW-14071
> URL: https://issues.apache.org/jira/browse/ARROW-14071
> Project: Apache Arrow
> Issue Type: Improvement
> Components: R
> Reporter: Neal Richardson
> Assignee: Dewey Dunnington
> Priority: Major
> Fix For: 7.0.0
>
>
> The first test passes but the second one fails, even though they're equivalent. The user's function isn't being evaluated in the nse_funcs environment.
> {code}
> expect_dplyr_equal(
> input %>%
> select(-fct) %>%
> filter(nchar(padded_strings) < 10) %>%
> collect(),
> tbl
> )
> isShortString <- function(x) nchar(x) < 10
> expect_dplyr_equal(
> input %>%
> select(-fct) %>%
> filter(isShortString(padded_strings)) %>%
> collect(),
> tbl
> )
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)