You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/08/15 13:08:25 UTC

[GitHub] [arrow] paleolimbot commented on pull request #13789: ARROW-14071: [R] Try to arrow_eval user-defined functions

paleolimbot commented on PR #13789:
URL: https://github.com/apache/arrow/pull/13789#issuecomment-1214992856

   This is very cool! It's the most important type of user-defined function because it's 100% translatable using Arrow kernels so it runs in parallel...a lot of applications will benefit from this!
   
   Have you considered adding a registration step? If you do, you may be able to simplify some of this. The dream, of course, is to not require pre-registration at all, which will require an approach much like the one you've sketched out here, (i.e., preprocessing the expression).
   
   <details>
   
   ``` r
   library(dplyr, warn.conflicts = FALSE)
   library(arrow, warn.conflicts = FALSE)
   #> Some features are not enabled in this build of Arrow. Run `arrow_info()` for more information.
   
   register_user_binding <- function(name, f, env = rlang::caller_env()) {
     # copy the bindings environment because we don't want to set the parent
     # of the one-and-only official bindings environment
     bindings_env <- as.environment(as.list(arrow:::nse_funcs))
     parent.env(bindings_env) <- env
     environment(f) <- bindings_env
     
     # register for use in Arrow (non-agg)
     arrow:::register_binding(name, f, update_cache = TRUE)
     
     # in case this is a recursive function
     arrow:::register_binding(name, f, bindings_env)
     
     # so that the user can call this function, too (most Arrow bindings accept
     # regular input, too)
     invisible(f)
   }
   
   nchar2 <- register_user_binding("nchar2", function(x) {
     1 + nchar(x)
   })
   
   record_batch(my_string = "1234") %>%
     mutate(
       var1 = nchar(my_string),
       var2 = nchar2(my_string)) %>%
     collect()
   #> # A tibble: 1 × 3
   #>   my_string  var1  var2
   #>   <chr>     <int> <dbl>
   #> 1 1234          4     5
   ```
   
   <sup>Created on 2022-08-15 by the [reprex package](https://reprex.tidyverse.org) (v2.0.1)</sup>
   
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org