You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@arrow.apache.org by "Dewey Dunnington (Jira)" <ji...@apache.org> on 2022/07/20 18:28:00 UTC

[jira] [Created] (ARROW-17148) [R] Improve evaluation of R functions from C++

Dewey Dunnington created ARROW-17148:
----------------------------------------

             Summary: [R] Improve evaluation of R functions from C++
                 Key: ARROW-17148
                 URL: https://issues.apache.org/jira/browse/ARROW-17148
             Project: Apache Arrow
          Issue Type: Improvement
          Components: R
            Reporter: Dewey Dunnington


There are currently a few places where we call R code from C++ (and after ARROW-16444 and ARROW-16703 we will have some more where the overhead of calling into R might be greater than the time it takes to actually evaluate the function/the functions will be called in a tight loop).

The current approach uses {{cpp11::function}}. This is totally fine and safe but generates some ugly backtraces on error and is potentially slower than the lean-and-mean approach of purrr (whose entire job is to call R functions in a loop and has been heavily optimized). The purrr approach is to construct the {{call()}} and calling environment in advance and then just run `Rf_eval(call, env)` in the loop. This is both faster (fewer R API calls) and generates better backtraces (e.g., {{Error in fun(arg1, arg2)}} instead of {{Error in (function(a, b) { ...the whole content of the function ... })(every, deparsed, argument)}}.

Before optimizing that heavily we should of course benchmark to see exactly how much that matters!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)