You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Neal Richardson (Jira)" <ji...@apache.org> on 2022/09/04 13:21:00 UTC

[jira] [Created] (ARROW-17609) [R] Streamline some C++ calls

Neal Richardson created ARROW-17609:
---------------------------------------

             Summary: [R] Streamline some C++ calls
                 Key: ARROW-17609
                 URL: https://issues.apache.org/jira/browse/ARROW-17609
             Project: Apache Arrow
          Issue Type: New Feature
          Components: R
            Reporter: Neal Richardson
            Assignee: Neal Richardson


When looking at profiling data of TPC-H queries on ARROW-17462, there was some added overhead (not a ton: tens of ms, but enough to trigger benchmark regressions on small data) from the extra expression type calculation. It's not a huge deal, but I saw a few places where we could avoid doing unnecessary work:

* Memoize Expression$type calculation
* Defer Expression$schema determination (calls UnifySchema on expression args' schemas)--most expressions don't ever need it (ARROW-13186)
* Set Expression$scalar type at creation so we don't have to query it
* Eliminate the .fields() R function and move logic into Schema constructor--it creates a bunch of Field R6 objects that immediately are dropped



--
This message was sent by Atlassian Jira
(v8.20.10#820010)