You are viewing a plain text version of this content. The canonical link for it is here.

Posted to jira@arrow.apache.org by "Neal Richardson (Jira)" <ji...@apache.org> on 2022/10/10 20:12:00 UTC

[jira] [Commented] (ARROW-17609) [R] Streamline some C++ calls

    [ https://issues.apache.org/jira/browse/ARROW-17609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17615296#comment-17615296 ] 

Neal Richardson commented on ARROW-17609:
-----------------------------------------

To clarify: it seems that the cost is in instantiating the R6 objects, not the calls to C++ themselves. But memoizing, deferring, etc. in these cases would save going to C++ to create a new R6 object.

> [R] Streamline some C++ calls
> -----------------------------
>
>                 Key: ARROW-17609
>                 URL: https://issues.apache.org/jira/browse/ARROW-17609
>             Project: Apache Arrow
>          Issue Type: New Feature
>          Components: R
>            Reporter: Neal Richardson
>            Assignee: Neal Richardson
>            Priority: Major
>
> When looking at profiling data of TPC-H queries on ARROW-17462, there was some added overhead (not a ton: tens of ms, but enough to trigger benchmark regressions on small data) from the extra expression type calculation. It's not a huge deal, but I saw a few places where we could avoid doing unnecessary work:
> * Memoize Expression$type calculation
> * Defer Expression$schema determination (calls UnifySchema on expression args' schemas)--most expressions don't ever need it (ARROW-13186)
> * Set Expression$scalar type at creation so we don't have to query it
> * Eliminate the .fields() R function and move logic into Schema constructor--it creates a bunch of Field R6 objects that immediately are dropped



--
This message was sent by Atlassian Jira
(v8.20.10#820010)