You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Andrew Lamb (Jira)" <ji...@apache.org> on 2020/09/30 11:39:00 UTC

[jira] [Created] (ARROW-10141) [Rust][Arrow] Improve performance of filter kernel

Andrew Lamb created ARROW-10141:
-----------------------------------

             Summary: [Rust][Arrow] Improve performance of filter kernel
                 Key: ARROW-10141
                 URL: https://issues.apache.org/jira/browse/ARROW-10141
             Project: Apache Arrow
          Issue Type: Improvement
            Reporter: Andrew Lamb


As [~jorgecarleitao] noted here: 
https://github.com/apache/arrow/pull/8303#issuecomment-701328143

The improvement of the filter kernel (and likely others) could be improved by avoiding creating intermediate copies. The code currently:

# creates Vec<Option<T>> through an iteration
# copies Vec<Option<T>> to the two buffers (when from_opt_vec is called)

it may be more efficient to create the buffers during the iteration, so that we avoid the copy (Vec -> buffers). In other words, the code in from_opt_vec could have been "injected" into the filter execution, where the MutableBuffer and offsets and values buffer are created before the loop, and new elements are directly written to it. 

(as a side note, this is why he proposed ARROW-10030 https://github.com/apache/arrow/pull/8211  : IMO there is some boiler-plate copy-pasting to

* initialize buffers
* iterate
* create ArrayData from buffers

which will continue to grow as we add more kernels, and whose pattern seems to be a FromIter of fixed size



--
This message was sent by Atlassian Jira
(v8.3.4#803005)