You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Andrew Lamb (Jira)" <ji...@apache.org> on 2020/09/30 11:39:00 UTC
[jira] [Created] (ARROW-10141) [Rust][Arrow] Improve performance of
filter kernel
Andrew Lamb created ARROW-10141:
-----------------------------------
Summary: [Rust][Arrow] Improve performance of filter kernel
Key: ARROW-10141
URL: https://issues.apache.org/jira/browse/ARROW-10141
Project: Apache Arrow
Issue Type: Improvement
Reporter: Andrew Lamb
As [~jorgecarleitao] noted here:
https://github.com/apache/arrow/pull/8303#issuecomment-701328143
The improvement of the filter kernel (and likely others) could be improved by avoiding creating intermediate copies. The code currently:
# creates Vec<Option<T>> through an iteration
# copies Vec<Option<T>> to the two buffers (when from_opt_vec is called)
it may be more efficient to create the buffers during the iteration, so that we avoid the copy (Vec -> buffers). In other words, the code in from_opt_vec could have been "injected" into the filter execution, where the MutableBuffer and offsets and values buffer are created before the loop, and new elements are directly written to it.
(as a side note, this is why he proposed ARROW-10030 https://github.com/apache/arrow/pull/8211 : IMO there is some boiler-plate copy-pasting to
* initialize buffers
* iterate
* create ArrayData from buffers
which will continue to grow as we add more kernels, and whose pattern seems to be a FromIter of fixed size
--
This message was sent by Atlassian Jira
(v8.3.4#803005)