You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Daniël Heres (Jira)" <ji...@apache.org> on 2021/01/20 20:13:00 UTC

[jira] [Created] (ARROW-11331) [Rust][DataFusion] Improve performance of Array.slice

Daniël Heres created ARROW-11331:
------------------------------------

             Summary: [Rust][DataFusion] Improve performance of Array.slice
                 Key: ARROW-11331
                 URL: https://issues.apache.org/jira/browse/ARROW-11331
             Project: Apache Arrow
          Issue Type: Improvement
            Reporter: Daniël Heres
         Attachments: 105164296-42515780-5b15-11eb-87f0-a042c4287514.png

In DataFusion we are using Array.slice since https://github.com/apache/arrow/pull/9271 to pass data into the accumulators, instead of having the overhead of building arrays (possibly with few rows) at once.

However, it seems pretty inefficient by now (taking a 1/6 of instructions for hash aggregates) doing some allocations under the hood instead of the promised "zero copy", much more than for example take which copies / shuffles the entire array based on indices.

[~jorgecarleitao]
{quote}Yes, slicing is suboptimal atm. Also, IMO it should not be the Array to implement that method, but each implementation individually. I haven't touch that part yet, though.
{quote}
 

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)