You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Ben Kietzman (Jira)" <ji...@apache.org> on 2021/06/18 17:30:00 UTC
[jira] [Updated] (ARROW-13121) [C++][Compute] Extract preallocation
logic to a public function
[ https://issues.apache.org/jira/browse/ARROW-13121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ben Kietzman updated ARROW-13121:
---------------------------------
Summary: [C++][Compute] Extract preallocation logic to a public function (was: [C++][Compute] Extract preallocation logic to a method of kernels)
> [C++][Compute] Extract preallocation logic to a public function
> ---------------------------------------------------------------
>
> Key: ARROW-13121
> URL: https://issues.apache.org/jira/browse/ARROW-13121
> Project: Apache Arrow
> Issue Type: Improvement
> Components: C++
> Reporter: Ben Kietzman
> Priority: Major
>
> Currently KernelExecutor handles preallocation of null bitmaps and other buffers based on simple flags on each Kernel. This is not very flexible and we end up leaving a lot of performance on the table in cases where we can preallocate but the behavior can't be captured in the available flags. For example, in the case of {{binary_string_join_element_wise}}, it would be possible to preallocate all buffers (even the character buffer) and write output into slices.
> Having this as a public function would enable us to unit test it directly (currently Executors are only tested indirectly through calling of compute::Functions) and reuse it, for example to correctly preallocate a small temporary for pipelined execution
> One way this could be added is as a new method on each Kernel:
> {code}
> // Output preallocated Datums sufficient for execution of the kernel on each ExecBatch.
> // The output Datums may not be identically chunked to the input batches, for example
> // kernels which support contiguous output preallocation will preallocate a single Datum
> // (and can then output into slices of that Datum).
> Result<std::vector<Datum>> Kernel::prepare_output(
> const Kernel*,
> KernelContext*,
> const std::vector<ExecBatch>& inputs)
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)