You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Jin Shang (Jira)" <ji...@apache.org> on 2022/10/28 03:46:00 UTC

[jira] [Created] (ARROW-18185) [C++][Compute] Support KEEP_NULL option for compute::Filter

Jin Shang created ARROW-18185:
---------------------------------

             Summary: [C++][Compute] Support KEEP_NULL option for compute::Filter
                 Key: ARROW-18185
                 URL: https://issues.apache.org/jira/browse/ARROW-18185
             Project: Apache Arrow
          Issue Type: New Feature
          Components: C++
            Reporter: Jin Shang


The current Filter implementation always drops the filtered values. In some use cases, it's required for the output array to have the same size as the inut array. So I added a new option FilterOptions::KEEP_NULL where the filtered values are kept as nulls.

For example, with input [1, 2, 3] and filter [true, false, true], the current implementation will output [1, 3] and with the new option it will output [1, null, 3]

This option is simpler to implement since we only need to construct a new validity bitmap and reuse the input buffers and child arrays. Except for dense union arrays which don't have  validity bitmaps.

It is also faster to filter with FilterOptions::KEEP_NULL according to the benchmark result in most cases, except for the case when selection percentage is extremely small so it's cheaper to copy over the selected values.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)