You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/10/28 03:46:23 UTC
[GitHub] [arrow] js8544 opened a new pull request, #14535: ARROW-18185: [C++][Compute] Support KEEP_NULL option for compute::Filter
js8544 opened a new pull request, #14535:
URL: https://github.com/apache/arrow/pull/14535
The current Filter implementation always drops the filtered values. In some use cases, it's required for the output array to have the same size as the inut array. So I added a new option FilterOptions::KEEP_NULL where the filtered values are kept as nulls.
For example, with input [1, 2, 3] and filter [true, false, true], the current implementation will output [1, 3] and with the new option it will output [1, null, 3]
This option is simpler to implement since we only need to construct a new validity bitmap and reuse the input buffers and child arrays. Except for dense union arrays which don't have validity bitmaps.
It is also faster to filter with FilterOptions::KEEP_NULL according to the benchmark result in most cases, except for the case when selection percentage is extremely small so it's cheaper to copy over the selected values.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on pull request #14535: ARROW-18185: [C++][Compute] Support KEEP_NULL option for compute::Filter
Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #14535:
URL: https://github.com/apache/arrow/pull/14535#issuecomment-1294434810
:warning: Ticket **has not been started in JIRA**, please click 'Start Progress'.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] js8544 commented on pull request #14535: ARROW-18185: [C++][Compute] Support KEEP_NULL option for compute::Filter
Posted by GitBox <gi...@apache.org>.
js8544 commented on PR #14535:
URL: https://github.com/apache/arrow/pull/14535#issuecomment-1296485996
CI failures are unrelated
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on pull request #14535: ARROW-18185: [C++][Compute] Support KEEP_NULL option for compute::Filter
Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #14535:
URL: https://github.com/apache/arrow/pull/14535#issuecomment-1294434779
https://issues.apache.org/jira/browse/ARROW-18185
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] js8544 closed pull request #14535: ARROW-18185: [C++][Compute] Support KEEP_NULL option for compute::Filter
Posted by GitBox <gi...@apache.org>.
js8544 closed pull request #14535: ARROW-18185: [C++][Compute] Support KEEP_NULL option for compute::Filter
URL: https://github.com/apache/arrow/pull/14535
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] js8544 commented on pull request #14535: ARROW-18185: [C++][Compute] Support KEEP_NULL option for compute::Filter
Posted by GitBox <gi...@apache.org>.
js8544 commented on PR #14535:
URL: https://github.com/apache/arrow/pull/14535#issuecomment-1294421635
Benchmark result on my machine: https://gist.github.com/js8544/7a1a1e798e41b42f51ccb4112bd2a2c2
Benchmark name ending with 2 is with filtered_value_behavior = KEEP_NULL, it is faster than the other two options in most cases, except for the cases when selection percentage is extremely small so it's cheaper to copy over the selected values.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org