You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/10/28 03:46:23 UTC

[GitHub] [arrow] js8544 opened a new pull request, #14535: ARROW-18185: [C++][Compute] Support KEEP_NULL option for compute::Filter

js8544 opened a new pull request, #14535:
URL: https://github.com/apache/arrow/pull/14535

   The current Filter implementation always drops the filtered values. In some use cases, it's required for the output array to have the same size as the inut array. So I added a new option FilterOptions::KEEP_NULL where the filtered values are kept as nulls.
   
   For example, with input [1, 2, 3] and filter [true, false, true], the current implementation will output [1, 3] and with the new option it will output [1, null, 3]
   
   This option is simpler to implement since we only need to construct a new validity bitmap and reuse the input buffers and child arrays. Except for dense union arrays which don't have validity bitmaps.
   
   It is also faster to filter with FilterOptions::KEEP_NULL according to the benchmark result in most cases, except for the case when selection percentage is extremely small so it's cheaper to copy over the selected values.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] github-actions[bot] commented on pull request #14535: ARROW-18185: [C++][Compute] Support KEEP_NULL option for compute::Filter

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #14535:
URL: https://github.com/apache/arrow/pull/14535#issuecomment-1294434810

   :warning: Ticket **has not been started in JIRA**, please click 'Start Progress'.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] js8544 commented on pull request #14535: ARROW-18185: [C++][Compute] Support KEEP_NULL option for compute::Filter

Posted by GitBox <gi...@apache.org>.
js8544 commented on PR #14535:
URL: https://github.com/apache/arrow/pull/14535#issuecomment-1296485996

   CI failures are unrelated


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] github-actions[bot] commented on pull request #14535: ARROW-18185: [C++][Compute] Support KEEP_NULL option for compute::Filter

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #14535:
URL: https://github.com/apache/arrow/pull/14535#issuecomment-1294434779

   https://issues.apache.org/jira/browse/ARROW-18185


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] js8544 closed pull request #14535: ARROW-18185: [C++][Compute] Support KEEP_NULL option for compute::Filter

Posted by GitBox <gi...@apache.org>.
js8544 closed pull request #14535: ARROW-18185: [C++][Compute] Support KEEP_NULL option for compute::Filter
URL: https://github.com/apache/arrow/pull/14535


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] js8544 commented on pull request #14535: ARROW-18185: [C++][Compute] Support KEEP_NULL option for compute::Filter

Posted by GitBox <gi...@apache.org>.
js8544 commented on PR #14535:
URL: https://github.com/apache/arrow/pull/14535#issuecomment-1294421635

   Benchmark result on my machine: https://gist.github.com/js8544/7a1a1e798e41b42f51ccb4112bd2a2c2
   Benchmark name ending with 2 is with filtered_value_behavior = KEEP_NULL, it is faster than the other two options in most cases, except for the cases when selection percentage is extremely small so it's cheaper to copy over the selected values.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org