You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "tustvold (via GitHub)" <gi...@apache.org> on 2023/06/14 12:11:24 UTC

[GitHub] [arrow-rs] tustvold commented on issue #3620: Evaluate Kernel under Selection / Short-Circuiting Filter Evaluation

tustvold commented on issue #3620:
URL: https://github.com/apache/arrow-rs/issues/3620#issuecomment-1591072494

   Thinking about this a bit more, the intention of a selection vector is to allow a kernel to skip an expensive computation, such as a string comparison or regex evaluation, when **the result is unimportant because we know it is going to be discarded**. For some kernels the cost of consulting the selection vector will outweigh any savings, especially for kernels like integer comparison where it interferes with vectorisation.
   
   Now the potentially interesting observation is the exact same principle also holds for null masks, we shouldn't spend time performing expensive evaluation on null slots. I think we currently do in some cases, but this should be easy to fix.
   
   This then leads to the obvious question, if a false value in a selection vector indicates that the result doesn't matter, how would the semantics of an operation under a selection vector differ from the semantics of an operation with the arrays first passed to [`nullif`](https://docs.rs/arrow-select/latest/arrow_select/nullif/fn.nullif.html) with the selection vector. As the result is irrelevant, why would its null-ness matter?
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org