You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Andrew Lamb (Jira)" <ji...@apache.org> on 2021/04/26 12:49:02 UTC

[jira] [Closed] (ARROW-11846) [Rust] Specify behavior of filter kernel on `null`

     [ https://issues.apache.org/jira/browse/ARROW-11846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andrew Lamb closed ARROW-11846.
-------------------------------
    Resolution: Invalid

> [Rust] Specify behavior of filter kernel on `null`
> --------------------------------------------------
>
>                 Key: ARROW-11846
>                 URL: https://issues.apache.org/jira/browse/ARROW-11846
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: Rust
>            Reporter: Ben Chambers
>            Priority: Minor
>
> Currently, the behavior of `filter` is undefined on null values.
> This leads to a few issues in cases where you may have a `boolean` array containing `null` values. For instance, I created a `null_to_false` which has to manipulate the underlying buffers in order to combine the null-bits with false. The C++ `filter` kernel allows specifying the behavior on nulls. Thoughts on adding a method that takes an additional parameter to configure the behavior, and then picking a "default" behavior for the existing implementation?
> {code:java}
> pub enum NullFilterBehavior {
>   // Include values where the filter was NULL.
>   EMIT,
>   // Exclude values where the filter was NULL.
>   SKIP,
>   // Ignore the null bits. Behavior is undefined.
>   UNDEFINED,
> }
> pub struct FilterConfig {
>   null_behavior: NullFilterBehavior
> }
> impl Default for FilterConfig {
>   fn default() -> Self {
>     Self {
>       null_behavior: NullFilterBehavior::UNDEFINED,
>     }
>   }
> }
> pub fn filter(array: &Array, filter: &BooleanArray) -> Result<ArrayRef> {
>   filter_config(array, filter, FilterConfig::default()
> }
> pub fn filter(array: &Array, filter: &BooleanArray, config: FilterConfig) -> Result<ArrayRef> {
>  ...
> }
> {code}
> It seems like implementing such a method could be done by allowing the BitChunksIterator to AND / OR each of the chunks before passing it to the BitSlices iterator.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)