You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Ben Chambers (Jira)" <ji...@apache.org> on 2021/03/03 00:19:00 UTC

[jira] [Created] (ARROW-11846) Specify behavior of filter kernel on `null`

Ben Chambers created ARROW-11846:
------------------------------------

             Summary: Specify behavior of filter kernel on `null`
                 Key: ARROW-11846
                 URL: https://issues.apache.org/jira/browse/ARROW-11846
             Project: Apache Arrow
          Issue Type: Improvement
          Components: Rust
            Reporter: Ben Chambers


Currently, the behavior of `filter` is undefined on null values.

This leads to a few issues in cases where you may have a `boolean` array containing `null` values. For instance, I created a `null_to_false` which has to manipulate the underlying buffers in order to combine the null-bits with false. The C++ `filter` kernel allows specifying the behavior on nulls. Thoughts on adding a method that takes an additional parameter to configure the behavior, and then picking a "default" behavior for the existing implementation?
{code:java}
pub enum NullFilterBehavior {
  // Include values where the filter was NULL.
  EMIT,
  // Exclude values where the filter was NULL.
  SKIP,
  // Ignore the null bits. Behavior is undefined.
  UNDEFINED,
}

pub struct FilterConfig {
  null_behavior: NullFilterBehavior
}

impl Default for FilterConfig {
  fn default() -> Self {
    Self {
      null_behavior: NullFilterBehavior::UNDEFINED,
    }
  }
}

pub fn filter(array: &Array, filter: &BooleanArray) -> Result<ArrayRef> {
  filter_config(array, filter, FilterConfig::default()
}

pub fn filter(array: &Array, filter: &BooleanArray, config: FilterConfig) -> Result<ArrayRef> {
 ...
}
{code}
It seems like implementing such a method could be done by allowing the BitChunksIterator to AND / OR each of the chunks before passing it to the BitSlices iterator.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)