You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Joris Van den Bossche (Jira)" <ji...@apache.org> on 2020/11/20 10:45:00 UTC

[jira] [Created] (ARROW-10663) [C++/Doc] The IsIn kernel ignores the skip_nulls option of SetLookupOptions

Joris Van den Bossche created ARROW-10663:
---------------------------------------------

             Summary: [C++/Doc] The IsIn kernel ignores the skip_nulls option of SetLookupOptions
                 Key: ARROW-10663
                 URL: https://issues.apache.org/jira/browse/ARROW-10663
             Project: Apache Arrow
          Issue Type: Bug
          Components: C++
            Reporter: Joris Van den Bossche
             Fix For: 3.0.0


The C++ docs of {{SetLookupOptions}} has this explanation of the {{skip_nulls}} option:

{code}
  /// Whether nulls in `value_set` count for lookup.
  ///
  /// If true, any null in `value_set` is ignored and nulls in the input
  /// produce null (IndexIn) or false (IsIn) values in the output.
  /// If false, any null in `value_set` is successfully matched in
  /// the input.
  bool skip_nulls;
{code}

(from https://github.com/apache/arrow/blob/8b9f6b9d28b4524724e60fac589fb1a3552a32b4/cpp/src/arrow/compute/api_scalar.h#L78-L84)

However, for {{IsIn}} this explanation doesn't seem to hold in practice:

{code}
In [16]: arr = pa.array([1, 2, None])

In [17]: pc.is_in(arr, value_set=pa.array([1, None]), skip_null=True)
Out[17]: 
<pyarrow.lib.BooleanArray object at 0x7fcf666f9408>
[
  true,
  false,
  true
]

In [18]: pc.is_in(arr, value_set=pa.array([1, None]), skip_null=False)
Out[18]: 
<pyarrow.lib.BooleanArray object at 0x7fcf666b13a8>
[
  true,
  false,
  true
]
{code}

This documentation was added in https://github.com/apache/arrow/pull/7695 (ARROW-8989)/
.

BTW, for "index_in", it works as documented:

{code}
In [19]: pc.index_in(arr, value_set=pa.array([1, None]), skip_null=True)
Out[19]: 
<pyarrow.lib.Int32Array object at 0x7fcf666f04c8>
[
  0,
  null,
  null
]

In [20]: pc.index_in(arr, value_set=pa.array([1, None]), skip_null=False)
Out[20]: 
<pyarrow.lib.Int32Array object at 0x7fcf666f0ee8>
[
  0,
  null,
  1
]
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)