You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Joris Van den Bossche (Jira)" <ji...@apache.org> on 2020/11/20 11:07:00 UTC

[jira] [Comment Edited] (ARROW-10663) [C++/Doc] The IsIn kernel ignores the skip_nulls option of SetLookupOptions

    [ https://issues.apache.org/jira/browse/ARROW-10663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17236064#comment-17236064 ] 

Joris Van den Bossche edited comment on ARROW-10663 at 11/20/20, 11:06 AM:
---------------------------------------------------------------------------

The original PR adding it seems to have documented a different behaviour at the time (https://github.com/apache/arrow/pull/4235/files#diff-fc156499f9e4a75e0ef2d7e83b390f68e833cfa46c164cd0b4542af10a0337e2R35-R36) ("If null occurs in left, if null count in right is not 0, it returns true, else returns null.").

-But what I don't understand is that this still seems to be tested that way for the {{IsIn}} function: https://github.com/apache/arrow/blob/8b9f6b9d28b4524724e60fac589fb1a3552a32b4/cpp/src/arrow/compute/kernels/scalar_set_lookup_test.cc#L107-L111-

Correction: the test I linked to was for only nulls in the left array, and so that's also how it works in Python (my example above is about nulls in both left and right array).


was (Author: jorisvandenbossche):
The original PR adding it seems to have documented a different behaviour at the time (https://github.com/apache/arrow/pull/4235/files#diff-fc156499f9e4a75e0ef2d7e83b390f68e833cfa46c164cd0b4542af10a0337e2R35-R36) ("If null occurs in left, if null count in right is not 0, it returns true, else returns null."), so this might have been changed later.   

But what I don't understand is that this still seems to be tested that way for the {{IsIn}} function: https://github.com/apache/arrow/blob/8b9f6b9d28b4524724e60fac589fb1a3552a32b4/cpp/src/arrow/compute/kernels/scalar_set_lookup_test.cc#L107-L111




> [C++/Doc] The IsIn kernel ignores the skip_nulls option of SetLookupOptions
> ---------------------------------------------------------------------------
>
>                 Key: ARROW-10663
>                 URL: https://issues.apache.org/jira/browse/ARROW-10663
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: C++
>            Reporter: Joris Van den Bossche
>            Priority: Major
>             Fix For: 3.0.0
>
>
> The C++ docs of {{SetLookupOptions}} has this explanation of the {{skip_nulls}} option:
> {code}
>   /// Whether nulls in `value_set` count for lookup.
>   ///
>   /// If true, any null in `value_set` is ignored and nulls in the input
>   /// produce null (IndexIn) or false (IsIn) values in the output.
>   /// If false, any null in `value_set` is successfully matched in
>   /// the input.
>   bool skip_nulls;
> {code}
> (from https://github.com/apache/arrow/blob/8b9f6b9d28b4524724e60fac589fb1a3552a32b4/cpp/src/arrow/compute/api_scalar.h#L78-L84)
> However, for {{IsIn}} this explanation doesn't seem to hold in practice:
> {code}
> In [16]: arr = pa.array([1, 2, None])
> In [17]: pc.is_in(arr, value_set=pa.array([1, None]), skip_null=True)
> Out[17]: 
> <pyarrow.lib.BooleanArray object at 0x7fcf666f9408>
> [
>   true,
>   false,
>   true
> ]
> In [18]: pc.is_in(arr, value_set=pa.array([1, None]), skip_null=False)
> Out[18]: 
> <pyarrow.lib.BooleanArray object at 0x7fcf666b13a8>
> [
>   true,
>   false,
>   true
> ]
> {code}
> This documentation was added in https://github.com/apache/arrow/pull/7695 (ARROW-8989)/
> .
> BTW, for "index_in", it works as documented:
> {code}
> In [19]: pc.index_in(arr, value_set=pa.array([1, None]), skip_null=True)
> Out[19]: 
> <pyarrow.lib.Int32Array object at 0x7fcf666f04c8>
> [
>   0,
>   null,
>   null
> ]
> In [20]: pc.index_in(arr, value_set=pa.array([1, None]), skip_null=False)
> Out[20]: 
> <pyarrow.lib.Int32Array object at 0x7fcf666f0ee8>
> [
>   0,
>   null,
>   1
> ]
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)