You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Antoine Pitrou (Jira)" <ji...@apache.org> on 2021/01/11 16:40:00 UTC

[jira] [Assigned] (ARROW-10663) [C++/Doc] The IsIn kernel ignores the skip_nulls option of SetLookupOptions

     [ https://issues.apache.org/jira/browse/ARROW-10663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Antoine Pitrou reassigned ARROW-10663:
--------------------------------------

    Assignee: Antoine Pitrou

> [C++/Doc] The IsIn kernel ignores the skip_nulls option of SetLookupOptions
> ---------------------------------------------------------------------------
>
>                 Key: ARROW-10663
>                 URL: https://issues.apache.org/jira/browse/ARROW-10663
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: C++
>            Reporter: Joris Van den Bossche
>            Assignee: Antoine Pitrou
>            Priority: Major
>             Fix For: 3.0.0
>
>
> The C++ docs of {{SetLookupOptions}} has this explanation of the {{skip_nulls}} option:
> {code}
>   /// Whether nulls in `value_set` count for lookup.
>   ///
>   /// If true, any null in `value_set` is ignored and nulls in the input
>   /// produce null (IndexIn) or false (IsIn) values in the output.
>   /// If false, any null in `value_set` is successfully matched in
>   /// the input.
>   bool skip_nulls;
> {code}
> (from https://github.com/apache/arrow/blob/8b9f6b9d28b4524724e60fac589fb1a3552a32b4/cpp/src/arrow/compute/api_scalar.h#L78-L84)
> However, for {{IsIn}} this explanation doesn't seem to hold in practice:
> {code}
> In [16]: arr = pa.array([1, 2, None])
> In [17]: pc.is_in(arr, value_set=pa.array([1, None]), skip_null=True)
> Out[17]: 
> <pyarrow.lib.BooleanArray object at 0x7fcf666f9408>
> [
>   true,
>   false,
>   true
> ]
> In [18]: pc.is_in(arr, value_set=pa.array([1, None]), skip_null=False)
> Out[18]: 
> <pyarrow.lib.BooleanArray object at 0x7fcf666b13a8>
> [
>   true,
>   false,
>   true
> ]
> {code}
> This documentation was added in https://github.com/apache/arrow/pull/7695 (ARROW-8989)/
> .
> BTW, for "index_in", it works as documented:
> {code}
> In [19]: pc.index_in(arr, value_set=pa.array([1, None]), skip_null=True)
> Out[19]: 
> <pyarrow.lib.Int32Array object at 0x7fcf666f04c8>
> [
>   0,
>   null,
>   null
> ]
> In [20]: pc.index_in(arr, value_set=pa.array([1, None]), skip_null=False)
> Out[20]: 
> <pyarrow.lib.Int32Array object at 0x7fcf666f0ee8>
> [
>   0,
>   null,
>   1
> ]
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)