You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2021/01/11 19:18:00 UTC
[jira] [Updated] (ARROW-10663) [C++/Doc] The IsIn kernel ignores
the skip_nulls option of SetLookupOptions
[ https://issues.apache.org/jira/browse/ARROW-10663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
ASF GitHub Bot updated ARROW-10663:
-----------------------------------
Labels: pull-request-available (was: )
> [C++/Doc] The IsIn kernel ignores the skip_nulls option of SetLookupOptions
> ---------------------------------------------------------------------------
>
> Key: ARROW-10663
> URL: https://issues.apache.org/jira/browse/ARROW-10663
> Project: Apache Arrow
> Issue Type: Bug
> Components: C++
> Reporter: Joris Van den Bossche
> Assignee: Antoine Pitrou
> Priority: Major
> Labels: pull-request-available
> Fix For: 3.0.0
>
> Time Spent: 10m
> Remaining Estimate: 0h
>
> The C++ docs of {{SetLookupOptions}} has this explanation of the {{skip_nulls}} option:
> {code}
> /// Whether nulls in `value_set` count for lookup.
> ///
> /// If true, any null in `value_set` is ignored and nulls in the input
> /// produce null (IndexIn) or false (IsIn) values in the output.
> /// If false, any null in `value_set` is successfully matched in
> /// the input.
> bool skip_nulls;
> {code}
> (from https://github.com/apache/arrow/blob/8b9f6b9d28b4524724e60fac589fb1a3552a32b4/cpp/src/arrow/compute/api_scalar.h#L78-L84)
> However, for {{IsIn}} this explanation doesn't seem to hold in practice:
> {code}
> In [16]: arr = pa.array([1, 2, None])
> In [17]: pc.is_in(arr, value_set=pa.array([1, None]), skip_null=True)
> Out[17]:
> <pyarrow.lib.BooleanArray object at 0x7fcf666f9408>
> [
> true,
> false,
> true
> ]
> In [18]: pc.is_in(arr, value_set=pa.array([1, None]), skip_null=False)
> Out[18]:
> <pyarrow.lib.BooleanArray object at 0x7fcf666b13a8>
> [
> true,
> false,
> true
> ]
> {code}
> This documentation was added in https://github.com/apache/arrow/pull/7695 (ARROW-8989)/
> .
> BTW, for "index_in", it works as documented:
> {code}
> In [19]: pc.index_in(arr, value_set=pa.array([1, None]), skip_null=True)
> Out[19]:
> <pyarrow.lib.Int32Array object at 0x7fcf666f04c8>
> [
> 0,
> null,
> null
> ]
> In [20]: pc.index_in(arr, value_set=pa.array([1, None]), skip_null=False)
> Out[20]:
> <pyarrow.lib.Int32Array object at 0x7fcf666f0ee8>
> [
> 0,
> null,
> 1
> ]
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)