You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/01/12 13:55:38 UTC

[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #9164: ARROW-10663: [C++] Fix is_in and index_in behaviour

jorisvandenbossche commented on a change in pull request #9164:
URL: https://github.com/apache/arrow/pull/9164#discussion_r555650652



##########
File path: cpp/src/arrow/compute/api_scalar.h
##########
@@ -290,17 +290,19 @@ Result<Datum> KleeneAndNot(const Datum& left, const Datum& right,
 /// \brief IsIn returns true for each element of `values` that is contained in
 /// `value_set`
 ///
-/// If null occurs in left, if null count in right is not 0,
-/// it returns true, else returns null.
+/// Behaviour of nulls is governed by SetLookupOptions::skip_nulls.

Review comment:
       Also, could maybe add a similar sentence "Behaviour of nulls is governed by SetLookupOptions::skip_nulls" to the `is_in_doc` ?

##########
File path: cpp/src/arrow/compute/kernels/scalar_set_lookup.cc
##########
@@ -272,10 +278,9 @@ struct IsInVisitor {
 
   Status Visit(const DataType&) {
     const auto& state = checked_cast<const SetLookupState<NullType>&>(*ctx->state());
+    // XXX should skip_nulls be taken into account?

Review comment:
       It's a bit strange corner case, but for consistency probably yes? 
   
   Right now for null array the skip_nulls argument has no effect (using this PR):
   
   ```
   In [27]: pc.is_in(pa.array([None]), value_set=pa.array([None]), skip_null=False)
   Out[27]: 
   <pyarrow.lib.BooleanArray object at 0x7fac1d800880>
   [
     true
   ]
   
   In [28]: pc.is_in(pa.array([None]), value_set=pa.array([None]), skip_null=True)
   Out[28]: 
   <pyarrow.lib.BooleanArray object at 0x7fac1d800ca0>
   [
     true
   ]
   ```
   while if you add a non-null type to the array creation, but use the same values, you get a different result:
   
   ```
   In [30]: pc.is_in(pa.array([None], type="int64"), value_set=pa.array([None], type="int64"), skip_null=False)
   Out[30]: 
   <pyarrow.lib.BooleanArray object at 0x7fac1d82f880>
   [
     true
   ]
   
   In [31]: pc.is_in(pa.array([None], type="int64"), value_set=pa.array([None], type="int64"), skip_null=True)
   Out[31]: 
   <pyarrow.lib.BooleanArray object at 0x7fac1d7c2040>
   [
     false
   ]
   ```

##########
File path: cpp/src/arrow/compute/api_scalar.h
##########
@@ -290,17 +290,19 @@ Result<Datum> KleeneAndNot(const Datum& left, const Datum& right,
 /// \brief IsIn returns true for each element of `values` that is contained in
 /// `value_set`
 ///
-/// If null occurs in left, if null count in right is not 0,
-/// it returns true, else returns null.
+/// Behaviour of nulls is governed by SetLookupOptions::skip_nulls.

Review comment:
       Could mention here that the default is `skip_nulls=False` (or in the docstring of SetLookupOptions)?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org