You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "rohanjain101 (via GitHub)" <gi...@apache.org> on 2023/04/08 03:25:13 UTC

[GitHub] [arrow] rohanjain101 opened a new issue, #34982: Relax is_in type requirement

rohanjain101 opened a new issue, #34982:
URL: https://github.com/apache/arrow/issues/34982

   ### Describe the enhancement requested
   
   In the following example:
   
   ```
   >>> a = pa.array([1,2,3], type=pa.int8())
   >>> pa.compute.is_in(a, pa.array([255], type=pa.int64()))
   Traceback (most recent call last):
     File "<stdin>", line 1, in <module>
     File "C:\ps_0310_is\lib\site-packages\pyarrow\compute.py", line 256, in wrapper
       return func.call(args, options, memory_pool)
     File "pyarrow\_compute.pyx", line 355, in pyarrow._compute.Function.call
     File "pyarrow\error.pxi", line 144, in pyarrow.lib.pyarrow_internal_check_status
     File "pyarrow\error.pxi", line 100, in pyarrow.lib.check_status
   pyarrow.lib.ArrowInvalid: Integer value 255 not in range: -128 to 127
   >>>
   ```
   
   It would be nice if this could return false instead of raising an error, given that its impossible for 255 to be in an int8 array. This would be more in line with isin in Pandas as well:
   
   ```
   >>> a = pd.Series([1,2,3], dtype="int8")
   >>> a.isin([255])
   0    False
   1    False
   2    False
   dtype: bool
   >>>
   ```
   
   
   ### Component(s)
   
   Python


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] westonpace commented on issue #34982: Relax pyarrow.compute.is_in type requirement

Posted by "westonpace (via GitHub)" <gi...@apache.org>.
westonpace commented on issue #34982:
URL: https://github.com/apache/arrow/issues/34982#issuecomment-1503779935

   It's a good idea but there isn't a good spot for this kind of optimization today so it'll be challenging to add.  It might be possible to do this with implicit casting.  Type coercion (implicit casting) happens today when expressions move from unbound (not tied to any input schema) to bound (tied to a specific input schema).
   
   This type coercion is enabled per-function so maybe SetLookupFunction could have a custom DispatchBest implementation that first attempts to dispatch exact and, if that fails, changes to a custom "always return false" function :shrug: 
   
   Either way, we don't have much precedent for this kind of thing and I think one could argue that this should be solved higher up than Arrow compute in some kind of expression rewrite pass (this doesn't exist today either).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org