You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Joris Van den Bossche (Jira)" <ji...@apache.org> on 2022/07/20 11:21:00 UTC

[jira] [Commented] (ARROW-17134) [C++(?)/Python] pyarrow.compute.replace_with_mask does not replace null when providing an array mask

    [ https://issues.apache.org/jira/browse/ARROW-17134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17568983#comment-17568983 ] 

Joris Van den Bossche commented on ARROW-17134:
-----------------------------------------------

The replacement array isn't expected to be of the same shape as the input/mask arrays (where the corresponding values would get replaced), but it's only the values that are actually placed in the new array (so len(replacements) == number of true values in the mask). 
So given that your {{arr2}} starts with two nulls, it are those two values that are put in the result. 

Comparing to numpy, it has thus the similar behaviour as {{setitem}} ({{arr[mask] = replacements}}), and not like {{np.putmask}} (where values and replacements have the same shape) 

We should maybe consider raising an error if the {{replacements}} are too long? 

The case where you want to use the corresponding (same location) values of values vs replacements, for that case I think one can use {{pc.if_else(mask, replacements, values)}}. Using your example:

{code}
In [13]: pc.if_else([False, False, False, True, True], arr2, arr1)
Out[13]: 
<pyarrow.lib.Int64Array object at 0x7f52f4eecd60>
[
  1,
  0,
  1,
  0,
  1
]
{code}





> [C++(?)/Python] pyarrow.compute.replace_with_mask does not replace null when providing an array mask
> ----------------------------------------------------------------------------------------------------
>
>                 Key: ARROW-17134
>                 URL: https://issues.apache.org/jira/browse/ARROW-17134
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++, Python
>    Affects Versions: 8.0.0
>            Reporter: Matthew Roeschke
>            Priority: Major
>
>  
> {code:java}
> In [1]: import pyarrow as pa
> In [2]: arr1 = pa.array([1, 0, 1, None, None])
> In [3]: arr2 = pa.array([None, None, 1, 0, 1])
> In [4]: pa.compute.replace_with_mask(arr1, [False, False, False, True, True], arr2)
> Out[4]:
> <pyarrow.lib.Int64Array object at 0x118a3e320>
> [
>   1,
>   0,
>   1,
>   null, # I would expect 0
>   null  # I would expect 1
> ]
> In [5]: pa.__version__
> Out[5]: '8.0.0'{code}
>  
> I have noticed this behavior occur with the integer, floating, bool, temporal types
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)