You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Joris Van den Bossche (Jira)" <ji...@apache.org> on 2021/04/19 09:10:02 UTC

[jira] [Commented] (ARROW-12431) [Python] pa.array mask inverted when type is binary and value to be converted is numpy array

    [ https://issues.apache.org/jira/browse/ARROW-12431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17324893#comment-17324893 ] 

Joris Van den Bossche commented on ARROW-12431:
-----------------------------------------------

[~nugend] thanks for the report!

It seems to specifically happen when the input array has numpy's binary/string dtype (and not when it's object type):

{code}
In [27]: pa.array(np.array([b'\x00']),type=pa.binary(1), mask = np.array([True]))
Out[27]: 
<pyarrow.lib.FixedSizeBinaryArray object at 0x7f6d65b32640>
[
  00
]

In [28]: pa.array(np.array([b'\x00'], dtype=object),type=pa.binary(1), mask = np.array([True]))
Out[28]: 
<pyarrow.lib.FixedSizeBinaryArray object at 0x7f6d65b32f40>
[
  null
{code}

(I assume the object dtype array takes a similar path as the list input)

> [Python] pa.array mask inverted when type is binary and value to be converted is numpy array
> --------------------------------------------------------------------------------------------
>
>                 Key: ARROW-12431
>                 URL: https://issues.apache.org/jira/browse/ARROW-12431
>             Project: Apache Arrow
>          Issue Type: Bug
>            Reporter: Daniel Nugent
>            Priority: Major
>
> {code:python}
> Python 3.9.2 | packaged by conda-forge | (default, Feb 21 2021, 05:02:46)                                   
> [GCC 9.3.0] on linux
> Type "help", "copyright", "credits" or "license" for more information.
> >>> import numpy as np
> >>> import pyarrow as pa
> >>>
> >>> pa.array(np.array([b'\x00']),type=pa.binary(1), mask = np.array([False]))
> <pyarrow.lib.FixedSizeBinaryArray object at 0x7fa080ca3640>
> [
>   null
> ]
> >>> pa.array(np.array([b'\x00']),type=pa.binary(1), mask = np.array([True]))
> <pyarrow.lib.FixedSizeBinaryArray object at 0x7fa080ca3700>
> [
>   00
> ]
> >>> pa.array([b'\x00'],type=pa.binary(1), mask = np.array([False]))
> <pyarrow.lib.FixedSizeBinaryArray object at 0x7fa083cc9520>
> [
>   00
> ]
> >>> pa.__version__
> '3.0.0'
> >>> np.__version__
> '1.20.1'
> {code}
> Happens both with FixedSizeBinary and variable sized binary (I was working with FixedSizeBinary). Does not happen for integers (presumably other types, didn't exhaustively check)?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)