You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Joris Van den Bossche (Jira)" <ji...@apache.org> on 2021/04/19 09:10:02 UTC
[jira] [Commented] (ARROW-12431) [Python] pa.array mask inverted
when type is binary and value to be converted is numpy array
[ https://issues.apache.org/jira/browse/ARROW-12431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17324893#comment-17324893 ]
Joris Van den Bossche commented on ARROW-12431:
-----------------------------------------------
[~nugend] thanks for the report!
It seems to specifically happen when the input array has numpy's binary/string dtype (and not when it's object type):
{code}
In [27]: pa.array(np.array([b'\x00']),type=pa.binary(1), mask = np.array([True]))
Out[27]:
<pyarrow.lib.FixedSizeBinaryArray object at 0x7f6d65b32640>
[
00
]
In [28]: pa.array(np.array([b'\x00'], dtype=object),type=pa.binary(1), mask = np.array([True]))
Out[28]:
<pyarrow.lib.FixedSizeBinaryArray object at 0x7f6d65b32f40>
[
null
{code}
(I assume the object dtype array takes a similar path as the list input)
> [Python] pa.array mask inverted when type is binary and value to be converted is numpy array
> --------------------------------------------------------------------------------------------
>
> Key: ARROW-12431
> URL: https://issues.apache.org/jira/browse/ARROW-12431
> Project: Apache Arrow
> Issue Type: Bug
> Reporter: Daniel Nugent
> Priority: Major
>
> {code:python}
> Python 3.9.2 | packaged by conda-forge | (default, Feb 21 2021, 05:02:46)
> [GCC 9.3.0] on linux
> Type "help", "copyright", "credits" or "license" for more information.
> >>> import numpy as np
> >>> import pyarrow as pa
> >>>
> >>> pa.array(np.array([b'\x00']),type=pa.binary(1), mask = np.array([False]))
> <pyarrow.lib.FixedSizeBinaryArray object at 0x7fa080ca3640>
> [
> null
> ]
> >>> pa.array(np.array([b'\x00']),type=pa.binary(1), mask = np.array([True]))
> <pyarrow.lib.FixedSizeBinaryArray object at 0x7fa080ca3700>
> [
> 00
> ]
> >>> pa.array([b'\x00'],type=pa.binary(1), mask = np.array([False]))
> <pyarrow.lib.FixedSizeBinaryArray object at 0x7fa083cc9520>
> [
> 00
> ]
> >>> pa.__version__
> '3.0.0'
> >>> np.__version__
> '1.20.1'
> {code}
> Happens both with FixedSizeBinary and variable sized binary (I was working with FixedSizeBinary). Does not happen for integers (presumably other types, didn't exhaustively check)?
--
This message was sent by Atlassian Jira
(v8.3.4#803005)