You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Joris Van den Bossche (Jira)" <ji...@apache.org> on 2022/07/18 08:20:00 UTC

[jira] [Commented] (ARROW-17096) [C++] Mode kernel incorrect for boolean inputs

    [ https://issues.apache.org/jira/browse/ARROW-17096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17567877#comment-17567877 ] 

Joris Van den Bossche commented on ARROW-17096:
-----------------------------------------------

bq.  Fiddling the buffer directly, looks pyarrow is treating the buffer as bitmap (one bit per value), not one byte per value like C++ compute kernel.

Isn't that then a bug in C++? I thought a BooleanArray is expected to use one bit per value (I wanted to point to the format docs to prove this, but it's actually not very explicitly said, only in a sidenote in the first paragraph at https://arrow.apache.org/docs/format/Columnar.html#fixed-size-primitive-layout, and also Schema.fbs doesn't mention it)

> [C++] Mode kernel incorrect for boolean inputs
> ----------------------------------------------
>
>                 Key: ARROW-17096
>                 URL: https://issues.apache.org/jira/browse/ARROW-17096
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++, Python
>    Affects Versions: 8.0.0
>            Reporter: Matthew Roeschke
>            Assignee: Yibo Cai
>            Priority: Major
>
> {code:java}
> In [1]: import pyarrow.compute as pc
> In [2]: import pyarrow as pa
> In [3]: pa.__version__
> Out[3]: '8.0.0'
> In [4]: pc.mode(pa.array([True, True]))
> # Correct
> Out[4]:
> <pyarrow.lib.StructArray object at 0x1266d5c60>
> -- is_valid: all not null
> -- child 0 type: bool
>   [
>     true
>   ]
> -- child 1 type: int64
>   [
>     2
>   ]
> # Incorrect
> In [5]: pc.mode(pa.array([True, False]), 2)
> Out[5]:
> <pyarrow.lib.StructArray object at 0x1262110c0>
> -- is_valid: all not null
> -- child 0 type: bool
>   [
>     false, # should be true
>     false
>   ]
> -- child 1 type: int64
>   [
>     1,
>     1
>   ] {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)