You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by "Uwe L. Korn (JIRA)" <ji...@apache.org> on 2016/12/20 17:50:58 UTC

[jira] [Commented] (PARQUET-812) [C++] Failure reading BYTE_ARRAY data from file in parquet-compatibility project

    [ https://issues.apache.org/jira/browse/PARQUET-812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15764799#comment-15764799 ] 

Uwe L. Korn commented on PARQUET-812:
-------------------------------------

The mentioned commit did not introduce any problem. The main issue here is that we have not yet implemented support for converting lists of uint8 to Python objects. As there is no var-length byte array type in NumPy, I suggest this should result in {{str}} for Python 2 and {{bytes}} for Python 3. Related story for that is https://issues.apache.org/jira/browse/ARROW-374

> [C++] Failure reading BYTE_ARRAY data from file in parquet-compatibility project
> --------------------------------------------------------------------------------
>
>                 Key: PARQUET-812
>                 URL: https://issues.apache.org/jira/browse/PARQUET-812
>             Project: Parquet
>          Issue Type: Bug
>          Components: parquet-cpp
>            Reporter: Wes McKinney
>            Priority: Critical
>         Attachments: nation.impala.parquet
>
>
> I see the odd exception from the Python side:
> {code}
> ArrowException: NotImplemented: list<: uint8>
> {code}
> The schema is:
> {code}
> $ debug/parquet-dump-schema ~/Downloads/nation.impala.parquet 
> required group schema {
>   optional int32 n_nationkey
>   optional byte_array n_name
>   optional int32 n_regionkey
>   optional byte_array n_comment
> }
> {code}
> This may have been introduced by https://github.com/apache/parquet-cpp/commit/8487142f6d5a60d12e3068ac226b2b5dfe178350



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)