You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Matthias Rosenthaler (Jira)" <ji...@apache.org> on 2021/03/08 09:49:00 UTC

[jira] [Comment Edited] (ARROW-11629) [C++] Writing float32 values with "Dictionary Encoding" makes parquet files not readable for some tools

    [ https://issues.apache.org/jira/browse/ARROW-11629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17297223#comment-17297223 ] 

Matthias Rosenthaler edited comment on ARROW-11629 at 3/8/21, 9:48 AM:
-----------------------------------------------------------------------

[~emkornfield], No, because I think it is not caused by that library alone, because I also have problems to read the data with apache drill.


was (Author: matthros):
[~emkornfield], yes, already opend a but ticket there. But the problem is not caused by that library alone, because I also have problems to read the data with apache drill.

> [C++] Writing float32 values with "Dictionary Encoding" makes parquet files not readable for some tools
> -------------------------------------------------------------------------------------------------------
>
>                 Key: ARROW-11629
>                 URL: https://issues.apache.org/jira/browse/ARROW-11629
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: C++, Python
>    Affects Versions: 3.0.0
>            Reporter: Matthias Rosenthaler
>            Priority: Major
>         Attachments: foo.parquet, image-2021-02-15-15-49-41-908.png, output.csv, output.parquet
>
>
> If I try to read the attached csv file with pyarrow, changing the float64 columns to float32 and export it to parquet, the parquet file gets corrupted. It is not readable for apache drill or Parquet.Net any longer.
>  
> Update: Bug in "*Dictionary Encoding*" feature. If I switch it off for float32 columns, everything works as expected.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)