You are viewing a plain text version of this content. The canonical link for it is here.

Posted to jira@arrow.apache.org by "Matthias Rosenthaler (Jira)" <ji...@apache.org> on 2021/03/01 14:10:00 UTC

[jira] [Issue Comment Deleted] (ARROW-11629) [C++] Writing float32 values with "Dictionary Encoding" makes parquet files not readable for some tools

     [ https://issues.apache.org/jira/browse/ARROW-11629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Matthias Rosenthaler updated ARROW-11629:
-----------------------------------------
    Comment: was deleted

(was: [~GPSnoopy], seems it makes no difference in file size. Maybe gzip compression does it equal effective?)

> [C++] Writing float32 values with "Dictionary Encoding" makes parquet files not readable for some tools
> -------------------------------------------------------------------------------------------------------
>
>                 Key: ARROW-11629
>                 URL: https://issues.apache.org/jira/browse/ARROW-11629
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: C++, Python
>    Affects Versions: 3.0.0
>            Reporter: Matthias Rosenthaler
>            Priority: Major
>         Attachments: foo.parquet, image-2021-02-15-15-49-41-908.png, output.csv, output.parquet
>
>
> If I try to read the attached csv file with pyarrow, changing the float64 columns to float32 and export it to parquet, the parquet file gets corrupted. It is not readable for apache drill or Parquet.Net any longer.
>  
> Update: Bug in "*Dictionary Encoding*" feature. If I switch it off for float32 columns, everything works as expected.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)