You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Matthias Rosenthaler (Jira)" <ji...@apache.org> on 2021/02/19 12:29:00 UTC

[jira] [Updated] (ARROW-11629) [C++] Writing float32 values with "Dictionary Encoding" makes parquet files not readable for some tools

     [ https://issues.apache.org/jira/browse/ARROW-11629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Matthias Rosenthaler updated ARROW-11629:
-----------------------------------------
    Summary: [C++] Writing float32 values with "Dictionary Encoding" makes parquet files not readable for some tools  (was: [C++] Writing float32 values makes parquet files not readable for some tools)

> [C++] Writing float32 values with "Dictionary Encoding" makes parquet files not readable for some tools
> -------------------------------------------------------------------------------------------------------
>
>                 Key: ARROW-11629
>                 URL: https://issues.apache.org/jira/browse/ARROW-11629
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: C++, Python
>    Affects Versions: 3.0.0
>            Reporter: Matthias Rosenthaler
>            Priority: Major
>         Attachments: foo.parquet, image-2021-02-15-15-49-41-908.png, output.csv, output.parquet
>
>
> If I try to read the attached csv file with pyarrow, changing the float64 columns to float32 and export it to parquet, the parquet file gets corrupted. It is not readable for apache drill or Parquet.Net any longer.
>  
> Update: Bug in "*Dictionary Encoding*" feature. If I switch it off for floats, everything works as expected.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)