You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@drill.apache.org by "Vova Vysotskyi (Jira)" <ji...@apache.org> on 2021/05/02 16:54:00 UTC

[jira] [Commented] (DRILL-7864) Parquet file could not be read correctly

    [ https://issues.apache.org/jira/browse/DRILL-7864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17338074#comment-17338074 ] 

Vova Vysotskyi commented on DRILL-7864:
---------------------------------------

[~matthros], I have tried querying the attached parquet file on the fresh Drill master version, and it returned the correct results, so looks like it was already fixed (perhaps by parquet update). Could you please confirm that it works as expected?

> Parquet file could not be read correctly
> ----------------------------------------
>
>                 Key: DRILL-7864
>                 URL: https://issues.apache.org/jira/browse/DRILL-7864
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Storage - Parquet
>    Affects Versions: 1.18.0
>            Reporter: Matthias Rosenthaler
>            Priority: Major
>         Attachments: drill_query.csv, output.parquet, parquet-dotnet.csv
>
>
> The following parquet file which is generated by ParquetSharp (which is using the underlying apache arrow c++ lib) is not readable by drill. The values of the columns are displaced. If I write the affected float32 columns "InjectionRate" and "I_injection_IA" as float64, everything is fine.
> Update: It seems that the bug is *caused by dictionary encoding*. If I turn this feature of, drill is able to read it. So please take a look into reading dictionary encoded columns in drill to solve the bug.
> Also created a ticket for the arrow project, but they redirect me to the drill project. https://issues.apache.org/jira/browse/ARROW-11629
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)