You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@impala.apache.org by "Gabor Kaszab (JIRA)" <ji...@apache.org> on 2017/09/07 12:22:02 UTC

[jira] [Closed] (IMPALA-4826) Impala should ignore the root schema's repetition in Parquet

     [ https://issues.apache.org/jira/browse/IMPALA-4826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Gabor Kaszab closed IMPALA-4826.
--------------------------------
       Resolution: Fixed
    Fix Version/s: Impala 2.11.0

https://gerrit.cloudera.org/#/c/7870/

> Impala should ignore the root schema's repetition in Parquet
> ------------------------------------------------------------
>
>                 Key: IMPALA-4826
>                 URL: https://issues.apache.org/jira/browse/IMPALA-4826
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Backend
>    Affects Versions: Impala 2.5.0, Impala 2.6.0, Impala 2.7.0, Impala 2.8.0, Impala 2.9.0
>            Reporter: Tim Armstrong
>            Assignee: Gabor Kaszab
>            Priority: Minor
>              Labels: parquet, ramp-up
>             Fix For: Impala 2.11.0
>
>
> See https://issues.apache.org/jira/browse/PARQUET-843 .  parquet-cpp was generating files that set the root schema's repetition to REPEATED, which threw off Impala's schema resolution so it couldn't read the file. PARQUET-843 includes an example file
> The field description in parquet.thrift explicitly says that the root schema's repetition should be unset (https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L231) but it seems like other tools may write out various things there.
> We should just ignore the repetition on the root schema, since it's meaningless.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)