You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@impala.apache.org by "Gabor Kaszab (JIRA)" <ji...@apache.org> on 2017/09/07 12:22:02 UTC
[jira] [Closed] (IMPALA-4826) Impala should ignore the root
schema's repetition in Parquet
[ https://issues.apache.org/jira/browse/IMPALA-4826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Gabor Kaszab closed IMPALA-4826.
--------------------------------
Resolution: Fixed
Fix Version/s: Impala 2.11.0
https://gerrit.cloudera.org/#/c/7870/
> Impala should ignore the root schema's repetition in Parquet
> ------------------------------------------------------------
>
> Key: IMPALA-4826
> URL: https://issues.apache.org/jira/browse/IMPALA-4826
> Project: IMPALA
> Issue Type: Improvement
> Components: Backend
> Affects Versions: Impala 2.5.0, Impala 2.6.0, Impala 2.7.0, Impala 2.8.0, Impala 2.9.0
> Reporter: Tim Armstrong
> Assignee: Gabor Kaszab
> Priority: Minor
> Labels: parquet, ramp-up
> Fix For: Impala 2.11.0
>
>
> See https://issues.apache.org/jira/browse/PARQUET-843 . parquet-cpp was generating files that set the root schema's repetition to REPEATED, which threw off Impala's schema resolution so it couldn't read the file. PARQUET-843 includes an example file
> The field description in parquet.thrift explicitly says that the root schema's repetition should be unset (https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L231) but it seems like other tools may write out various things there.
> We should just ignore the repetition on the root schema, since it's meaningless.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)