You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Ryan Blue (JIRA)" <ji...@apache.org> on 2014/11/19 02:03:51 UTC

[jira] [Updated] (HIVE-8909) Hive doesn't correctly read Parquet nested types

     [ https://issues.apache.org/jira/browse/HIVE-8909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ryan Blue updated HIVE-8909:
----------------------------
    Attachment: HIVE-8909-1.patch

This patch implements the rules from PARQUET-113, which required some restructuring of the existing converters. The included TestArrayCompatibility tests will run on trunk and can be used to verify that the current array representation has not been changed and to see the current behavior for Avro, Thrift, and repeated types without annotations.

This patch has the following behavior consequences:
1. Avro and Thrift data structures that could be read previously will match the original Avro or Thrift type. This is the case when Avro stored, for example, a {{array<struct<f1: int>>}}. This structure matched Hive's 3-level representation of arrays, so it could be read, although the inner Avro record level was discarded by the SerDe and the type in Hive would be {{array<int>}}.
2. Lists must be annotated with {{LIST}} and maps with {{MAP}}. This was assumed by the previous version. This is a safe change because all Parquet object models have correctly used these annotations.
3. Repeated groups with 3 or more fields and repeated primitive types are now supported.

The Hive SerDe expects an extra {{ArrayWritable}} layer from the Parquet {{Converter}}. This expectation has been preserved and all list and map structures artificially include it so that the SerDe doesn't need to be changed. This should be done as a follow-up issue.

> Hive doesn't correctly read Parquet nested types
> ------------------------------------------------
>
>                 Key: HIVE-8909
>                 URL: https://issues.apache.org/jira/browse/HIVE-8909
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Ryan Blue
>            Assignee: Ryan Blue
>         Attachments: HIVE-8909-1.patch
>
>
> Parquet's Avro and Thrift object models don't produce the same parquet type representation for lists and maps that Hive does. In the Parquet community, we've defined what should be written and backward-compatibility rules for existing data written by parquet-avro and parquet-thrift in PARQUET-113. We need to implement those rules in the Hive Converter classes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)