You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "H. Vetinari (JIRA)" <ji...@apache.org> on 2019/07/17 10:28:00 UTC

[jira] [Created] (HIVE-22005) Handle complete parquet specification

H. Vetinari created HIVE-22005:
----------------------------------

             Summary: Handle complete parquet specification
                 Key: HIVE-22005
                 URL: https://issues.apache.org/jira/browse/HIVE-22005
             Project: Hive
          Issue Type: Improvement
    Affects Versions: All Versions
            Reporter: H. Vetinari


Hive cannot read parquet files written by (default-)spark after 1.4, which uses some other internal representation, but stay faithful to the parquet specification (see SPARK-20297).

Hive should be able to read such data written by spark, plus ideally other parquet formats (arrow, etc?) that follow the spec.

Quote from SPARK-20297:
> The standard doesn't say that smaller decimals *have* to be stored in int32/int64, it just is an option for subset of decimal types. int32 and int64 are valid representations for a subset of decimal types. fixed_len_byte_array and binary are a valid representation of any decimal type.

Arguably, this is a subtask of HIVE-12398.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)