You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by "Ying Xu (JIRA)" <ji...@apache.org> on 2019/06/12 06:55:00 UTC

[jira] [Created] (PARQUET-1595) Parquet proto writer de-nest Protobuf wrapper classes

Ying Xu created PARQUET-1595:
--------------------------------

             Summary: Parquet proto writer de-nest Protobuf wrapper classes
                 Key: PARQUET-1595
                 URL: https://issues.apache.org/jira/browse/PARQUET-1595
             Project: Parquet
          Issue Type: Improvement
          Components: parquet-mr
            Reporter: Ying Xu


Existing Parquet protobuf writer support preserves the structure of any Protobuf Message objects.  This works well in most cases. However, when dealing with [Protobuf wrapper messages|https://github.com/protocolbuffers/protobuf/blob/master/src/google/protobuf/wrappers.proto], users may prefer directly writing the de-nested value into the Parquet files, for ease of querying them directly (in query engine such as Hive/Presto). 

Proposal: 
 * Implement a control flag, e.g., enableDenestingProtoWrappers, to control whether or not to denest Protobuf wrapper classes. 
 * When this flag is set to true, write the Protobuf wrapper classes as single primitive fields, based on the type of the wrapped *value* field.
 
||Protobuf Type||Parquet Type||
|BoolValue|boolean|
|BytesValue|binary|
|DoubleValue|double|
|FloatValue|float|
|Int32Value|int64 (32-bit, signed)|
|Int64Value|int64 (64-bit, signed)|
|StringValue|binary (string)|
|UInt32Value|int64 (32-bit, unsigned)|
|UInt64Value|int64 (64-bit, unsigned)|

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)