You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@parquet.apache.org by "Ashish Kumar Singh (JIRA)" <ji...@apache.org> on 2014/11/22 00:33:34 UTC

[jira] [Comment Edited] (PARQUET-47) SERDE backed schema for parquet storage in Hive

    [ https://issues.apache.org/jira/browse/PARQUET-47?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14221542#comment-14221542 ] 

Ashish Kumar Singh edited comment on PARQUET-47 at 11/21/14 11:33 PM:
----------------------------------------------------------------------

I have started working on this. I could not assign this JIRA to myself. If someone could, that will be helpful.


was (Author: singhashish):
I have started working on this. I could assign this JIRA to myself. If someone could, that will be helpful.

> SERDE backed schema for parquet storage in Hive
> -----------------------------------------------
>
>                 Key: PARQUET-47
>                 URL: https://issues.apache.org/jira/browse/PARQUET-47
>             Project: Parquet
>          Issue Type: Improvement
>          Components: parquet-mr
>            Reporter: Abhishek Agarwal
>
> As of now, for a hive table stored as parquet, the schema can only be specified in Hive MetaStore. For our use-case, it is desired that the schema be provided by Thrift SerDe rather than MetaStore. Using thrift IDL as a schema provider, allows us to maintain a consistent schema across executions engines other than Hive such as Pig and Native MR. 
> Additionally, for a large sparse schema, it is much easier to build thrift objects, and use parquet-thrift/elephant-bird to convert them into columns/tuples rather than constructing the whole big tuple itself.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)