You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-dev@hadoop.apache.org by "Zheng Shao (JIRA)" <ji...@apache.org> on 2008/09/11 02:34:44 UTC

[jira] Commented: (HADOOP-4138) [Hive] refactor the SerDe library

    [ https://issues.apache.org/jira/browse/HADOOP-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12630048#action_12630048 ] 

Zheng Shao commented on HADOOP-4138:
------------------------------------

Here is the implementation of the new serde interface.

The main principles of the design are:
1. Efficiency: we allow lazy deserialization (or on-demand deserialization) to make it really efficient. One example use case is the column-based storage format which stores different columns in different files, or column-based compression inside sequence file, in which the same column from different rows are stored together and compressed.
2. Simplicity and Extensibility: we want to allow developers to write a new serde very easily.


> [Hive] refactor the SerDe library
> ---------------------------------
>
>                 Key: HADOOP-4138
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4138
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: contrib/hive
>            Reporter: Zheng Shao
>            Assignee: Zheng Shao
>
> Hive uses the library from src/contrib/hive/serde to do serialization/deserialization.
> We want to do a refactoring of the library to:
> 1. Split Serializer and Deserializer interface
> 2. Split Serializer/Deserializer and ObjectInspector interface
> 3. Change hive/metaserver and hive/ql to use the new SerDe framework

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.