You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Zheng Shao (JIRA)" <ji...@apache.org> on 2008/09/11 02:34:44 UTC
[jira] Commented: (HADOOP-4138) [Hive] refactor the SerDe library
[ https://issues.apache.org/jira/browse/HADOOP-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12630048#action_12630048 ]
Zheng Shao commented on HADOOP-4138:
------------------------------------
Here is the implementation of the new serde interface.
The main principles of the design are:
1. Efficiency: we allow lazy deserialization (or on-demand deserialization) to make it really efficient. One example use case is the column-based storage format which stores different columns in different files, or column-based compression inside sequence file, in which the same column from different rows are stored together and compressed.
2. Simplicity and Extensibility: we want to allow developers to write a new serde very easily.
> [Hive] refactor the SerDe library
> ---------------------------------
>
> Key: HADOOP-4138
> URL: https://issues.apache.org/jira/browse/HADOOP-4138
> Project: Hadoop Core
> Issue Type: Improvement
> Components: contrib/hive
> Reporter: Zheng Shao
> Assignee: Zheng Shao
>
> Hive uses the library from src/contrib/hive/serde to do serialization/deserialization.
> We want to do a refactoring of the library to:
> 1. Split Serializer and Deserializer interface
> 2. Split Serializer/Deserializer and ObjectInspector interface
> 3. Change hive/metaserver and hive/ql to use the new SerDe framework
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.