You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Runping Qi (JIRA)" <ji...@apache.org> on 2007/01/24 09:03:51 UTC

[jira] Updated: (HADOOP-732) SequenceFile's header should allow to store metadata in the form of key/value pairs

     [ https://issues.apache.org/jira/browse/HADOOP-732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Runping Qi updated HADOOP-732:
------------------------------

    Attachment: seqFileMetadata.patch


Attached is a patch for this issue.

SequenceFile has a new header --- a TreeMap<Text, Text> object wrapped in a class, Metadata, implementing Writable interface. To accomodate this, the version number is bumped up to 6. 

The Reader class has a new member variable for the metadata. A method is also added for returning the metadata object. The new code can read the files of old versions.

New constructors of various Writer classes are added to take a metadata object as their last parameter. New createWriter static functions with metadata as the last 
parameter are also introduced. They are all backward compatible. A new unit test is added to TestSequenceFile for testing writing/reading sequence files with metadata.
All unit tests passed.



> SequenceFile's header should allow to store metadata in the form of key/value pairs
> -----------------------------------------------------------------------------------
>
>                 Key: HADOOP-732
>                 URL: https://issues.apache.org/jira/browse/HADOOP-732
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: io
>            Reporter: Runping Qi
>         Assigned To: Runping Qi
>
> The sequence file currently stores a fixed list of metadata attributes, such as key/value class names, 
> compression method, etc.  To make sequence file more self descriptable, it should allow to store a list of key/value pairs.  One particular attribute of interest is to indicate whether the key/value classes are actually hadoop record classes, 
> if so, store the DDls for the records. This way, we may create tools to extract DDl from a sequence file and 
> then generate necessary classes. It also make it possible to provide an interpretive version of Hadoop record. 
> This way, even in the situation where Hadoop or the application does not have the necessary classes, 
> a sequence file of Hadoop records can be read and deserialized "interpretively".

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.