You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Runping Qi (JIRA)" <ji...@apache.org> on 2006/11/17 01:20:37 UTC

[jira] Created: (HADOOP-732) SequenceFile's header should allow to store metadata in the form of key/value pairs

SequenceFile's header should allow to store metadata in the form of key/value pairs 
------------------------------------------------------------------------------------

                 Key: HADOOP-732
                 URL: http://issues.apache.org/jira/browse/HADOOP-732
             Project: Hadoop
          Issue Type: New Feature
            Reporter: Runping Qi



The sequence file currently stores a fixed list of metadata attributes, such as key/value class names, 
compression method, etc.  To make sequence file more self descriptable, it should allow to store a list of key/value pairs.  One particular attribute of interest is to indicate whether the key/value classes are actually hadoop record classes, 
if so, store the DDls for the records. This way, we may create tools to extract DDl from a sequence file and 
then generate necessary classes. It also make it possible to provide an interpretive version of Hadoop record. 
This way, even in the situation where Hadoop or the application does not have the necessary classes, 
a sequence file of Hadoop records can be read and deserialized "interpretively".



-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Updated: (HADOOP-732) SequenceFile's header should allow to store metadata in the form of key/value pairs

Posted by "Runping Qi (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Runping Qi updated HADOOP-732:
------------------------------

    Attachment: seqFileMetadata.patch


Attached is a patch for this issue.

SequenceFile has a new header --- a TreeMap<Text, Text> object wrapped in a class, Metadata, implementing Writable interface. To accomodate this, the version number is bumped up to 6. 

The Reader class has a new member variable for the metadata. A method is also added for returning the metadata object. The new code can read the files of old versions.

New constructors of various Writer classes are added to take a metadata object as their last parameter. New createWriter static functions with metadata as the last 
parameter are also introduced. They are all backward compatible. A new unit test is added to TestSequenceFile for testing writing/reading sequence files with metadata.
All unit tests passed.



> SequenceFile's header should allow to store metadata in the form of key/value pairs
> -----------------------------------------------------------------------------------
>
>                 Key: HADOOP-732
>                 URL: https://issues.apache.org/jira/browse/HADOOP-732
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: io
>            Reporter: Runping Qi
>         Assigned To: Runping Qi
>
> The sequence file currently stores a fixed list of metadata attributes, such as key/value class names, 
> compression method, etc.  To make sequence file more self descriptable, it should allow to store a list of key/value pairs.  One particular attribute of interest is to indicate whether the key/value classes are actually hadoop record classes, 
> if so, store the DDls for the records. This way, we may create tools to extract DDl from a sequence file and 
> then generate necessary classes. It also make it possible to provide an interpretive version of Hadoop record. 
> This way, even in the situation where Hadoop or the application does not have the necessary classes, 
> a sequence file of Hadoop records can be read and deserialized "interpretively".

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-732) SequenceFile's header should allow to store metadata in the form of key/value pairs

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12466945 ] 

Hadoop QA commented on HADOOP-732:
----------------------------------

+1, because http://issues.apache.org/jira/secure/attachment/12349495/seqFileMetadata.patch applied and successfully tested against trunk revision r499156.

> SequenceFile's header should allow to store metadata in the form of key/value pairs
> -----------------------------------------------------------------------------------
>
>                 Key: HADOOP-732
>                 URL: https://issues.apache.org/jira/browse/HADOOP-732
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: io
>            Reporter: Runping Qi
>         Assigned To: Runping Qi
>         Attachments: seqFileMetadata.patch
>
>
> The sequence file currently stores a fixed list of metadata attributes, such as key/value class names, 
> compression method, etc.  To make sequence file more self descriptable, it should allow to store a list of key/value pairs.  One particular attribute of interest is to indicate whether the key/value classes are actually hadoop record classes, 
> if so, store the DDls for the records. This way, we may create tools to extract DDl from a sequence file and 
> then generate necessary classes. It also make it possible to provide an interpretive version of Hadoop record. 
> This way, even in the situation where Hadoop or the application does not have the necessary classes, 
> a sequence file of Hadoop records can be read and deserialized "interpretively".

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-732) SequenceFile's header should allow to store metadata in the form of key/value pairs

Posted by "Runping Qi (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Runping Qi updated HADOOP-732:
------------------------------

    Status: Patch Available  (was: Open)

> SequenceFile's header should allow to store metadata in the form of key/value pairs
> -----------------------------------------------------------------------------------
>
>                 Key: HADOOP-732
>                 URL: https://issues.apache.org/jira/browse/HADOOP-732
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: io
>            Reporter: Runping Qi
>         Assigned To: Runping Qi
>         Attachments: seqFileMetadata.patch
>
>
> The sequence file currently stores a fixed list of metadata attributes, such as key/value class names, 
> compression method, etc.  To make sequence file more self descriptable, it should allow to store a list of key/value pairs.  One particular attribute of interest is to indicate whether the key/value classes are actually hadoop record classes, 
> if so, store the DDls for the records. This way, we may create tools to extract DDl from a sequence file and 
> then generate necessary classes. It also make it possible to provide an interpretive version of Hadoop record. 
> This way, even in the situation where Hadoop or the application does not have the necessary classes, 
> a sequence file of Hadoop records can be read and deserialized "interpretively".

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (HADOOP-732) SequenceFile's header should allow to store metadata in the form of key/value pairs

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Doug Cutting resolved HADOOP-732.
---------------------------------

       Resolution: Fixed
    Fix Version/s: 0.11.0

I just committed this.  Thanks, Runping!

> SequenceFile's header should allow to store metadata in the form of key/value pairs
> -----------------------------------------------------------------------------------
>
>                 Key: HADOOP-732
>                 URL: https://issues.apache.org/jira/browse/HADOOP-732
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: io
>            Reporter: Runping Qi
>         Assigned To: Runping Qi
>             Fix For: 0.11.0
>
>         Attachments: seqFileMetadata.patch, seqFileMetadata.patch.2
>
>
> The sequence file currently stores a fixed list of metadata attributes, such as key/value class names, 
> compression method, etc.  To make sequence file more self descriptable, it should allow to store a list of key/value pairs.  One particular attribute of interest is to indicate whether the key/value classes are actually hadoop record classes, 
> if so, store the DDls for the records. This way, we may create tools to extract DDl from a sequence file and 
> then generate necessary classes. It also make it possible to provide an interpretive version of Hadoop record. 
> This way, even in the situation where Hadoop or the application does not have the necessary classes, 
> a sequence file of Hadoop records can be read and deserialized "interpretively".

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-732) SequenceFile's header should allow to store metadata in the form of key/value pairs

Posted by "Runping Qi (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Runping Qi updated HADOOP-732:
------------------------------

    Attachment: seqFileMetadata.patch

Reattach the patch

> SequenceFile's header should allow to store metadata in the form of key/value pairs
> -----------------------------------------------------------------------------------
>
>                 Key: HADOOP-732
>                 URL: https://issues.apache.org/jira/browse/HADOOP-732
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: io
>            Reporter: Runping Qi
>         Assigned To: Runping Qi
>         Attachments: seqFileMetadata.patch
>
>
> The sequence file currently stores a fixed list of metadata attributes, such as key/value class names, 
> compression method, etc.  To make sequence file more self descriptable, it should allow to store a list of key/value pairs.  One particular attribute of interest is to indicate whether the key/value classes are actually hadoop record classes, 
> if so, store the DDls for the records. This way, we may create tools to extract DDl from a sequence file and 
> then generate necessary classes. It also make it possible to provide an interpretive version of Hadoop record. 
> This way, even in the situation where Hadoop or the application does not have the necessary classes, 
> a sequence file of Hadoop records can be read and deserialized "interpretively".

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-732) SequenceFile's header should allow to store metadata in the form of key/value pairs

Posted by "Runping Qi (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Runping Qi updated HADOOP-732:
------------------------------

    Attachment:     (was: seqFileMetadata.patch)

> SequenceFile's header should allow to store metadata in the form of key/value pairs
> -----------------------------------------------------------------------------------
>
>                 Key: HADOOP-732
>                 URL: https://issues.apache.org/jira/browse/HADOOP-732
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: io
>            Reporter: Runping Qi
>         Assigned To: Runping Qi
>
> The sequence file currently stores a fixed list of metadata attributes, such as key/value class names, 
> compression method, etc.  To make sequence file more self descriptable, it should allow to store a list of key/value pairs.  One particular attribute of interest is to indicate whether the key/value classes are actually hadoop record classes, 
> if so, store the DDls for the records. This way, we may create tools to extract DDl from a sequence file and 
> then generate necessary classes. It also make it possible to provide an interpretive version of Hadoop record. 
> This way, even in the situation where Hadoop or the application does not have the necessary classes, 
> a sequence file of Hadoop records can be read and deserialized "interpretively".

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-732) SequenceFile's header should allow to store metadata in the form of key/value pairs

Posted by "Runping Qi (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Runping Qi updated HADOOP-732:
------------------------------

    Attachment: seqFileMetadata.patch.2


Knock off a few createWriter methods.
seqFileMetadata.patch.2 contains the new patch.


> SequenceFile's header should allow to store metadata in the form of key/value pairs
> -----------------------------------------------------------------------------------
>
>                 Key: HADOOP-732
>                 URL: https://issues.apache.org/jira/browse/HADOOP-732
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: io
>            Reporter: Runping Qi
>         Assigned To: Runping Qi
>         Attachments: seqFileMetadata.patch, seqFileMetadata.patch.2
>
>
> The sequence file currently stores a fixed list of metadata attributes, such as key/value class names, 
> compression method, etc.  To make sequence file more self descriptable, it should allow to store a list of key/value pairs.  One particular attribute of interest is to indicate whether the key/value classes are actually hadoop record classes, 
> if so, store the DDls for the records. This way, we may create tools to extract DDl from a sequence file and 
> then generate necessary classes. It also make it possible to provide an interpretive version of Hadoop record. 
> This way, even in the situation where Hadoop or the application does not have the necessary classes, 
> a sequence file of Hadoop records can be read and deserialized "interpretively".

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (HADOOP-732) SequenceFile's header should allow to store metadata in the form of key/value pairs

Posted by "Runping Qi (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Runping Qi reassigned HADOOP-732:
---------------------------------

    Assignee: Runping Qi  (was: Owen O'Malley)

> SequenceFile's header should allow to store metadata in the form of key/value pairs
> -----------------------------------------------------------------------------------
>
>                 Key: HADOOP-732
>                 URL: https://issues.apache.org/jira/browse/HADOOP-732
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: io
>            Reporter: Runping Qi
>         Assigned To: Runping Qi
>
> The sequence file currently stores a fixed list of metadata attributes, such as key/value class names, 
> compression method, etc.  To make sequence file more self descriptable, it should allow to store a list of key/value pairs.  One particular attribute of interest is to indicate whether the key/value classes are actually hadoop record classes, 
> if so, store the DDls for the records. This way, we may create tools to extract DDl from a sequence file and 
> then generate necessary classes. It also make it possible to provide an interpretive version of Hadoop record. 
> This way, even in the situation where Hadoop or the application does not have the necessary classes, 
> a sequence file of Hadoop records can be read and deserialized "interpretively".

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-732) SequenceFile's header should allow to store metadata in the form of key/value pairs

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Doug Cutting updated HADOOP-732:
--------------------------------

    Status: Open  (was: Patch Available)

This looks good.  My only concern is that it adds yet more SequenceFile.createWriter() signatures.  Until HADOOP-938 is resolved, I'd prefer this only add a single new createWriter() signature, one that includes all possible options, including this new option.

> SequenceFile's header should allow to store metadata in the form of key/value pairs
> -----------------------------------------------------------------------------------
>
>                 Key: HADOOP-732
>                 URL: https://issues.apache.org/jira/browse/HADOOP-732
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: io
>            Reporter: Runping Qi
>         Assigned To: Runping Qi
>         Attachments: seqFileMetadata.patch
>
>
> The sequence file currently stores a fixed list of metadata attributes, such as key/value class names, 
> compression method, etc.  To make sequence file more self descriptable, it should allow to store a list of key/value pairs.  One particular attribute of interest is to indicate whether the key/value classes are actually hadoop record classes, 
> if so, store the DDls for the records. This way, we may create tools to extract DDl from a sequence file and 
> then generate necessary classes. It also make it possible to provide an interpretive version of Hadoop record. 
> This way, even in the situation where Hadoop or the application does not have the necessary classes, 
> a sequence file of Hadoop records can be read and deserialized "interpretively".

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-732) SequenceFile's header should allow to store metadata in the form of key/value pairs

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/HADOOP-732?page=comments#action_12451794 ] 
            
Owen O'Malley commented on HADOOP-732:
--------------------------------------

Would the meta-data key/value pairs be required to be strings? That would simplify everything. It isn't clear what the right API for this would be.

> SequenceFile's header should allow to store metadata in the form of key/value pairs
> -----------------------------------------------------------------------------------
>
>                 Key: HADOOP-732
>                 URL: http://issues.apache.org/jira/browse/HADOOP-732
>             Project: Hadoop
>          Issue Type: New Feature
>            Reporter: Runping Qi
>
> The sequence file currently stores a fixed list of metadata attributes, such as key/value class names, 
> compression method, etc.  To make sequence file more self descriptable, it should allow to store a list of key/value pairs.  One particular attribute of interest is to indicate whether the key/value classes are actually hadoop record classes, 
> if so, store the DDls for the records. This way, we may create tools to extract DDl from a sequence file and 
> then generate necessary classes. It also make it possible to provide an interpretive version of Hadoop record. 
> This way, even in the situation where Hadoop or the application does not have the necessary classes, 
> a sequence file of Hadoop records can be read and deserialized "interpretively".

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (HADOOP-732) SequenceFile's header should allow to store metadata in the form of key/value pairs

Posted by "Runping Qi (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/HADOOP-732?page=comments#action_12451802 ] 
            
Runping Qi commented on HADOOP-732:
-----------------------------------

Strings sgould be fine.


> SequenceFile's header should allow to store metadata in the form of key/value pairs
> -----------------------------------------------------------------------------------
>
>                 Key: HADOOP-732
>                 URL: http://issues.apache.org/jira/browse/HADOOP-732
>             Project: Hadoop
>          Issue Type: New Feature
>            Reporter: Runping Qi
>
> The sequence file currently stores a fixed list of metadata attributes, such as key/value class names, 
> compression method, etc.  To make sequence file more self descriptable, it should allow to store a list of key/value pairs.  One particular attribute of interest is to indicate whether the key/value classes are actually hadoop record classes, 
> if so, store the DDls for the records. This way, we may create tools to extract DDl from a sequence file and 
> then generate necessary classes. It also make it possible to provide an interpretive version of Hadoop record. 
> This way, even in the situation where Hadoop or the application does not have the necessary classes, 
> a sequence file of Hadoop records can be read and deserialized "interpretively".

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira