You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2006/04/28 23:57:37 UTC

[jira] Created: (HADOOP-175) Utilities for reading SequenceFile and MapFile

Utilities for reading SequenceFile and MapFile
----------------------------------------------

         Key: HADOOP-175
         URL: http://issues.apache.org/jira/browse/HADOOP-175
     Project: Hadoop
        Type: Improvement

  Components: io  
    Reporter: Andrzej Bialecki 
    Priority: Minor


Most data in Hadoop is stored in SequenceFile-s and MapFile-s. Sometimes there is a need to examine such files, but no specialized utilities exist ro read them.

These two classes provide a functionality to examine individual records in such files, and also to dump the content of such files to a plain text output.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Commented: (HADOOP-175) Utilities for reading SequenceFile and MapFile

Posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/HADOOP-175?page=comments#action_12377026 ] 

Andrzej Bialecki  commented on HADOOP-175:
------------------------------------------

Ok, I'll rework the patch along these lines.

> Utilities for reading SequenceFile and MapFile
> ----------------------------------------------
>
>          Key: HADOOP-175
>          URL: http://issues.apache.org/jira/browse/HADOOP-175
>      Project: Hadoop
>         Type: Improvement

>   Components: io
>     Reporter: Andrzej Bialecki 
>     Priority: Minor
>  Attachments: patch.txt
>
> Most data in Hadoop is stored in SequenceFile-s and MapFile-s. Sometimes there is a need to examine such files, but no specialized utilities exist ro read them.
> These two classes provide a functionality to examine individual records in such files, and also to dump the content of such files to a plain text output.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Updated: (HADOOP-175) Utilities for reading SequenceFile and MapFile

Posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/HADOOP-175?page=all ]

Andrzej Bialecki  updated HADOOP-175:
-------------------------------------

    Attachment: patch.txt

> Utilities for reading SequenceFile and MapFile
> ----------------------------------------------
>
>          Key: HADOOP-175
>          URL: http://issues.apache.org/jira/browse/HADOOP-175
>      Project: Hadoop
>         Type: Improvement

>   Components: io
>     Reporter: Andrzej Bialecki 
>     Priority: Minor
>  Attachments: patch.txt
>
> Most data in Hadoop is stored in SequenceFile-s and MapFile-s. Sometimes there is a need to examine such files, but no specialized utilities exist ro read them.
> These two classes provide a functionality to examine individual records in such files, and also to dump the content of such files to a plain text output.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Commented: (HADOOP-175) Utilities for reading SequenceFile and MapFile

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/HADOOP-175?page=comments#action_12377022 ] 

Doug Cutting commented on HADOOP-175:
-------------------------------------

+1

Instead of putting these in in util, why not have them as the main() for MapFile and SequenceFile?

Also, it might be good to integrate these into bin/hadoop.

> Utilities for reading SequenceFile and MapFile
> ----------------------------------------------
>
>          Key: HADOOP-175
>          URL: http://issues.apache.org/jira/browse/HADOOP-175
>      Project: Hadoop
>         Type: Improvement

>   Components: io
>     Reporter: Andrzej Bialecki 
>     Priority: Minor
>  Attachments: patch.txt
>
> Most data in Hadoop is stored in SequenceFile-s and MapFile-s. Sometimes there is a need to examine such files, but no specialized utilities exist ro read them.
> These two classes provide a functionality to examine individual records in such files, and also to dump the content of such files to a plain text output.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Commented: (HADOOP-175) Utilities for reading SequenceFile and MapFile

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/HADOOP-175?page=comments#action_12377125 ] 

Doug Cutting commented on HADOOP-175:
-------------------------------------

> I could add similar main() methods to SequenceFile.Reader and MapFile.Reader  ...

Yes, please!  Also, the OutputFormat versions could call routines from these, no?  They would mostly be directory iterators, right, but could use the primitive MapFile & SequenceFile to dump records.

> Utilities for reading SequenceFile and MapFile
> ----------------------------------------------
>
>          Key: HADOOP-175
>          URL: http://issues.apache.org/jira/browse/HADOOP-175
>      Project: Hadoop
>         Type: Improvement

>   Components: io
>     Reporter: Andrzej Bialecki 
>     Priority: Minor
>  Attachments: patch.txt
>
> Most data in Hadoop is stored in SequenceFile-s and MapFile-s. Sometimes there is a need to examine such files, but no specialized utilities exist ro read them.
> These two classes provide a functionality to examine individual records in such files, and also to dump the content of such files to a plain text output.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Commented: (HADOOP-175) Utilities for reading SequenceFile and MapFile

Posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/HADOOP-175?page=comments#action_12377089 ] 

Andrzej Bialecki  commented on HADOOP-175:
------------------------------------------

Actually, I may have misled you by the poor choice of class names ... come to think of that, the subject is not too clear either. These two utilities started as readers of SequenceFile and MapFile, but now they read SequenceFileOutputFormat and MapFileOutputFormat ... so if anything I think their place is in the main methods of these classes.

I could add similar main() methods to SequenceFile.Reader and MapFile.Reader while I'm here ...

> Utilities for reading SequenceFile and MapFile
> ----------------------------------------------
>
>          Key: HADOOP-175
>          URL: http://issues.apache.org/jira/browse/HADOOP-175
>      Project: Hadoop
>         Type: Improvement

>   Components: io
>     Reporter: Andrzej Bialecki 
>     Priority: Minor
>  Attachments: patch.txt
>
> Most data in Hadoop is stored in SequenceFile-s and MapFile-s. Sometimes there is a need to examine such files, but no specialized utilities exist ro read them.
> These two classes provide a functionality to examine individual records in such files, and also to dump the content of such files to a plain text output.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Commented: (HADOOP-175) Utilities for reading SequenceFile and MapFile

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/HADOOP-175?page=comments#action_12377027 ] 

Doug Cutting commented on HADOOP-175:
-------------------------------------

Also, we could add a generic dumper that sniffs the magic number of a file and dumps it accordingly.  If it's a file that begins with {'S', 'E', 'Q' , 3} then it's a sequence file, if its a directory with sequence files named "index" and "data", then its a map file, if none of the first 100 bytes are less than 32, then its text, etc.

> Utilities for reading SequenceFile and MapFile
> ----------------------------------------------
>
>          Key: HADOOP-175
>          URL: http://issues.apache.org/jira/browse/HADOOP-175
>      Project: Hadoop
>         Type: Improvement

>   Components: io
>     Reporter: Andrzej Bialecki 
>     Priority: Minor
>  Attachments: patch.txt
>
> Most data in Hadoop is stored in SequenceFile-s and MapFile-s. Sometimes there is a need to examine such files, but no specialized utilities exist ro read them.
> These two classes provide a functionality to examine individual records in such files, and also to dump the content of such files to a plain text output.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira