You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2006/04/28 23:57:37 UTC
[jira] Created: (HADOOP-175) Utilities for reading SequenceFile and
MapFile
Utilities for reading SequenceFile and MapFile
----------------------------------------------
Key: HADOOP-175
URL: http://issues.apache.org/jira/browse/HADOOP-175
Project: Hadoop
Type: Improvement
Components: io
Reporter: Andrzej Bialecki
Priority: Minor
Most data in Hadoop is stored in SequenceFile-s and MapFile-s. Sometimes there is a need to examine such files, but no specialized utilities exist ro read them.
These two classes provide a functionality to examine individual records in such files, and also to dump the content of such files to a plain text output.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira
[jira] Commented: (HADOOP-175) Utilities for reading SequenceFile
and MapFile
Posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org>.
[ http://issues.apache.org/jira/browse/HADOOP-175?page=comments#action_12377026 ]
Andrzej Bialecki commented on HADOOP-175:
------------------------------------------
Ok, I'll rework the patch along these lines.
> Utilities for reading SequenceFile and MapFile
> ----------------------------------------------
>
> Key: HADOOP-175
> URL: http://issues.apache.org/jira/browse/HADOOP-175
> Project: Hadoop
> Type: Improvement
> Components: io
> Reporter: Andrzej Bialecki
> Priority: Minor
> Attachments: patch.txt
>
> Most data in Hadoop is stored in SequenceFile-s and MapFile-s. Sometimes there is a need to examine such files, but no specialized utilities exist ro read them.
> These two classes provide a functionality to examine individual records in such files, and also to dump the content of such files to a plain text output.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira
[jira] Updated: (HADOOP-175) Utilities for reading SequenceFile and
MapFile
Posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org>.
[ http://issues.apache.org/jira/browse/HADOOP-175?page=all ]
Andrzej Bialecki updated HADOOP-175:
-------------------------------------
Attachment: patch.txt
> Utilities for reading SequenceFile and MapFile
> ----------------------------------------------
>
> Key: HADOOP-175
> URL: http://issues.apache.org/jira/browse/HADOOP-175
> Project: Hadoop
> Type: Improvement
> Components: io
> Reporter: Andrzej Bialecki
> Priority: Minor
> Attachments: patch.txt
>
> Most data in Hadoop is stored in SequenceFile-s and MapFile-s. Sometimes there is a need to examine such files, but no specialized utilities exist ro read them.
> These two classes provide a functionality to examine individual records in such files, and also to dump the content of such files to a plain text output.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira
[jira] Commented: (HADOOP-175) Utilities for reading SequenceFile
and MapFile
Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
[ http://issues.apache.org/jira/browse/HADOOP-175?page=comments#action_12377022 ]
Doug Cutting commented on HADOOP-175:
-------------------------------------
+1
Instead of putting these in in util, why not have them as the main() for MapFile and SequenceFile?
Also, it might be good to integrate these into bin/hadoop.
> Utilities for reading SequenceFile and MapFile
> ----------------------------------------------
>
> Key: HADOOP-175
> URL: http://issues.apache.org/jira/browse/HADOOP-175
> Project: Hadoop
> Type: Improvement
> Components: io
> Reporter: Andrzej Bialecki
> Priority: Minor
> Attachments: patch.txt
>
> Most data in Hadoop is stored in SequenceFile-s and MapFile-s. Sometimes there is a need to examine such files, but no specialized utilities exist ro read them.
> These two classes provide a functionality to examine individual records in such files, and also to dump the content of such files to a plain text output.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira
[jira] Commented: (HADOOP-175) Utilities for reading SequenceFile
and MapFile
Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
[ http://issues.apache.org/jira/browse/HADOOP-175?page=comments#action_12377125 ]
Doug Cutting commented on HADOOP-175:
-------------------------------------
> I could add similar main() methods to SequenceFile.Reader and MapFile.Reader ...
Yes, please! Also, the OutputFormat versions could call routines from these, no? They would mostly be directory iterators, right, but could use the primitive MapFile & SequenceFile to dump records.
> Utilities for reading SequenceFile and MapFile
> ----------------------------------------------
>
> Key: HADOOP-175
> URL: http://issues.apache.org/jira/browse/HADOOP-175
> Project: Hadoop
> Type: Improvement
> Components: io
> Reporter: Andrzej Bialecki
> Priority: Minor
> Attachments: patch.txt
>
> Most data in Hadoop is stored in SequenceFile-s and MapFile-s. Sometimes there is a need to examine such files, but no specialized utilities exist ro read them.
> These two classes provide a functionality to examine individual records in such files, and also to dump the content of such files to a plain text output.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira
[jira] Commented: (HADOOP-175) Utilities for reading SequenceFile
and MapFile
Posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org>.
[ http://issues.apache.org/jira/browse/HADOOP-175?page=comments#action_12377089 ]
Andrzej Bialecki commented on HADOOP-175:
------------------------------------------
Actually, I may have misled you by the poor choice of class names ... come to think of that, the subject is not too clear either. These two utilities started as readers of SequenceFile and MapFile, but now they read SequenceFileOutputFormat and MapFileOutputFormat ... so if anything I think their place is in the main methods of these classes.
I could add similar main() methods to SequenceFile.Reader and MapFile.Reader while I'm here ...
> Utilities for reading SequenceFile and MapFile
> ----------------------------------------------
>
> Key: HADOOP-175
> URL: http://issues.apache.org/jira/browse/HADOOP-175
> Project: Hadoop
> Type: Improvement
> Components: io
> Reporter: Andrzej Bialecki
> Priority: Minor
> Attachments: patch.txt
>
> Most data in Hadoop is stored in SequenceFile-s and MapFile-s. Sometimes there is a need to examine such files, but no specialized utilities exist ro read them.
> These two classes provide a functionality to examine individual records in such files, and also to dump the content of such files to a plain text output.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira
[jira] Commented: (HADOOP-175) Utilities for reading SequenceFile
and MapFile
Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
[ http://issues.apache.org/jira/browse/HADOOP-175?page=comments#action_12377027 ]
Doug Cutting commented on HADOOP-175:
-------------------------------------
Also, we could add a generic dumper that sniffs the magic number of a file and dumps it accordingly. If it's a file that begins with {'S', 'E', 'Q' , 3} then it's a sequence file, if its a directory with sequence files named "index" and "data", then its a map file, if none of the first 100 bytes are less than 32, then its text, etc.
> Utilities for reading SequenceFile and MapFile
> ----------------------------------------------
>
> Key: HADOOP-175
> URL: http://issues.apache.org/jira/browse/HADOOP-175
> Project: Hadoop
> Type: Improvement
> Components: io
> Reporter: Andrzej Bialecki
> Priority: Minor
> Attachments: patch.txt
>
> Most data in Hadoop is stored in SequenceFile-s and MapFile-s. Sometimes there is a need to examine such files, but no specialized utilities exist ro read them.
> These two classes provide a functionality to examine individual records in such files, and also to dump the content of such files to a plain text output.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira