You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Chris Douglas (JIRA)" <ji...@apache.org> on 2007/10/27 11:02:50 UTC

[jira] Created: (HADOOP-2113) Add "-text" command to FsShell to decode SequenceFile to stdout

Add "-text" command to FsShell to decode SequenceFile to stdout
---------------------------------------------------------------

                 Key: HADOOP-2113
                 URL: https://issues.apache.org/jira/browse/HADOOP-2113
             Project: Hadoop
          Issue Type: Improvement
          Components: fs
            Reporter: Chris Douglas
            Assignee: Chris Douglas
            Priority: Minor
             Fix For: 0.16.0
         Attachments: 2113-0.patch

FsShell should provide a command to examine SequenceFiles.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-2113) Add "-text" command to FsShell to decode SequenceFile to stdout

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-2113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12542849 ] 

dhruba borthakur commented on HADOOP-2113:
------------------------------------------

+1. Code looks good.

> Add "-text" command to FsShell to decode SequenceFile to stdout
> ---------------------------------------------------------------
>
>                 Key: HADOOP-2113
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2113
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: fs
>            Reporter: Chris Douglas
>            Assignee: Chris Douglas
>            Priority: Minor
>             Fix For: 0.16.0
>
>         Attachments: 2113-0.patch, 2113-1.patch
>
>
> FsShell should provide a command to examine SequenceFiles.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-2113) Add "-text" command to FsShell to decode SequenceFile to stdout

Posted by "Chris Douglas (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-2113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chris Douglas updated HADOOP-2113:
----------------------------------

    Attachment: 2113-1.patch

Added a test case

> Add "-text" command to FsShell to decode SequenceFile to stdout
> ---------------------------------------------------------------
>
>                 Key: HADOOP-2113
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2113
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: fs
>            Reporter: Chris Douglas
>            Assignee: Chris Douglas
>            Priority: Minor
>             Fix For: 0.16.0
>
>         Attachments: 2113-0.patch, 2113-1.patch
>
>
> FsShell should provide a command to examine SequenceFiles.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-2113) Add "-text" command to FsShell to decode SequenceFile to stdout

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-2113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12542906 ] 

Hadoop QA commented on HADOOP-2113:
-----------------------------------

+1 overall.  Here are the results of testing the latest attachment 
http://issues.apache.org/jira/secure/attachment/12369578/2113-1.patch
against trunk revision r595406.

    @author +1.  The patch does not contain any @author tags.

    javadoc +1.  The javadoc tool did not generate any warning messages.

    javac +1.  The applied patch does not generate any new compiler warnings.

    findbugs +1.  The patch does not introduce any new Findbugs warnings.

    core tests +1.  The patch passed core unit tests.

    contrib tests +1.  The patch passed contrib unit tests.

Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1103/testReport/
Findbugs warnings: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1103/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1103/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1103/console

This message is automatically generated.

> Add "-text" command to FsShell to decode SequenceFile to stdout
> ---------------------------------------------------------------
>
>                 Key: HADOOP-2113
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2113
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: fs
>            Reporter: Chris Douglas
>            Assignee: Chris Douglas
>            Priority: Minor
>             Fix For: 0.16.0
>
>         Attachments: 2113-0.patch, 2113-1.patch
>
>
> FsShell should provide a command to examine SequenceFiles.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-2113) Add "-text" command to FsShell to decode SequenceFile to stdout

Posted by "Chris Douglas (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-2113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chris Douglas updated HADOOP-2113:
----------------------------------

    Attachment: 2113-0.patch

Doing what SequenceFileAsTextRecordReader does- i.e. calling {{toString()}} on keys/values from a SequenceFile to output it as text- seems reasonable as a first pass.

> Add "-text" command to FsShell to decode SequenceFile to stdout
> ---------------------------------------------------------------
>
>                 Key: HADOOP-2113
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2113
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: fs
>            Reporter: Chris Douglas
>            Assignee: Chris Douglas
>            Priority: Minor
>             Fix For: 0.16.0
>
>         Attachments: 2113-0.patch
>
>
> FsShell should provide a command to examine SequenceFiles.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-2113) Add "-text" command to FsShell to decode SequenceFile to stdout

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-2113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12540642 ] 

Hadoop QA commented on HADOOP-2113:
-----------------------------------

+1 overall.  Here are the results of testing the latest attachment 
http://issues.apache.org/jira/secure/attachment/12368535/2113-0.patch
against trunk revision r592551.

    @author +1.  The patch does not contain any @author tags.

    javadoc +1.  The javadoc tool did not generate any warning messages.

    javac +1.  The applied patch does not generate any new compiler warnings.

    findbugs +1.  The patch does not introduce any new Findbugs warnings.

    core tests +1.  The patch passed core unit tests.

    contrib tests +1.  The patch passed contrib unit tests.

Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1072/testReport/
Findbugs warnings: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1072/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1072/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1072/console

This message is automatically generated.

> Add "-text" command to FsShell to decode SequenceFile to stdout
> ---------------------------------------------------------------
>
>                 Key: HADOOP-2113
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2113
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: fs
>            Reporter: Chris Douglas
>            Assignee: Chris Douglas
>            Priority: Minor
>             Fix For: 0.16.0
>
>         Attachments: 2113-0.patch
>
>
> FsShell should provide a command to examine SequenceFiles.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-2113) Add "-text" command to FsShell to decode SequenceFile to stdout

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-2113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12544775 ] 

Hudson commented on HADOOP-2113:
--------------------------------

Integrated in Hadoop-Nightly #311 (See [http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Nightly/311/])

> Add "-text" command to FsShell to decode SequenceFile to stdout
> ---------------------------------------------------------------
>
>                 Key: HADOOP-2113
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2113
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: fs
>            Reporter: Chris Douglas
>            Assignee: Chris Douglas
>            Priority: Minor
>             Fix For: 0.16.0
>
>         Attachments: 2113-0.patch, 2113-1.patch
>
>
> FsShell should provide a command to examine SequenceFiles.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-2113) Add "-text" command to FsShell to decode SequenceFile to stdout

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-2113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

dhruba borthakur updated HADOOP-2113:
-------------------------------------

    Resolution: Fixed
        Status: Resolved  (was: Patch Available)

I just committed this. Thanks Chris!

> Add "-text" command to FsShell to decode SequenceFile to stdout
> ---------------------------------------------------------------
>
>                 Key: HADOOP-2113
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2113
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: fs
>            Reporter: Chris Douglas
>            Assignee: Chris Douglas
>            Priority: Minor
>             Fix For: 0.16.0
>
>         Attachments: 2113-0.patch, 2113-1.patch
>
>
> FsShell should provide a command to examine SequenceFiles.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-2113) Add "-text" command to FsShell to decode SequenceFile to stdout

Posted by "Milind Bhandarkar (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-2113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12538331 ] 

Milind Bhandarkar commented on HADOOP-2113:
-------------------------------------------

I find Enis's suggestion very valuable. Having a separate command to operate on sequencefiles (or rather any infputformat) would be great. These commands would allow users to specify inputformat. They can make only three assumptions, that the "datasets" belonging to a directory all have same format, are partitioned, and are locally sorted within a partition (in short, produced by a reduce phase.) Dumping them as text is a starting point, and then more commands can be added. Comments ?

> Add "-text" command to FsShell to decode SequenceFile to stdout
> ---------------------------------------------------------------
>
>                 Key: HADOOP-2113
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2113
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: fs
>            Reporter: Chris Douglas
>            Assignee: Chris Douglas
>            Priority: Minor
>             Fix For: 0.16.0
>
>         Attachments: 2113-0.patch
>
>
> FsShell should provide a command to examine SequenceFiles.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-2113) Add "-text" command to FsShell to decode SequenceFile to stdout

Posted by "Enis Soztutar (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-2113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12538838 ] 

Enis Soztutar commented on HADOOP-2113:
---------------------------------------

bq. I think I've explained this command poorly. It attempts to render whatever exists at a given path as human-readable text.
Himm, i guess i've just assumed that the file would be a SequenceFile, in which case the patch dumps all the contents of the file. But what i propose is more general for sequence files, but it lacks other file types. 

bq. aren't each of those commands easily implemented in map/reduce?
yes, they can be easily implemented as MR jobs, or local jobs, but the framework should include such jobs. 

Now understanding the original intention, I am OK with the current patch and I suggest we finalize this patch, and continue with Hadoop-175 for SF handling. We can later change SF dumping code(TextRecordInputStream) in Hadoop-175. 

> Add "-text" command to FsShell to decode SequenceFile to stdout
> ---------------------------------------------------------------
>
>                 Key: HADOOP-2113
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2113
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: fs
>            Reporter: Chris Douglas
>            Assignee: Chris Douglas
>            Priority: Minor
>             Fix For: 0.16.0
>
>         Attachments: 2113-0.patch
>
>
> FsShell should provide a command to examine SequenceFiles.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (HADOOP-2113) Add "-text" command to FsShell to decode SequenceFile to stdout

Posted by "Chris Douglas (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-2113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12538266 ] 

chris.douglas edited comment on HADOOP-2113 at 10/27/07 3:52 PM:
-----------------------------------------------------------------

(core tests failed HADOOP-2112; I assume the contrib tests are unrelated)

Each of those seem like valuable operations, but piping the output through one's favorite text-processing utility seems very usable. Unless the keys contain tabs, I would expect 1-4 in your list to be pretty straightforward. I agree that the framework could be far more efficient for most operations- particularly for sorted data, which is almost certainly the most common case- and it could also help express "for keys matching this regexp in their string representation, emit them as their native type" (which this cannot), but isn't mapred the correct tool for that job, anyway? The intent was merely to provide an aid to people hoping to check the first few/some subset of values from a given SequenceFile; it aspires to sanity checks, not processing.

I could see extending stat to support more info, re: (5), though. By "a more general set of tools", what did you have in mind?

[edit - unintended text effects ]

      was (Author: chris.douglas):
    (core tests failed HADOOP-2112; I assume the contrib tests are unrelated)

Each of those seem like valuable operations, but piping the output of "-text" through one's favorite text-processing utility seems very usable. Unless the keys contain tabs, I would expect 1-4 in your list to be pretty straightforward. I agree that the framework could be far more efficient for most operations- particularly for sorted data, which is almost certainly the most common case- and it could also help express "for keys matching this regexp in their string representation, emit them as their native type" (which this cannot), but isn't mapred the correct tool for that job, anyway? The intent was merely to provide an aid to people hoping to check the first few/some subset of values from a given SequenceFile; it aspires to sanity checks, not processing.

I could see extending -stat to support more info, re: (5), though. By "a more general set of tools", what did you have in mind?
  
> Add "-text" command to FsShell to decode SequenceFile to stdout
> ---------------------------------------------------------------
>
>                 Key: HADOOP-2113
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2113
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: fs
>            Reporter: Chris Douglas
>            Assignee: Chris Douglas
>            Priority: Minor
>             Fix For: 0.16.0
>
>         Attachments: 2113-0.patch
>
>
> FsShell should provide a command to examine SequenceFiles.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-2113) Add "-text" command to FsShell to decode SequenceFile to stdout

Posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-2113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12538334 ] 

Andrzej Bialecki  commented on HADOOP-2113:
-------------------------------------------

Please take a look at HADOOP-175 and see if that patch could be useful here.

> Add "-text" command to FsShell to decode SequenceFile to stdout
> ---------------------------------------------------------------
>
>                 Key: HADOOP-2113
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2113
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: fs
>            Reporter: Chris Douglas
>            Assignee: Chris Douglas
>            Priority: Minor
>             Fix For: 0.16.0
>
>         Attachments: 2113-0.patch
>
>
> FsShell should provide a command to examine SequenceFiles.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-2113) Add "-text" command to FsShell to decode SequenceFile to stdout

Posted by "Enis Soztutar (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-2113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12538199 ] 

Enis Soztutar commented on HADOOP-2113:
---------------------------------------

I rather think of implementing a more general set of tools to operate sequence files. Because sequence files are at the heart of mapred operations, making human-interprettable operations should be supported by the framework. The set of operations that I can think of include : 

# find value given key 
# find values with keys matching given regex (dump to text file)
# dump sequence file to text file (using keyClass.toString() and valueClass.toString())
# find pairs in the given key1-key2 range. 
# dump metadata and statistics of the sf, such as number of record, key range, etc. 
# ... suggestions ?



> Add "-text" command to FsShell to decode SequenceFile to stdout
> ---------------------------------------------------------------
>
>                 Key: HADOOP-2113
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2113
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: fs
>            Reporter: Chris Douglas
>            Assignee: Chris Douglas
>            Priority: Minor
>             Fix For: 0.16.0
>
>         Attachments: 2113-0.patch
>
>
> FsShell should provide a command to examine SequenceFiles.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-2113) Add "-text" command to FsShell to decode SequenceFile to stdout

Posted by "Chris Douglas (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-2113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chris Douglas updated HADOOP-2113:
----------------------------------

    Status: Patch Available  (was: Open)

> Add "-text" command to FsShell to decode SequenceFile to stdout
> ---------------------------------------------------------------
>
>                 Key: HADOOP-2113
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2113
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: fs
>            Reporter: Chris Douglas
>            Assignee: Chris Douglas
>            Priority: Minor
>             Fix For: 0.16.0
>
>         Attachments: 2113-0.patch, 2113-1.patch
>
>
> FsShell should provide a command to examine SequenceFiles.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-2113) Add "-text" command to FsShell to decode SequenceFile to stdout

Posted by "Chris Douglas (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-2113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chris Douglas updated HADOOP-2113:
----------------------------------

    Status: Patch Available  (was: Open)

> Add "-text" command to FsShell to decode SequenceFile to stdout
> ---------------------------------------------------------------
>
>                 Key: HADOOP-2113
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2113
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: fs
>            Reporter: Chris Douglas
>            Assignee: Chris Douglas
>            Priority: Minor
>             Fix For: 0.16.0
>
>         Attachments: 2113-0.patch
>
>
> FsShell should provide a command to examine SequenceFiles.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-2113) Add "-text" command to FsShell to decode SequenceFile to stdout

Posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-2113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12538570 ] 

Andrzej Bialecki  commented on HADOOP-2113:
-------------------------------------------

Some additional functionality was requested for HADOOP-175, and so far it didn't materialize ... ;)

UTF8 keys in these utilities are used only when user wants to retrieve specific records by key - and indeed, we can change this to Text - otherwise the tools use whatever classes are declared for keys/values, so from this point of view they don't depend on UTF8.

Regarding mapred: I use these utilities often, specifically for casual checking of existing data files, and they come especially handy in cases when only DFS is working but mapred might not be available, or when the overhead of starting a mapred job is too high (e.g. dumping the first record of a big SequenceFile).

> Add "-text" command to FsShell to decode SequenceFile to stdout
> ---------------------------------------------------------------
>
>                 Key: HADOOP-2113
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2113
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: fs
>            Reporter: Chris Douglas
>            Assignee: Chris Douglas
>            Priority: Minor
>             Fix For: 0.16.0
>
>         Attachments: 2113-0.patch
>
>
> FsShell should provide a command to examine SequenceFiles.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-2113) Add "-text" command to FsShell to decode SequenceFile to stdout

Posted by "Chris Douglas (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-2113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chris Douglas updated HADOOP-2113:
----------------------------------

    Status: Open  (was: Patch Available)

> Add "-text" command to FsShell to decode SequenceFile to stdout
> ---------------------------------------------------------------
>
>                 Key: HADOOP-2113
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2113
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: fs
>            Reporter: Chris Douglas
>            Assignee: Chris Douglas
>            Priority: Minor
>             Fix For: 0.16.0
>
>         Attachments: 2113-0.patch
>
>
> FsShell should provide a command to examine SequenceFiles.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-2113) Add "-text" command to FsShell to decode SequenceFile to stdout

Posted by "Chris Douglas (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-2113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chris Douglas updated HADOOP-2113:
----------------------------------

    Status: Patch Available  (was: Open)

Failed HADOOP-2112; trying Hudson again

> Add "-text" command to FsShell to decode SequenceFile to stdout
> ---------------------------------------------------------------
>
>                 Key: HADOOP-2113
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2113
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: fs
>            Reporter: Chris Douglas
>            Assignee: Chris Douglas
>            Priority: Minor
>             Fix For: 0.16.0
>
>         Attachments: 2113-0.patch
>
>
> FsShell should provide a command to examine SequenceFiles.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-2113) Add "-text" command to FsShell to decode SequenceFile to stdout

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-2113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12538212 ] 

Hadoop QA commented on HADOOP-2113:
-----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
http://issues.apache.org/jira/secure/attachment/12368535/2113-0.patch
against trunk revision r588778.

    @author +1.  The patch does not contain any @author tags.

    javadoc +1.  The javadoc tool did not generate any warning messages.

    javac +1.  The applied patch does not generate any new compiler warnings.

    findbugs +1.  The patch does not introduce any new Findbugs warnings.

    core tests -1.  The patch failed core unit tests.

    contrib tests -1.  The patch failed contrib unit tests.

Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1015/testReport/
Findbugs warnings: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1015/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1015/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1015/console

This message is automatically generated.

> Add "-text" command to FsShell to decode SequenceFile to stdout
> ---------------------------------------------------------------
>
>                 Key: HADOOP-2113
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2113
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: fs
>            Reporter: Chris Douglas
>            Assignee: Chris Douglas
>            Priority: Minor
>             Fix For: 0.16.0
>
>         Attachments: 2113-0.patch
>
>
> FsShell should provide a command to examine SequenceFiles.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-2113) Add "-text" command to FsShell to decode SequenceFile to stdout

Posted by "Chris Douglas (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-2113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12538557 ] 

Chris Douglas commented on HADOOP-2113:
---------------------------------------

I think I've explained this command poorly. It attempts to render whatever exists at a given path as human-readable text. Right now, it includes SequenceFile and gzip formats; it's not trying to stuff a framework for computation on SequenceFiles into FsShell. I agree that such a toolchain should be independent, but this aspires to something else.

While we're on the subject though, I'm not sure I fully understand the motivation for this command-line tool. Aren't each of those commands easily implemented in map/reduce? As I see it, there are two ways to generalize the operations Enis suggests, since all of WritableComparable is fair game. Either a) everything is first converted to a string or b) the framework can understand that a user-specified InputFormat creating a RecordReader creating a keytype comparable to IntWritable should select a comparator for its keys such that the user-supplied "70" is greater than "9", (unless the user actually intends a lexiographic ordering). Not to reveal my opinion. ;)

In the latter case, code like this belongs in mapred, since merely working out the types is going to be either a hack or a significant effort. In the former case, for more than a single SequenceFile, such code still seems to belong in mapred; that said, piping the output of "text"- as implemented- through a general text-processing utility is a reasonable hack for some purposes. For my purposes, I only needed to check the first few records for some of the output, and this suffices. I don't know why a comparable utility like HADOOP-175 never got committed (it would be a good base, though 1) it relies on UTF8 keys which are currently deprecated and 2) it solves some problems outside the limited domain of this issue), but that no similar utility has been written for the last year makes me wary of over-complicating this. It's for human-readability, not processing.

> Add "-text" command to FsShell to decode SequenceFile to stdout
> ---------------------------------------------------------------
>
>                 Key: HADOOP-2113
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2113
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: fs
>            Reporter: Chris Douglas
>            Assignee: Chris Douglas
>            Priority: Minor
>             Fix For: 0.16.0
>
>         Attachments: 2113-0.patch
>
>
> FsShell should provide a command to examine SequenceFiles.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-2113) Add "-text" command to FsShell to decode SequenceFile to stdout

Posted by "Enis Soztutar (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-2113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12538316 ] 

Enis Soztutar commented on HADOOP-2113:
---------------------------------------

bq. By "a more general set of tools", what did you have in mind?
I think of introducing a new command rather than using FsShell, such as 
{noformat}
bin/hadoop sf <command> <command_args>
{noformat}

and the set of commands would be : findkey, matchkey, dump, stats, etc .

For some jobs such as finding record given key/value, for example we may check whether sf is a map file, for other commands like matchkey we may run a distributed grep. 

bq. Each of those seem like valuable operations, but piping the output through one's favorite text-processing utility seems very usable.
Yes, indeed the outputs of some of the commands sould be dumped to stdout. We can add a filename argument and use stdout if "-" is given. 




> Add "-text" command to FsShell to decode SequenceFile to stdout
> ---------------------------------------------------------------
>
>                 Key: HADOOP-2113
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2113
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: fs
>            Reporter: Chris Douglas
>            Assignee: Chris Douglas
>            Priority: Minor
>             Fix For: 0.16.0
>
>         Attachments: 2113-0.patch
>
>
> FsShell should provide a command to examine SequenceFiles.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-2113) Add "-text" command to FsShell to decode SequenceFile to stdout

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-2113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

dhruba borthakur updated HADOOP-2113:
-------------------------------------

    Status: Open  (was: Patch Available)

It would be really good if we can have a test case.

> Add "-text" command to FsShell to decode SequenceFile to stdout
> ---------------------------------------------------------------
>
>                 Key: HADOOP-2113
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2113
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: fs
>            Reporter: Chris Douglas
>            Assignee: Chris Douglas
>            Priority: Minor
>             Fix For: 0.16.0
>
>         Attachments: 2113-0.patch
>
>
> FsShell should provide a command to examine SequenceFiles.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-2113) Add "-text" command to FsShell to decode SequenceFile to stdout

Posted by "Chris Douglas (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-2113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12538266 ] 

Chris Douglas commented on HADOOP-2113:
---------------------------------------

(core tests failed HADOOP-2112; I assume the contrib tests are unrelated)

Each of those seem like valuable operations, but piping the output of "-text" through one's favorite text-processing utility seems very usable. Unless the keys contain tabs, I would expect 1-4 in your list to be pretty straightforward. I agree that the framework could be far more efficient for most operations- particularly for sorted data, which is almost certainly the most common case- and it could also help express "for keys matching this regexp in their string representation, emit them as their native type" (which this cannot), but isn't mapred the correct tool for that job, anyway? The intent was merely to provide an aid to people hoping to check the first few/some subset of values from a given SequenceFile; it aspires to sanity checks, not processing.

I could see extending -stat to support more info, re: (5), though. By "a more general set of tools", what did you have in mind?

> Add "-text" command to FsShell to decode SequenceFile to stdout
> ---------------------------------------------------------------
>
>                 Key: HADOOP-2113
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2113
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: fs
>            Reporter: Chris Douglas
>            Assignee: Chris Douglas
>            Priority: Minor
>             Fix For: 0.16.0
>
>         Attachments: 2113-0.patch
>
>
> FsShell should provide a command to examine SequenceFiles.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-2113) Add "-text" command to FsShell to decode SequenceFile to stdout

Posted by "Enis Soztutar (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-2113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12538548 ] 

Enis Soztutar commented on HADOOP-2113:
---------------------------------------

Thanks, Hadoop-175 could be a good starting point, i wonder why it did not make into trunk. 

> Add "-text" command to FsShell to decode SequenceFile to stdout
> ---------------------------------------------------------------
>
>                 Key: HADOOP-2113
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2113
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: fs
>            Reporter: Chris Douglas
>            Assignee: Chris Douglas
>            Priority: Minor
>             Fix For: 0.16.0
>
>         Attachments: 2113-0.patch
>
>
> FsShell should provide a command to examine SequenceFiles.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.