You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "tom pierce (Created) (JIRA)" <ji...@apache.org> on 2012/01/15 23:53:40 UTC

[jira] [Created] (MAHOUT-947) Improvements to seqdumper

Improvements to seqdumper
-------------------------

                 Key: MAHOUT-947
                 URL: https://issues.apache.org/jira/browse/MAHOUT-947
             Project: Mahout
          Issue Type: Improvement
            Reporter: tom pierce
            Priority: Minor


I've put together a few handy additions to seqdumper:

* Ability to dump all sequence files in a directory.
* A quiet flag to attenuate the non-data output.
* A flag to toggle name-only printing for NamedVector values.
* An option to only print the N highest-valued elements in WeightedVector values

Seems like others will probably find some of these to be helpful.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAHOUT-947) Improvements to seqdumper

Posted by "Grant Ingersoll (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13205392#comment-13205392 ] 

Grant Ingersoll commented on MAHOUT-947:
----------------------------------------

Hmm, should be a getOptions in there, but maybe my patch is messed up:
{code}
/**
   * Options can occur multiple times, so return the list
   * @param optionName The unadorned (no "--" prefixing it) option name
   * @return The values, else null.  If the option is present, but has no values, then the result will be an empty list (Collections.emptyList())
   */
  public List<String> getOptions(String optionName){
    return argMap.get(keyFor(optionName));
  }
{code}

Or do you mean we should just have one or the other, but not both?  That could work.  I did both as it seems like one knows when one only wants want arg and when one wants multiples, so getOption() is really just a convenience method.

Perhaps the parsedArgs is still useful if one wants to iterate over them or something?  There also is at least one place where we use them to pass through to other jobs that aren't necessarily AbstractJobs (DistributedConjugateGradientSolver)

                
> Improvements to seqdumper
> -------------------------
>
>                 Key: MAHOUT-947
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-947
>             Project: Mahout
>          Issue Type: Improvement
>            Reporter: tom pierce
>            Assignee: Grant Ingersoll
>            Priority: Minor
>             Fix For: 0.7
>
>         Attachments: MAHOUT-947-2.patch, MAHOUT-947.patch, MAHOUT-947.patch, MAHOUT-947.patch
>
>
> I've put together a few handy additions to seqdumper:
> * Ability to dump all sequence files in a directory.
> * A quiet flag to attenuate the non-data output.
> * A flag to toggle name-only printing for NamedVector values.
> * An option to only print the N highest-valued elements in WeightedVector values
> Seems like others will probably find some of these to be helpful.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAHOUT-947) Improvements to seqdumper

Posted by "tom pierce (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

tom pierce updated MAHOUT-947:
------------------------------

    Attachment: MAHOUT-947-2.patch

Adjusted to put vector options in VectorDumper.  Also add ability to dump vectors from AbstractClusters and WeightedPropertyVectorWritables.
                
> Improvements to seqdumper
> -------------------------
>
>                 Key: MAHOUT-947
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-947
>             Project: Mahout
>          Issue Type: Improvement
>            Reporter: tom pierce
>            Priority: Minor
>         Attachments: MAHOUT-947-2.patch, MAHOUT-947.patch
>
>
> I've put together a few handy additions to seqdumper:
> * Ability to dump all sequence files in a directory.
> * A quiet flag to attenuate the non-data output.
> * A flag to toggle name-only printing for NamedVector values.
> * An option to only print the N highest-valued elements in WeightedVector values
> Seems like others will probably find some of these to be helpful.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAHOUT-947) Improvements to seqdumper

Posted by "tom pierce (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

tom pierce updated MAHOUT-947:
------------------------------

    Attachment: MAHOUT-947.patch
    
> Improvements to seqdumper
> -------------------------
>
>                 Key: MAHOUT-947
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-947
>             Project: Mahout
>          Issue Type: Improvement
>            Reporter: tom pierce
>            Priority: Minor
>         Attachments: MAHOUT-947.patch
>
>
> I've put together a few handy additions to seqdumper:
> * Ability to dump all sequence files in a directory.
> * A quiet flag to attenuate the non-data output.
> * A flag to toggle name-only printing for NamedVector values.
> * An option to only print the N highest-valued elements in WeightedVector values
> Seems like others will probably find some of these to be helpful.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAHOUT-947) Improvements to seqdumper

Posted by "Lance Norskog (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13206332#comment-13206332 ] 

Lance Norskog commented on MAHOUT-947:
--------------------------------------

There is a sequencefile utility code pattern that handles files and directories and supplies an iterator. There's no need to have separate args.
                
> Improvements to seqdumper
> -------------------------
>
>                 Key: MAHOUT-947
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-947
>             Project: Mahout
>          Issue Type: Improvement
>            Reporter: tom pierce
>            Assignee: Grant Ingersoll
>            Priority: Minor
>             Fix For: 0.7
>
>         Attachments: MAHOUT-947-2.patch, MAHOUT-947.patch, MAHOUT-947.patch, MAHOUT-947.patch
>
>
> I've put together a few handy additions to seqdumper:
> * Ability to dump all sequence files in a directory.
> * A quiet flag to attenuate the non-data output.
> * A flag to toggle name-only printing for NamedVector values.
> * An option to only print the N highest-valued elements in WeightedVector values
> Seems like others will probably find some of these to be helpful.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAHOUT-947) Improvements to seqdumper

Posted by "Lance Norskog (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13187291#comment-13187291 ] 

Lance Norskog commented on MAHOUT-947:
--------------------------------------

mahout/src/conf/driver.classes.props lists all of the dumper classes.

Everything needs the "quiet" option :) In fact, could it just change the log level from INFO to WARN? 
                
> Improvements to seqdumper
> -------------------------
>
>                 Key: MAHOUT-947
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-947
>             Project: Mahout
>          Issue Type: Improvement
>            Reporter: tom pierce
>            Priority: Minor
>         Attachments: MAHOUT-947-2.patch, MAHOUT-947.patch
>
>
> I've put together a few handy additions to seqdumper:
> * Ability to dump all sequence files in a directory.
> * A quiet flag to attenuate the non-data output.
> * A flag to toggle name-only printing for NamedVector values.
> * An option to only print the N highest-valued elements in WeightedVector values
> Seems like others will probably find some of these to be helpful.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAHOUT-947) Improvements to seqdumper

Posted by "Grant Ingersoll (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13205018#comment-13205018 ] 

Grant Ingersoll commented on MAHOUT-947:
----------------------------------------

I have a patch for this that cleans this up and switches it to the standard command line processing.  Will post soon

                
> Improvements to seqdumper
> -------------------------
>
>                 Key: MAHOUT-947
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-947
>             Project: Mahout
>          Issue Type: Improvement
>            Reporter: tom pierce
>            Assignee: Grant Ingersoll
>            Priority: Minor
>             Fix For: 0.7
>
>         Attachments: MAHOUT-947-2.patch, MAHOUT-947.patch, MAHOUT-947.patch
>
>
> I've put together a few handy additions to seqdumper:
> * Ability to dump all sequence files in a directory.
> * A quiet flag to attenuate the non-data output.
> * A flag to toggle name-only printing for NamedVector values.
> * An option to only print the N highest-valued elements in WeightedVector values
> Seems like others will probably find some of these to be helpful.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAHOUT-947) Improvements to seqdumper

Posted by "Sean Owen (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13205262#comment-13205262 ] 

Sean Owen commented on MAHOUT-947:
----------------------------------

My only issue with this is that this has brought in a second way to access args -- getOption(). It seems like it should be one way or the other. The 'parsedArgs' thing is no longer useful then and could be removed. You could have a getOptions() method to return a multiple-valued flag.
                
> Improvements to seqdumper
> -------------------------
>
>                 Key: MAHOUT-947
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-947
>             Project: Mahout
>          Issue Type: Improvement
>            Reporter: tom pierce
>            Assignee: Grant Ingersoll
>            Priority: Minor
>             Fix For: 0.7
>
>         Attachments: MAHOUT-947-2.patch, MAHOUT-947.patch, MAHOUT-947.patch, MAHOUT-947.patch
>
>
> I've put together a few handy additions to seqdumper:
> * Ability to dump all sequence files in a directory.
> * A quiet flag to attenuate the non-data output.
> * A flag to toggle name-only printing for NamedVector values.
> * An option to only print the N highest-valued elements in WeightedVector values
> Seems like others will probably find some of these to be helpful.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAHOUT-947) Improvements to seqdumper

Posted by "tom pierce (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13186935#comment-13186935 ] 

tom pierce commented on MAHOUT-947:
-----------------------------------

Oh nice- I hadn't seen VectorDumper before.  Looks like they could both use the directory and quiet options, and the other things should move over.  Will adjust patch.
                
> Improvements to seqdumper
> -------------------------
>
>                 Key: MAHOUT-947
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-947
>             Project: Mahout
>          Issue Type: Improvement
>            Reporter: tom pierce
>            Priority: Minor
>         Attachments: MAHOUT-947.patch
>
>
> I've put together a few handy additions to seqdumper:
> * Ability to dump all sequence files in a directory.
> * A quiet flag to attenuate the non-data output.
> * A flag to toggle name-only printing for NamedVector values.
> * An option to only print the N highest-valued elements in WeightedVector values
> Seems like others will probably find some of these to be helpful.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAHOUT-947) Improvements to seqdumper

Posted by "Jake Mannix (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13205017#comment-13205017 ] 

Jake Mannix commented on MAHOUT-947:
------------------------------------

so one comment: instead of --seqDirectory vs --seqFile, why not just support glob paths?  

vectordumper -s "/path/to/my/dir/part-*"

?

Do we need separate flags for if it's a directory vs a file?  Makes the usage messier, IMO.
                
> Improvements to seqdumper
> -------------------------
>
>                 Key: MAHOUT-947
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-947
>             Project: Mahout
>          Issue Type: Improvement
>            Reporter: tom pierce
>            Assignee: Grant Ingersoll
>            Priority: Minor
>             Fix For: 0.7
>
>         Attachments: MAHOUT-947-2.patch, MAHOUT-947.patch, MAHOUT-947.patch
>
>
> I've put together a few handy additions to seqdumper:
> * Ability to dump all sequence files in a directory.
> * A quiet flag to attenuate the non-data output.
> * A flag to toggle name-only printing for NamedVector values.
> * An option to only print the N highest-valued elements in WeightedVector values
> Seems like others will probably find some of these to be helpful.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAHOUT-947) Improvements to seqdumper

Posted by "Grant Ingersoll (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Grant Ingersoll updated MAHOUT-947:
-----------------------------------

    Attachment: MAHOUT-947.patch

This patch is quite a bit bigger than Tom's b/c as I was digging in converting to use AbstractJob, I realized that AbstractJob only supported single value arguments.  This patch fixes this and also goes through and standardizes a whole slew of files that use AbstractJob
                
> Improvements to seqdumper
> -------------------------
>
>                 Key: MAHOUT-947
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-947
>             Project: Mahout
>          Issue Type: Improvement
>            Reporter: tom pierce
>            Assignee: Grant Ingersoll
>            Priority: Minor
>             Fix For: 0.7
>
>         Attachments: MAHOUT-947-2.patch, MAHOUT-947.patch, MAHOUT-947.patch, MAHOUT-947.patch
>
>
> I've put together a few handy additions to seqdumper:
> * Ability to dump all sequence files in a directory.
> * A quiet flag to attenuate the non-data output.
> * A flag to toggle name-only printing for NamedVector values.
> * An option to only print the N highest-valued elements in WeightedVector values
> Seems like others will probably find some of these to be helpful.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAHOUT-947) Improvements to seqdumper

Posted by "Grant Ingersoll (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Grant Ingersoll updated MAHOUT-947:
-----------------------------------

    Comment: was deleted

(was: Good point, Lance.  Updating and committing.)
    
> Improvements to seqdumper
> -------------------------
>
>                 Key: MAHOUT-947
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-947
>             Project: Mahout
>          Issue Type: Improvement
>            Reporter: tom pierce
>            Assignee: Grant Ingersoll
>            Priority: Minor
>             Fix For: 0.7
>
>         Attachments: MAHOUT-947-2.patch, MAHOUT-947.patch, MAHOUT-947.patch, MAHOUT-947.patch
>
>
> I've put together a few handy additions to seqdumper:
> * Ability to dump all sequence files in a directory.
> * A quiet flag to attenuate the non-data output.
> * A flag to toggle name-only printing for NamedVector values.
> * An option to only print the N highest-valued elements in WeightedVector values
> Seems like others will probably find some of these to be helpful.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAHOUT-947) Improvements to seqdumper

Posted by "Grant Ingersoll (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Grant Ingersoll updated MAHOUT-947:
-----------------------------------

    Fix Version/s: 0.7
    
> Improvements to seqdumper
> -------------------------
>
>                 Key: MAHOUT-947
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-947
>             Project: Mahout
>          Issue Type: Improvement
>            Reporter: tom pierce
>            Assignee: Grant Ingersoll
>            Priority: Minor
>             Fix For: 0.7
>
>         Attachments: MAHOUT-947-2.patch, MAHOUT-947.patch, MAHOUT-947.patch
>
>
> I've put together a few handy additions to seqdumper:
> * Ability to dump all sequence files in a directory.
> * A quiet flag to attenuate the non-data output.
> * A flag to toggle name-only printing for NamedVector values.
> * An option to only print the N highest-valued elements in WeightedVector values
> Seems like others will probably find some of these to be helpful.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAHOUT-947) Improvements to seqdumper

Posted by "tom pierce (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13187861#comment-13187861 ] 

tom pierce commented on MAHOUT-947:
-----------------------------------

Hah - I agree on everything needing a quiet option (or better, quiet by default and a verbose flag!).

There was at least one println/write that I didn't want to see in one of these (and the bin/mahout wrapper adds some output too).  Turning up the loglevel would be good, though I think I remember this being tricky to do programatically with slf4j (though maybe I'm confusing slf4j with another Java logger). 

Any objection to VectorDumper having the ability to dump vectors from clusters?  It looks like ClusterDumper likes to read things into core, which can sometimes be troublesome.

                
> Improvements to seqdumper
> -------------------------
>
>                 Key: MAHOUT-947
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-947
>             Project: Mahout
>          Issue Type: Improvement
>            Reporter: tom pierce
>            Priority: Minor
>         Attachments: MAHOUT-947-2.patch, MAHOUT-947.patch
>
>
> I've put together a few handy additions to seqdumper:
> * Ability to dump all sequence files in a directory.
> * A quiet flag to attenuate the non-data output.
> * A flag to toggle name-only printing for NamedVector values.
> * An option to only print the N highest-valued elements in WeightedVector values
> Seems like others will probably find some of these to be helpful.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAHOUT-947) Improvements to seqdumper

Posted by "Grant Ingersoll (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13205397#comment-13205397 ] 

Grant Ingersoll commented on MAHOUT-947:
----------------------------------------

bq. I wasn't suggesting supporting multiple args, just quoted globs - since HDFS FileSystem supports strings with glob patterns in them...

Makes sense.  I can update.  In any case, I changed Tom's patch to use our standard --input flag for both and then just check to see whether it is a directory or not.  We could just as well check to see if it is a glob.
                
> Improvements to seqdumper
> -------------------------
>
>                 Key: MAHOUT-947
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-947
>             Project: Mahout
>          Issue Type: Improvement
>            Reporter: tom pierce
>            Assignee: Grant Ingersoll
>            Priority: Minor
>             Fix For: 0.7
>
>         Attachments: MAHOUT-947-2.patch, MAHOUT-947.patch, MAHOUT-947.patch, MAHOUT-947.patch
>
>
> I've put together a few handy additions to seqdumper:
> * Ability to dump all sequence files in a directory.
> * A quiet flag to attenuate the non-data output.
> * A flag to toggle name-only printing for NamedVector values.
> * An option to only print the N highest-valued elements in WeightedVector values
> Seems like others will probably find some of these to be helpful.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAHOUT-947) Improvements to seqdumper

Posted by "Grant Ingersoll (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Grant Ingersoll updated MAHOUT-947:
-----------------------------------

    Resolution: Fixed
        Status: Resolved  (was: Patch Available)
    
> Improvements to seqdumper
> -------------------------
>
>                 Key: MAHOUT-947
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-947
>             Project: Mahout
>          Issue Type: Improvement
>            Reporter: tom pierce
>            Assignee: Grant Ingersoll
>            Priority: Minor
>             Fix For: 0.7
>
>         Attachments: MAHOUT-947-2.patch, MAHOUT-947.patch, MAHOUT-947.patch, MAHOUT-947.patch
>
>
> I've put together a few handy additions to seqdumper:
> * Ability to dump all sequence files in a directory.
> * A quiet flag to attenuate the non-data output.
> * A flag to toggle name-only printing for NamedVector values.
> * An option to only print the N highest-valued elements in WeightedVector values
> Seems like others will probably find some of these to be helpful.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAHOUT-947) Improvements to seqdumper

Posted by "Lance Norskog (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13205003#comment-13205003 ] 

Lance Norskog commented on MAHOUT-947:
--------------------------------------

I won't be able to try it. The patch looks clean.
                
> Improvements to seqdumper
> -------------------------
>
>                 Key: MAHOUT-947
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-947
>             Project: Mahout
>          Issue Type: Improvement
>            Reporter: tom pierce
>            Assignee: Grant Ingersoll
>            Priority: Minor
>             Fix For: 0.7
>
>         Attachments: MAHOUT-947-2.patch, MAHOUT-947.patch, MAHOUT-947.patch
>
>
> I've put together a few handy additions to seqdumper:
> * Ability to dump all sequence files in a directory.
> * A quiet flag to attenuate the non-data output.
> * A flag to toggle name-only printing for NamedVector values.
> * An option to only print the N highest-valued elements in WeightedVector values
> Seems like others will probably find some of these to be helpful.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Assigned] (MAHOUT-947) Improvements to seqdumper

Posted by "Grant Ingersoll (Assigned) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Grant Ingersoll reassigned MAHOUT-947:
--------------------------------------

    Assignee: Grant Ingersoll
    
> Improvements to seqdumper
> -------------------------
>
>                 Key: MAHOUT-947
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-947
>             Project: Mahout
>          Issue Type: Improvement
>            Reporter: tom pierce
>            Assignee: Grant Ingersoll
>            Priority: Minor
>             Fix For: 0.7
>
>         Attachments: MAHOUT-947-2.patch, MAHOUT-947.patch, MAHOUT-947.patch
>
>
> I've put together a few handy additions to seqdumper:
> * Ability to dump all sequence files in a directory.
> * A quiet flag to attenuate the non-data output.
> * A flag to toggle name-only printing for NamedVector values.
> * An option to only print the N highest-valued elements in WeightedVector values
> Seems like others will probably find some of these to be helpful.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAHOUT-947) Improvements to seqdumper

Posted by "tom pierce (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

tom pierce updated MAHOUT-947:
------------------------------

    Attachment: MAHOUT-947.patch

Dropped the cluster dumping addition to VectorDumper.
                
> Improvements to seqdumper
> -------------------------
>
>                 Key: MAHOUT-947
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-947
>             Project: Mahout
>          Issue Type: Improvement
>            Reporter: tom pierce
>            Priority: Minor
>         Attachments: MAHOUT-947-2.patch, MAHOUT-947.patch, MAHOUT-947.patch
>
>
> I've put together a few handy additions to seqdumper:
> * Ability to dump all sequence files in a directory.
> * A quiet flag to attenuate the non-data output.
> * A flag to toggle name-only printing for NamedVector values.
> * An option to only print the N highest-valued elements in WeightedVector values
> Seems like others will probably find some of these to be helpful.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAHOUT-947) Improvements to seqdumper

Posted by "Grant Ingersoll (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13206855#comment-13206855 ] 

Grant Ingersoll commented on MAHOUT-947:
----------------------------------------

Good point, Lance.  Updating and committing.
                
> Improvements to seqdumper
> -------------------------
>
>                 Key: MAHOUT-947
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-947
>             Project: Mahout
>          Issue Type: Improvement
>            Reporter: tom pierce
>            Assignee: Grant Ingersoll
>            Priority: Minor
>             Fix For: 0.7
>
>         Attachments: MAHOUT-947-2.patch, MAHOUT-947.patch, MAHOUT-947.patch, MAHOUT-947.patch
>
>
> I've put together a few handy additions to seqdumper:
> * Ability to dump all sequence files in a directory.
> * A quiet flag to attenuate the non-data output.
> * A flag to toggle name-only printing for NamedVector values.
> * An option to only print the N highest-valued elements in WeightedVector values
> Seems like others will probably find some of these to be helpful.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAHOUT-947) Improvements to seqdumper

Posted by "Grant Ingersoll (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13204152#comment-13204152 ] 

Grant Ingersoll commented on MAHOUT-947:
----------------------------------------

I'm close to committing



                
> Improvements to seqdumper
> -------------------------
>
>                 Key: MAHOUT-947
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-947
>             Project: Mahout
>          Issue Type: Improvement
>            Reporter: tom pierce
>            Assignee: Grant Ingersoll
>            Priority: Minor
>             Fix For: 0.7
>
>         Attachments: MAHOUT-947-2.patch, MAHOUT-947.patch, MAHOUT-947.patch
>
>
> I've put together a few handy additions to seqdumper:
> * Ability to dump all sequence files in a directory.
> * A quiet flag to attenuate the non-data output.
> * A flag to toggle name-only printing for NamedVector values.
> * An option to only print the N highest-valued elements in WeightedVector values
> Seems like others will probably find some of these to be helpful.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAHOUT-947) Improvements to seqdumper

Posted by "tom pierce (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

tom pierce updated MAHOUT-947:
------------------------------

    Status: Patch Available  (was: Open)
    
> Improvements to seqdumper
> -------------------------
>
>                 Key: MAHOUT-947
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-947
>             Project: Mahout
>          Issue Type: Improvement
>            Reporter: tom pierce
>            Priority: Minor
>         Attachments: MAHOUT-947.patch
>
>
> I've put together a few handy additions to seqdumper:
> * Ability to dump all sequence files in a directory.
> * A quiet flag to attenuate the non-data output.
> * A flag to toggle name-only printing for NamedVector values.
> * An option to only print the N highest-valued elements in WeightedVector values
> Seems like others will probably find some of these to be helpful.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAHOUT-947) Improvements to seqdumper

Posted by "Jake Mannix (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13205104#comment-13205104 ] 

Jake Mannix commented on MAHOUT-947:
------------------------------------

I wasn't suggesting supporting multiple args, just quoted globs - since HDFS FileSystem supports strings with glob patterns in them...
                
> Improvements to seqdumper
> -------------------------
>
>                 Key: MAHOUT-947
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-947
>             Project: Mahout
>          Issue Type: Improvement
>            Reporter: tom pierce
>            Assignee: Grant Ingersoll
>            Priority: Minor
>             Fix For: 0.7
>
>         Attachments: MAHOUT-947-2.patch, MAHOUT-947.patch, MAHOUT-947.patch, MAHOUT-947.patch
>
>
> I've put together a few handy additions to seqdumper:
> * Ability to dump all sequence files in a directory.
> * A quiet flag to attenuate the non-data output.
> * A flag to toggle name-only printing for NamedVector values.
> * An option to only print the N highest-valued elements in WeightedVector values
> Seems like others will probably find some of these to be helpful.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAHOUT-947) Improvements to seqdumper

Posted by "Jeff Hammerbacher (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13204119#comment-13204119 ] 

Jeff Hammerbacher commented on MAHOUT-947:
------------------------------------------

Hey Lance,

Any more changes required for this patch?

Thanks,
Jeff
                
> Improvements to seqdumper
> -------------------------
>
>                 Key: MAHOUT-947
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-947
>             Project: Mahout
>          Issue Type: Improvement
>            Reporter: tom pierce
>            Assignee: Grant Ingersoll
>            Priority: Minor
>             Fix For: 0.7
>
>         Attachments: MAHOUT-947-2.patch, MAHOUT-947.patch, MAHOUT-947.patch
>
>
> I've put together a few handy additions to seqdumper:
> * Ability to dump all sequence files in a directory.
> * A quiet flag to attenuate the non-data output.
> * A flag to toggle name-only printing for NamedVector values.
> * An option to only print the N highest-valued elements in WeightedVector values
> Seems like others will probably find some of these to be helpful.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAHOUT-947) Improvements to seqdumper

Posted by "Lance Norskog (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13186630#comment-13186630 ] 

Lance Norskog commented on MAHOUT-947:
--------------------------------------

VectorDumper is a custom class just for vectors; should most of these be there?
                
> Improvements to seqdumper
> -------------------------
>
>                 Key: MAHOUT-947
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-947
>             Project: Mahout
>          Issue Type: Improvement
>            Reporter: tom pierce
>            Priority: Minor
>         Attachments: MAHOUT-947.patch
>
>
> I've put together a few handy additions to seqdumper:
> * Ability to dump all sequence files in a directory.
> * A quiet flag to attenuate the non-data output.
> * A flag to toggle name-only printing for NamedVector values.
> * An option to only print the N highest-valued elements in WeightedVector values
> Seems like others will probably find some of these to be helpful.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira