You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Drew Farris (JIRA)" <ji...@apache.org> on 2010/10/02 03:02:32 UTC

[jira] Updated: (MAHOUT-373) VectorDumper/VectorHelper doesn't dump values when dictionary is present

     [ https://issues.apache.org/jira/browse/MAHOUT-373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Drew Farris updated MAHOUT-373:
-------------------------------

    Attachment: MAHOUT-373.patch


{{./bin/mahout vectordump -s tf-vectors/part-r-00000}}

produces:

{noformat}
elts: {12222:1.0, 1582:2.0, 24439:1.0, 26591:1.0, 10772:1.0, 20835:1.0, 7167:3.0[...]
{noformat}

{{./bin/mahout vectordump -s tf-vectors/part-r-00000 -dt sequencefile -d dictionary.file-0}}

produces:

{noformat}
elts: {doubt:1.0, 15:2.0, superior:1.0, which:1.0, continued:1.0, prices:1.0, against:3.0[...]
{noformat}


> VectorDumper/VectorHelper doesn't dump values when dictionary is present
> ------------------------------------------------------------------------
>
>                 Key: MAHOUT-373
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-373
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Utils
>    Affects Versions: 0.3
>            Reporter: Drew Farris
>            Assignee: Drew Farris
>            Priority: Minor
>             Fix For: 0.5
>
>         Attachments: MAHOUT-373.patch
>
>
> Dumping term vectors with a dictionary using:
> {{mahout vectordump -s vector-output/chunk-0 -dt sequencefile -d dictionary-output}}
> gives me output like the following with no values, just the indexes expanded into their dictionary entries:
> {code}
> Name: 0001-11055 elts: {513:discard, 7199:empty,...
> {code}
> While dumping the same vector without a dictionary using
> {{mahout vectordump -s vector-output/chunk-0}}
> gives me output that includes indexes and values:
> {code}
> Name: 0001-11055 elts: {513:1.0, 7199:1.0...
> {code}
> Would it make sense for the dictionary based output to include values as well? Anyone opposed to modifying VectorHelper.vectorToString(Vector, String[]) to do so?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.