You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Jeff Eastman (JIRA)" <ji...@apache.org> on 2010/05/22 03:14:15 UTC

[jira] Updated: (MAHOUT-398) Seq2sparse outputs final vectors to different directories depending upon the TF/TFIDF weight switch. This is confusing to users.

     [ https://issues.apache.org/jira/browse/MAHOUT-398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jeff Eastman updated MAHOUT-398:
--------------------------------

    Description: 
In TF mode, seq2sparse puts the output vectors into <output>/vectors. In TFIDF mode; however, it puts the output vectors into <output>/tfidf/vectors. This happens because the IDF calculation - if it is selected - happens after TF and uses the TF vectors for its input.

Seems like both modes ought to output to a consistent directory structure so changing the switch does not change the final output location: perhaps as simple as changing TF to output to <output>/tf/vectors so that the contents of both directories when present are more obvious from their nomenclature.

  was:In TF mode, seq2sparse puts the output vectors into <output>vectors. In TFIDF mode; however, it puts the output vectors into <output>/tfidf/vectors. Even worse, in TFIDF mode the TFIDF converter reuses the <output>/vector/ directory for its intermediate calculations. Seems like both modes ought to output to the same directory so changing the switch does not cause downstream user changes that are error-prone and confusing.


> Seq2sparse outputs final vectors to different directories depending upon the TF/TFIDF weight switch. This is confusing to users.
> --------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-398
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-398
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Utils
>    Affects Versions: 0.3
>            Reporter: Jeff Eastman
>             Fix For: 0.4
>
>
> In TF mode, seq2sparse puts the output vectors into <output>/vectors. In TFIDF mode; however, it puts the output vectors into <output>/tfidf/vectors. This happens because the IDF calculation - if it is selected - happens after TF and uses the TF vectors for its input.
> Seems like both modes ought to output to a consistent directory structure so changing the switch does not change the final output location: perhaps as simple as changing TF to output to <output>/tf/vectors so that the contents of both directories when present are more obvious from their nomenclature.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.