You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Robin Anil (JIRA)" <ji...@apache.org> on 2010/09/25 23:44:33 UTC

[jira] Updated: (MAHOUT-287) Bayes Classifier should use Vector as input

     [ https://issues.apache.org/jira/browse/MAHOUT-287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robin Anil updated MAHOUT-287:
------------------------------

    Attachment: MAHOUT-logNormalize.patch

Patch for implementing logNormalization in Vector. has tests and adds logNormalization in DictionaryVectorizer.

Also changed the behaviour of DictionaryVectorizer.


If tf vectors are being generated, the final merge stage normalizes it.
If tfidf vectors are being generated, no normalization is done in tf stage, normalization is done in merge after idf is calculated

Also fixed a bug which caused normalization not to happen if chunk size was one( we were renaming the partial directory as a speed hack)

 

> Bayes Classifier should use Vector as input
> -------------------------------------------
>
>                 Key: MAHOUT-287
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-287
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Classification
>    Affects Versions: 0.4
>            Reporter: Robin Anil
>            Assignee: Robin Anil
>             Fix For: 0.4
>
>         Attachments: MAHOUT-logNormalize.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.