You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Drew Farris (JIRA)" <ji...@apache.org> on 2010/08/24 14:58:17 UTC

[jira] Updated: (MAHOUT-487) Issues with memory use and inconsistent or state-influenced results when using CBayesAlgorithm

     [ https://issues.apache.org/jira/browse/MAHOUT-487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Drew Farris updated MAHOUT-487:
-------------------------------

    Description: 
Came across this digging through the mailing list archives for something else, probably worth tracking as an issue.

{quote}
During classification, every word still unknown is added to 
featureDictionary. This leads to the excessive growth if lots of texts 
with unknown words are to be classified. The inconsistency is caused by 
using a "vocabCount" that is not reset after each classification. 
Indeed, featureDictionary.size() is used for "vocabCount", which 
increases every time new unknown words are discovered.
{quote}

See: http://www.lucidimagination.com/search/document/7dabe3efec8d136d/issues_with_memory_use_and_inconsistent_or_state_influenced_results_when_using_cbayesalgorit#8853165db260bf75

Alternately per Robin:

{quote}
We can remove the addition features to the
dictionary altogether. Will yield better performance, and lock down the
model. Will require a bit more modification
{quote}


  was:
Came across this digging through the mailing list archives for something else, probably worth tracking as an issue.

{quote}
During classification, every word still unknown is added to 
featureDictionary. This leads to the excessive growth if lots of texts 
with unknown words are to be classified. The inconsistency is caused by 
using a "vocabCount" that is not reset after each classification. 
Indeed, featureDictionary.size() is used for "vocabCount", which 
increases every time new unknown words are discovered.
{quote}

See: http://www.lucidimagination.com/search/document/7dabe3efec8d136d/issues_with_memory_use_and_inconsistent_or_state_influenced_results_when_using_cbayesalgorit#8853165db260bf75

Alternately per Robin:

{quote}
We can remove the addition features to the
dictionary altogether. Will yield better performance, and lock down the
model. Will require a bit more modification
{quote]



> Issues with memory use and inconsistent or state-influenced results when using CBayesAlgorithm
> ----------------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-487
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-487
>             Project: Mahout
>          Issue Type: Bug
>          Components: Classification
>    Affects Versions: 0.3
>            Reporter: Drew Farris
>            Priority: Minor
>
> Came across this digging through the mailing list archives for something else, probably worth tracking as an issue.
> {quote}
> During classification, every word still unknown is added to 
> featureDictionary. This leads to the excessive growth if lots of texts 
> with unknown words are to be classified. The inconsistency is caused by 
> using a "vocabCount" that is not reset after each classification. 
> Indeed, featureDictionary.size() is used for "vocabCount", which 
> increases every time new unknown words are discovered.
> {quote}
> See: http://www.lucidimagination.com/search/document/7dabe3efec8d136d/issues_with_memory_use_and_inconsistent_or_state_influenced_results_when_using_cbayesalgorit#8853165db260bf75
> Alternately per Robin:
> {quote}
> We can remove the addition features to the
> dictionary altogether. Will yield better performance, and lock down the
> model. Will require a bit more modification
> {quote}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.