You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@mahout.apache.org by "Sean Owen (JIRA)" <ji...@apache.org> on 2010/01/03 20:11:54 UTC

[jira] Updated: (MAHOUT-216) Improve the results of MAHOUT-145 by uniformly distributing the classes in the partitioned data

     [ https://issues.apache.org/jira/browse/MAHOUT-216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sean Owen updated MAHOUT-216:
-----------------------------

        Fix Version/s: 0.3
    Affects Version/s: 0.2

> Improve the results of MAHOUT-145 by uniformly distributing the classes in the partitioned data
> -----------------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-216
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-216
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Classification
>    Affects Versions: 0.2
>            Reporter: Deneche A. Hakim
>            Assignee: Deneche A. Hakim
>             Fix For: 0.3
>
>
> the poor results of the partial decision forest implementation may be explained by the particular distribution of the partitioned data. For example, if a partition does not contain any instance of a given class, the decision trees built using this partition won't be able to classify this class. 
> According to [CHAN, 95]:
> {quote}
> Random Selection of the partitioned data sets with a uniform distribution of classes is perhaps the most sensible solution. Here we may attempt to maintain the same frequency distribution over the ''class attribute" so that each partition represents a good but a smaller model of the entire training set
> {quote}
> [CHAN, 95]: Philip K. Chan, "On the Accuracy of Meta-learning for Scalable Data Mining" 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.