You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Sean Owen (JIRA)" <ji...@apache.org> on 2011/05/26 09:39:47 UTC
[jira] [Commented] (MAHOUT-713) Random Forest Prototypes
[ https://issues.apache.org/jira/browse/MAHOUT-713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13039562#comment-13039562 ]
Sean Owen commented on MAHOUT-713:
----------------------------------
(Is this an issue report?)
> Random Forest Prototypes
> ------------------------
>
> Key: MAHOUT-713
> URL: https://issues.apache.org/jira/browse/MAHOUT-713
> Project: Mahout
> Issue Type: New Feature
> Components: Classification
> Reporter: Oleg Levchenko
> Priority: Minor
>
> Below is an explanation by Breinman (http://www.stat.berkeley.edu/~breiman/RandomForests/cc_home.htm#prototype):
> Prototypes are a way of getting a picture of how the variables relate to the classification.
> For the jth class, we find the case that has the largest number of class j cases among its k nearest neighbors, determined using the proximities. Among these k cases we find the median, 25th percentile, and 75th percentile for each variable.
> The medians are the prototype for class j and the quartiles give an estimate of is stability.
> For the second prototype, we repeat the procedure but only consider cases that are not among the original k, and so on.
> Prototypes for continuous variables are standardized by subtractng the 5th percentile and dividing by the difference between the 95th and 5th percentiles.
> For categorical variables, the prototype is the most frequent value.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira