You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Sean Owen (JIRA)" <ji...@apache.org> on 2011/05/26 09:39:47 UTC

[jira] [Commented] (MAHOUT-713) Random Forest Prototypes

    [ https://issues.apache.org/jira/browse/MAHOUT-713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13039562#comment-13039562 ] 

Sean Owen commented on MAHOUT-713:
----------------------------------

(Is this an issue report?)

> Random Forest Prototypes
> ------------------------
>
>                 Key: MAHOUT-713
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-713
>             Project: Mahout
>          Issue Type: New Feature
>          Components: Classification
>            Reporter: Oleg Levchenko
>            Priority: Minor
>
> Below is an explanation by Breinman (http://www.stat.berkeley.edu/~breiman/RandomForests/cc_home.htm#prototype):
> Prototypes are a way of getting a picture of how the variables relate to the classification. 
> For the jth class, we find the case that has the largest number of class j cases among its k nearest neighbors, determined using the proximities. Among these k cases we find the median, 25th percentile, and 75th percentile for each variable. 
> The medians are the prototype for class j and the quartiles give an estimate of is stability. 
> For the second prototype, we repeat the procedure but only consider cases that are not among the original k, and so on. 
> Prototypes for continuous variables are standardized by subtractng the 5th percentile and dividing by the difference between the 95th and 5th percentiles. 
> For categorical variables, the prototype is the most frequent value.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira