You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by IKumasa Mukai <ik...@gmail.com> on 2012/02/07 12:22:44 UTC
Re: About "complementary" using mahout to build the random forest
Hi wang-san.
# I detach this topic and modified the subject.
> Do you know the meaning of option "complementary" using mahout to build the random forest?
Sorry, I cannot get your point clearly, but do you talk about the
Pruning and the Grafting?
Regards,
2012/2/7 Wang Yue (Commented) (JIRA) <ji...@apache.org>:
>
> [ https://issues.apache.org/jira/browse/MAHOUT-945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13201399#comment-13201399 ]
>
> Wang Yue commented on MAHOUT-945:
> ---------------------------------
>
> Hi, Mukai
> Thanks for your efforts. I feel it is ok for your modification. I have a
> question about the decision tree building.
> Do you know the meaning of option "complementary" using mahout to build
> the random forest?
>
> On Mon, Jan 23, 2012 at 6:55 AM, Ikumasa Mukai (Updated) (JIRA) <
>
>
>
> --
> Regards, Wang Yue
> PhD Starts From 08 Fall
> NUS Graduate School for Integrative Sciences and Engineering, NUS
> Email: wangyue@nus.edu.sg
> Homepage: https://sites.google.com/site/fayue1015/
> HP: +65 81022515
>
>
>> The variance calculation of Random forest regression tree
>> ---------------------------------------------------------
>>
>> Key: MAHOUT-945
>> URL: https://issues.apache.org/jira/browse/MAHOUT-945
>> Project: Mahout
>> Issue Type: Improvement
>> Components: Classification
>> Affects Versions: 0.6
>> Reporter: Wang Yue
>> Labels: Regressionsplit.java
>> Attachments: MAHOUT-945.patch, MAHOUT-945.patch
>>
>> Original Estimate: 48h
>> Remaining Estimate: 48h
>>
>> Hi, Mukai
>> Thanks for your efforts in expand the RF to regression. However, I have a doubt about your implementation regarding to Regressionsplit.java. The variance method
>> "
>> private static double variance(double[] s, double[] ss, double[] dataSize) {
>> double var = 0;
>> for (int i = 0; i < s.length; i++) {
>> if (dataSize[i] > 0) {
>> var += ss[i] - ((s[i] * s[i]) / dataSize[i]);
>> }
>> }
>> return var;
>> }
>> "
>> While the variance in my mind should be something like
>> var += ss[i]/dataSize[i] - ((s[i] * s[i]) / (dataSize[i]*dataSize[i]));
>> Please help correct me if I am wrong. Thanks
>
> --
> This message is automatically generated by JIRA.
> If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
> For more information on JIRA, see: http://www.atlassian.com/software/jira
>
>
--
- - - - - - -
IKumasa Mukai at Recruit Co.,Ltd.