You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by deneche abdelhakim <ad...@gmail.com> on 2011/10/09 06:49:08 UTC

RandomForest out of bag error

While reviewing Decision Forest code, I noticed that computing the "out of
bag" error (OOB) of the forest while training it made the implementation
really messy. I made a lot of assumptions about the way Hadoop works
internally (especially the way it splits the data), this proven many times
to be buggy because with each new version of Hadoop I hade to "tweak" the
code to make it run.

So I am asking the users and developers alike: is computing the OOB really
necessary ? if yes, I will spend the time to figure out a better way to
compute it, but if no I will just get rid of it for now and leave a JIRA
issue about getting it back again if someone actually need it.

Re: RandomForest out of bag error

Posted by Ted Dunning <te...@gmail.com>.
No.  I think cross validation is the better way to evaluate classifiers.
 The diagnostics of random forests are interesting, but not critical.

On Sat, Oct 8, 2011 at 9:49 PM, deneche abdelhakim <ad...@gmail.com>wrote:

> While reviewing Decision Forest code, I noticed that computing the "out of
> bag" error (OOB) of the forest while training it made the implementation
> really messy. I made a lot of assumptions about the way Hadoop works
> internally (especially the way it splits the data), this proven many times
> to be buggy because with each new version of Hadoop I hade to "tweak" the
> code to make it run.
>
> So I am asking the users and developers alike: is computing the OOB really
> necessary ? if yes, I will spend the time to figure out a better way to
> compute it, but if no I will just get rid of it for now and leave a JIRA
> issue about getting it back again if someone actually need it.
>

Re: RandomForest out of bag error

Posted by Ted Dunning <te...@gmail.com>.
No.  I think cross validation is the better way to evaluate classifiers.
 The diagnostics of random forests are interesting, but not critical.

On Sat, Oct 8, 2011 at 9:49 PM, deneche abdelhakim <ad...@gmail.com>wrote:

> While reviewing Decision Forest code, I noticed that computing the "out of
> bag" error (OOB) of the forest while training it made the implementation
> really messy. I made a lot of assumptions about the way Hadoop works
> internally (especially the way it splits the data), this proven many times
> to be buggy because with each new version of Hadoop I hade to "tweak" the
> code to make it run.
>
> So I am asking the users and developers alike: is computing the OOB really
> necessary ? if yes, I will spend the time to figure out a better way to
> compute it, but if no I will just get rid of it for now and leave a JIRA
> issue about getting it back again if someone actually need it.
>