You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Xiaobo Gu <gu...@gmail.com> on 2011/07/06 12:11:03 UTC

What's the difference between classic decision tree and Mahout Decision forest algorithm?

Hi,

I have known the classic decision tree algorithm in traditional tools
such as SPSS, but not so familiar with Decision forest in Mahout, can
we treat them the same?

Regards,

Xiaobo Gu

Re: What's the difference between classic decision tree and Mahout Decision forest algorithm?

Posted by deneche abdelhakim <ad...@gmail.com>.
Well, I was trying to implement the rainforest algorithm, based on the
following paper:

"RainForest - A Framework for Fast Decision Tree Construction of Large
Datasets"

On Sun, Aug 14, 2011 at 11:28 AM, Xiaobo Gu <gu...@gmail.com> wrote:

> Can you share the idea, I'll try to understand, and would like to help
> writing some code.
>
> Regards,
>
> On Sun, Aug 14, 2011 at 6:23 PM, deneche abdelhakim <ad...@gmail.com>
> wrote:
> > Ted gave a very good summary of the situation. I do have plans to get rid
> of
> > the memory limitation and already started working on a solution, but
> > unfortunately I am lacking the necessary time and motivation to get it
> done
> > :(
> >
> > On Sun, Aug 14, 2011 at 11:12 AM, Xiaobo Gu <gu...@gmail.com>
> wrote:
> >
> >> Do you have any plan to get rid of the memory limitation in Random
> Forest?
> >>
> >> Regards,
> >>
> >> Xiaobo Gu
> >>
> >> On Thu, Jul 7, 2011 at 11:48 PM, Ted Dunning <te...@gmail.com>
> >> wrote:
> >> > The summary of the reason is that this was a summer project and
> >> > parallelizing the random forest algorithm at all was a big enough
> >> project.
> >> >
> >> > Writing a single pass on-line algorithm was considered a bit much for
> the
> >> > project size.  Figuring out how to make multiple passes through an
> input
> >> > split was similarly out of scope.
> >> >
> >> > If you have a good alternative, this would be of substantial interest
> >> > because it could improve the currently limited scalability of the
> >> decision
> >> > forest code.
> >> >
> >> > On Thu, Jul 7, 2011 at 8:20 AM, Xiaobo Gu <gu...@gmail.com>
> >> wrote:
> >> >
> >> >> Why can't a tree be built against a dataset resides on the disk as
> >> >> long as we can read it ?
> >> >>
> >> >
> >>
> >
>

Re: What's the difference between classic decision tree and Mahout Decision forest algorithm?

Posted by Xiaobo Gu <gu...@gmail.com>.
Can you share the idea, I'll try to understand, and would like to help
writing some code.

Regards,

On Sun, Aug 14, 2011 at 6:23 PM, deneche abdelhakim <ad...@gmail.com> wrote:
> Ted gave a very good summary of the situation. I do have plans to get rid of
> the memory limitation and already started working on a solution, but
> unfortunately I am lacking the necessary time and motivation to get it done
> :(
>
> On Sun, Aug 14, 2011 at 11:12 AM, Xiaobo Gu <gu...@gmail.com> wrote:
>
>> Do you have any plan to get rid of the memory limitation in Random Forest?
>>
>> Regards,
>>
>> Xiaobo Gu
>>
>> On Thu, Jul 7, 2011 at 11:48 PM, Ted Dunning <te...@gmail.com>
>> wrote:
>> > The summary of the reason is that this was a summer project and
>> > parallelizing the random forest algorithm at all was a big enough
>> project.
>> >
>> > Writing a single pass on-line algorithm was considered a bit much for the
>> > project size.  Figuring out how to make multiple passes through an input
>> > split was similarly out of scope.
>> >
>> > If you have a good alternative, this would be of substantial interest
>> > because it could improve the currently limited scalability of the
>> decision
>> > forest code.
>> >
>> > On Thu, Jul 7, 2011 at 8:20 AM, Xiaobo Gu <gu...@gmail.com>
>> wrote:
>> >
>> >> Why can't a tree be built against a dataset resides on the disk as
>> >> long as we can read it ?
>> >>
>> >
>>
>

Re: What's the difference between classic decision tree and Mahout Decision forest algorithm?

Posted by deneche abdelhakim <ad...@gmail.com>.
Ted gave a very good summary of the situation. I do have plans to get rid of
the memory limitation and already started working on a solution, but
unfortunately I am lacking the necessary time and motivation to get it done
:(

On Sun, Aug 14, 2011 at 11:12 AM, Xiaobo Gu <gu...@gmail.com> wrote:

> Do you have any plan to get rid of the memory limitation in Random Forest?
>
> Regards,
>
> Xiaobo Gu
>
> On Thu, Jul 7, 2011 at 11:48 PM, Ted Dunning <te...@gmail.com>
> wrote:
> > The summary of the reason is that this was a summer project and
> > parallelizing the random forest algorithm at all was a big enough
> project.
> >
> > Writing a single pass on-line algorithm was considered a bit much for the
> > project size.  Figuring out how to make multiple passes through an input
> > split was similarly out of scope.
> >
> > If you have a good alternative, this would be of substantial interest
> > because it could improve the currently limited scalability of the
> decision
> > forest code.
> >
> > On Thu, Jul 7, 2011 at 8:20 AM, Xiaobo Gu <gu...@gmail.com>
> wrote:
> >
> >> Why can't a tree be built against a dataset resides on the disk as
> >> long as we can read it ?
> >>
> >
>

Re: What's the difference between classic decision tree and Mahout Decision forest algorithm?

Posted by Xiaobo Gu <gu...@gmail.com>.
Do you have any plan to get rid of the memory limitation in Random Forest?

Regards,

Xiaobo Gu

On Thu, Jul 7, 2011 at 11:48 PM, Ted Dunning <te...@gmail.com> wrote:
> The summary of the reason is that this was a summer project and
> parallelizing the random forest algorithm at all was a big enough project.
>
> Writing a single pass on-line algorithm was considered a bit much for the
> project size.  Figuring out how to make multiple passes through an input
> split was similarly out of scope.
>
> If you have a good alternative, this would be of substantial interest
> because it could improve the currently limited scalability of the decision
> forest code.
>
> On Thu, Jul 7, 2011 at 8:20 AM, Xiaobo Gu <gu...@gmail.com> wrote:
>
>> Why can't a tree be built against a dataset resides on the disk as
>> long as we can read it ?
>>
>

Re: What's the difference between classic decision tree and Mahout Decision forest algorithm?

Posted by Ted Dunning <te...@gmail.com>.
The summary of the reason is that this was a summer project and
parallelizing the random forest algorithm at all was a big enough project.

Writing a single pass on-line algorithm was considered a bit much for the
project size.  Figuring out how to make multiple passes through an input
split was similarly out of scope.

If you have a good alternative, this would be of substantial interest
because it could improve the currently limited scalability of the decision
forest code.

On Thu, Jul 7, 2011 at 8:20 AM, Xiaobo Gu <gu...@gmail.com> wrote:

> Why can't a tree be built against a dataset resides on the disk as
> long as we can read it ?
>

Re: What's the difference between classic decision tree and Mahout Decision forest algorithm?

Posted by Ted Dunning <te...@gmail.com>.
The summary of the reason is that this was a summer project and
parallelizing the random forest algorithm at all was a big enough project.

Writing a single pass on-line algorithm was considered a bit much for the
project size.  Figuring out how to make multiple passes through an input
split was similarly out of scope.

If you have a good alternative, this would be of substantial interest
because it could improve the currently limited scalability of the decision
forest code.

On Thu, Jul 7, 2011 at 8:20 AM, Xiaobo Gu <gu...@gmail.com> wrote:

> Why can't a tree be built against a dataset resides on the disk as
> long as we can read it ?
>

Re: What's the difference between classic decision tree and Mahout Decision forest algorithm?

Posted by Xiaobo Gu <gu...@gmail.com>.
I have just had a glance of the source for decision forest, is a
decision tree must be built against a dataset which must be loaded in
memory?

Why can't a tree be built against a dataset resides on the disk as
long as we can read it ?

Regards,

Xiaobo Gu


On Wed, Jul 6, 2011 at 11:26 PM, Ted Dunning <te...@gmail.com> wrote:
> We really only have random forests.  Tree methods are somewhat difficult to
> parallelize and with large, sparse data, their advantages are not as
> pronounced as with small data sets.
>
> On Wed, Jul 6, 2011 at 3:28 AM, Xiaobo Gu <gu...@gmail.com> wrote:
>
>> There is also a "Random Forests ", got more confused, can someone
>> explain them to me please.
>>
>> Regards,
>>
>> Xiaobo Gu
>>
>> On Wed, Jul 6, 2011 at 6:21 PM, Xiaobo Gu <gu...@gmail.com> wrote:
>> > And what's the progress of "Partial Implementation" of Decision forest
>> > now, is it still in progress?
>> >
>> >
>> > On Wed, Jul 6, 2011 at 6:11 PM, Xiaobo Gu <gu...@gmail.com>
>> wrote:
>> >> Hi,
>> >>
>> >> I have known the classic decision tree algorithm in traditional tools
>> >> such as SPSS, but not so familiar with Decision forest in Mahout, can
>> >> we treat them the same?
>> >>
>> >> Regards,
>> >>
>> >> Xiaobo Gu
>> >>
>> >
>>
>

Re: What's the difference between classic decision tree and Mahout Decision forest algorithm?

Posted by Xiaobo Gu <gu...@gmail.com>.
I have just had a glance of the source for decision forest, is a
decision tree must be built against a dataset which must be loaded in
memory?

Why can't a tree be built against a dataset resides on the disk as
long as we can read it ?

Regards,

Xiaobo Gu


On Wed, Jul 6, 2011 at 11:26 PM, Ted Dunning <te...@gmail.com> wrote:
> We really only have random forests.  Tree methods are somewhat difficult to
> parallelize and with large, sparse data, their advantages are not as
> pronounced as with small data sets.
>
> On Wed, Jul 6, 2011 at 3:28 AM, Xiaobo Gu <gu...@gmail.com> wrote:
>
>> There is also a "Random Forests ", got more confused, can someone
>> explain them to me please.
>>
>> Regards,
>>
>> Xiaobo Gu
>>
>> On Wed, Jul 6, 2011 at 6:21 PM, Xiaobo Gu <gu...@gmail.com> wrote:
>> > And what's the progress of "Partial Implementation" of Decision forest
>> > now, is it still in progress?
>> >
>> >
>> > On Wed, Jul 6, 2011 at 6:11 PM, Xiaobo Gu <gu...@gmail.com>
>> wrote:
>> >> Hi,
>> >>
>> >> I have known the classic decision tree algorithm in traditional tools
>> >> such as SPSS, but not so familiar with Decision forest in Mahout, can
>> >> we treat them the same?
>> >>
>> >> Regards,
>> >>
>> >> Xiaobo Gu
>> >>
>> >
>>
>

Re: What's the difference between classic decision tree and Mahout Decision forest algorithm?

Posted by Ted Dunning <te...@gmail.com>.
We really only have random forests.  Tree methods are somewhat difficult to
parallelize and with large, sparse data, their advantages are not as
pronounced as with small data sets.

On Wed, Jul 6, 2011 at 3:28 AM, Xiaobo Gu <gu...@gmail.com> wrote:

> There is also a "Random Forests ", got more confused, can someone
> explain them to me please.
>
> Regards,
>
> Xiaobo Gu
>
> On Wed, Jul 6, 2011 at 6:21 PM, Xiaobo Gu <gu...@gmail.com> wrote:
> > And what's the progress of "Partial Implementation" of Decision forest
> > now, is it still in progress?
> >
> >
> > On Wed, Jul 6, 2011 at 6:11 PM, Xiaobo Gu <gu...@gmail.com>
> wrote:
> >> Hi,
> >>
> >> I have known the classic decision tree algorithm in traditional tools
> >> such as SPSS, but not so familiar with Decision forest in Mahout, can
> >> we treat them the same?
> >>
> >> Regards,
> >>
> >> Xiaobo Gu
> >>
> >
>

Re: What's the difference between classic decision tree and Mahout Decision forest algorithm?

Posted by Ted Dunning <te...@gmail.com>.
We really only have random forests.  Tree methods are somewhat difficult to
parallelize and with large, sparse data, their advantages are not as
pronounced as with small data sets.

On Wed, Jul 6, 2011 at 3:28 AM, Xiaobo Gu <gu...@gmail.com> wrote:

> There is also a "Random Forests ", got more confused, can someone
> explain them to me please.
>
> Regards,
>
> Xiaobo Gu
>
> On Wed, Jul 6, 2011 at 6:21 PM, Xiaobo Gu <gu...@gmail.com> wrote:
> > And what's the progress of "Partial Implementation" of Decision forest
> > now, is it still in progress?
> >
> >
> > On Wed, Jul 6, 2011 at 6:11 PM, Xiaobo Gu <gu...@gmail.com>
> wrote:
> >> Hi,
> >>
> >> I have known the classic decision tree algorithm in traditional tools
> >> such as SPSS, but not so familiar with Decision forest in Mahout, can
> >> we treat them the same?
> >>
> >> Regards,
> >>
> >> Xiaobo Gu
> >>
> >
>

Re: What's the difference between classic decision tree and Mahout Decision forest algorithm?

Posted by Xiaobo Gu <gu...@gmail.com>.
There is also a "Random Forests ", got more confused, can someone
explain them to me please.

Regards,

Xiaobo Gu

On Wed, Jul 6, 2011 at 6:21 PM, Xiaobo Gu <gu...@gmail.com> wrote:
> And what's the progress of "Partial Implementation" of Decision forest
> now, is it still in progress?
>
>
> On Wed, Jul 6, 2011 at 6:11 PM, Xiaobo Gu <gu...@gmail.com> wrote:
>> Hi,
>>
>> I have known the classic decision tree algorithm in traditional tools
>> such as SPSS, but not so familiar with Decision forest in Mahout, can
>> we treat them the same?
>>
>> Regards,
>>
>> Xiaobo Gu
>>
>

Re: What's the difference between classic decision tree and Mahout Decision forest algorithm?

Posted by Xiaobo Gu <gu...@gmail.com>.
There is also a "Random Forests ", got more confused, can someone
explain them to me please.

Regards,

Xiaobo Gu

On Wed, Jul 6, 2011 at 6:21 PM, Xiaobo Gu <gu...@gmail.com> wrote:
> And what's the progress of "Partial Implementation" of Decision forest
> now, is it still in progress?
>
>
> On Wed, Jul 6, 2011 at 6:11 PM, Xiaobo Gu <gu...@gmail.com> wrote:
>> Hi,
>>
>> I have known the classic decision tree algorithm in traditional tools
>> such as SPSS, but not so familiar with Decision forest in Mahout, can
>> we treat them the same?
>>
>> Regards,
>>
>> Xiaobo Gu
>>
>

Re: What's the difference between classic decision tree and Mahout Decision forest algorithm?

Posted by Xiaobo Gu <gu...@gmail.com>.
And what's the progress of "Partial Implementation" of Decision forest
now, is it still in progress?


On Wed, Jul 6, 2011 at 6:11 PM, Xiaobo Gu <gu...@gmail.com> wrote:
> Hi,
>
> I have known the classic decision tree algorithm in traditional tools
> such as SPSS, but not so familiar with Decision forest in Mahout, can
> we treat them the same?
>
> Regards,
>
> Xiaobo Gu
>

Re: What's the difference between classic decision tree and Mahout Decision forest algorithm?

Posted by Xiaobo Gu <gu...@gmail.com>.
And what's the progress of "Partial Implementation" of Decision forest
now, is it still in progress?


On Wed, Jul 6, 2011 at 6:11 PM, Xiaobo Gu <gu...@gmail.com> wrote:
> Hi,
>
> I have known the classic decision tree algorithm in traditional tools
> such as SPSS, but not so familiar with Decision forest in Mahout, can
> we treat them the same?
>
> Regards,
>
> Xiaobo Gu
>