You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by Xiaobo Gu <gu...@gmail.com> on 2011/08/14 12:12:58 UTC

Re: What's the difference between classic decision tree and Mahout Decision forest algorithm?

Do you have any plan to get rid of the memory limitation in Random Forest?

Regards,

Xiaobo Gu

On Thu, Jul 7, 2011 at 11:48 PM, Ted Dunning <te...@gmail.com> wrote:
> The summary of the reason is that this was a summer project and
> parallelizing the random forest algorithm at all was a big enough project.
>
> Writing a single pass on-line algorithm was considered a bit much for the
> project size.  Figuring out how to make multiple passes through an input
> split was similarly out of scope.
>
> If you have a good alternative, this would be of substantial interest
> because it could improve the currently limited scalability of the decision
> forest code.
>
> On Thu, Jul 7, 2011 at 8:20 AM, Xiaobo Gu <gu...@gmail.com> wrote:
>
>> Why can't a tree be built against a dataset resides on the disk as
>> long as we can read it ?
>>
>

Re: What's the difference between classic decision tree and Mahout Decision forest algorithm?

Posted by deneche abdelhakim <ad...@gmail.com>.
Well, I was trying to implement the rainforest algorithm, based on the
following paper:

"RainForest - A Framework for Fast Decision Tree Construction of Large
Datasets"

On Sun, Aug 14, 2011 at 11:28 AM, Xiaobo Gu <gu...@gmail.com> wrote:

> Can you share the idea, I'll try to understand, and would like to help
> writing some code.
>
> Regards,
>
> On Sun, Aug 14, 2011 at 6:23 PM, deneche abdelhakim <ad...@gmail.com>
> wrote:
> > Ted gave a very good summary of the situation. I do have plans to get rid
> of
> > the memory limitation and already started working on a solution, but
> > unfortunately I am lacking the necessary time and motivation to get it
> done
> > :(
> >
> > On Sun, Aug 14, 2011 at 11:12 AM, Xiaobo Gu <gu...@gmail.com>
> wrote:
> >
> >> Do you have any plan to get rid of the memory limitation in Random
> Forest?
> >>
> >> Regards,
> >>
> >> Xiaobo Gu
> >>
> >> On Thu, Jul 7, 2011 at 11:48 PM, Ted Dunning <te...@gmail.com>
> >> wrote:
> >> > The summary of the reason is that this was a summer project and
> >> > parallelizing the random forest algorithm at all was a big enough
> >> project.
> >> >
> >> > Writing a single pass on-line algorithm was considered a bit much for
> the
> >> > project size.  Figuring out how to make multiple passes through an
> input
> >> > split was similarly out of scope.
> >> >
> >> > If you have a good alternative, this would be of substantial interest
> >> > because it could improve the currently limited scalability of the
> >> decision
> >> > forest code.
> >> >
> >> > On Thu, Jul 7, 2011 at 8:20 AM, Xiaobo Gu <gu...@gmail.com>
> >> wrote:
> >> >
> >> >> Why can't a tree be built against a dataset resides on the disk as
> >> >> long as we can read it ?
> >> >>
> >> >
> >>
> >
>

Re: What's the difference between classic decision tree and Mahout Decision forest algorithm?

Posted by Xiaobo Gu <gu...@gmail.com>.
Can you share the idea, I'll try to understand, and would like to help
writing some code.

Regards,

On Sun, Aug 14, 2011 at 6:23 PM, deneche abdelhakim <ad...@gmail.com> wrote:
> Ted gave a very good summary of the situation. I do have plans to get rid of
> the memory limitation and already started working on a solution, but
> unfortunately I am lacking the necessary time and motivation to get it done
> :(
>
> On Sun, Aug 14, 2011 at 11:12 AM, Xiaobo Gu <gu...@gmail.com> wrote:
>
>> Do you have any plan to get rid of the memory limitation in Random Forest?
>>
>> Regards,
>>
>> Xiaobo Gu
>>
>> On Thu, Jul 7, 2011 at 11:48 PM, Ted Dunning <te...@gmail.com>
>> wrote:
>> > The summary of the reason is that this was a summer project and
>> > parallelizing the random forest algorithm at all was a big enough
>> project.
>> >
>> > Writing a single pass on-line algorithm was considered a bit much for the
>> > project size.  Figuring out how to make multiple passes through an input
>> > split was similarly out of scope.
>> >
>> > If you have a good alternative, this would be of substantial interest
>> > because it could improve the currently limited scalability of the
>> decision
>> > forest code.
>> >
>> > On Thu, Jul 7, 2011 at 8:20 AM, Xiaobo Gu <gu...@gmail.com>
>> wrote:
>> >
>> >> Why can't a tree be built against a dataset resides on the disk as
>> >> long as we can read it ?
>> >>
>> >
>>
>

Re: What's the difference between classic decision tree and Mahout Decision forest algorithm?

Posted by deneche abdelhakim <ad...@gmail.com>.
Ted gave a very good summary of the situation. I do have plans to get rid of
the memory limitation and already started working on a solution, but
unfortunately I am lacking the necessary time and motivation to get it done
:(

On Sun, Aug 14, 2011 at 11:12 AM, Xiaobo Gu <gu...@gmail.com> wrote:

> Do you have any plan to get rid of the memory limitation in Random Forest?
>
> Regards,
>
> Xiaobo Gu
>
> On Thu, Jul 7, 2011 at 11:48 PM, Ted Dunning <te...@gmail.com>
> wrote:
> > The summary of the reason is that this was a summer project and
> > parallelizing the random forest algorithm at all was a big enough
> project.
> >
> > Writing a single pass on-line algorithm was considered a bit much for the
> > project size.  Figuring out how to make multiple passes through an input
> > split was similarly out of scope.
> >
> > If you have a good alternative, this would be of substantial interest
> > because it could improve the currently limited scalability of the
> decision
> > forest code.
> >
> > On Thu, Jul 7, 2011 at 8:20 AM, Xiaobo Gu <gu...@gmail.com>
> wrote:
> >
> >> Why can't a tree be built against a dataset resides on the disk as
> >> long as we can read it ?
> >>
> >
>