You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by deneche abdelhakim <ad...@gmail.com> on 2010/10/14 17:34:09 UTC

Re: about Random Forests

Hey Andrey,

Thank you very much for the dataset, this will really help me fixing
this bug. I'm downloading it right now and I will see if I can
reproduce the problem.

As you said, the depth limit is a nice feature to have. I will create
to JIRA issues: one for the depth limit (should not be too difficult
to implement), the other for the infinite recursion bug.

Thanks again Andrey for your feedback and ideas

Deneche

On Wed, Oct 13, 2010 at 11:53 PM, Andrey Gusev <an...@gmail.com> wrote:
> Hey Deneche,
> Thanks for looking into the problem. I am glad I can help you guys fine-tune
> the implementation. Attached is one random-bag of data (sampled with
> replacement so there are duplicates there). Using m=2, this will cause the
> problem I described. It is csv, with the last value, an integer,
> representing the label. All features are numeric. When the
> infinite recursion happens we have 4 instances (2 copies of two unique
> instances) and loSubset is always empty while the hiSubset has all 4
> elements.
> Let me know if you can reproduce it. Another thought - having a limit might
> be a nice feature without trying to solve the infinite recursion problem (I
> agree that it's a bit of a hack if it used to solve the recursion problem).
> For example by setting relatively low limit, some form of over-fitting can
> be prevented. By default the builder may have no limit. What do you think?
> Andrey
>
> On Wed, Oct 13, 2010 at 7:21 AM, deneche abdelhakim <ad...@gmail.com>
> wrote:
>>
>> Hi Andrey,
>>
>> Ted relayed your problem with infinite recursion in Random Forests.
>> I'm the guy working on Random Forests and I wanted to thank you for
>> identifying this Bug and proposing a solution.
>> I never encountered this bug before. From your description, this
>> problem seems to be related to Numerical attributes. If you could send
>> me some dataset that makes the bug appear, I could make sure the fix I
>> add really solves the problem. And I have and idea that could probably
>> solve this problem without setting a limit on the depth of the trees,
>> but again I need some dataset to work with.
>>
>> Thanks again for using Mahout Airlines =P
>
>