You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by deneche abdelhakim <ad...@gmail.com> on 2010/11/01 18:10:00 UTC

Re: confidence value for bagged decision trees

Hey Andrey,

I committed my changes to the trunk, this should -hopefully- fix your
infinite recursion problems. Please let me know if it worked for you.

On Thu, Oct 28, 2010 at 5:38 AM, Andrey Gusev <an...@gmail.com> wrote:
> Thanks Deneche!
>
> On Tue, Oct 26, 2010 at 9:12 PM, deneche abdelhakim <ad...@gmail.com>
> wrote:
>>
>> Hi Andrey,
>> Yes, this would be great. Actually, it was on my todo list for some
>> time now. And now that you have requested it, it should become top
>> priority for me. I'm just waiting for the release of Mahout-0.4 and
>> the end of the code freeze and I will add both the "maxDepth" to
>> DefaultTreeBuilder and start working on this one.
>> I also found what was causing the infinite recursion on your dataset,
>> a patch is available here:
>> https://issues.apache.org/jira/browse/MAHOUT-526
>> I should commit it as soon as the code freeze ends.
>>
>> Thanks for your feedback,
>> Deneche
>>
>> On Tue, Oct 26, 2010 at 10:00 PM, Andrey Gusev <an...@gmail.com>
>> wrote:
>> > Hey Deneche,
>> > I wanted to also let you know about another feature that may be useful
>> > for
>> > bagged decision trees. It would be nice to have an option of getting
>> > confidence value (probability) along with prediction. This could help
>> > for
>> > cases where precision needs to be increased with possible lower recall
>> > values.
>> > For example, I modified the code to include confidence as the ratio of
>> > trees
>> > that have predicted particular label - i.e. get counts for each label
>> > from
>> > all the trees and set return confidence as the ratio of predictions for
>> > the
>> > label with most prediction divided by total number of bagged trees. What
>> > do
>> > you think?
>> > Andrey
>
>

Re: confidence value for bagged decision trees

Posted by andreyg <an...@gmail.com>.
Hey Deneche - that works on my dataset. Thanks! 

I am also still using the limited depth -  this also helps limit the size of
the model, but overall this fix does address infinite recursion that I was
observing.

Andrey


On Mon, Nov 1, 2010 at 10:10 AM, deneche  wrote:

Hey Andrey,

I committed my changes to the trunk, this should -hopefully- fix your
infinite recursion problems. Please let me know if it worked for you.
-- 
View this message in context: http://lucene.472066.n3.nabble.com/Re-confidence-value-for-bagged-decision-trees-tp1778185p1831202.html
Sent from the Mahout Developer List mailing list archive at Nabble.com.

Re: confidence value for bagged decision trees

Posted by Andrey Gusev <an...@gmail.com>.
Hey Deneche, so here is yet another nice to have for DT :) It would be
helpful to have a toDot method on the node method that creates a dot file
which can then be visualized with graphviz. I generated some of those and
while these graphs can be very large sometimes they are also helpful.

Andrey

On Wed, Nov 3, 2010 at 8:28 AM, deneche abdelhakim <ad...@gmail.com>wrote:

> I will have to investigate this further, just to make sure I didn't
> introduce any new bug. I will keep you informed. I shall add the depth
> limit as soon as possible.
>
> Deneche
>
> On Wed, Nov 3, 2010 at 2:22 AM, Andrey Gusev <an...@gmail.com>
> wrote:
> > Just a note, I am observing a slight change in the accuracy numbers but
> > there are very small and probably just a result of slight changes at the
> > long branches of the trees. So overall, I think the fix works. Thanks
> again!
> > Andrey
> >
> > On Tue, Nov 2, 2010 at 1:49 PM, Andrey Gusev <an...@gmail.com>
> wrote:
> >>
> >> Hey Deneche - that works on my dataset. Thanks!
> >> I am also still using the limited depth -  this also helps limit the
> size
> >> of the model, but overall this fix does address infinite recursion that
> I
> >> was observing.
> >> Andrey
> >>
> >> On Mon, Nov 1, 2010 at 10:10 AM, deneche abdelhakim <adeneche@gmail.com
> >
> >> wrote:
> >>>
> >>> Hey Andrey,
> >>>
> >>> I committed my changes to the trunk, this should -hopefully- fix your
> >>> infinite recursion problems. Please let me know if it worked for you.
> >>>
> >>> On Thu, Oct 28, 2010 at 5:38 AM, Andrey Gusev <an...@gmail.com>
> >>> wrote:
> >>> > Thanks Deneche!
> >>> >
> >>> > On Tue, Oct 26, 2010 at 9:12 PM, deneche abdelhakim
> >>> > <ad...@gmail.com>
> >>> > wrote:
> >>> >>
> >>> >> Hi Andrey,
> >>> >> Yes, this would be great. Actually, it was on my todo list for some
> >>> >> time now. And now that you have requested it, it should become top
> >>> >> priority for me. I'm just waiting for the release of Mahout-0.4 and
> >>> >> the end of the code freeze and I will add both the "maxDepth" to
> >>> >> DefaultTreeBuilder and start working on this one.
> >>> >> I also found what was causing the infinite recursion on your
> dataset,
> >>> >> a patch is available here:
> >>> >> https://issues.apache.org/jira/browse/MAHOUT-526
> >>> >> I should commit it as soon as the code freeze ends.
> >>> >>
> >>> >> Thanks for your feedback,
> >>> >> Deneche
> >>> >>
> >>> >> On Tue, Oct 26, 2010 at 10:00 PM, Andrey Gusev
> >>> >> <an...@gmail.com>
> >>> >> wrote:
> >>> >> > Hey Deneche,
> >>> >> > I wanted to also let you know about another feature that may be
> >>> >> > useful
> >>> >> > for
> >>> >> > bagged decision trees. It would be nice to have an option of
> getting
> >>> >> > confidence value (probability) along with prediction. This could
> >>> >> > help
> >>> >> > for
> >>> >> > cases where precision needs to be increased with possible lower
> >>> >> > recall
> >>> >> > values.
> >>> >> > For example, I modified the code to include confidence as the
> ratio
> >>> >> > of
> >>> >> > trees
> >>> >> > that have predicted particular label - i.e. get counts for each
> >>> >> > label
> >>> >> > from
> >>> >> > all the trees and set return confidence as the ratio of
> predictions
> >>> >> > for
> >>> >> > the
> >>> >> > label with most prediction divided by total number of bagged
> trees.
> >>> >> > What
> >>> >> > do
> >>> >> > you think?
> >>> >> > Andrey
> >>> >
> >>> >
> >>
> >
> >
>

Re: confidence value for bagged decision trees

Posted by deneche abdelhakim <ad...@gmail.com>.
I will have to investigate this further, just to make sure I didn't
introduce any new bug. I will keep you informed. I shall add the depth
limit as soon as possible.

Deneche

On Wed, Nov 3, 2010 at 2:22 AM, Andrey Gusev <an...@gmail.com> wrote:
> Just a note, I am observing a slight change in the accuracy numbers but
> there are very small and probably just a result of slight changes at the
> long branches of the trees. So overall, I think the fix works. Thanks again!
> Andrey
>
> On Tue, Nov 2, 2010 at 1:49 PM, Andrey Gusev <an...@gmail.com> wrote:
>>
>> Hey Deneche - that works on my dataset. Thanks!
>> I am also still using the limited depth -  this also helps limit the size
>> of the model, but overall this fix does address infinite recursion that I
>> was observing.
>> Andrey
>>
>> On Mon, Nov 1, 2010 at 10:10 AM, deneche abdelhakim <ad...@gmail.com>
>> wrote:
>>>
>>> Hey Andrey,
>>>
>>> I committed my changes to the trunk, this should -hopefully- fix your
>>> infinite recursion problems. Please let me know if it worked for you.
>>>
>>> On Thu, Oct 28, 2010 at 5:38 AM, Andrey Gusev <an...@gmail.com>
>>> wrote:
>>> > Thanks Deneche!
>>> >
>>> > On Tue, Oct 26, 2010 at 9:12 PM, deneche abdelhakim
>>> > <ad...@gmail.com>
>>> > wrote:
>>> >>
>>> >> Hi Andrey,
>>> >> Yes, this would be great. Actually, it was on my todo list for some
>>> >> time now. And now that you have requested it, it should become top
>>> >> priority for me. I'm just waiting for the release of Mahout-0.4 and
>>> >> the end of the code freeze and I will add both the "maxDepth" to
>>> >> DefaultTreeBuilder and start working on this one.
>>> >> I also found what was causing the infinite recursion on your dataset,
>>> >> a patch is available here:
>>> >> https://issues.apache.org/jira/browse/MAHOUT-526
>>> >> I should commit it as soon as the code freeze ends.
>>> >>
>>> >> Thanks for your feedback,
>>> >> Deneche
>>> >>
>>> >> On Tue, Oct 26, 2010 at 10:00 PM, Andrey Gusev
>>> >> <an...@gmail.com>
>>> >> wrote:
>>> >> > Hey Deneche,
>>> >> > I wanted to also let you know about another feature that may be
>>> >> > useful
>>> >> > for
>>> >> > bagged decision trees. It would be nice to have an option of getting
>>> >> > confidence value (probability) along with prediction. This could
>>> >> > help
>>> >> > for
>>> >> > cases where precision needs to be increased with possible lower
>>> >> > recall
>>> >> > values.
>>> >> > For example, I modified the code to include confidence as the ratio
>>> >> > of
>>> >> > trees
>>> >> > that have predicted particular label - i.e. get counts for each
>>> >> > label
>>> >> > from
>>> >> > all the trees and set return confidence as the ratio of predictions
>>> >> > for
>>> >> > the
>>> >> > label with most prediction divided by total number of bagged trees.
>>> >> > What
>>> >> > do
>>> >> > you think?
>>> >> > Andrey
>>> >
>>> >
>>
>
>