You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Justin Yip <yi...@gmail.com> on 2014/06/14 05:55:03 UTC

MLLib : Decision Tree with minimum points per node

Hello,

I have been playing around with mllib's decision tree library. It is
working great, thanks.

I have a question regarding overfitting. It appears to me that the current
implementation doesn't allows user to specify the minimum number of samples
per node. This results in some nodes only contain very few samples, which
potentially leads to overfitting.

I would like to know if there is workaround or any way to prevent
overfitting? Or will decision tree supports min-samples-per-node in future
releases?

Thanks.

Justin

Re: MLLib : Decision Tree with minimum points per node

Posted by Manish Amde <ma...@gmail.com>.
Hi Justin,

I have created a JIRA ticket to keep track of your request. Thanks.
https://issues.apache.org/jira/browse/SPARK-2207

-Manish


On Thu, Jun 19, 2014 at 2:35 PM, Manish Amde <ma...@gmail.com> wrote:

> Hi Justin,
>
> I am glad to know that trees are working well for you.
>
> The trees will support minimum samples per node in a future release.
> Thanks for the feedback.
>
> -Manish
>
>
> On Fri, Jun 13, 2014 at 8:55 PM, Justin Yip <yi...@gmail.com> wrote:
>
>> Hello,
>>
>> I have been playing around with mllib's decision tree library. It is
>> working great, thanks.
>>
>> I have a question regarding overfitting. It appears to me that the
>> current implementation doesn't allows user to specify the minimum number of
>> samples per node. This results in some nodes only contain very few samples,
>> which potentially leads to overfitting.
>>
>> I would like to know if there is workaround or any way to prevent
>> overfitting? Or will decision tree supports min-samples-per-node in future
>> releases?
>>
>> Thanks.
>>
>> Justin
>>
>>
>>
>

Re: MLLib : Decision Tree with minimum points per node

Posted by Manish Amde <ma...@gmail.com>.
Hi Justin,

I am glad to know that trees are working well for you.

The trees will support minimum samples per node in a future release. Thanks
for the feedback.

-Manish


On Fri, Jun 13, 2014 at 8:55 PM, Justin Yip <yi...@gmail.com> wrote:

> Hello,
>
> I have been playing around with mllib's decision tree library. It is
> working great, thanks.
>
> I have a question regarding overfitting. It appears to me that the current
> implementation doesn't allows user to specify the minimum number of samples
> per node. This results in some nodes only contain very few samples, which
> potentially leads to overfitting.
>
> I would like to know if there is workaround or any way to prevent
> overfitting? Or will decision tree supports min-samples-per-node in future
> releases?
>
> Thanks.
>
> Justin
>
>
>