You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by "张建鑫 (市场部)" <zh...@didichuxing.com> on 2016/10/17 11:18:29 UTC

Did anybody come across this random-forest issue with spark 2.0.1.

Did anybody encounter this problem before and why it happens , how to solve it?  The same training data and same source code work in 1.6.1, however become lousy in 2.0.1

[cid:BD0EFC31-F4CE-421F-BC94-79EF3BE09D60]

Re: Did anybody come across this random-forest issue with spark 2.0.1.

Posted by "张建鑫 (市场部)" <zh...@didichuxing.com>.
Hi YanBo
Thank you very much.
You are totally correct!

I just looked up spark document of 2.0.1.  It says that "Maximum memory in MB allocated to histogram aggregation. If too small, then 1 node will be split per iteration, and its aggregates may exceed this size. (default = 256 MB)”

Although this setting isn't altered in spark 2.0,  it didn’t  occur with my ml source code in spark 1.6.1.   It seems that implementation of random forest algorithm  in spark 2.0 occupied more memory and altered the threshold to trigger this warning in spite of no change of the default value to maxMemoryInMB



发件人: Yanbo Liang <yb...@gmail.com>>
日期: 2016年10月18日 星期二 上午11:55
至: zhangjianxin <zh...@didichuxing.com>>
抄送: Xi Shen <da...@gmail.com>>, "user@spark.apache.org<ma...@spark.apache.org>" <us...@spark.apache.org>>
主题: Re: Did anybody come across this random-forest issue with spark 2.0.1.

​Please increase the value of "maxMemoryInMB"​ of your RandomForestClassifier or RandomForestRegressor.
It's a warning which will not affect the result but may lead your training slower.

Thanks
Yanbo

On Mon, Oct 17, 2016 at 8:21 PM, 张建鑫(市场部) <zh...@didichuxing.com>> wrote:
Hi Xi Shen

The warning message wasn’t  removed after I had upgraded my java to V8,
but  anyway I appreciate your kind help.

Since it’s just a WARN, I suppose I can bear with it and nothing bad would really happen. Am I right?


6/10/18 11:12:42 WARN RandomForest: Tree learning is using approximately 268437864 bytes per iteration, which exceeds requested limit maxMemoryUsage=268435456. This allows splitting 80088 nodes in this iteration.
16/10/18 11:13:07 WARN RandomForest: Tree learning is using approximately 268436304 bytes per iteration, which exceeds requested limit maxMemoryUsage=268435456. This allows splitting 80132 nodes in this iteration.
16/10/18 11:13:32 WARN RandomForest: Tree learning is using approximately 268437816 bytes per iteration, which exceeds requested limit maxMemoryUsage=268435456. This allows splitting 80082 nodes in this iteration.



发件人: zhangjianxin <zh...@didichuxing.com>>
日期: 2016年10月17日 星期一 下午8:16
至: Xi Shen <da...@gmail.com>>
抄送: "user@spark.apache.org<ma...@spark.apache.org>" <us...@spark.apache.org>>
主题: Re: Did anybody come across this random-forest issue with spark 2.0.1.

Hi Xi Shen

Not yet.  For the moment my idk for spark is still V7. Thanks for your reminding, I will try it out by upgrading java.

发件人: Xi Shen <da...@gmail.com>>
日期: 2016年10月17日 星期一 下午8:00
至: zhangjianxin <zh...@didichuxing.com>>, "user@spark.apache.org<ma...@spark.apache.org>" <us...@spark.apache.org>>
主题: Re: Did anybody come across this random-forest issue with spark 2.0.1.

Did you also upgrade to Java from v7 to v8?

On Mon, Oct 17, 2016 at 7:19 PM 张建鑫(市场部) <zh...@didichuxing.com>> wrote:

Did anybody encounter this problem before and why it happens , how to solve it?  The same training data and same source code work in 1.6.1, however become lousy in 2.0.1

[X]
--

Thanks,
David S.


Re: Did anybody come across this random-forest issue with spark 2.0.1.

Posted by Yanbo Liang <yb...@gmail.com>.
​Please increase the value of "maxMemoryInMB"​ of your
RandomForestClassifier or RandomForestRegressor.
It's a warning which will not affect the result but may lead your training
slower.

Thanks
Yanbo

On Mon, Oct 17, 2016 at 8:21 PM, 张建鑫(市场部) <zh...@didichuxing.com>
wrote:

> Hi Xi Shen
>
> The warning message wasn’t  removed after I had upgraded my java to V8,
> but  anyway I appreciate your kind help.
>
> Since it’s just a WARN, I suppose I can bear with it and nothing bad would
> really happen. Am I right?
>
>
> 6/10/18 11:12:42 WARN RandomForest: Tree learning is using approximately
> 268437864 bytes per iteration, which exceeds requested limit
> maxMemoryUsage=268435456. This allows splitting 80088 nodes in this
> iteration.
> 16/10/18 11:13:07 WARN RandomForest: Tree learning is using approximately
> 268436304 bytes per iteration, which exceeds requested limit
> maxMemoryUsage=268435456. This allows splitting 80132 nodes in this
> iteration.
> 16/10/18 11:13:32 WARN RandomForest: Tree learning is using approximately
> 268437816 bytes per iteration, which exceeds requested limit
> maxMemoryUsage=268435456. This allows splitting 80082 nodes in this
> iteration.
>
>
>
> 发件人: zhangjianxin <zh...@didichuxing.com>
> 日期: 2016年10月17日 星期一 下午8:16
> 至: Xi Shen <da...@gmail.com>
> 抄送: "user@spark.apache.org" <us...@spark.apache.org>
> 主题: Re: Did anybody come across this random-forest issue with spark 2.0.1.
>
> Hi Xi Shen
>
> Not yet.  For the moment my idk for spark is still V7. Thanks for your
> reminding, I will try it out by upgrading java.
>
> 发件人: Xi Shen <da...@gmail.com>
> 日期: 2016年10月17日 星期一 下午8:00
> 至: zhangjianxin <zh...@didichuxing.com>, "user@spark.apache.org" <
> user@spark.apache.org>
> 主题: Re: Did anybody come across this random-forest issue with spark 2.0.1.
>
> Did you also upgrade to Java from v7 to v8?
>
> On Mon, Oct 17, 2016 at 7:19 PM 张建鑫(市场部) <zh...@didichuxing.com>
> wrote:
>
>>
>> Did anybody encounter this problem before and why it happens , how to
>> solve it?  The same training data and same source code work in 1.6.1,
>> however become lousy in 2.0.1
>>
>> --
>
>
> Thanks,
> David S.
>

Re: Did anybody come across this random-forest issue with spark 2.0.1.

Posted by "张建鑫 (市场部)" <zh...@didichuxing.com>.
Hi Xi Shen

The warning message wasn’t  removed after I had upgraded my java to V8,
but  anyway I appreciate your kind help.

Since it’s just a WARN, I suppose I can bear with it and nothing bad would really happen. Am I right?


6/10/18 11:12:42 WARN RandomForest: Tree learning is using approximately 268437864 bytes per iteration, which exceeds requested limit maxMemoryUsage=268435456. This allows splitting 80088 nodes in this iteration.
16/10/18 11:13:07 WARN RandomForest: Tree learning is using approximately 268436304 bytes per iteration, which exceeds requested limit maxMemoryUsage=268435456. This allows splitting 80132 nodes in this iteration.
16/10/18 11:13:32 WARN RandomForest: Tree learning is using approximately 268437816 bytes per iteration, which exceeds requested limit maxMemoryUsage=268435456. This allows splitting 80082 nodes in this iteration.



发件人: zhangjianxin <zh...@didichuxing.com>>
日期: 2016年10月17日 星期一 下午8:16
至: Xi Shen <da...@gmail.com>>
抄送: "user@spark.apache.org<ma...@spark.apache.org>" <us...@spark.apache.org>>
主题: Re: Did anybody come across this random-forest issue with spark 2.0.1.

Hi Xi Shen

Not yet.  For the moment my idk for spark is still V7. Thanks for your reminding, I will try it out by upgrading java.

发件人: Xi Shen <da...@gmail.com>>
日期: 2016年10月17日 星期一 下午8:00
至: zhangjianxin <zh...@didichuxing.com>>, "user@spark.apache.org<ma...@spark.apache.org>" <us...@spark.apache.org>>
主题: Re: Did anybody come across this random-forest issue with spark 2.0.1.

Did you also upgrade to Java from v7 to v8?

On Mon, Oct 17, 2016 at 7:19 PM 张建鑫(市场部) <zh...@didichuxing.com>> wrote:

Did anybody encounter this problem before and why it happens , how to solve it?  The same training data and same source code work in 1.6.1, however become lousy in 2.0.1

[cid:BD0EFC31-F4CE-421F-BC94-79EF3BE09D60]
--

Thanks,
David S.

Re: Did anybody come across this random-forest issue with spark 2.0.1.

Posted by "张建鑫 (市场部)" <zh...@didichuxing.com>.
Hi Xi Shen

Not yet.  For the moment my idk for spark is still V7. Thanks for your reminding, I will try it out by upgrading java.

发件人: Xi Shen <da...@gmail.com>>
日期: 2016年10月17日 星期一 下午8:00
至: zhangjianxin <zh...@didichuxing.com>>, "user@spark.apache.org<ma...@spark.apache.org>" <us...@spark.apache.org>>
主题: Re: Did anybody come across this random-forest issue with spark 2.0.1.

Did you also upgrade to Java from v7 to v8?

On Mon, Oct 17, 2016 at 7:19 PM 张建鑫(市场部) <zh...@didichuxing.com>> wrote:

Did anybody encounter this problem before and why it happens , how to solve it?  The same training data and same source code work in 1.6.1, however become lousy in 2.0.1

[cid:BD0EFC31-F4CE-421F-BC94-79EF3BE09D60]
--

Thanks,
David S.

Re: Did anybody come across this random-forest issue with spark 2.0.1.

Posted by Xi Shen <da...@gmail.com>.
Did you also upgrade to Java from v7 to v8?

On Mon, Oct 17, 2016 at 7:19 PM 张建鑫(市场部) <zh...@didichuxing.com>
wrote:

>
> Did anybody encounter this problem before and why it happens , how to
> solve it?  The same training data and same source code work in 1.6.1,
> however become lousy in 2.0.1
>
> --


Thanks,
David S.