You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Teng Jiang (JIRA)" <ji...@apache.org> on 2017/05/02 07:19:04 UTC
[jira] [Commented] (SPARK-20443) The blockSize of MLLIB ALS should
be setting by the User
[ https://issues.apache.org/jira/browse/SPARK-20443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15992471#comment-15992471 ]
Teng Jiang commented on SPARK-20443:
------------------------------------
I did some tests on the blockSize.
The test environment is:
3 workers: each work 40 core, each worker 180G memory, each worker 1 executor.
The Data: user 3,290,000, and item 208,000
The results are:
blockSize rank=10 rank = 100
128 67.32min 127.66min
256 46.68min 87.67min
512 35.66min 63.46min
1024 28.49min 41.61min
2048 22.83min 34.76min
4096 22.39min 54.43min
8192 23.35min 71.09min
Another dataset with 480,000 users and 17,000 items. The rank was set to 10.
blockSize 128 256 512 1024 2048 4096 8192
time (s) 98.2 70.4 52.7 45.3 45.0 60.5 67.3
For both datasets, with the blockSize grows from 128 to 8192, the recommend time first decreases and then increases.
Therefore, for different datasets, the optimal blockSize is different.
> The blockSize of MLLIB ALS should be setting by the User
> ---------------------------------------------------------
>
> Key: SPARK-20443
> URL: https://issues.apache.org/jira/browse/SPARK-20443
> Project: Spark
> Issue Type: Improvement
> Components: ML, MLlib
> Affects Versions: 2.3.0
> Reporter: Peng Meng
> Priority: Minor
>
> The blockSize of MLLIB ALS is very important for ALS performance.
> In our test, when the blockSize is 128, the performance is about 4X comparing with the blockSize is 4096 (default value).
> The following are our test results:
> BlockSize(recommendationForAll time)
> 128(124s), 256(160s), 512(184s), 1024(244s), 2048(332s), 4096(488s), 8192(OOM)
> The Test Environment:
> 3 workers: each work 10 core, each work 30G memory, each work 1 executor.
> The Data: User 480,000, and Item 17,000
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org