You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Teng Jiang (JIRA)" <ji...@apache.org> on 2017/05/02 07:19:04 UTC

[jira] [Commented] (SPARK-20443) The blockSize of MLLIB ALS should be setting by the User

    [ https://issues.apache.org/jira/browse/SPARK-20443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15992471#comment-15992471 ] 

Teng Jiang commented on SPARK-20443:
------------------------------------

I did some tests on the blockSize. 
The test environment is:
3 workers: each work 40 core, each worker 180G memory, each worker 1 executor.
The Data: user 3,290,000, and item 208,000
The results are:
blockSize  rank=10	 rank = 100
128		 67.32min 	127.66min 
256		 46.68min 	87.67min 
512		 35.66min 	63.46min
1024	 28.49min 	41.61min
2048	 22.83min	34.76min
4096	 22.39min 	54.43min
8192	 23.35min 	71.09min

Another dataset with 480,000 users and 17,000 items. The rank was set to 10.
blockSize 128     256     512     1024   2048   4096   8192
time (s)    98.2    70.4    52.7     45.3   45.0    60.5     67.3

For both datasets, with the blockSize grows from 128 to 8192, the recommend time first decreases and then increases.
Therefore, for different datasets, the optimal blockSize is different. 


> The blockSize of MLLIB ALS should be setting  by the User
> ---------------------------------------------------------
>
>                 Key: SPARK-20443
>                 URL: https://issues.apache.org/jira/browse/SPARK-20443
>             Project: Spark
>          Issue Type: Improvement
>          Components: ML, MLlib
>    Affects Versions: 2.3.0
>            Reporter: Peng Meng
>            Priority: Minor
>
> The blockSize of MLLIB ALS is very important for ALS performance. 
> In our test, when the blockSize is 128, the performance is about 4X comparing with the blockSize is 4096 (default value).
> The following are our test results: 
> BlockSize(recommendationForAll time)
> 128(124s), 256(160s), 512(184s), 1024(244s), 2048(332s), 4096(488s), 8192(OOM)
> The Test Environment:
> 3 workers: each work 10 core, each work 30G memory, each work 1 executor.
> The Data: User 480,000, and Item 17,000



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org