You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by 张志齐 <go...@126.com> on 2014/04/11 15:32:26 UTC

Too many tasks in reduceByKey() when do PageRank iteration

Hi all,

I am now implementing a simple PageRank. Unlike the PageRank example in spark, I divided the matrix into blocks and the rank vector into slices.
Here is my code: https://github.com/gowithqi/PageRankOnSpark/blob/master/src/PageRank/PageRank.java


I supposed that the complexity of each iteration is the same. However, I found that during the first iteration the reduceByKey() (line 162) has 6 tasks and during the second iteration it has 18 tasks and third iteration 54 tasks, fourth iteration 162 tasks......


during the sixth iteration it has 1458 tasks which almost costs more than 2 hours to complete. 


I don't why this happened... I think every iteration costs the same time....


Thank you for your help.




--
张志齐
计算机科学与技术

上海交通大学