You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hama.apache.org by "Edward J. Yoon" <ed...@apache.org> on 2011/10/25 14:42:10 UTC

Performance on large cluster.

Hi,

I heard that the task scheduling will be most important factor for
high performance on large cluster since the barrier waits for the
slowest task. What do you think about this?

P.S., If user use YARN cluster, BSP task scheduling will be done by
their resource management system.

-- 
Best Regards, Edward J. Yoon
@eddieyoon

Re: Performance on large cluster.

Posted by Thomas Jungblut <th...@googlemail.com>.
Hey Edward,

from what you tell you are speaking about some kind of speculative task
execution.
This is also possible in YARN, if you have free resources.

Overall, the running time of a task is highly dependend on how much work it
has to do. So splitting the data even will result in good performance in
every task.
Also we can reduce the time we spend in synchronization by using
asynchronous messaging.
In this case we send messages while computation phase and just transferring
the missing rest which has not yet been transfered in the sync phase.
This should result in a performance boost, especially in messaging heavy
jobs like SSSP.

2011/10/25 Edward J. Yoon <ed...@apache.org>

> Hi,
>
> I heard that the task scheduling will be most important factor for
> high performance on large cluster since the barrier waits for the
> slowest task. What do you think about this?
>
> P.S., If user use YARN cluster, BSP task scheduling will be done by
> their resource management system.
>
> --
> Best Regards, Edward J. Yoon
> @eddieyoon
>



-- 
Thomas Jungblut
Berlin <th...@gmail.com>