You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hama.apache.org by ChiaHung Lin <ch...@nuk.edu.tw> on 2011/09/19 05:34:29 UTC

[Discussion] Refactor bsp() for recovery procedure

Currently we have bsp() where users can code for performing thier tasks. For instance, 

... bsp() ...{
   ... // some computation
   sync();
   ... // some other computation
   sync();
   ...   
}

However, this is difficult for recovery because 1st, it requires checkpointed messages to be recovered so that the computation can be resumed from where it fails; 2nd, the recovery procedure needs to know from which super step to restart. With the current bsp(), it seems a common choice is preprocessing; but this may not be good because when internally something goes wrong it, it is not easy to find out the problem. 

I come up with an alternative method but this would have change to the way of our current procedure. So I think it would be good to discuss it first. It is proposed as below:

1. we divide bsp() into smaller computation unit called e.g. step() or superstep(), within which user still write their own logic. 

2. in main, user composes the order of supersteps. 

... class Superstep1 extends BSPSuperstep {
   ... superstep() ... {...}
}
... class Superstep2 extends BSPSuperstep {
   ... superstep() ... {...}
}

BSPJob bsp = new BSP(...);
bsp.compose(Superstep1.class).compose(Superstep2.class)...;

Therefore, when recovery, in BSPTask run() we can have 

List<BSPSuperstep> steps = BSPJob.supersteps();

for(BSPSuperstep step: steps) {
   if(checkpointed) { 
     // restore checkpointed messages e.g. adding checkpointed msg (in hdfs) back to queues
   }
   step.superstep(...);
   step.sync();
}

The advantage is easier for recovery procedure.
The disadvantage may be the client programme need to explicitly tell the order of superstep.  

Any thought?

--
ChiaHung Lin
Department of Information Management
National University of Kaohsiung
Taiwan

Re: [Discussion] Refactor bsp() for recovery procedure

Posted by ChiaHung Lin <ch...@nuk.edu.tw>.
I will try to provide a patch so that we can have a baseline for discussion. 

-----Original message-----
From:Edward J. Yoon <ed...@apache.org>
To:hama-dev@incubator.apache.org,chl501@nuk.edu.tw
Date:Tue, 20 Sep 2011 15:49:20 +0900
Subject:Re: [Discussion] Refactor bsp() for recovery procedure

> The disadvantage may be the client programme need to explicitly tell the order of superstep.

If user want to call a sync() method repeatedly in the loops while or
until a condition is true, how to program it?

bsp() {

  while (condition is true) {
    doLocalComputation();
    communicationWith(others);
    sync();
  }

}

I think, current BSP programming interface is very good. If it's just
only for recovery, we have to find another way.

2011/9/19 ChiaHung Lin <ch...@nuk.edu.tw>:
> Currently we have bsp() where users can code for performing thier tasks. For instance,
>
> ... bsp() ...{
>   ... // some computation
>   sync();
>   ... // some other computation
>   sync();
>   ...
> }
>
> However, this is difficult for recovery because 1st, it requires checkpointed messages to be recovered so that the computation can be resumed from where it fails; 2nd, the recovery procedure needs to know from which super step to restart. With the current bsp(), it seems a common choice is preprocessing; but this may not be good because when internally something goes wrong it, it is not easy to find out the problem.
>
> I come up with an alternative method but this would have change to the way of our current procedure. So I think it would be good to discuss it first. It is proposed as below:
>
> 1. we divide bsp() into smaller computation unit called e.g. step() or superstep(), within which user still write their own logic.
>
> 2. in main, user composes the order of supersteps.
>
> ... class Superstep1 extends BSPSuperstep {
>   ... superstep() ... {...}
> }
> ... class Superstep2 extends BSPSuperstep {
>   ... superstep() ... {...}
> }
>
> BSPJob bsp = new BSP(...);
> bsp.compose(Superstep1.class).compose(Superstep2.class)...;
>
> Therefore, when recovery, in BSPTask run() we can have
>
> List<BSPSuperstep> steps = BSPJob.supersteps();
>
> for(BSPSuperstep step: steps) {
>   if(checkpointed) {
>     // restore checkpointed messages e.g. adding checkpointed msg (in hdfs) back to queues
>   }
>   step.superstep(...);
>   step.sync();
> }
>
> The advantage is easier for recovery procedure.
> The disadvantage may be the client programme need to explicitly tell the order of superstep.
>
> Any thought?
>
> --
> ChiaHung Lin
> Department of Information Management
> National University of Kaohsiung
> Taiwan
>



-- 
Best Regards, Edward J. Yoon
@eddieyoon


--
ChiaHung Lin
Department of Information Management
National University of Kaohsiung
Taiwan

Re: [Discussion] Refactor bsp() for recovery procedure

Posted by "Edward J. Yoon" <ed...@apache.org>.
> The disadvantage may be the client programme need to explicitly tell the order of superstep.

If user want to call a sync() method repeatedly in the loops while or
until a condition is true, how to program it?

bsp() {

  while (condition is true) {
    doLocalComputation();
    communicationWith(others);
    sync();
  }

}

I think, current BSP programming interface is very good. If it's just
only for recovery, we have to find another way.

2011/9/19 ChiaHung Lin <ch...@nuk.edu.tw>:
> Currently we have bsp() where users can code for performing thier tasks. For instance,
>
> ... bsp() ...{
>   ... // some computation
>   sync();
>   ... // some other computation
>   sync();
>   ...
> }
>
> However, this is difficult for recovery because 1st, it requires checkpointed messages to be recovered so that the computation can be resumed from where it fails; 2nd, the recovery procedure needs to know from which super step to restart. With the current bsp(), it seems a common choice is preprocessing; but this may not be good because when internally something goes wrong it, it is not easy to find out the problem.
>
> I come up with an alternative method but this would have change to the way of our current procedure. So I think it would be good to discuss it first. It is proposed as below:
>
> 1. we divide bsp() into smaller computation unit called e.g. step() or superstep(), within which user still write their own logic.
>
> 2. in main, user composes the order of supersteps.
>
> ... class Superstep1 extends BSPSuperstep {
>   ... superstep() ... {...}
> }
> ... class Superstep2 extends BSPSuperstep {
>   ... superstep() ... {...}
> }
>
> BSPJob bsp = new BSP(...);
> bsp.compose(Superstep1.class).compose(Superstep2.class)...;
>
> Therefore, when recovery, in BSPTask run() we can have
>
> List<BSPSuperstep> steps = BSPJob.supersteps();
>
> for(BSPSuperstep step: steps) {
>   if(checkpointed) {
>     // restore checkpointed messages e.g. adding checkpointed msg (in hdfs) back to queues
>   }
>   step.superstep(...);
>   step.sync();
> }
>
> The advantage is easier for recovery procedure.
> The disadvantage may be the client programme need to explicitly tell the order of superstep.
>
> Any thought?
>
> --
> ChiaHung Lin
> Department of Information Management
> National University of Kaohsiung
> Taiwan
>



-- 
Best Regards, Edward J. Yoon
@eddieyoon

Re: [Discussion] Refactor bsp() for recovery procedure

Posted by Thomas Jungblut <th...@googlemail.com>.
Hi ChiaHung,

I would not split this into several classes like SuperStep1 or SuperStep2
and the chaining sounds a bit strange to me.
But, what I think your idea is cool, the BSPSuperstep class is starting
after a sync phase and is ending with it (easier for the user, because the
workflow is simpler).

Here is my proposal:

BSPSuperstep step;
> int rollbackSuperStep = -1;
> if((rollbackSuperStep = conf.getInt(bsp.rollback.superstep) ) > -1)[
>    step = BSPSuperstep.getSuperStep(rollbackSuperStep);
> }
> while(!notHalted){
>    sync();
>    step = new BSPSuperstep(CURRENT_NUMBER_OF_SUPERSTEP);
>    step.compute(List<Message> list);
>    save(step);
>    notHalted = checkHalted();
> }
>

I know that diverges alot from your idea. Maybe you have to put the sync
into the tail of the loop.
But what do you think on that?

2011/9/19 ChiaHung Lin <ch...@nuk.edu.tw>
>
> Currently we have bsp() where users can code for performing thier tasks.
For instance,
>
> ... bsp() ...{
>   ... // some computation
>   sync();
>   ... // some other computation
>   sync();
>   ...
> }
>
> However, this is difficult for recovery because 1st, it requires
checkpointed messages to be recovered so that the computation can be resumed
from where it fails; 2nd, the recovery procedure needs to know from which
super step to restart. With the current bsp(), it seems a common choice is
preprocessing; but this may not be good because when internally something
goes wrong it, it is not easy to find out the problem.
>
> I come up with an alternative method but this would have change to the way
of our current procedure. So I think it would be good to discuss it first.
It is proposed as below:
>
> 1. we divide bsp() into smaller computation unit called e.g. step() or
superstep(), within which user still write their own logic.
>
> 2. in main, user composes the order of supersteps.
>
> ... class Superstep1 extends BSPSuperstep {
>   ... superstep() ... {...}
> }
> ... class Superstep2 extends BSPSuperstep {
>   ... superstep() ... {...}
> }
>
> BSPJob bsp = new BSP(...);
> bsp.compose(Superstep1.class).compose(Superstep2.class)...;
>
> Therefore, when recovery, in BSPTask run() we can have
>
> List<BSPSuperstep> steps = BSPJob.supersteps();
>
> for(BSPSuperstep step: steps) {
>   if(checkpointed) {
>     // restore checkpointed messages e.g. adding checkpointed msg (in
hdfs) back to queues
>   }
>   step.superstep(...);
>   step.sync();
> }
>
> The advantage is easier for recovery procedure.
> The disadvantage may be the client programme need to explicitly tell the
order of superstep.
>
> Any thought?
>
> --
> ChiaHung Lin
> Department of Information Management
> National University of Kaohsiung
> Taiwan



--
Thomas Jungblut
Berlin

mobile: 0170-3081070

business: thomas.jungblut@testberichte.de
private: thomas.jungblut@gmail.com