You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@giraph.apache.org by Sergey Edunov <ed...@gmail.com> on 2014/08/12 02:54:00 UTC

Re: Review Request 23989: Improve checkpointing

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/23989/
-----------------------------------------------------------

(Updated Aug. 12, 2014, 12:53 a.m.)


Review request for giraph.


Changes
-------

Addressing CR issues


Repository: giraph-git


Description
-------

We need to address some issues with checkpointing:
1) worker2worker messages are not saved
2) BspServiceWorker does not compile under hadoop_0.23 profile
3) it would be nice to be able to manually checkpoint and stop any job at any point of time.

Changes:

1) worker2worker messages fixed my serializing currentworkertoworker messages (it is a list of writable so I had to write class information as well)
2) Compilation issues fixed
3) The way you can trigger checkpointing now is by creating /_checkpointAndStop node in zookeeper (same way as _haltComputation works) After that the behavior of the job will be determined by registered GiraphJobRetryChecker. By default, job will get checkpointed at the end of current superstep and halted. You can override this behavior by making shouldRestartCheckpoint() return true, in this case job will be restarted immediately after getting checkpointed.


Diffs (updated)
-----

  giraph-core/src/main/java/org/apache/giraph/bsp/BspService.java 02577b9 
  giraph-core/src/main/java/org/apache/giraph/bsp/CentralizedService.java ff3e427 
  giraph-core/src/main/java/org/apache/giraph/bsp/CentralizedServiceMaster.java e5b7cf3 
  giraph-core/src/main/java/org/apache/giraph/bsp/CentralizedServiceWorker.java e5d0ae1 
  giraph-core/src/main/java/org/apache/giraph/bsp/CheckpointStatus.java PRE-CREATION 
  giraph-core/src/main/java/org/apache/giraph/bsp/SuperstepState.java c384fbf 
  giraph-core/src/main/java/org/apache/giraph/comm/ServerData.java a92cd1c 
  giraph-core/src/main/java/org/apache/giraph/conf/GiraphConstants.java 0424a47 
  giraph-core/src/main/java/org/apache/giraph/graph/FinishedSuperstepStats.java c351778 
  giraph-core/src/main/java/org/apache/giraph/graph/GlobalStats.java bc56c9c 
  giraph-core/src/main/java/org/apache/giraph/graph/GraphTaskManager.java 6ebb002 
  giraph-core/src/main/java/org/apache/giraph/job/DefaultGiraphJobRetryChecker.java 0cab86c 
  giraph-core/src/main/java/org/apache/giraph/job/GiraphJob.java 4a1f02e 
  giraph-core/src/main/java/org/apache/giraph/job/GiraphJobRetryChecker.java 53a800e 
  giraph-core/src/main/java/org/apache/giraph/job/HadoopUtils.java 9530fd6 
  giraph-core/src/main/java/org/apache/giraph/master/BspServiceMaster.java e129390 
  giraph-core/src/main/java/org/apache/giraph/master/MasterThread.java 0635210 
  giraph-core/src/main/java/org/apache/giraph/utils/CheckpointingUtils.java PRE-CREATION 
  giraph-core/src/main/java/org/apache/giraph/utils/WritableUtils.java 763f59d 
  giraph-core/src/main/java/org/apache/giraph/worker/BspServiceWorker.java d2d24ee 
  giraph-core/src/test/java/org/apache/giraph/utils/TestWritableUtils.java PRE-CREATION 
  giraph-examples/src/test/java/org/apache/giraph/TestCheckpointing.java 2939af7 
  pom.xml ed2a98c 

Diff: https://reviews.apache.org/r/23989/diff/


Testing
-------

Run pagerank, will keep testing with different jobs.


Thanks,

Sergey Edunov


Re: Review Request 23989: Improve checkpointing

Posted by Maja Kabiljo <ma...@fb.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/23989/#review50672
-----------------------------------------------------------


Thanks, looks good now! Did you try running a job with GiraphJobRetryChecker which returns true for shouldRestartCheckpoint?


giraph-core/src/main/java/org/apache/giraph/utils/WritableUtils.java
<https://reviews.apache.org/r/23989/#comment88536>

    Wny not write ordinal?


- Maja Kabiljo


On Aug. 12, 2014, 12:53 a.m., Sergey Edunov wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/23989/
> -----------------------------------------------------------
> 
> (Updated Aug. 12, 2014, 12:53 a.m.)
> 
> 
> Review request for giraph.
> 
> 
> Repository: giraph-git
> 
> 
> Description
> -------
> 
> We need to address some issues with checkpointing:
> 1) worker2worker messages are not saved
> 2) BspServiceWorker does not compile under hadoop_0.23 profile
> 3) it would be nice to be able to manually checkpoint and stop any job at any point of time.
> 
> Changes:
> 
> 1) worker2worker messages fixed my serializing currentworkertoworker messages (it is a list of writable so I had to write class information as well)
> 2) Compilation issues fixed
> 3) The way you can trigger checkpointing now is by creating /_checkpointAndStop node in zookeeper (same way as _haltComputation works) After that the behavior of the job will be determined by registered GiraphJobRetryChecker. By default, job will get checkpointed at the end of current superstep and halted. You can override this behavior by making shouldRestartCheckpoint() return true, in this case job will be restarted immediately after getting checkpointed.
> 
> 
> Diffs
> -----
> 
>   giraph-core/src/main/java/org/apache/giraph/bsp/BspService.java 02577b9 
>   giraph-core/src/main/java/org/apache/giraph/bsp/CentralizedService.java ff3e427 
>   giraph-core/src/main/java/org/apache/giraph/bsp/CentralizedServiceMaster.java e5b7cf3 
>   giraph-core/src/main/java/org/apache/giraph/bsp/CentralizedServiceWorker.java e5d0ae1 
>   giraph-core/src/main/java/org/apache/giraph/bsp/CheckpointStatus.java PRE-CREATION 
>   giraph-core/src/main/java/org/apache/giraph/bsp/SuperstepState.java c384fbf 
>   giraph-core/src/main/java/org/apache/giraph/comm/ServerData.java a92cd1c 
>   giraph-core/src/main/java/org/apache/giraph/conf/GiraphConstants.java 0424a47 
>   giraph-core/src/main/java/org/apache/giraph/graph/FinishedSuperstepStats.java c351778 
>   giraph-core/src/main/java/org/apache/giraph/graph/GlobalStats.java bc56c9c 
>   giraph-core/src/main/java/org/apache/giraph/graph/GraphTaskManager.java 6ebb002 
>   giraph-core/src/main/java/org/apache/giraph/job/DefaultGiraphJobRetryChecker.java 0cab86c 
>   giraph-core/src/main/java/org/apache/giraph/job/GiraphJob.java 4a1f02e 
>   giraph-core/src/main/java/org/apache/giraph/job/GiraphJobRetryChecker.java 53a800e 
>   giraph-core/src/main/java/org/apache/giraph/job/HadoopUtils.java 9530fd6 
>   giraph-core/src/main/java/org/apache/giraph/master/BspServiceMaster.java e129390 
>   giraph-core/src/main/java/org/apache/giraph/master/MasterThread.java 0635210 
>   giraph-core/src/main/java/org/apache/giraph/utils/CheckpointingUtils.java PRE-CREATION 
>   giraph-core/src/main/java/org/apache/giraph/utils/WritableUtils.java 763f59d 
>   giraph-core/src/main/java/org/apache/giraph/worker/BspServiceWorker.java d2d24ee 
>   giraph-core/src/test/java/org/apache/giraph/utils/TestWritableUtils.java PRE-CREATION 
>   giraph-examples/src/test/java/org/apache/giraph/TestCheckpointing.java 2939af7 
>   pom.xml ed2a98c 
> 
> Diff: https://reviews.apache.org/r/23989/diff/
> 
> 
> Testing
> -------
> 
> Run pagerank, will keep testing with different jobs.
> 
> 
> Thanks,
> 
> Sergey Edunov
> 
>


Re: Review Request 23989: Improve checkpointing

Posted by Maja Kabiljo <ma...@fb.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/23989/#review50747
-----------------------------------------------------------

Ship it!


Ship It!

- Maja Kabiljo


On Aug. 15, 2014, 5:37 p.m., Sergey Edunov wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/23989/
> -----------------------------------------------------------
> 
> (Updated Aug. 15, 2014, 5:37 p.m.)
> 
> 
> Review request for giraph.
> 
> 
> Repository: giraph-git
> 
> 
> Description
> -------
> 
> We need to address some issues with checkpointing:
> 1) worker2worker messages are not saved
> 2) BspServiceWorker does not compile under hadoop_0.23 profile
> 3) it would be nice to be able to manually checkpoint and stop any job at any point of time.
> 
> Changes:
> 
> 1) worker2worker messages fixed my serializing currentworkertoworker messages (it is a list of writable so I had to write class information as well)
> 2) Compilation issues fixed
> 3) The way you can trigger checkpointing now is by creating /_checkpointAndStop node in zookeeper (same way as _haltComputation works) After that the behavior of the job will be determined by registered GiraphJobRetryChecker. By default, job will get checkpointed at the end of current superstep and halted. You can override this behavior by making shouldRestartCheckpoint() return true, in this case job will be restarted immediately after getting checkpointed.
> 
> 
> Diffs
> -----
> 
>   giraph-core/src/main/java/org/apache/giraph/bsp/BspService.java 02577b9 
>   giraph-core/src/main/java/org/apache/giraph/bsp/CentralizedService.java ff3e427 
>   giraph-core/src/main/java/org/apache/giraph/bsp/CentralizedServiceMaster.java e5b7cf3 
>   giraph-core/src/main/java/org/apache/giraph/bsp/CentralizedServiceWorker.java e5d0ae1 
>   giraph-core/src/main/java/org/apache/giraph/bsp/CheckpointStatus.java PRE-CREATION 
>   giraph-core/src/main/java/org/apache/giraph/bsp/SuperstepState.java c384fbf 
>   giraph-core/src/main/java/org/apache/giraph/comm/ServerData.java a92cd1c 
>   giraph-core/src/main/java/org/apache/giraph/conf/GiraphConstants.java 0424a47 
>   giraph-core/src/main/java/org/apache/giraph/graph/FinishedSuperstepStats.java c351778 
>   giraph-core/src/main/java/org/apache/giraph/graph/GlobalStats.java bc56c9c 
>   giraph-core/src/main/java/org/apache/giraph/graph/GraphTaskManager.java 6ebb002 
>   giraph-core/src/main/java/org/apache/giraph/job/DefaultGiraphJobRetryChecker.java 0cab86c 
>   giraph-core/src/main/java/org/apache/giraph/job/GiraphJob.java 4a1f02e 
>   giraph-core/src/main/java/org/apache/giraph/job/GiraphJobRetryChecker.java 53a800e 
>   giraph-core/src/main/java/org/apache/giraph/job/HadoopUtils.java 9530fd6 
>   giraph-core/src/main/java/org/apache/giraph/master/BspServiceMaster.java e129390 
>   giraph-core/src/main/java/org/apache/giraph/master/MasterThread.java 0635210 
>   giraph-core/src/main/java/org/apache/giraph/utils/CheckpointingUtils.java PRE-CREATION 
>   giraph-core/src/main/java/org/apache/giraph/utils/WritableUtils.java 763f59d 
>   giraph-core/src/main/java/org/apache/giraph/worker/BspServiceWorker.java d2d24ee 
>   giraph-core/src/test/java/org/apache/giraph/utils/TestWritableUtils.java PRE-CREATION 
>   giraph-examples/src/test/java/org/apache/giraph/TestCheckpointing.java 2939af7 
>   pom.xml 672ec44 
> 
> Diff: https://reviews.apache.org/r/23989/diff/
> 
> 
> Testing
> -------
> 
> Run pagerank, will keep testing with different jobs.
> 
> 
> Thanks,
> 
> Sergey Edunov
> 
>


Re: Review Request 23989: Improve checkpointing

Posted by Sergey Edunov <ed...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/23989/
-----------------------------------------------------------

(Updated Aug. 15, 2014, 5:37 p.m.)


Review request for giraph.


Changes
-------

Fixing CR comments


Repository: giraph-git


Description (updated)
-------

We need to address some issues with checkpointing:
1) worker2worker messages are not saved
2) BspServiceWorker does not compile under hadoop_0.23 profile
3) it would be nice to be able to manually checkpoint and stop any job at any point of time.

Changes:

1) worker2worker messages fixed my serializing currentworkertoworker messages (it is a list of writable so I had to write class information as well)
2) Compilation issues fixed
3) The way you can trigger checkpointing now is by creating /_checkpointAndStop node in zookeeper (same way as _haltComputation works) After that the behavior of the job will be determined by registered GiraphJobRetryChecker. By default, job will get checkpointed at the end of current superstep and halted. You can override this behavior by making shouldRestartCheckpoint() return true, in this case job will be restarted immediately after getting checkpointed.


Diffs (updated)
-----

  giraph-core/src/main/java/org/apache/giraph/bsp/BspService.java 02577b9 
  giraph-core/src/main/java/org/apache/giraph/bsp/CentralizedService.java ff3e427 
  giraph-core/src/main/java/org/apache/giraph/bsp/CentralizedServiceMaster.java e5b7cf3 
  giraph-core/src/main/java/org/apache/giraph/bsp/CentralizedServiceWorker.java e5d0ae1 
  giraph-core/src/main/java/org/apache/giraph/bsp/CheckpointStatus.java PRE-CREATION 
  giraph-core/src/main/java/org/apache/giraph/bsp/SuperstepState.java c384fbf 
  giraph-core/src/main/java/org/apache/giraph/comm/ServerData.java a92cd1c 
  giraph-core/src/main/java/org/apache/giraph/conf/GiraphConstants.java 0424a47 
  giraph-core/src/main/java/org/apache/giraph/graph/FinishedSuperstepStats.java c351778 
  giraph-core/src/main/java/org/apache/giraph/graph/GlobalStats.java bc56c9c 
  giraph-core/src/main/java/org/apache/giraph/graph/GraphTaskManager.java 6ebb002 
  giraph-core/src/main/java/org/apache/giraph/job/DefaultGiraphJobRetryChecker.java 0cab86c 
  giraph-core/src/main/java/org/apache/giraph/job/GiraphJob.java 4a1f02e 
  giraph-core/src/main/java/org/apache/giraph/job/GiraphJobRetryChecker.java 53a800e 
  giraph-core/src/main/java/org/apache/giraph/job/HadoopUtils.java 9530fd6 
  giraph-core/src/main/java/org/apache/giraph/master/BspServiceMaster.java e129390 
  giraph-core/src/main/java/org/apache/giraph/master/MasterThread.java 0635210 
  giraph-core/src/main/java/org/apache/giraph/utils/CheckpointingUtils.java PRE-CREATION 
  giraph-core/src/main/java/org/apache/giraph/utils/WritableUtils.java 763f59d 
  giraph-core/src/main/java/org/apache/giraph/worker/BspServiceWorker.java d2d24ee 
  giraph-core/src/test/java/org/apache/giraph/utils/TestWritableUtils.java PRE-CREATION 
  giraph-examples/src/test/java/org/apache/giraph/TestCheckpointing.java 2939af7 
  pom.xml 672ec44 

Diff: https://reviews.apache.org/r/23989/diff/


Testing
-------

Run pagerank, will keep testing with different jobs.


Thanks,

Sergey Edunov