You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@giraph.apache.org by Sergey Edunov <ed...@gmail.com> on 2014/08/12 02:54:00 UTC
Re: Review Request 23989: Improve checkpointing
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/23989/
-----------------------------------------------------------
(Updated Aug. 12, 2014, 12:53 a.m.)
Review request for giraph.
Changes
-------
Addressing CR issues
Repository: giraph-git
Description
-------
We need to address some issues with checkpointing:
1) worker2worker messages are not saved
2) BspServiceWorker does not compile under hadoop_0.23 profile
3) it would be nice to be able to manually checkpoint and stop any job at any point of time.
Changes:
1) worker2worker messages fixed my serializing currentworkertoworker messages (it is a list of writable so I had to write class information as well)
2) Compilation issues fixed
3) The way you can trigger checkpointing now is by creating /_checkpointAndStop node in zookeeper (same way as _haltComputation works) After that the behavior of the job will be determined by registered GiraphJobRetryChecker. By default, job will get checkpointed at the end of current superstep and halted. You can override this behavior by making shouldRestartCheckpoint() return true, in this case job will be restarted immediately after getting checkpointed.
Diffs (updated)
-----
giraph-core/src/main/java/org/apache/giraph/bsp/BspService.java 02577b9
giraph-core/src/main/java/org/apache/giraph/bsp/CentralizedService.java ff3e427
giraph-core/src/main/java/org/apache/giraph/bsp/CentralizedServiceMaster.java e5b7cf3
giraph-core/src/main/java/org/apache/giraph/bsp/CentralizedServiceWorker.java e5d0ae1
giraph-core/src/main/java/org/apache/giraph/bsp/CheckpointStatus.java PRE-CREATION
giraph-core/src/main/java/org/apache/giraph/bsp/SuperstepState.java c384fbf
giraph-core/src/main/java/org/apache/giraph/comm/ServerData.java a92cd1c
giraph-core/src/main/java/org/apache/giraph/conf/GiraphConstants.java 0424a47
giraph-core/src/main/java/org/apache/giraph/graph/FinishedSuperstepStats.java c351778
giraph-core/src/main/java/org/apache/giraph/graph/GlobalStats.java bc56c9c
giraph-core/src/main/java/org/apache/giraph/graph/GraphTaskManager.java 6ebb002
giraph-core/src/main/java/org/apache/giraph/job/DefaultGiraphJobRetryChecker.java 0cab86c
giraph-core/src/main/java/org/apache/giraph/job/GiraphJob.java 4a1f02e
giraph-core/src/main/java/org/apache/giraph/job/GiraphJobRetryChecker.java 53a800e
giraph-core/src/main/java/org/apache/giraph/job/HadoopUtils.java 9530fd6
giraph-core/src/main/java/org/apache/giraph/master/BspServiceMaster.java e129390
giraph-core/src/main/java/org/apache/giraph/master/MasterThread.java 0635210
giraph-core/src/main/java/org/apache/giraph/utils/CheckpointingUtils.java PRE-CREATION
giraph-core/src/main/java/org/apache/giraph/utils/WritableUtils.java 763f59d
giraph-core/src/main/java/org/apache/giraph/worker/BspServiceWorker.java d2d24ee
giraph-core/src/test/java/org/apache/giraph/utils/TestWritableUtils.java PRE-CREATION
giraph-examples/src/test/java/org/apache/giraph/TestCheckpointing.java 2939af7
pom.xml ed2a98c
Diff: https://reviews.apache.org/r/23989/diff/
Testing
-------
Run pagerank, will keep testing with different jobs.
Thanks,
Sergey Edunov
Re: Review Request 23989: Improve checkpointing
Posted by Maja Kabiljo <ma...@fb.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/23989/#review50672
-----------------------------------------------------------
Thanks, looks good now! Did you try running a job with GiraphJobRetryChecker which returns true for shouldRestartCheckpoint?
giraph-core/src/main/java/org/apache/giraph/utils/WritableUtils.java
<https://reviews.apache.org/r/23989/#comment88536>
Wny not write ordinal?
- Maja Kabiljo
On Aug. 12, 2014, 12:53 a.m., Sergey Edunov wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/23989/
> -----------------------------------------------------------
>
> (Updated Aug. 12, 2014, 12:53 a.m.)
>
>
> Review request for giraph.
>
>
> Repository: giraph-git
>
>
> Description
> -------
>
> We need to address some issues with checkpointing:
> 1) worker2worker messages are not saved
> 2) BspServiceWorker does not compile under hadoop_0.23 profile
> 3) it would be nice to be able to manually checkpoint and stop any job at any point of time.
>
> Changes:
>
> 1) worker2worker messages fixed my serializing currentworkertoworker messages (it is a list of writable so I had to write class information as well)
> 2) Compilation issues fixed
> 3) The way you can trigger checkpointing now is by creating /_checkpointAndStop node in zookeeper (same way as _haltComputation works) After that the behavior of the job will be determined by registered GiraphJobRetryChecker. By default, job will get checkpointed at the end of current superstep and halted. You can override this behavior by making shouldRestartCheckpoint() return true, in this case job will be restarted immediately after getting checkpointed.
>
>
> Diffs
> -----
>
> giraph-core/src/main/java/org/apache/giraph/bsp/BspService.java 02577b9
> giraph-core/src/main/java/org/apache/giraph/bsp/CentralizedService.java ff3e427
> giraph-core/src/main/java/org/apache/giraph/bsp/CentralizedServiceMaster.java e5b7cf3
> giraph-core/src/main/java/org/apache/giraph/bsp/CentralizedServiceWorker.java e5d0ae1
> giraph-core/src/main/java/org/apache/giraph/bsp/CheckpointStatus.java PRE-CREATION
> giraph-core/src/main/java/org/apache/giraph/bsp/SuperstepState.java c384fbf
> giraph-core/src/main/java/org/apache/giraph/comm/ServerData.java a92cd1c
> giraph-core/src/main/java/org/apache/giraph/conf/GiraphConstants.java 0424a47
> giraph-core/src/main/java/org/apache/giraph/graph/FinishedSuperstepStats.java c351778
> giraph-core/src/main/java/org/apache/giraph/graph/GlobalStats.java bc56c9c
> giraph-core/src/main/java/org/apache/giraph/graph/GraphTaskManager.java 6ebb002
> giraph-core/src/main/java/org/apache/giraph/job/DefaultGiraphJobRetryChecker.java 0cab86c
> giraph-core/src/main/java/org/apache/giraph/job/GiraphJob.java 4a1f02e
> giraph-core/src/main/java/org/apache/giraph/job/GiraphJobRetryChecker.java 53a800e
> giraph-core/src/main/java/org/apache/giraph/job/HadoopUtils.java 9530fd6
> giraph-core/src/main/java/org/apache/giraph/master/BspServiceMaster.java e129390
> giraph-core/src/main/java/org/apache/giraph/master/MasterThread.java 0635210
> giraph-core/src/main/java/org/apache/giraph/utils/CheckpointingUtils.java PRE-CREATION
> giraph-core/src/main/java/org/apache/giraph/utils/WritableUtils.java 763f59d
> giraph-core/src/main/java/org/apache/giraph/worker/BspServiceWorker.java d2d24ee
> giraph-core/src/test/java/org/apache/giraph/utils/TestWritableUtils.java PRE-CREATION
> giraph-examples/src/test/java/org/apache/giraph/TestCheckpointing.java 2939af7
> pom.xml ed2a98c
>
> Diff: https://reviews.apache.org/r/23989/diff/
>
>
> Testing
> -------
>
> Run pagerank, will keep testing with different jobs.
>
>
> Thanks,
>
> Sergey Edunov
>
>
Re: Review Request 23989: Improve checkpointing
Posted by Maja Kabiljo <ma...@fb.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/23989/#review50747
-----------------------------------------------------------
Ship it!
Ship It!
- Maja Kabiljo
On Aug. 15, 2014, 5:37 p.m., Sergey Edunov wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/23989/
> -----------------------------------------------------------
>
> (Updated Aug. 15, 2014, 5:37 p.m.)
>
>
> Review request for giraph.
>
>
> Repository: giraph-git
>
>
> Description
> -------
>
> We need to address some issues with checkpointing:
> 1) worker2worker messages are not saved
> 2) BspServiceWorker does not compile under hadoop_0.23 profile
> 3) it would be nice to be able to manually checkpoint and stop any job at any point of time.
>
> Changes:
>
> 1) worker2worker messages fixed my serializing currentworkertoworker messages (it is a list of writable so I had to write class information as well)
> 2) Compilation issues fixed
> 3) The way you can trigger checkpointing now is by creating /_checkpointAndStop node in zookeeper (same way as _haltComputation works) After that the behavior of the job will be determined by registered GiraphJobRetryChecker. By default, job will get checkpointed at the end of current superstep and halted. You can override this behavior by making shouldRestartCheckpoint() return true, in this case job will be restarted immediately after getting checkpointed.
>
>
> Diffs
> -----
>
> giraph-core/src/main/java/org/apache/giraph/bsp/BspService.java 02577b9
> giraph-core/src/main/java/org/apache/giraph/bsp/CentralizedService.java ff3e427
> giraph-core/src/main/java/org/apache/giraph/bsp/CentralizedServiceMaster.java e5b7cf3
> giraph-core/src/main/java/org/apache/giraph/bsp/CentralizedServiceWorker.java e5d0ae1
> giraph-core/src/main/java/org/apache/giraph/bsp/CheckpointStatus.java PRE-CREATION
> giraph-core/src/main/java/org/apache/giraph/bsp/SuperstepState.java c384fbf
> giraph-core/src/main/java/org/apache/giraph/comm/ServerData.java a92cd1c
> giraph-core/src/main/java/org/apache/giraph/conf/GiraphConstants.java 0424a47
> giraph-core/src/main/java/org/apache/giraph/graph/FinishedSuperstepStats.java c351778
> giraph-core/src/main/java/org/apache/giraph/graph/GlobalStats.java bc56c9c
> giraph-core/src/main/java/org/apache/giraph/graph/GraphTaskManager.java 6ebb002
> giraph-core/src/main/java/org/apache/giraph/job/DefaultGiraphJobRetryChecker.java 0cab86c
> giraph-core/src/main/java/org/apache/giraph/job/GiraphJob.java 4a1f02e
> giraph-core/src/main/java/org/apache/giraph/job/GiraphJobRetryChecker.java 53a800e
> giraph-core/src/main/java/org/apache/giraph/job/HadoopUtils.java 9530fd6
> giraph-core/src/main/java/org/apache/giraph/master/BspServiceMaster.java e129390
> giraph-core/src/main/java/org/apache/giraph/master/MasterThread.java 0635210
> giraph-core/src/main/java/org/apache/giraph/utils/CheckpointingUtils.java PRE-CREATION
> giraph-core/src/main/java/org/apache/giraph/utils/WritableUtils.java 763f59d
> giraph-core/src/main/java/org/apache/giraph/worker/BspServiceWorker.java d2d24ee
> giraph-core/src/test/java/org/apache/giraph/utils/TestWritableUtils.java PRE-CREATION
> giraph-examples/src/test/java/org/apache/giraph/TestCheckpointing.java 2939af7
> pom.xml 672ec44
>
> Diff: https://reviews.apache.org/r/23989/diff/
>
>
> Testing
> -------
>
> Run pagerank, will keep testing with different jobs.
>
>
> Thanks,
>
> Sergey Edunov
>
>
Re: Review Request 23989: Improve checkpointing
Posted by Sergey Edunov <ed...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/23989/
-----------------------------------------------------------
(Updated Aug. 15, 2014, 5:37 p.m.)
Review request for giraph.
Changes
-------
Fixing CR comments
Repository: giraph-git
Description (updated)
-------
We need to address some issues with checkpointing:
1) worker2worker messages are not saved
2) BspServiceWorker does not compile under hadoop_0.23 profile
3) it would be nice to be able to manually checkpoint and stop any job at any point of time.
Changes:
1) worker2worker messages fixed my serializing currentworkertoworker messages (it is a list of writable so I had to write class information as well)
2) Compilation issues fixed
3) The way you can trigger checkpointing now is by creating /_checkpointAndStop node in zookeeper (same way as _haltComputation works) After that the behavior of the job will be determined by registered GiraphJobRetryChecker. By default, job will get checkpointed at the end of current superstep and halted. You can override this behavior by making shouldRestartCheckpoint() return true, in this case job will be restarted immediately after getting checkpointed.
Diffs (updated)
-----
giraph-core/src/main/java/org/apache/giraph/bsp/BspService.java 02577b9
giraph-core/src/main/java/org/apache/giraph/bsp/CentralizedService.java ff3e427
giraph-core/src/main/java/org/apache/giraph/bsp/CentralizedServiceMaster.java e5b7cf3
giraph-core/src/main/java/org/apache/giraph/bsp/CentralizedServiceWorker.java e5d0ae1
giraph-core/src/main/java/org/apache/giraph/bsp/CheckpointStatus.java PRE-CREATION
giraph-core/src/main/java/org/apache/giraph/bsp/SuperstepState.java c384fbf
giraph-core/src/main/java/org/apache/giraph/comm/ServerData.java a92cd1c
giraph-core/src/main/java/org/apache/giraph/conf/GiraphConstants.java 0424a47
giraph-core/src/main/java/org/apache/giraph/graph/FinishedSuperstepStats.java c351778
giraph-core/src/main/java/org/apache/giraph/graph/GlobalStats.java bc56c9c
giraph-core/src/main/java/org/apache/giraph/graph/GraphTaskManager.java 6ebb002
giraph-core/src/main/java/org/apache/giraph/job/DefaultGiraphJobRetryChecker.java 0cab86c
giraph-core/src/main/java/org/apache/giraph/job/GiraphJob.java 4a1f02e
giraph-core/src/main/java/org/apache/giraph/job/GiraphJobRetryChecker.java 53a800e
giraph-core/src/main/java/org/apache/giraph/job/HadoopUtils.java 9530fd6
giraph-core/src/main/java/org/apache/giraph/master/BspServiceMaster.java e129390
giraph-core/src/main/java/org/apache/giraph/master/MasterThread.java 0635210
giraph-core/src/main/java/org/apache/giraph/utils/CheckpointingUtils.java PRE-CREATION
giraph-core/src/main/java/org/apache/giraph/utils/WritableUtils.java 763f59d
giraph-core/src/main/java/org/apache/giraph/worker/BspServiceWorker.java d2d24ee
giraph-core/src/test/java/org/apache/giraph/utils/TestWritableUtils.java PRE-CREATION
giraph-examples/src/test/java/org/apache/giraph/TestCheckpointing.java 2939af7
pom.xml 672ec44
Diff: https://reviews.apache.org/r/23989/diff/
Testing
-------
Run pagerank, will keep testing with different jobs.
Thanks,
Sergey Edunov