You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@giraph.apache.org by Sergey Edunov <ed...@gmail.com> on 2014/06/27 22:49:00 UTC

Review Request 23140: Fix checkpointing

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/23140/
-----------------------------------------------------------

Review request for giraph.


Repository: giraph-git


Description
-------

This fix merely makes checkpointing work again. 


Diffs
-----

  giraph-core/src/main/java/org/apache/giraph/aggregators/Aggregator.java 514e470 
  giraph-core/src/main/java/org/apache/giraph/aggregators/AggregatorHandler.java PRE-CREATION 
  giraph-core/src/main/java/org/apache/giraph/aggregators/AggregatorWrapper.java 9613805 
  giraph-core/src/main/java/org/apache/giraph/aggregators/BasicAggregator.java 07a4100 
  giraph-core/src/main/java/org/apache/giraph/bsp/BspService.java 2e35373 
  giraph-core/src/main/java/org/apache/giraph/comm/ServerData.java f0ecca2 
  giraph-core/src/main/java/org/apache/giraph/comm/aggregators/AllAggregatorServerData.java 177e738 
  giraph-core/src/main/java/org/apache/giraph/conf/GiraphConstants.java 7d7ceb2 
  giraph-core/src/main/java/org/apache/giraph/master/BspServiceMaster.java ad7e045 
  giraph-core/src/main/java/org/apache/giraph/master/DefaultMasterCompute.java bfb6f0e 
  giraph-core/src/main/java/org/apache/giraph/master/MasterAggregatorHandler.java 325d91f 
  giraph-core/src/main/java/org/apache/giraph/master/MasterCompute.java d77a9b5 
  giraph-core/src/main/java/org/apache/giraph/master/WritableMasterAggregatorUsage.java PRE-CREATION 
  giraph-core/src/main/java/org/apache/giraph/partition/BasicPartitionOwner.java 545d1af 
  giraph-core/src/main/java/org/apache/giraph/partition/HashMasterPartitioner.java 240687e 
  giraph-core/src/main/java/org/apache/giraph/partition/HashWorkerPartitioner.java d833895 
  giraph-core/src/main/java/org/apache/giraph/partition/MasterGraphPartitioner.java 50c750a 
  giraph-core/src/main/java/org/apache/giraph/partition/PartitionBalancer.java 3454d62 
  giraph-core/src/main/java/org/apache/giraph/partition/PartitionOwner.java 0ac74da 
  giraph-core/src/main/java/org/apache/giraph/partition/SimpleMasterPartitioner.java f128f34 
  giraph-core/src/main/java/org/apache/giraph/partition/SimpleWorkerPartitioner.java 3c0de44 
  giraph-core/src/main/java/org/apache/giraph/partition/WorkerGraphPartitioner.java 004ea81 
  giraph-core/src/main/java/org/apache/giraph/utils/InternalVertexRunner.java 09dd46d 
  giraph-core/src/main/java/org/apache/giraph/utils/io/ExtendedDataInputOutput.java af45426 
  giraph-core/src/main/java/org/apache/giraph/worker/BspServiceWorker.java 8dcf19a 
  giraph-core/src/main/java/org/apache/giraph/worker/WorkerAggregatorHandler.java 9bfd7b5 
  giraph-core/src/main/java/org/apache/giraph/worker/WorkerContext.java 17347db 
  giraph-core/src/main/java/org/apache/giraph/worker/WorkerThreadAggregatorUsage.java 194127e 
  giraph-core/src/main/java/org/apache/giraph/worker/WritableWorkerAggregatorUsage.java PRE-CREATION 
  giraph-core/src/test/java/org/apache/giraph/partition/SimpleRangePartitionFactoryTest.java 96bd5d7 
  giraph-examples/src/test/java/org/apache/giraph/aggregators/TestAggregatorsHandling.java e2b611b 

Diff: https://reviews.apache.org/r/23140/diff/


Testing
-------

I tested it running multiple different jobs. I run page rank on 2*10^9 vertices on 200 workers and it seems to work just fine. It only takes 2 minutes to save checkpoint. 


Thanks,

Sergey Edunov


Re: Review Request 23140: Fix checkpointing

Posted by Sergey Edunov <ed...@gmail.com>.

> On July 2, 2014, 1:53 a.m., Maja Kabiljo wrote:
> > giraph-examples/src/test/java/org/apache/giraph/master/TestAggregatorsHandling.java, line 19
> > <https://reviews.apache.org/r/23140/diff/2/?file=622266#file622266line19>
> >
> >     Why did you move this file?


> On July 2, 2014, 1:53 a.m., Maja Kabiljo wrote:
> > giraph-core/src/main/java/org/apache/giraph/master/BspServiceMaster.java, lines 817-818
> > <https://reviews.apache.org/r/23140/diff/2/?file=622249#file622249line817>
> >
> >     Interesting, where do we rely on this?

I don't remember it right now, will run some experiments later


- Sergey


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/23140/#review47169
-----------------------------------------------------------


On July 15, 2014, 9:08 p.m., Sergey Edunov wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/23140/
> -----------------------------------------------------------
> 
> (Updated July 15, 2014, 9:08 p.m.)
> 
> 
> Review request for giraph.
> 
> 
> Repository: giraph-git
> 
> 
> Description
> -------
> 
> This fix merely makes checkpointing work again. 
> 
> 
> Diffs
> -----
> 
>   giraph-core/src/main/java/org/apache/giraph/aggregators/AggregatorWrapper.java 9613805 
>   giraph-core/src/main/java/org/apache/giraph/bsp/BspService.java 2e35373 
>   giraph-core/src/main/java/org/apache/giraph/comm/ServerData.java 85bfe04 
>   giraph-core/src/main/java/org/apache/giraph/conf/GiraphConstants.java ab0570f 
>   giraph-core/src/main/java/org/apache/giraph/master/BspServiceMaster.java 0275395 
>   giraph-core/src/main/java/org/apache/giraph/partition/BasicPartitionOwner.java 545d1af 
>   giraph-core/src/main/java/org/apache/giraph/partition/HashMasterPartitioner.java 240687e 
>   giraph-core/src/main/java/org/apache/giraph/partition/HashWorkerPartitioner.java d833895 
>   giraph-core/src/main/java/org/apache/giraph/partition/MasterGraphPartitioner.java 50c750a 
>   giraph-core/src/main/java/org/apache/giraph/partition/PartitionBalancer.java 3454d62 
>   giraph-core/src/main/java/org/apache/giraph/partition/PartitionOwner.java 0ac74da 
>   giraph-core/src/main/java/org/apache/giraph/partition/SimpleMasterPartitioner.java f128f34 
>   giraph-core/src/main/java/org/apache/giraph/partition/SimpleWorkerPartitioner.java 3c0de44 
>   giraph-core/src/main/java/org/apache/giraph/partition/WorkerGraphPartitioner.java 004ea81 
>   giraph-core/src/main/java/org/apache/giraph/utils/InternalVertexRunner.java 2c4606f 
>   giraph-core/src/main/java/org/apache/giraph/utils/io/ExtendedDataInputOutput.java af45426 
>   giraph-core/src/main/java/org/apache/giraph/worker/BspServiceWorker.java de7af28 
>   giraph-core/src/main/java/org/apache/giraph/worker/WorkerContext.java 29835c5 
>   giraph-core/src/test/java/org/apache/giraph/partition/SimpleRangePartitionFactoryTest.java 96bd5d7 
> 
> Diff: https://reviews.apache.org/r/23140/diff/
> 
> 
> Testing
> -------
> 
> I tested it running multiple different jobs. I run page rank on 2*10^9 vertices on 200 workers and it seems to work just fine. It only takes 2 minutes to save checkpoint. 
> 
> 
> Thanks,
> 
> Sergey Edunov
> 
>


Re: Review Request 23140: Fix checkpointing

Posted by Maja Kabiljo <ma...@fb.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/23140/#review47169
-----------------------------------------------------------


Thanks, much shorter now. Should we add some tests to make sure things don't get broken again?


giraph-core/src/main/java/org/apache/giraph/bsp/BspService.java
<https://reviews.apache.org/r/23140/#comment82778>

    Why ignore superstep 0? For example there might be a lot of filtering going on during input superstep and it's cheaper to restart from checkpoint than read all the data again



giraph-core/src/main/java/org/apache/giraph/master/BspServiceMaster.java
<https://reviews.apache.org/r/23140/#comment82781>

    Interesting, where do we rely on this?



giraph-core/src/main/java/org/apache/giraph/utils/io/ExtendedDataInputOutput.java
<https://reviews.apache.org/r/23140/#comment82777>

    Nice bug ;-)



giraph-core/src/main/java/org/apache/giraph/worker/BspServiceWorker.java
<https://reviews.apache.org/r/23140/#comment82787>

    This is what output threads are called, please name these differently



giraph-core/src/main/java/org/apache/giraph/worker/WorkerContext.java
<https://reviews.apache.org/r/23140/#comment82775>

    We are not using Serializable - what's transient here for?



giraph-examples/src/test/java/org/apache/giraph/master/TestAggregatorsHandling.java
<https://reviews.apache.org/r/23140/#comment82772>

    Why did you move this file?


- Maja Kabiljo


On July 2, 2014, 12:57 a.m., Sergey Edunov wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/23140/
> -----------------------------------------------------------
> 
> (Updated July 2, 2014, 12:57 a.m.)
> 
> 
> Review request for giraph.
> 
> 
> Repository: giraph-git
> 
> 
> Description
> -------
> 
> This fix merely makes checkpointing work again. 
> 
> 
> Diffs
> -----
> 
>   giraph-core/src/main/java/org/apache/giraph/aggregators/AggregatorWrapper.java 9613805 
>   giraph-core/src/main/java/org/apache/giraph/bsp/BspService.java 2e35373 
>   giraph-core/src/main/java/org/apache/giraph/comm/ServerData.java f0ecca2 
>   giraph-core/src/main/java/org/apache/giraph/conf/GiraphConstants.java 7d7ceb2 
>   giraph-core/src/main/java/org/apache/giraph/master/BspServiceMaster.java ad7e045 
>   giraph-core/src/main/java/org/apache/giraph/master/MasterAggregatorHandler.java 325d91f 
>   giraph-core/src/main/java/org/apache/giraph/partition/BasicPartitionOwner.java 545d1af 
>   giraph-core/src/main/java/org/apache/giraph/partition/HashMasterPartitioner.java 240687e 
>   giraph-core/src/main/java/org/apache/giraph/partition/HashWorkerPartitioner.java d833895 
>   giraph-core/src/main/java/org/apache/giraph/partition/MasterGraphPartitioner.java 50c750a 
>   giraph-core/src/main/java/org/apache/giraph/partition/PartitionBalancer.java 3454d62 
>   giraph-core/src/main/java/org/apache/giraph/partition/PartitionOwner.java 0ac74da 
>   giraph-core/src/main/java/org/apache/giraph/partition/SimpleMasterPartitioner.java f128f34 
>   giraph-core/src/main/java/org/apache/giraph/partition/SimpleWorkerPartitioner.java 3c0de44 
>   giraph-core/src/main/java/org/apache/giraph/partition/WorkerGraphPartitioner.java 004ea81 
>   giraph-core/src/main/java/org/apache/giraph/utils/InternalVertexRunner.java 09dd46d 
>   giraph-core/src/main/java/org/apache/giraph/utils/io/ExtendedDataInputOutput.java af45426 
>   giraph-core/src/main/java/org/apache/giraph/worker/BspServiceWorker.java 8dcf19a 
>   giraph-core/src/main/java/org/apache/giraph/worker/WorkerContext.java 17347db 
>   giraph-core/src/test/java/org/apache/giraph/partition/SimpleRangePartitionFactoryTest.java 96bd5d7 
>   giraph-examples/src/test/java/org/apache/giraph/aggregators/TestAggregatorsHandling.java e2b611b 
>   giraph-examples/src/test/java/org/apache/giraph/master/TestAggregatorsHandling.java PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/23140/diff/
> 
> 
> Testing
> -------
> 
> I tested it running multiple different jobs. I run page rank on 2*10^9 vertices on 200 workers and it seems to work just fine. It only takes 2 minutes to save checkpoint. 
> 
> 
> Thanks,
> 
> Sergey Edunov
> 
>


Re: Review Request 23140: Fix checkpointing

Posted by Maja Kabiljo <ma...@fb.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/23140/#review47838
-----------------------------------------------------------


Looks great, a few final comments about the test.


giraph-examples/src/test/java/org/apache/giraph/TestCheckpointing.java
<https://reviews.apache.org/r/23140/#comment84071>

    I'm a bit concerned that this test would have passed even if restart from checkpoint didn't actually happen but app run from beginning. Can we somehow ensure it did?



giraph-examples/src/test/java/org/apache/giraph/TestCheckpointing.java
<https://reviews.apache.org/r/23140/#comment84066>

    Can you reuse the same conf and just add one setting (or at least create a method which creates conf)



giraph-examples/src/test/java/org/apache/giraph/TestCheckpointing.java
<https://reviews.apache.org/r/23140/#comment84068>

    You can extend DefaultWorkerContext to avoid overriding empty methods


- Maja Kabiljo


On July 15, 2014, 11:33 p.m., Sergey Edunov wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/23140/
> -----------------------------------------------------------
> 
> (Updated July 15, 2014, 11:33 p.m.)
> 
> 
> Review request for giraph.
> 
> 
> Repository: giraph-git
> 
> 
> Description
> -------
> 
> This fix merely makes checkpointing work again. 
> 
> 
> Diffs
> -----
> 
>   giraph-core/src/main/java/org/apache/giraph/aggregators/AggregatorWrapper.java 9613805 
>   giraph-core/src/main/java/org/apache/giraph/bsp/BspService.java 2e35373 
>   giraph-core/src/main/java/org/apache/giraph/comm/ServerData.java 85bfe04 
>   giraph-core/src/main/java/org/apache/giraph/conf/GiraphConstants.java ab0570f 
>   giraph-core/src/main/java/org/apache/giraph/master/BspServiceMaster.java 0275395 
>   giraph-core/src/main/java/org/apache/giraph/partition/BasicPartitionOwner.java 545d1af 
>   giraph-core/src/main/java/org/apache/giraph/partition/HashMasterPartitioner.java 240687e 
>   giraph-core/src/main/java/org/apache/giraph/partition/HashWorkerPartitioner.java d833895 
>   giraph-core/src/main/java/org/apache/giraph/partition/MasterGraphPartitioner.java 50c750a 
>   giraph-core/src/main/java/org/apache/giraph/partition/PartitionBalancer.java 3454d62 
>   giraph-core/src/main/java/org/apache/giraph/partition/PartitionOwner.java 0ac74da 
>   giraph-core/src/main/java/org/apache/giraph/partition/SimpleMasterPartitioner.java f128f34 
>   giraph-core/src/main/java/org/apache/giraph/partition/SimpleWorkerPartitioner.java 3c0de44 
>   giraph-core/src/main/java/org/apache/giraph/partition/WorkerGraphPartitioner.java 004ea81 
>   giraph-core/src/main/java/org/apache/giraph/utils/InternalVertexRunner.java 2c4606f 
>   giraph-core/src/main/java/org/apache/giraph/utils/io/ExtendedDataInputOutput.java af45426 
>   giraph-core/src/main/java/org/apache/giraph/worker/BspServiceWorker.java de7af28 
>   giraph-core/src/main/java/org/apache/giraph/worker/WorkerContext.java 29835c5 
>   giraph-core/src/test/java/org/apache/giraph/partition/SimpleRangePartitionFactoryTest.java 96bd5d7 
>   giraph-examples/src/test/java/org/apache/giraph/TestCheckpointing.java PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/23140/diff/
> 
> 
> Testing
> -------
> 
> I tested it running multiple different jobs. I run page rank on 2*10^9 vertices on 200 workers and it seems to work just fine. It only takes 2 minutes to save checkpoint. 
> 
> 
> Thanks,
> 
> Sergey Edunov
> 
>


Re: Review Request 23140: Fix checkpointing

Posted by Maja Kabiljo <ma...@fb.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/23140/#review47902
-----------------------------------------------------------

Ship it!


Thanks Sergey, +1, I'll commit it!

- Maja Kabiljo


On July 16, 2014, 3:59 a.m., Sergey Edunov wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/23140/
> -----------------------------------------------------------
> 
> (Updated July 16, 2014, 3:59 a.m.)
> 
> 
> Review request for giraph.
> 
> 
> Repository: giraph-git
> 
> 
> Description
> -------
> 
> This fix merely makes checkpointing work again. 
> 
> 
> Diffs
> -----
> 
>   giraph-core/src/main/java/org/apache/giraph/aggregators/AggregatorWrapper.java 9613805 
>   giraph-core/src/main/java/org/apache/giraph/bsp/BspService.java 2e35373 
>   giraph-core/src/main/java/org/apache/giraph/comm/ServerData.java 85bfe04 
>   giraph-core/src/main/java/org/apache/giraph/conf/GiraphConstants.java ab0570f 
>   giraph-core/src/main/java/org/apache/giraph/master/BspServiceMaster.java 0275395 
>   giraph-core/src/main/java/org/apache/giraph/partition/BasicPartitionOwner.java 545d1af 
>   giraph-core/src/main/java/org/apache/giraph/partition/HashMasterPartitioner.java 240687e 
>   giraph-core/src/main/java/org/apache/giraph/partition/HashWorkerPartitioner.java d833895 
>   giraph-core/src/main/java/org/apache/giraph/partition/MasterGraphPartitioner.java 50c750a 
>   giraph-core/src/main/java/org/apache/giraph/partition/PartitionBalancer.java 3454d62 
>   giraph-core/src/main/java/org/apache/giraph/partition/PartitionOwner.java 0ac74da 
>   giraph-core/src/main/java/org/apache/giraph/partition/SimpleMasterPartitioner.java f128f34 
>   giraph-core/src/main/java/org/apache/giraph/partition/SimpleWorkerPartitioner.java 3c0de44 
>   giraph-core/src/main/java/org/apache/giraph/partition/WorkerGraphPartitioner.java 004ea81 
>   giraph-core/src/main/java/org/apache/giraph/utils/InternalVertexRunner.java 2c4606f 
>   giraph-core/src/main/java/org/apache/giraph/utils/io/ExtendedDataInputOutput.java af45426 
>   giraph-core/src/main/java/org/apache/giraph/worker/BspServiceWorker.java de7af28 
>   giraph-core/src/main/java/org/apache/giraph/worker/WorkerContext.java 29835c5 
>   giraph-core/src/test/java/org/apache/giraph/partition/SimpleRangePartitionFactoryTest.java 96bd5d7 
>   giraph-examples/src/test/java/org/apache/giraph/TestCheckpointing.java PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/23140/diff/
> 
> 
> Testing
> -------
> 
> I tested it running multiple different jobs. I run page rank on 2*10^9 vertices on 200 workers and it seems to work just fine. It only takes 2 minutes to save checkpoint. 
> 
> 
> Thanks,
> 
> Sergey Edunov
> 
>


Re: Review Request 23140: Fix checkpointing

Posted by Sergey Edunov <ed...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/23140/
-----------------------------------------------------------

(Updated July 16, 2014, 3:59 a.m.)


Review request for giraph.


Repository: giraph-git


Description
-------

This fix merely makes checkpointing work again. 


Diffs (updated)
-----

  giraph-core/src/main/java/org/apache/giraph/aggregators/AggregatorWrapper.java 9613805 
  giraph-core/src/main/java/org/apache/giraph/bsp/BspService.java 2e35373 
  giraph-core/src/main/java/org/apache/giraph/comm/ServerData.java 85bfe04 
  giraph-core/src/main/java/org/apache/giraph/conf/GiraphConstants.java ab0570f 
  giraph-core/src/main/java/org/apache/giraph/master/BspServiceMaster.java 0275395 
  giraph-core/src/main/java/org/apache/giraph/partition/BasicPartitionOwner.java 545d1af 
  giraph-core/src/main/java/org/apache/giraph/partition/HashMasterPartitioner.java 240687e 
  giraph-core/src/main/java/org/apache/giraph/partition/HashWorkerPartitioner.java d833895 
  giraph-core/src/main/java/org/apache/giraph/partition/MasterGraphPartitioner.java 50c750a 
  giraph-core/src/main/java/org/apache/giraph/partition/PartitionBalancer.java 3454d62 
  giraph-core/src/main/java/org/apache/giraph/partition/PartitionOwner.java 0ac74da 
  giraph-core/src/main/java/org/apache/giraph/partition/SimpleMasterPartitioner.java f128f34 
  giraph-core/src/main/java/org/apache/giraph/partition/SimpleWorkerPartitioner.java 3c0de44 
  giraph-core/src/main/java/org/apache/giraph/partition/WorkerGraphPartitioner.java 004ea81 
  giraph-core/src/main/java/org/apache/giraph/utils/InternalVertexRunner.java 2c4606f 
  giraph-core/src/main/java/org/apache/giraph/utils/io/ExtendedDataInputOutput.java af45426 
  giraph-core/src/main/java/org/apache/giraph/worker/BspServiceWorker.java de7af28 
  giraph-core/src/main/java/org/apache/giraph/worker/WorkerContext.java 29835c5 
  giraph-core/src/test/java/org/apache/giraph/partition/SimpleRangePartitionFactoryTest.java 96bd5d7 
  giraph-examples/src/test/java/org/apache/giraph/TestCheckpointing.java PRE-CREATION 

Diff: https://reviews.apache.org/r/23140/diff/


Testing
-------

I tested it running multiple different jobs. I run page rank on 2*10^9 vertices on 200 workers and it seems to work just fine. It only takes 2 minutes to save checkpoint. 


Thanks,

Sergey Edunov


Re: Review Request 23140: Fix checkpointing

Posted by Sergey Edunov <ed...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/23140/
-----------------------------------------------------------

(Updated July 15, 2014, 11:33 p.m.)


Review request for giraph.


Repository: giraph-git


Description
-------

This fix merely makes checkpointing work again. 


Diffs (updated)
-----

  giraph-core/src/main/java/org/apache/giraph/aggregators/AggregatorWrapper.java 9613805 
  giraph-core/src/main/java/org/apache/giraph/bsp/BspService.java 2e35373 
  giraph-core/src/main/java/org/apache/giraph/comm/ServerData.java 85bfe04 
  giraph-core/src/main/java/org/apache/giraph/conf/GiraphConstants.java ab0570f 
  giraph-core/src/main/java/org/apache/giraph/master/BspServiceMaster.java 0275395 
  giraph-core/src/main/java/org/apache/giraph/partition/BasicPartitionOwner.java 545d1af 
  giraph-core/src/main/java/org/apache/giraph/partition/HashMasterPartitioner.java 240687e 
  giraph-core/src/main/java/org/apache/giraph/partition/HashWorkerPartitioner.java d833895 
  giraph-core/src/main/java/org/apache/giraph/partition/MasterGraphPartitioner.java 50c750a 
  giraph-core/src/main/java/org/apache/giraph/partition/PartitionBalancer.java 3454d62 
  giraph-core/src/main/java/org/apache/giraph/partition/PartitionOwner.java 0ac74da 
  giraph-core/src/main/java/org/apache/giraph/partition/SimpleMasterPartitioner.java f128f34 
  giraph-core/src/main/java/org/apache/giraph/partition/SimpleWorkerPartitioner.java 3c0de44 
  giraph-core/src/main/java/org/apache/giraph/partition/WorkerGraphPartitioner.java 004ea81 
  giraph-core/src/main/java/org/apache/giraph/utils/InternalVertexRunner.java 2c4606f 
  giraph-core/src/main/java/org/apache/giraph/utils/io/ExtendedDataInputOutput.java af45426 
  giraph-core/src/main/java/org/apache/giraph/worker/BspServiceWorker.java de7af28 
  giraph-core/src/main/java/org/apache/giraph/worker/WorkerContext.java 29835c5 
  giraph-core/src/test/java/org/apache/giraph/partition/SimpleRangePartitionFactoryTest.java 96bd5d7 
  giraph-examples/src/test/java/org/apache/giraph/TestCheckpointing.java PRE-CREATION 

Diff: https://reviews.apache.org/r/23140/diff/


Testing
-------

I tested it running multiple different jobs. I run page rank on 2*10^9 vertices on 200 workers and it seems to work just fine. It only takes 2 minutes to save checkpoint. 


Thanks,

Sergey Edunov


Re: Review Request 23140: Fix checkpointing

Posted by Sergey Edunov <ed...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/23140/
-----------------------------------------------------------

(Updated July 15, 2014, 9:08 p.m.)


Review request for giraph.


Changes
-------

Fixed CR issues


Repository: giraph-git


Description
-------

This fix merely makes checkpointing work again. 


Diffs (updated)
-----

  giraph-core/src/main/java/org/apache/giraph/aggregators/AggregatorWrapper.java 9613805 
  giraph-core/src/main/java/org/apache/giraph/bsp/BspService.java 2e35373 
  giraph-core/src/main/java/org/apache/giraph/comm/ServerData.java 85bfe04 
  giraph-core/src/main/java/org/apache/giraph/conf/GiraphConstants.java ab0570f 
  giraph-core/src/main/java/org/apache/giraph/master/BspServiceMaster.java 0275395 
  giraph-core/src/main/java/org/apache/giraph/partition/BasicPartitionOwner.java 545d1af 
  giraph-core/src/main/java/org/apache/giraph/partition/HashMasterPartitioner.java 240687e 
  giraph-core/src/main/java/org/apache/giraph/partition/HashWorkerPartitioner.java d833895 
  giraph-core/src/main/java/org/apache/giraph/partition/MasterGraphPartitioner.java 50c750a 
  giraph-core/src/main/java/org/apache/giraph/partition/PartitionBalancer.java 3454d62 
  giraph-core/src/main/java/org/apache/giraph/partition/PartitionOwner.java 0ac74da 
  giraph-core/src/main/java/org/apache/giraph/partition/SimpleMasterPartitioner.java f128f34 
  giraph-core/src/main/java/org/apache/giraph/partition/SimpleWorkerPartitioner.java 3c0de44 
  giraph-core/src/main/java/org/apache/giraph/partition/WorkerGraphPartitioner.java 004ea81 
  giraph-core/src/main/java/org/apache/giraph/utils/InternalVertexRunner.java 2c4606f 
  giraph-core/src/main/java/org/apache/giraph/utils/io/ExtendedDataInputOutput.java af45426 
  giraph-core/src/main/java/org/apache/giraph/worker/BspServiceWorker.java de7af28 
  giraph-core/src/main/java/org/apache/giraph/worker/WorkerContext.java 29835c5 
  giraph-core/src/test/java/org/apache/giraph/partition/SimpleRangePartitionFactoryTest.java 96bd5d7 

Diff: https://reviews.apache.org/r/23140/diff/


Testing
-------

I tested it running multiple different jobs. I run page rank on 2*10^9 vertices on 200 workers and it seems to work just fine. It only takes 2 minutes to save checkpoint. 


Thanks,

Sergey Edunov


Re: Review Request 23140: Fix checkpointing

Posted by Sergey Edunov <ed...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/23140/
-----------------------------------------------------------

(Updated July 2, 2014, 12:57 a.m.)


Review request for giraph.


Changes
-------

I removed aggregators serialization from MasterCompute and workers.


Repository: giraph-git


Description
-------

This fix merely makes checkpointing work again. 


Diffs (updated)
-----

  giraph-core/src/main/java/org/apache/giraph/aggregators/AggregatorWrapper.java 9613805 
  giraph-core/src/main/java/org/apache/giraph/bsp/BspService.java 2e35373 
  giraph-core/src/main/java/org/apache/giraph/comm/ServerData.java f0ecca2 
  giraph-core/src/main/java/org/apache/giraph/conf/GiraphConstants.java 7d7ceb2 
  giraph-core/src/main/java/org/apache/giraph/master/BspServiceMaster.java ad7e045 
  giraph-core/src/main/java/org/apache/giraph/master/MasterAggregatorHandler.java 325d91f 
  giraph-core/src/main/java/org/apache/giraph/partition/BasicPartitionOwner.java 545d1af 
  giraph-core/src/main/java/org/apache/giraph/partition/HashMasterPartitioner.java 240687e 
  giraph-core/src/main/java/org/apache/giraph/partition/HashWorkerPartitioner.java d833895 
  giraph-core/src/main/java/org/apache/giraph/partition/MasterGraphPartitioner.java 50c750a 
  giraph-core/src/main/java/org/apache/giraph/partition/PartitionBalancer.java 3454d62 
  giraph-core/src/main/java/org/apache/giraph/partition/PartitionOwner.java 0ac74da 
  giraph-core/src/main/java/org/apache/giraph/partition/SimpleMasterPartitioner.java f128f34 
  giraph-core/src/main/java/org/apache/giraph/partition/SimpleWorkerPartitioner.java 3c0de44 
  giraph-core/src/main/java/org/apache/giraph/partition/WorkerGraphPartitioner.java 004ea81 
  giraph-core/src/main/java/org/apache/giraph/utils/InternalVertexRunner.java 09dd46d 
  giraph-core/src/main/java/org/apache/giraph/utils/io/ExtendedDataInputOutput.java af45426 
  giraph-core/src/main/java/org/apache/giraph/worker/BspServiceWorker.java 8dcf19a 
  giraph-core/src/main/java/org/apache/giraph/worker/WorkerContext.java 17347db 
  giraph-core/src/test/java/org/apache/giraph/partition/SimpleRangePartitionFactoryTest.java 96bd5d7 
  giraph-examples/src/test/java/org/apache/giraph/aggregators/TestAggregatorsHandling.java e2b611b 
  giraph-examples/src/test/java/org/apache/giraph/master/TestAggregatorsHandling.java PRE-CREATION 

Diff: https://reviews.apache.org/r/23140/diff/


Testing
-------

I tested it running multiple different jobs. I run page rank on 2*10^9 vertices on 200 workers and it seems to work just fine. It only takes 2 minutes to save checkpoint. 


Thanks,

Sergey Edunov


Re: Review Request 23140: Fix checkpointing

Posted by Maja Kabiljo <ma...@fb.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/23140/#review47023
-----------------------------------------------------------


I see a lot of the changes are related to aggregators, and you write them now from master, worker and MasterCompute - can't we write them just once and go through normal path of distributing them in the beginning of the superstep?

- Maja Kabiljo


On June 27, 2014, 8:48 p.m., Sergey Edunov wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/23140/
> -----------------------------------------------------------
> 
> (Updated June 27, 2014, 8:48 p.m.)
> 
> 
> Review request for giraph.
> 
> 
> Repository: giraph-git
> 
> 
> Description
> -------
> 
> This fix merely makes checkpointing work again. 
> 
> 
> Diffs
> -----
> 
>   giraph-core/src/main/java/org/apache/giraph/aggregators/Aggregator.java 514e470 
>   giraph-core/src/main/java/org/apache/giraph/aggregators/AggregatorHandler.java PRE-CREATION 
>   giraph-core/src/main/java/org/apache/giraph/aggregators/AggregatorWrapper.java 9613805 
>   giraph-core/src/main/java/org/apache/giraph/aggregators/BasicAggregator.java 07a4100 
>   giraph-core/src/main/java/org/apache/giraph/bsp/BspService.java 2e35373 
>   giraph-core/src/main/java/org/apache/giraph/comm/ServerData.java f0ecca2 
>   giraph-core/src/main/java/org/apache/giraph/comm/aggregators/AllAggregatorServerData.java 177e738 
>   giraph-core/src/main/java/org/apache/giraph/conf/GiraphConstants.java 7d7ceb2 
>   giraph-core/src/main/java/org/apache/giraph/master/BspServiceMaster.java ad7e045 
>   giraph-core/src/main/java/org/apache/giraph/master/DefaultMasterCompute.java bfb6f0e 
>   giraph-core/src/main/java/org/apache/giraph/master/MasterAggregatorHandler.java 325d91f 
>   giraph-core/src/main/java/org/apache/giraph/master/MasterCompute.java d77a9b5 
>   giraph-core/src/main/java/org/apache/giraph/master/WritableMasterAggregatorUsage.java PRE-CREATION 
>   giraph-core/src/main/java/org/apache/giraph/partition/BasicPartitionOwner.java 545d1af 
>   giraph-core/src/main/java/org/apache/giraph/partition/HashMasterPartitioner.java 240687e 
>   giraph-core/src/main/java/org/apache/giraph/partition/HashWorkerPartitioner.java d833895 
>   giraph-core/src/main/java/org/apache/giraph/partition/MasterGraphPartitioner.java 50c750a 
>   giraph-core/src/main/java/org/apache/giraph/partition/PartitionBalancer.java 3454d62 
>   giraph-core/src/main/java/org/apache/giraph/partition/PartitionOwner.java 0ac74da 
>   giraph-core/src/main/java/org/apache/giraph/partition/SimpleMasterPartitioner.java f128f34 
>   giraph-core/src/main/java/org/apache/giraph/partition/SimpleWorkerPartitioner.java 3c0de44 
>   giraph-core/src/main/java/org/apache/giraph/partition/WorkerGraphPartitioner.java 004ea81 
>   giraph-core/src/main/java/org/apache/giraph/utils/InternalVertexRunner.java 09dd46d 
>   giraph-core/src/main/java/org/apache/giraph/utils/io/ExtendedDataInputOutput.java af45426 
>   giraph-core/src/main/java/org/apache/giraph/worker/BspServiceWorker.java 8dcf19a 
>   giraph-core/src/main/java/org/apache/giraph/worker/WorkerAggregatorHandler.java 9bfd7b5 
>   giraph-core/src/main/java/org/apache/giraph/worker/WorkerContext.java 17347db 
>   giraph-core/src/main/java/org/apache/giraph/worker/WorkerThreadAggregatorUsage.java 194127e 
>   giraph-core/src/main/java/org/apache/giraph/worker/WritableWorkerAggregatorUsage.java PRE-CREATION 
>   giraph-core/src/test/java/org/apache/giraph/partition/SimpleRangePartitionFactoryTest.java 96bd5d7 
>   giraph-examples/src/test/java/org/apache/giraph/aggregators/TestAggregatorsHandling.java e2b611b 
> 
> Diff: https://reviews.apache.org/r/23140/diff/
> 
> 
> Testing
> -------
> 
> I tested it running multiple different jobs. I run page rank on 2*10^9 vertices on 200 workers and it seems to work just fine. It only takes 2 minutes to save checkpoint. 
> 
> 
> Thanks,
> 
> Sergey Edunov
> 
>