You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@giraph.apache.org by Avery Ching <av...@gmail.com> on 2012/08/17 10:18:45 UTC

Review Request: GIRAPH-302: Thread safety issue with sending partitions around

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/6676/
-----------------------------------------------------------

Review request for giraph.


Description
-------

When calling sendPartitionRequest(), we clear the vertex list afterward, making it a race!

I noticed this when I was running with 300 workers and the number of edges wasn't what I expected. Sometimes we get empty requests!

After digging into the code I found the issue and have fixed it.

Giraph Stats Aggregate edges 99,971,220 0 99,971,220
Superstep 11 0 11
Current workers 300 0 300
Last checkpointed superstep 0 0 0
Current master task partition 0 0 0
Sent messages 0 0 0
Aggregate finished vertices 10,000,000 0 10,000,000
Aggregate vertices 10,000,000 0 10,000,000

This is wrong!

Giraph Stats Aggregate edges 100,000,000 0 100,000,000
Superstep 11 0 11
Last checkpointed superstep 0 0 0
Current workers 300 0 300
Current master task partition 0 0 0
Sent messages 0 0 0
Aggregate finished vertices 10,000,000 0 10,000,000
Aggregate vertices 10,000,000 0 10,000,000

Fixed!

Also added a few messages for better debugging.


This addresses bug GIRAPH-302.
    https://issues.apache.org/jira/browse/GIRAPH-302


Diffs
-----

  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/BspServiceWorker.java 1373682 

Diff: https://reviews.apache.org/r/6676/diff/


Testing
-------

Passed unittests and verified on a real cluster using 300 machines.


Thanks,

Avery Ching