You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@giraph.apache.org by Avery Ching <av...@gmail.com> on 2012/08/17 10:18:45 UTC
Review Request: GIRAPH-302: Thread safety issue with sending partitions
around
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/6676/
-----------------------------------------------------------
Review request for giraph.
Description
-------
When calling sendPartitionRequest(), we clear the vertex list afterward, making it a race!
I noticed this when I was running with 300 workers and the number of edges wasn't what I expected. Sometimes we get empty requests!
After digging into the code I found the issue and have fixed it.
Giraph Stats Aggregate edges 99,971,220 0 99,971,220
Superstep 11 0 11
Current workers 300 0 300
Last checkpointed superstep 0 0 0
Current master task partition 0 0 0
Sent messages 0 0 0
Aggregate finished vertices 10,000,000 0 10,000,000
Aggregate vertices 10,000,000 0 10,000,000
This is wrong!
Giraph Stats Aggregate edges 100,000,000 0 100,000,000
Superstep 11 0 11
Last checkpointed superstep 0 0 0
Current workers 300 0 300
Current master task partition 0 0 0
Sent messages 0 0 0
Aggregate finished vertices 10,000,000 0 10,000,000
Aggregate vertices 10,000,000 0 10,000,000
Fixed!
Also added a few messages for better debugging.
This addresses bug GIRAPH-302.
https://issues.apache.org/jira/browse/GIRAPH-302
Diffs
-----
http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/BspServiceWorker.java 1373682
Diff: https://reviews.apache.org/r/6676/diff/
Testing
-------
Passed unittests and verified on a real cluster using 300 machines.
Thanks,
Avery Ching