You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@giraph.apache.org by "Avery Ching (Resolved) (JIRA)" <ji...@apache.org> on 2011/10/01 20:26:34 UTC
[jira] [Resolved] (GIRAPH-46) Race condition on superstep 1 with
RPC servers not started by the time that requests are sent
[ https://issues.apache.org/jira/browse/GIRAPH-46?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Avery Ching resolved GIRAPH-46.
-------------------------------
Resolution: Fixed
Committed, thanks Jakob for the review.
> Race condition on superstep 1 with RPC servers not started by the time that requests are sent
> ---------------------------------------------------------------------------------------------
>
> Key: GIRAPH-46
> URL: https://issues.apache.org/jira/browse/GIRAPH-46
> Project: Giraph
> Issue Type: Bug
> Affects Versions: 0.70.0
> Reporter: Avery Ching
> Assignee: Avery Ching
> Priority: Minor
> Fix For: 0.70.0
>
> Attachments: diff.txt
>
>
> Hi,
> occasionally (maybe one time in four), my giraph run fails because of the below RuntimeException.
> According to code, it should never happen:
> if (msgMap == null) { // should never happen after constructor throw new RuntimeException( "sendMessage: msgMap did not exist for " + addr + " for vertex " + destVertex); }
> This happens during superstep 1 (second superstep). My application actually *adds* edges on superstep 1
> (to make every out-edge also an in-edge of the destination), but since I am running only on 3 workers,
> I am surprised if every worker would not had been registered in the RPC layer initially.
> One hypothesis is that Hadoop does something funny, because one of my server was under heavy
> load. Maybe Hadoop launched another worker to replace a slow worker? Can it happen?
> java.lang.RuntimeException: sendMessage: msgMap did not exist for [hostname].ml.cmu.edu:30003 for vertex 875713
> at org.apache.giraph.comm.BasicRPCCommunications.sendMessageReq(BasicRPCCommunications.java:825)
> at org.apache.giraph.graph.BasicVertex.sendMsg(BasicVertex.java:179)
> at edu.cmu.selectlab.BP.BinaryBPVertex.compute(BinaryBPVertex.java:94)
> at org.apache.giraph.graph.GraphMapper.map(GraphMapper.java:624)
> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369)
> at org.apache.hadoop.mapred.Child$4.run(Child.java:259)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:396)
> at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
> at org.apache.hadoop.mapred.Child.main(Child.java:253)
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira