You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@giraph.apache.org by "Eugene Koontz (JIRA)" <ji...@apache.org> on 2013/04/02 23:51:15 UTC

[jira] [Commented] (GIRAPH-601) Exception when running pagerank benchmark with 6 or more workers on a pseudodistributed setup: SendVertexRequest cannot be cast to MasterRequest

    [ https://issues.apache.org/jira/browse/GIRAPH-601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13620314#comment-13620314 ] 

Eugene Koontz commented on GIRAPH-601:
--------------------------------------

Hi Maja and Eli,
Thanks for helping with this. It seems to only happen with -w N with N > 5. So I can run with no problem:

{code}
 $HADOOP_RUNTIME/bin/hadoop jar $JAR \
         org.apache.giraph.benchmark.PageRankBenchmark \
	  -e 10 -s 10 -v -V 10 -w 5
{code} 

Also, other MR jobs like pi work fine. I've updated the description to reflect the workers > 5 condition. 

I think the port increment that is used to attempt a re-bind after binding failure is related. Will try to dig more soon. There's also a small case-sensitivity-of-hostnames bug which I'll file separately.

                
> Exception when running pagerank benchmark with 6 or more workers on a pseudodistributed setup: SendVertexRequest cannot be cast to MasterRequest
> ------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: GIRAPH-601
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-601
>             Project: Giraph
>          Issue Type: Bug
>            Reporter: Eugene Koontz
>         Attachments: instrumentation.patch, print_addresses.patch
>
>
> Building Giraph with:
> {code}
> mvn -DskipTests  -Phadoop_2.0.3 clean compile
> {code}
> Running pagerank like this:
> {code}
>  $HADOOP_RUNTIME/bin/hadoop jar $JAR \
>          org.apache.giraph.benchmark.PageRankBenchmark \
> 	  -e 10 -s 10 -v -V 10 -w 6
> {code}
> I see this in  /tmp/userlogs/application_1364578380737_0003/container_1364578380737_0003_01_000002/ :
> {code}
> 2013-03-29 10:58:06,371 DEBUG [org.apache.giraph.master.MasterThread] org.apache.giraph.master.BspServiceMaster: barrierOnWorkerList: Got finished worker list = [Eugenes-MacBook-Pro.local_1, Eugenes-MacBook-Pro.local_3], size = 2, worker list = [Worker(hostname=Eugenes-MacBook-Pro.local, MRtaskID=2, port=30002), Worker(hostname=Eugenes-MacBook-Pro.local, MRtaskID=1, port=30001), Worker(hostname=Eugenes-MacBook-Pro.local, MRtaskID=4, port=30004), Worker(hostname=Eugenes-MacBook-Pro.local, MRtaskID=3, port=30003), Worker(hostname=Eugenes-MacBook-Pro.local, MRtaskID=5, port=30005), Worker(hostname=Eugenes-MacBook-Pro.local, MRtaskID=0, port=30010)], size = 6 from /_hadoopBsp/job_1364578380737_0003/_vertexInputSplitDoneDir
> 2013-03-29 10:58:06,373 WARN [netty-server-exec-3] org.apache.giraph.comm.netty.handler.RequestServerHandler: exceptionCaught: Channel failed with remote address /172.16.175.1:56236
> java.lang.ClassCastException: org.apache.giraph.comm.requests.SendVertexRequest cannot be cast to org.apache.giraph.comm.requests.MasterRequest
> 	at org.apache.giraph.comm.netty.handler.MasterRequestServerHandler.processRequest(MasterRequestServerHandler.java:27)
> 	at org.apache.giraph.comm.netty.handler.RequestServerHandler.messageReceived(RequestServerHandler.java:106)
> 	at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296)
> 	at org.jboss.netty.handler.codec.oneone.OneToOneDecoder.handleUpstream(OneToOneDecoder.java:71)
> 	at org.jboss.netty.handler.execution.ChannelUpstreamEventRunnable.doRun(ChannelUpstreamEventRunnable.java:45)
> 	at org.jboss.netty.handler.execution.ChannelEventRunnable.run(ChannelEventRunnable.java:69)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
> 	at java.lang.Thread.run(Thread.java:680)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira