You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@giraph.apache.org by "Avery Ching (Commented) (JIRA)" <ji...@apache.org> on 2011/10/14 17:30:12 UTC

[jira] [Commented] (GIRAPH-53) Unable to read additional data from server session, likely server has closed socket

    [ https://issues.apache.org/jira/browse/GIRAPH-53?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13127626#comment-13127626 ] 

Avery Ching commented on GIRAPH-53:
-----------------------------------

Thanks for reporting the issue.  A few questions:

1)  Is it always the 103rd superstep?

2)  It looks like the task lost its connection to the ZooKeeper service.  Probably good to see what happen to that task as well.  Most likely it crashed for some reason.
                
> Unable to read additional data from server session, likely server has closed socket
> -----------------------------------------------------------------------------------
>
>                 Key: GIRAPH-53
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-53
>             Project: Giraph
>          Issue Type: Bug
>            Reporter: locker
>
> I've got an error recently. Every thing goes well till it comes to the 103rd superstep. 
> 2011-10-14 16:23:38,904 INFO org.apache.giraph.comm.BasicRPCCommunications: prepareSuperstep
> 2011-10-14 16:23:39,018 WARN org.apache.giraph.graph.BspService: process: Unknown and unprocessed event (path=/_hadoopBsp/job_201110141621_0001/_applicationAttemptsDir/0/_superstepDir/101/_vertexRangeAssignments, type=NodeDeleted, state=SyncConnected)
> 2011-10-14 16:23:39,057 INFO org.apache.giraph.graph.BspServiceWorker: registerHealth: Created my health node for attempt=0, superstep=103 with /_hadoopBsp/job_201110141621_0001/_applicationAttemptsDir/0/_superstepDir/103/_workerHealthyDir/locker-desktop_1 and hostnamePort = ["locker-desktop",30001]
> 2011-10-14 16:23:39,057 WARN org.apache.giraph.graph.BspService: process: Unknown and unprocessed event (path=/_hadoopBsp/job_201110141621_0001/_applicationAttemptsDir/0/_superstepDir/101/_superstepFinished, type=NodeDeleted, state=SyncConnected)
> 2011-10-14 16:23:39,529 INFO org.apache.zookeeper.ClientCnxn: Unable to read additional data from server sessionid 0x1330186cff30001, likely server has closed socket, closing socket connection and attempting reconnect
> 2011-10-14 16:23:39,630 ERROR org.apache.zookeeper.ClientCnxn: Error while calling watcher 
> java.lang.RuntimeException: process: Disconnected from ZooKeeper, cannot recover.
> 	at org.apache.giraph.graph.BspService.process(BspService.java:995)
> 	at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:488)
> 2011-10-14 16:23:41,098 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server locker-desktop/10.13.30.90:22181
> 2011-10-14 16:23:41,099 WARN org.apache.zookeeper.ClientCnxn: Session 0x1330186cff30001 for server null, unexpected error, closing socket connection and attempting reconnect
> java.net.ConnectException: Connection refused
> 	at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
> 	at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
> 	at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1078)
> 2011-10-14 16:23:41,212 INFO org.apache.hadoop.mapred.TaskLogsTruncater: Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1
> 2011-10-14 16:23:41,306 INFO org.apache.hadoop.io.nativeio.NativeIO: Initialized cache for UID to User mapping with a cache timeout of 14400 seconds.
> 2011-10-14 16:23:41,307 INFO org.apache.hadoop.io.nativeio.NativeIO: Got UserName dic for UID 1001 from the native implementation
> 2011-10-14 16:23:41,318 WARN org.apache.hadoop.mapred.Child: Error running child
> java.lang.RuntimeException: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /_hadoopBsp/job_201110141621_0001/_applicationAttemptsDir/0/_superstepDir/103/_vertexRangeAssignments
> 	at org.apache.giraph.graph.BspServiceWorker.startSuperstep(BspServiceWorker.java:836)
> 	at org.apache.giraph.graph.GraphMapper.map(GraphMapper.java:551)
> 	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> 	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369)
> 	at org.apache.hadoop.mapred.Child$4.run(Child.java:259)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:396)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
> 	at org.apache.hadoop.mapred.Child.main(Child.java:253)
> Caused by: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /_hadoopBsp/job_201110141621_0001/_applicationAttemptsDir/0/_superstepDir/103/_vertexRangeAssignments
> 	at org.apache.zookeeper.KeeperException.create(KeeperException.java:90)
> 	at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
> 	at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:809)
> 	at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:837)
> 	at org.apache.giraph.graph.BspServiceWorker.startSuperstep(BspServiceWorker.java:830)
> 	... 9 more
> I dont know whether it should be called a bug or not. Wait for some help, thx...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira