You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@giraph.apache.org by "Alessandro Presta (JIRA)" <ji...@apache.org> on 2012/10/19 03:14:03 UTC

[jira] [Commented] (GIRAPH-381) Ensure we get the original exception from GraphMapper#run()

    [ https://issues.apache.org/jira/browse/GIRAPH-381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13479547#comment-13479547 ] 

Alessandro Presta commented on GIRAPH-381:
------------------------------------------

Looks good, +1.
                
> Ensure we get the original exception from GraphMapper#run()
> -----------------------------------------------------------
>
>                 Key: GIRAPH-381
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-381
>             Project: Giraph
>          Issue Type: Improvement
>            Reporter: Avery Ching
>            Assignee: Avery Ching
>         Attachments: GIRAPH-381.patch
>
>
> We can lose the original exception if failureCleanup() fails.
> I.e.
> INFO    2012-10-18 14:23:25,417 [main] org.apache.giraph.graph.WorkerAggregatorHandler  - marshalAggregatorValues: Finished assembling aggregator values
> INFO    2012-10-18 14:23:25,451 [main-SendThread(xxx.machine.xxx:22181)] org.apache.zookeeper.ClientCnxn  - Unable to read additional data from server sessionid 0x13a75baca440014, likely server has closed socket, closing socket c\
> onnection and attempting reconnect
> ERROR   2012-10-18 14:23:25,552 [main] org.apache.giraph.graph.BspServiceWorker  - unregisterHealth: Got failure, unregistering health on /_hadoopBsp/job_201209271814.8652_0001/_applicationAttemptsDir/0/_superstepDir/1/_workerHea\
> lthyDir/xxx.machine.xxx_9 on superstep 1
> WARN    2012-10-18 14:23:25,554 [main-EventThread] org.apache.giraph.graph.BspService  - process: Disconnected from ZooKeeper (will automatically try to recover) WatchedEvent state:Disconnected type:None path:null
> INFO    2012-10-18 14:23:26,916 [main-SendThread(xxx.machine.xxx:22181)] org.apache.zookeeper.ClientCnxn  - Opening socket connection to server xxx.machine.xxx/10.174.108.77:22181
> INFO    2012-10-18 14:23:26,917 [main-SendThread(xxx.machine.xxx:22181)] org.apache.zookeeper.ClientCnxn  - Socket connection established to xxx.machine.xxx/10.174.108.77:22181, initiating session
> WARN    2012-10-18 14:23:26,977 [main-SendThread(xxx.machine.xxx:22181)] org.apache.zookeeper.ClientCnxn  - Session 0x13a75baca440014 for server xxx.machine.xxx/10.174.108.77:22181, unexpected error, closing socket connection and\
>  attempting reconnect
> java.io.IOException: Connection reset by peer
> at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
> at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
> at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:218)
> at sun.nio.ch.IOUtil.read(IOUtil.java:186)
> at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:359)
> at org.apache.zookeeper.ClientCnxn$SendThread.doIO(ClientCnxn.java:858)
> at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1130)
> WARN    2012-10-18 14:23:27,082 [main] org.apache.hadoop.mapred.Child  - Error running child
> java.lang.IllegalStateException: unregisterHealth: KeeperException - Couldn't delete /_hadoopBsp/job_201209271814.8652_0001/_applicationAttemptsDir/0/_superstepDir/1/_workerHealthyDir/xxx.machine.xxx_9
> at org.apache.giraph.graph.BspServiceWorker.unregisterHealth(BspServiceWorker.java:582)
> at org.apache.giraph.graph.BspServiceWorker.failureCleanup(BspServiceWorker.java:590)
> at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:608)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:632)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
> at org.apache.hadoop.mapred.Child.main(Child.java:171)
> Caused by: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /_hadoopBsp/job_201209271814.8652_0001/_applicationAttemptsDir/0/_superstepDir/1/_workerHealthyDir/xxx.machine.xxx_9
> at org.apache.zookeeper.KeeperException.create(KeeperException.java:90)
> at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
> at org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:728)
> at org.apache.giraph.graph.BspServiceWorker.unregisterHealth(BspServiceWorker.java:576)
> ... 5 more

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira