You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "Jason Lowe (JIRA)" <ji...@apache.org> on 2015/04/09 22:35:12 UTC

[jira] [Commented] (TEZ-2303) ConcurrentModificationException while processing recovery

    [ https://issues.apache.org/jira/browse/TEZ-2303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14488219#comment-14488219 ] 

Jason Lowe commented on TEZ-2303:
---------------------------------

{noformat}
2015-04-09 19:36:11,231 INFO [main] app.RecoveryParser: Recovering from event, eventType=VERTEX_INITIALIZED, event=vertexName=scope-1973, vertexId=vertex_1428329756093_168563_1_43, initRequestedTime=1428606011138, initedTime=1428606011166, numTasks=769, processorName=null, additionalInputsCount=0
2015-04-09 19:36:11,231 INFO [main] impl.VertexImpl: Setting vertexManager to ShuffleVertexManager for vertex_1428329756093_168563_1_43 [scope-1973]
2015-04-09 19:36:11,242 INFO [main] vertexmanager.ShuffleVertexManager: Shuffle Vertex Manager: settings minFrac:0.25 maxFrac:0.75 auto:false desiredTaskIput:104857600 minTasks:1
2015-04-09 19:36:11,251 WARN [IPC Server handler 0 on x] ipc.Server: IPC Server handler 0 on x, call org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolBlockingPB.getDAGStatus from x Call#1965 Retry#0
java.util.ConcurrentModificationException
        at java.util.LinkedHashMap$LinkedHashIterator.nextEntry(LinkedHashMap.java:394)
        at java.util.LinkedHashMap$ValueIterator.next(LinkedHashMap.java:409)
        at org.apache.tez.dag.app.dag.impl.VertexImpl.getRunningTasks(VertexImpl.java:892)
        at org.apache.tez.dag.app.dag.impl.VertexImpl.getVertexProgress(VertexImpl.java:988)
        at org.apache.tez.dag.app.dag.impl.DAGImpl.getDAGStatus(DAGImpl.java:694)
        at org.apache.tez.dag.api.client.DAGClientHandler.getDAGStatus(DAGClientHandler.java:62)
        at org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolBlockingPBServerImpl.getDAGStatus(DAGClientAMProtocolBlockingPBServerImpl.java:98)
        at org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolRPC$DAGClientAMProtocol$2.callBlockingMethod(DAGClientAMProtocolRPC.java:7375)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1694)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2080)
{noformat}

Looks like a client trying to obtain status from the new attempt is sneaking in and walking the list of tasks as the recovery process is building that list.

> ConcurrentModificationException while processing recovery
> ---------------------------------------------------------
>
>                 Key: TEZ-2303
>                 URL: https://issues.apache.org/jira/browse/TEZ-2303
>             Project: Apache Tez
>          Issue Type: Bug
>    Affects Versions: 0.6.0
>            Reporter: Jason Lowe
>
> Saw a Tez AM log a few ConcurrentModificationException messages while trying to recover from a previous attempt that crashed.  Exception details to follow.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)