You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "Jeff Zhang (JIRA)" <ji...@apache.org> on 2015/06/10 02:04:00 UTC
[jira] [Updated] (TEZ-2107) Recovery failure in the case of
Auto-reduce parallelism
[ https://issues.apache.org/jira/browse/TEZ-2107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jeff Zhang updated TEZ-2107:
----------------------------
Description:
The following errors happens when recovering in the case of auto-reduce parallelism. The task number is reduced from 2 to 1. while the upstream vertex's DataMovementEvent is still routed to task 2 which has been removed when auto-reduce parallelism.
{code}
2015-02-16 09:11:54,587 FATAL [Dispatcher thread: Central] common.AsyncDispatcher: Error in dispatcher thread
org.apache.tez.dag.api.TezUncheckedException: Unexpected null task. sourceVertex=vertex_1424048826974_0002_1_00 [scope-47] srcTaskIndex = 0 destVertex=vertex_1424048826974_0002_1_01 [scope-50] destTaskIndex=1 destNumTasks=1 edgeManager=org.apache.tez.dag.app.dag.impl.ScatterGatherEdgeManager
at org.apache.tez.dag.app.dag.impl.Edge.sendDmEventOrIfEventToTasks(Edge.java:358)
at org.apache.tez.dag.app.dag.impl.Edge.sendTezEventToDestinationTasks(Edge.java:422)
at org.apache.tez.dag.app.dag.impl.Edge.handleCompositeDataMovementEvent(Edge.java:310)
at org.apache.tez.dag.app.dag.impl.Edge.sendTezEventToDestinationTasks(Edge.java:378)
at org.apache.tez.dag.app.dag.impl.VertexImpl.handleRoutedTezEvents(VertexImpl.java:3795)
at org.apache.tez.dag.app.dag.impl.VertexImpl.access$3600(VertexImpl.java:187)
at org.apache.tez.dag.app.dag.impl.VertexImpl$RouteEventTransition.transition(VertexImpl.java:3708)
at org.apache.tez.dag.app.dag.impl.VertexImpl$RouteEventTransition.transition(VertexImpl.java:3700)
at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
at org.apache.tez.state.StateMachineTez.doTransition(StateMachineTez.java:57)
at org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:1575)
at org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:186)
at org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:1802)
at org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:1788)
at org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:184)
at org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:115)
at java.lang.Thread.run(Thread.java:745)
{code}
The following exception will also happen sometimes
{code}
2015-06-10 08:02:03,417 ERROR [Dispatcher thread: Central] impl.VertexImpl: Exception in VertexManager, vertex:vertex_1433894507873_0001_1_01 [Summation]
org.apache.tez.dag.app.dag.impl.AMUserCodeException: org.apache.tez.dag.api.TezUncheckedException: Atleast 1 bipartite source should exist, vertexName=Summation
at org.apache.tez.dag.app.dag.impl.VertexManager$VertexManagerCallback.onFailure(VertexManager.java:516)
at com.google.common.util.concurrent.Futures$6.run(Futures.java:977)
at com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:253)
at com.google.common.util.concurrent.ExecutionList$RunnableExecutorPair.execute(ExecutionList.java:149)
at com.google.common.util.concurrent.ExecutionList.execute(ExecutionList.java:134)
at com.google.common.util.concurrent.ListenableFutureTask.done(ListenableFutureTask.java:86)
at java.util.concurrent.FutureTask.finishCompletion(FutureTask.java:380)
at java.util.concurrent.FutureTask.setException(FutureTask.java:247)
at java.util.concurrent.FutureTask.run(FutureTask.java:267)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.tez.dag.api.TezUncheckedException: Atleast 1 bipartite source should exist, vertexName=Summation
at org.apache.tez.dag.library.vertexmanager.ShuffleVertexManager.onVertexStarted(ShuffleVertexManager.java:459)
at org.apache.tez.dag.app.dag.impl.VertexManager$VertexManagerEventOnVertexStarted.invoke(VertexManager.java:585)
at org.apache.tez.dag.app.dag.impl.VertexManager$VertexManagerEvent$1.run(VertexManager.java:656)
at org.apache.tez.dag.app.dag.impl.VertexManager$VertexManagerEvent$1.run(VertexManager.java:1)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.tez.dag.app.dag.impl.VertexManager$VertexManagerEvent.call(VertexManager.java:651)
at org.apache.tez.dag.app.dag.event.CallableEvent.call(CallableEvent.java:1)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
... 3 more
{code}
was:
The following errors happens when recovering in the case of auto-reduce parallelism. The task number is reduced from 2 to 1. while the upstream vertex's DataMovementEvent is still routed to task 2 which has been removed when auto-reduce parallelism.
{code}
2015-02-16 09:11:54,587 FATAL [Dispatcher thread: Central] common.AsyncDispatcher: Error in dispatcher thread
org.apache.tez.dag.api.TezUncheckedException: Unexpected null task. sourceVertex=vertex_1424048826974_0002_1_00 [scope-47] srcTaskIndex = 0 destVertex=vertex_1424048826974_0002_1_01 [scope-50] destTaskIndex=1 destNumTasks=1 edgeManager=org.apache.tez.dag.app.dag.impl.ScatterGatherEdgeManager
at org.apache.tez.dag.app.dag.impl.Edge.sendDmEventOrIfEventToTasks(Edge.java:358)
at org.apache.tez.dag.app.dag.impl.Edge.sendTezEventToDestinationTasks(Edge.java:422)
at org.apache.tez.dag.app.dag.impl.Edge.handleCompositeDataMovementEvent(Edge.java:310)
at org.apache.tez.dag.app.dag.impl.Edge.sendTezEventToDestinationTasks(Edge.java:378)
at org.apache.tez.dag.app.dag.impl.VertexImpl.handleRoutedTezEvents(VertexImpl.java:3795)
at org.apache.tez.dag.app.dag.impl.VertexImpl.access$3600(VertexImpl.java:187)
at org.apache.tez.dag.app.dag.impl.VertexImpl$RouteEventTransition.transition(VertexImpl.java:3708)
at org.apache.tez.dag.app.dag.impl.VertexImpl$RouteEventTransition.transition(VertexImpl.java:3700)
at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
at org.apache.tez.state.StateMachineTez.doTransition(StateMachineTez.java:57)
at org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:1575)
at org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:186)
at org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:1802)
at org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:1788)
at org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:184)
at org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:115)
at java.lang.Thread.run(Thread.java:745)
{code}
{code}
2015-06-10 08:02:03,417 ERROR [Dispatcher thread: Central] impl.VertexImpl: Exception in VertexManager, vertex:vertex_1433894507873_0001_1_01 [Summation]
org.apache.tez.dag.app.dag.impl.AMUserCodeException: org.apache.tez.dag.api.TezUncheckedException: Atleast 1 bipartite source should exist, vertexName=Summation
at org.apache.tez.dag.app.dag.impl.VertexManager$VertexManagerCallback.onFailure(VertexManager.java:516)
at com.google.common.util.concurrent.Futures$6.run(Futures.java:977)
at com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:253)
at com.google.common.util.concurrent.ExecutionList$RunnableExecutorPair.execute(ExecutionList.java:149)
at com.google.common.util.concurrent.ExecutionList.execute(ExecutionList.java:134)
at com.google.common.util.concurrent.ListenableFutureTask.done(ListenableFutureTask.java:86)
at java.util.concurrent.FutureTask.finishCompletion(FutureTask.java:380)
at java.util.concurrent.FutureTask.setException(FutureTask.java:247)
at java.util.concurrent.FutureTask.run(FutureTask.java:267)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.tez.dag.api.TezUncheckedException: Atleast 1 bipartite source should exist, vertexName=Summation
at org.apache.tez.dag.library.vertexmanager.ShuffleVertexManager.onVertexStarted(ShuffleVertexManager.java:459)
at org.apache.tez.dag.app.dag.impl.VertexManager$VertexManagerEventOnVertexStarted.invoke(VertexManager.java:585)
at org.apache.tez.dag.app.dag.impl.VertexManager$VertexManagerEvent$1.run(VertexManager.java:656)
at org.apache.tez.dag.app.dag.impl.VertexManager$VertexManagerEvent$1.run(VertexManager.java:1)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.tez.dag.app.dag.impl.VertexManager$VertexManagerEvent.call(VertexManager.java:651)
at org.apache.tez.dag.app.dag.event.CallableEvent.call(CallableEvent.java:1)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
... 3 more
{code}
> Recovery failure in the case of Auto-reduce parallelism
> -------------------------------------------------------
>
> Key: TEZ-2107
> URL: https://issues.apache.org/jira/browse/TEZ-2107
> Project: Apache Tez
> Issue Type: Sub-task
> Reporter: Jeff Zhang
> Assignee: Jeff Zhang
>
> The following errors happens when recovering in the case of auto-reduce parallelism. The task number is reduced from 2 to 1. while the upstream vertex's DataMovementEvent is still routed to task 2 which has been removed when auto-reduce parallelism.
> {code}
> 2015-02-16 09:11:54,587 FATAL [Dispatcher thread: Central] common.AsyncDispatcher: Error in dispatcher thread
> org.apache.tez.dag.api.TezUncheckedException: Unexpected null task. sourceVertex=vertex_1424048826974_0002_1_00 [scope-47] srcTaskIndex = 0 destVertex=vertex_1424048826974_0002_1_01 [scope-50] destTaskIndex=1 destNumTasks=1 edgeManager=org.apache.tez.dag.app.dag.impl.ScatterGatherEdgeManager
> at org.apache.tez.dag.app.dag.impl.Edge.sendDmEventOrIfEventToTasks(Edge.java:358)
> at org.apache.tez.dag.app.dag.impl.Edge.sendTezEventToDestinationTasks(Edge.java:422)
> at org.apache.tez.dag.app.dag.impl.Edge.handleCompositeDataMovementEvent(Edge.java:310)
> at org.apache.tez.dag.app.dag.impl.Edge.sendTezEventToDestinationTasks(Edge.java:378)
> at org.apache.tez.dag.app.dag.impl.VertexImpl.handleRoutedTezEvents(VertexImpl.java:3795)
> at org.apache.tez.dag.app.dag.impl.VertexImpl.access$3600(VertexImpl.java:187)
> at org.apache.tez.dag.app.dag.impl.VertexImpl$RouteEventTransition.transition(VertexImpl.java:3708)
> at org.apache.tez.dag.app.dag.impl.VertexImpl$RouteEventTransition.transition(VertexImpl.java:3700)
> at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
> at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
> at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
> at org.apache.tez.state.StateMachineTez.doTransition(StateMachineTez.java:57)
> at org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:1575)
> at org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:186)
> at org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:1802)
> at org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:1788)
> at org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:184)
> at org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:115)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> The following exception will also happen sometimes
> {code}
> 2015-06-10 08:02:03,417 ERROR [Dispatcher thread: Central] impl.VertexImpl: Exception in VertexManager, vertex:vertex_1433894507873_0001_1_01 [Summation]
> org.apache.tez.dag.app.dag.impl.AMUserCodeException: org.apache.tez.dag.api.TezUncheckedException: Atleast 1 bipartite source should exist, vertexName=Summation
> at org.apache.tez.dag.app.dag.impl.VertexManager$VertexManagerCallback.onFailure(VertexManager.java:516)
> at com.google.common.util.concurrent.Futures$6.run(Futures.java:977)
> at com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:253)
> at com.google.common.util.concurrent.ExecutionList$RunnableExecutorPair.execute(ExecutionList.java:149)
> at com.google.common.util.concurrent.ExecutionList.execute(ExecutionList.java:134)
> at com.google.common.util.concurrent.ListenableFutureTask.done(ListenableFutureTask.java:86)
> at java.util.concurrent.FutureTask.finishCompletion(FutureTask.java:380)
> at java.util.concurrent.FutureTask.setException(FutureTask.java:247)
> at java.util.concurrent.FutureTask.run(FutureTask.java:267)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.tez.dag.api.TezUncheckedException: Atleast 1 bipartite source should exist, vertexName=Summation
> at org.apache.tez.dag.library.vertexmanager.ShuffleVertexManager.onVertexStarted(ShuffleVertexManager.java:459)
> at org.apache.tez.dag.app.dag.impl.VertexManager$VertexManagerEventOnVertexStarted.invoke(VertexManager.java:585)
> at org.apache.tez.dag.app.dag.impl.VertexManager$VertexManagerEvent$1.run(VertexManager.java:656)
> at org.apache.tez.dag.app.dag.impl.VertexManager$VertexManagerEvent$1.run(VertexManager.java:1)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
> at org.apache.tez.dag.app.dag.impl.VertexManager$VertexManagerEvent.call(VertexManager.java:651)
> at org.apache.tez.dag.app.dag.event.CallableEvent.call(CallableEvent.java:1)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> ... 3 more
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)