You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tez.apache.org by "Jonathan Turner Eagles (Jira)" <ji...@apache.org> on 2021/08/12 14:26:00 UTC

[jira] [Resolved] (TEZ-2262) DAG/Tasks should not fail if counter limits are exceeded.

     [ https://issues.apache.org/jira/browse/TEZ-2262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Turner Eagles resolved TEZ-2262.
-----------------------------------------
    Resolution: Incomplete

Creating an option to make counters limits exceeded non-fatal is valid. This can be achieved by setting a very high limit as a work around. Closing as incomplete

> DAG/Tasks should not fail if counter limits are exceeded.
> ---------------------------------------------------------
>
>                 Key: TEZ-2262
>                 URL: https://issues.apache.org/jira/browse/TEZ-2262
>             Project: Apache Tez
>          Issue Type: Bug
>    Affects Versions: 0.5.0
>            Reporter: Mostafa Mokhtar
>            Priority: Major
>
> Running TPC-DS Q64 failed due to exceeding the max number of counters.
> DAG should succeed and include a warning in the diagnostics stating that the error got truncated.
> {code}
> 18043560327-2015-04-01 16:23:08,509 INFO [AsyncDispatcher event handler] impl.DAGImpl: No output committers for vertex: Reducer 9
> 18043560445-2015-04-01 16:23:08,857 FATAL [AsyncDispatcher event handler] event.AsyncDispatcher: Error in dispatcher thread
> 18043560557:org.apache.tez.common.counters.LimitExceededException: Too many counters: 1201 max=1200
> 18043560645-	at org.apache.tez.common.counters.Limits.checkCounters(Limits.java:87)
> 18043560717-	at org.apache.tez.common.counters.Limits.incrCounters(Limits.java:94)
> 18043560788-	at org.apache.tez.common.counters.AbstractCounterGroup.addCounter(AbstractCounterGroup.java:75)
> 18043560885-	at org.apache.tez.common.counters.AbstractCounterGroup.addCounterImpl(AbstractCounterGroup.java:92)
> 18043560986-	at org.apache.tez.common.counters.AbstractCounterGroup.findCounter(AbstractCounterGroup.java:103)
> 18043561085-	at org.apache.tez.common.counters.AbstractCounterGroup.incrAllCounters(AbstractCounterGroup.java:198)
> 18043561188-	at org.apache.tez.common.counters.AbstractCounters.incrAllCounters(AbstractCounters.java:363)
> 18043561283-	at org.apache.tez.dag.app.dag.impl.DAGImpl.incrTaskCounters(DAGImpl.java:598)
> 18043561362-	at org.apache.tez.dag.app.dag.impl.DAGImpl.getAllCounters(DAGImpl.java:588)
> 18043561439-	at org.apache.tez.dag.app.dag.impl.DAGImpl.logJobHistoryFinishedEvent(DAGImpl.java:994)
> 18043561528-	at org.apache.tez.dag.app.dag.impl.DAGImpl.finished(DAGImpl.java:1135)
> 18043561600-	at org.apache.tez.dag.app.dag.impl.DAGImpl.checkDAGForCompletion(DAGImpl.java:1048)
> 18043561685-	at org.apache.tez.dag.app.dag.impl.DAGImpl$VertexCompletedTransition.transition(DAGImpl.java:1708)
> 18043561785-	at org.apache.tez.dag.app.dag.impl.DAGImpl$VertexCompletedTransition.transition(DAGImpl.java:1665)
> 18043561885-	at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
> 18043562001-	at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
> 18043562097-	at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> 18043562190-	at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
> 18043562307-	at org.apache.tez.dag.app.dag.impl.DAGImpl.handle(DAGImpl.java:944)
> 18043562376-	at org.apache.tez.dag.app.dag.impl.DAGImpl.handle(DAGImpl.java:126)
> 18043562445-	at org.apache.tez.dag.app.DAGAppMaster$DagEventDispatcher.handle(DAGAppMaster.java:1686)
> 18043562535-	at org.apache.tez.dag.app.DAGAppMaster$DagEventDispatcher.handle(DAGAppMaster.java:1677)
> 18043562625-	at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
> 18043562709-	at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
> 18043562790-	at java.lang.Thread.run(Thread.java:745)
> 18043562832-2015-04-01 16:23:08,882 INFO [AsyncDispatcher event handler] event.AsyncDispatcher: Exiting, bbye..
> 18043562932-2015-04-01 16:23:08,885 INFO [Thread-1] app.DAGAppMaster: DAGAppMasterShutdownHook invoked
> 18043563023-2015-04-01 16:23:08,885 INFO [Thread-1] app.DAGAppMaster: DAGAppMaster received a signal. Signaling TaskScheduler
> 18043563137-2015-04-01 16:23:08,885 INFO [Thread-1] rm.TaskSchedulerEventHandler: TaskScheduler notified that iSignalled was : true
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)