You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "Bikas Saha (JIRA)" <ji...@apache.org> on 2014/10/09 23:31:33 UTC
[jira] [Resolved] (TEZ-1562) DAGImpl commitOrAbortOutputs takes
long time (300+ seconds) for reducer vertex with 4000+ tasks
[ https://issues.apache.org/jira/browse/TEZ-1562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Bikas Saha resolved TEZ-1562.
-----------------------------
Resolution: Invalid
This is not a Tez issue. Its a MapReduce issue.
> DAGImpl commitOrAbortOutputs takes long time (300+ seconds) for reducer vertex with 4000+ tasks
> -----------------------------------------------------------------------------------------------
>
> Key: TEZ-1562
> URL: https://issues.apache.org/jira/browse/TEZ-1562
> Project: Apache Tez
> Issue Type: Bug
> Reporter: Rajesh Balamohan
> Labels: performance
>
> M7 --> R1
> M2 --> R1
> I was running a job with 4000 reducers. This is in non-session mode.
> At the end of the job, it took 300+ seconds just to commit().
> It appears that its doing some cleanup work. But in session mode, this could lead to the following issue
> 1. Open session
> 2. Run small job1
> 3. Run large job1
> 4. Run small job2
> In this case, small job2 wouldn't even start running until large-job1's commit is over which would be in the order of 300+ seconds.
> Please refer to the logs here,
> 2014-09-09 17:58:20,422 INFO [AsyncDispatcher event handler] org.apache.tez.dag.app.dag.impl.DAGImpl: Vertex vertex_1409722953518_0143_1_02 completed., numCompletedVertices=3, numSuccessfulVertices=3, numFailedVertices=0, numKilledVertices=0, numVertices=3
> 2014-09-09 17:58:20,422 INFO [AsyncDispatcher event handler] org.apache.tez.dag.app.dag.impl.DAGImpl: Calling DAG commit/abort for dag: dag_1409722953518_0143_1
> 2014-09-09 17:58:20,426 INFO [AsyncDispatcher event handler] org.apache.tez.dag.app.dag.impl.DAGImpl: Committing output: Reducer_6_sink for vertex: vertex_1409722953518_0143_1_02
> 2014-09-09 18:03:25,207 INFO [AsyncDispatcher event handler] org.apache.tez.dag.app.dag.impl.DAGImpl: No output committers for vertex: Map_7
> 2014-09-09 18:03:25,207 INFO [AsyncDispatcher event handler] org.apache.tez.dag.app.dag.impl.DAGImpl: No output committers for vertex: Map_2
> 2014-09-09 18:03:25,207 INFO [AsyncDispatcher event handler] org.apache.tez.dag.app.dag.impl.DAGImpl: Patch : Time taken to commitOrAbortOutputs : 304784 <==== In milliseconds
> 2014-09-09 18:03:25,361 INFO [AsyncDispatcher event handler] org.apache.tez.dag.app.dag.impl.DAGImpl: Patch : Time taken to log jobhistory : 304938 <==== Cumulative number
> 2014-09-09 18:03:25,363 INFO [AsyncDispatcher event handler] org.apache.tez.dag.app.dag.impl.DAGImpl: Patch : Time taken to handle DAGAppMasterEventDAGFinished : 304940 <==== Cumulative number
> 2014-09-09 18:03:25,363 INFO [AsyncDispatcher event handler] org.apache.tez.dag.app.dag.impl.DAGImpl: DAG: dag_1409722953518_0143_1 finished with state: SUCCEEDED
> 2014-09-09 18:03:25,363 INFO [AsyncDispatcher event handler] org.apache.tez.dag.app.dag.impl.DAGImpl: dag_1409722953518_0143_1 transitioned from RUNNING to SUCCEEDED
> Should this operation be moved to a separate thread as the DAG is already marked as SUCCEEDED?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)