You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "Rajesh Balamohan (JIRA)" <ji...@apache.org> on 2014/09/10 06:06:28 UTC

[jira] [Created] (TEZ-1562) DAGImpl commitOrAbortOutputs takes long time (300+ seconds) for reducer vertex with 4000+ tasks

Rajesh Balamohan created TEZ-1562:
-------------------------------------

             Summary: DAGImpl commitOrAbortOutputs takes long time (300+ seconds) for reducer vertex with 4000+ tasks
                 Key: TEZ-1562
                 URL: https://issues.apache.org/jira/browse/TEZ-1562
             Project: Apache Tez
          Issue Type: Bug
            Reporter: Rajesh Balamohan


M7===\
	 R1
	 /
M2===

I was running a job with 4000 reducers.  This is in non-session mode.

At the end of the job, it took 300+ seconds just to commit().  

It appears that its doing some cleanup work. But in session mode, this could lead to the following issue

1. Open session
2. Run small job1
3. Run large job1
4. Run small job2

In this case, small job2 wouldn't even start running until large-job1's commit is over which would be in the order of 300+ seconds.  

Please refer to the logs here, 

2014-09-09 17:58:20,422 INFO [AsyncDispatcher event handler] org.apache.tez.dag.app.dag.impl.DAGImpl: Vertex vertex_1409722953518_0143_1_02 completed., numCompletedVertices=3, numSuccessfulVertices=3, numFailedVertices=0, numKilledVertices=0, numVertices=3
2014-09-09 17:58:20,422 INFO [AsyncDispatcher event handler] org.apache.tez.dag.app.dag.impl.DAGImpl: Calling DAG commit/abort for dag: dag_1409722953518_0143_1
2014-09-09 17:58:20,426 INFO [AsyncDispatcher event handler] org.apache.tez.dag.app.dag.impl.DAGImpl: Committing output: Reducer_6_sink for vertex: vertex_1409722953518_0143_1_02
2014-09-09 18:03:25,207 INFO [AsyncDispatcher event handler] org.apache.tez.dag.app.dag.impl.DAGImpl: No output committers for vertex: Map_7
2014-09-09 18:03:25,207 INFO [AsyncDispatcher event handler] org.apache.tez.dag.app.dag.impl.DAGImpl: No output committers for vertex: Map_2
2014-09-09 18:03:25,207 INFO [AsyncDispatcher event handler] org.apache.tez.dag.app.dag.impl.DAGImpl: Patch : Time taken to commitOrAbortOutputs : 304784  <==== In milliseconds
2014-09-09 18:03:25,361 INFO [AsyncDispatcher event handler] org.apache.tez.dag.app.dag.impl.DAGImpl: Patch : Time taken to log jobhistory : 304938  <==== Cumulative number
2014-09-09 18:03:25,363 INFO [AsyncDispatcher event handler] org.apache.tez.dag.app.dag.impl.DAGImpl: Patch : Time taken to handle DAGAppMasterEventDAGFinished : 304940 <==== Cumulative number
2014-09-09 18:03:25,363 INFO [AsyncDispatcher event handler] org.apache.tez.dag.app.dag.impl.DAGImpl: DAG: dag_1409722953518_0143_1 finished with state: SUCCEEDED
2014-09-09 18:03:25,363 INFO [AsyncDispatcher event handler] org.apache.tez.dag.app.dag.impl.DAGImpl: dag_1409722953518_0143_1 transitioned from RUNNING to SUCCEEDED

Should this operation be moved to a separate thread as the DAG is already marked as SUCCEEDED?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)