You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "Bikas Saha (JIRA)" <ji...@apache.org> on 2015/04/06 03:07:04 UTC

[jira] [Commented] (TEZ-714) OutputCommitters should not run in the main AM dispatcher thread

    [ https://issues.apache.org/jira/browse/TEZ-714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14446581#comment-14446581 ] 

Bikas Saha commented on TEZ-714:
--------------------------------

bq. Both task finishing and commit finishing may be the last step to the finished state (if no commit then task finishing is the last step, and committing finish is the last step if there's commit), so they would both trigger checkForCompletion.
Today, the semantics of checkForCompletion() are actually checkForTasksCompletion(). It checks if all tasks have succeeded. On success, it would complete all commits synchronously and return the final state SUCCEEDED. This patch is overloading checkForCompletion() to do both checkForTasksCompletion() and checkForCommitsCompletion() resulting in logic overloading that is error-prone and makes future changes harder because the method is doing 2 things.
{code}
+        LOG.info("All tasks are succeeded, vertex:" + vertex.logIdentifier);
+        if (!vertex.commitVertexOutputs) {
+          // just finish because no vertex committing needed
+          return vertex.finished(VertexState.SUCCEEDED);
+        } else if (!vertex.committed.getAndSet(true)) {
+          // start commit if there're commits or just finish if no commits
+          return commitOrFinish(vertex);
// This part belongs to checkForCommitsCompletion()
+        } else if (vertex.commitFutures.isEmpty()) {
+          // move from COMMITTING to SUCCEEDED
+          return vertex.finished(VertexState.SUCCEEDED);
+        } else {
+          return VertexState.COMMITTING;
         }{code}

bq. What i am doing is not exactly this way. I won't ignore failed commits, stead I will cancel pending commits and wait for them to complete and then move to failed state. 
Will calling cancel on the future object is going to result in VertexCommitCallback#onFailure() to be invoked. If not, then the vertex will hang on waiting for the commitFutures to be empty because no CommitCompleted event will come.

bq. This behavior is consistent with other cases when there's any fail event happens (commit fail, vertex termination event and etc ), all the cases would cancel pending commits and wait for them to complete and then move to finished state.
Is this existing behavior or behavior in the patch? In either case, this logic is not consistent from a users point of view. Depending on which async commit operation ran first on the threadpool and failed, the user will see anywhere between 1 to N-1 committed outputs. Is that observation correct? If yes, is that a better choice than saying - User will see either no outputs or all successfully committed outputs?



> OutputCommitters should not run in the main AM dispatcher thread
> ----------------------------------------------------------------
>
>                 Key: TEZ-714
>                 URL: https://issues.apache.org/jira/browse/TEZ-714
>             Project: Apache Tez
>          Issue Type: Improvement
>            Reporter: Siddharth Seth
>            Assignee: Jeff Zhang
>            Priority: Critical
>         Attachments: DAG_2.pdf, TEZ-714-1.patch, TEZ-714-2.patch, TEZ-714-3.patch, TEZ-714-4.patch, TEZ-714-5.patch, TEZ-714-6.patch, TEZ-714-7.patch, Vertex_2.pdf
>
>
> Follow up jira from TEZ-41.
> 1) If there's multiple OutputCommitters on a Vertex, they can be run in parallel.
> 2) Running an OutputCommitter in the main thread blocks all other event handling, w.r.t the DAG, and causes the event queue to back up.
> 3) This should also cover shared commits that happen in the DAG.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)