You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "Bikas Saha (JIRA)" <ji...@apache.org> on 2015/04/08 23:07:12 UTC

[jira] [Comment Edited] (TEZ-714) OutputCommitters should not run in the main AM dispatcher thread

    [ https://issues.apache.org/jira/browse/TEZ-714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14486038#comment-14486038 ] 

Bikas Saha edited comment on TEZ-714 at 4/8/15 9:06 PM:
--------------------------------------------------------


abortVertex() should probably be called for all non-succeeded states? Similar to DAGImpl. Otherwise TaskCompletedAfterVertexSuccessTransition will have a regression.
{code}      case ERROR:
        addDiagnostic("Vertex: " + logIdentifier + " error due to:" + terminationCause);
        if (!StringUtils.isEmpty(diag)) {
          addDiagnostic(diag);
        }
        eventHandler.handle(new DAGEvent(getDAGId(),
            DAGEventType.INTERNAL_ERROR));
        try {
          logJobHistoryVertexFailedEvent(finalState);
        } catch (IOException e) {
          LOG.error("Failed to send vertex finished event to recovery", e);
        }
        break;
      case KILLED:
      case FAILED:
        addDiagnostic("Vertex " + logIdentifier + " killed/failed due to:" + terminationCause);
        if (!StringUtils.isEmpty(diag)) {
          addDiagnostic(diag);
        }
        abortVertex(VertexStatus.State.valueOf(finalState.name()));
{code}

testCommitOutputOnVertexSuccess() still has
{code}    // both committers fail
    DAG dag4 = createDAG("testDAGBothCommitsFail", false, true); <<< true/true
{code}

Mis-written comment? v3 is killed due to...
{code}   // killed is failed due to the commit failure of the vertex group (v1,v2)
    Assert.assertEquals(VertexState.KILLED, v3.getState());
    Assert.assertEquals(VertexTerminationCause.OTHER_VERTEX_FAILURE,
 {code}

The test is not checking/verifying that one of the commits was indeed not scheduled on the threadpool and cancelled before that. Maybe set a flag when the callable is scheduled and check that the flag was not set. Also perhaps set a flag when interrupted exception is caught when the scheduled event got canceled.
{code}
 // test commit will be canceled no matter it is started or still in the threadpool
  @Test(timeout = 5000)
  public void testCommitCanceled() throws Exception {
{code}

Please cleanup the findbugs warnings.

Spoke offline with [~hitesh] about aborting all output after any failure and it seems like the right thing to do.


was (Author: bikassaha):


abortVertex() should probably be called for all non-succeeded states? Other TaskCompletedAfterVertexSuccessTransition will have a regression.
{code}      case ERROR:
        addDiagnostic("Vertex: " + logIdentifier + " error due to:" + terminationCause);
        if (!StringUtils.isEmpty(diag)) {
          addDiagnostic(diag);
        }
        eventHandler.handle(new DAGEvent(getDAGId(),
            DAGEventType.INTERNAL_ERROR));
        try {
          logJobHistoryVertexFailedEvent(finalState);
        } catch (IOException e) {
          LOG.error("Failed to send vertex finished event to recovery", e);
        }
        break;
      case KILLED:
      case FAILED:
        addDiagnostic("Vertex " + logIdentifier + " killed/failed due to:" + terminationCause);
        if (!StringUtils.isEmpty(diag)) {
          addDiagnostic(diag);
        }
        abortVertex(VertexStatus.State.valueOf(finalState.name()));
{code}

testCommitOutputOnVertexSuccess() still has
{code}    // both committers fail
    DAG dag4 = createDAG("testDAGBothCommitsFail", false, true); <<< true/true
{code}

Mis-written comment? v3 is killed due to...
{code}   // killed is failed due to the commit failure of the vertex group (v1,v2)
    Assert.assertEquals(VertexState.KILLED, v3.getState());
    Assert.assertEquals(VertexTerminationCause.OTHER_VERTEX_FAILURE,
 {code}

The test is not checking/verifying that one of the commits was indeed not scheduled on the threadpool and cancelled before that. Maybe set a flag when the callable is scheduled and check that the flag was not set. Also perhaps set a flag when interrupted exception is caught when the scheduled event got canceled.
{code}
 // test commit will be canceled no matter it is started or still in the threadpool
  @Test(timeout = 5000)
  public void testCommitCanceled() throws Exception {
{code}

Please cleanup the findbugs warnings.

Spoke offline with [~hitesh] about aborting all output after any failure and it seems like the right thing to do.

> OutputCommitters should not run in the main AM dispatcher thread
> ----------------------------------------------------------------
>
>                 Key: TEZ-714
>                 URL: https://issues.apache.org/jira/browse/TEZ-714
>             Project: Apache Tez
>          Issue Type: Improvement
>            Reporter: Siddharth Seth
>            Assignee: Jeff Zhang
>            Priority: Critical
>         Attachments: DAG_2.pdf, TEZ-714-1.patch, TEZ-714-10.patch, TEZ-714-2.patch, TEZ-714-3.patch, TEZ-714-4.patch, TEZ-714-5.patch, TEZ-714-6.patch, TEZ-714-7.patch, TEZ-714-8.patch, TEZ-714-9.patch, Vertex_2.pdf
>
>
> Follow up jira from TEZ-41.
> 1) If there's multiple OutputCommitters on a Vertex, they can be run in parallel.
> 2) Running an OutputCommitter in the main thread blocks all other event handling, w.r.t the DAG, and causes the event queue to back up.
> 3) This should also cover shared commits that happen in the DAG.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)