You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "Bikas Saha (JIRA)" <ji...@apache.org> on 2015/04/08 23:07:12 UTC
[jira] [Comment Edited] (TEZ-714) OutputCommitters should not run
in the main AM dispatcher thread
[ https://issues.apache.org/jira/browse/TEZ-714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14486038#comment-14486038 ]
Bikas Saha edited comment on TEZ-714 at 4/8/15 9:06 PM:
--------------------------------------------------------
abortVertex() should probably be called for all non-succeeded states? Similar to DAGImpl. Otherwise TaskCompletedAfterVertexSuccessTransition will have a regression.
{code} case ERROR:
addDiagnostic("Vertex: " + logIdentifier + " error due to:" + terminationCause);
if (!StringUtils.isEmpty(diag)) {
addDiagnostic(diag);
}
eventHandler.handle(new DAGEvent(getDAGId(),
DAGEventType.INTERNAL_ERROR));
try {
logJobHistoryVertexFailedEvent(finalState);
} catch (IOException e) {
LOG.error("Failed to send vertex finished event to recovery", e);
}
break;
case KILLED:
case FAILED:
addDiagnostic("Vertex " + logIdentifier + " killed/failed due to:" + terminationCause);
if (!StringUtils.isEmpty(diag)) {
addDiagnostic(diag);
}
abortVertex(VertexStatus.State.valueOf(finalState.name()));
{code}
testCommitOutputOnVertexSuccess() still has
{code} // both committers fail
DAG dag4 = createDAG("testDAGBothCommitsFail", false, true); <<< true/true
{code}
Mis-written comment? v3 is killed due to...
{code} // killed is failed due to the commit failure of the vertex group (v1,v2)
Assert.assertEquals(VertexState.KILLED, v3.getState());
Assert.assertEquals(VertexTerminationCause.OTHER_VERTEX_FAILURE,
{code}
The test is not checking/verifying that one of the commits was indeed not scheduled on the threadpool and cancelled before that. Maybe set a flag when the callable is scheduled and check that the flag was not set. Also perhaps set a flag when interrupted exception is caught when the scheduled event got canceled.
{code}
// test commit will be canceled no matter it is started or still in the threadpool
@Test(timeout = 5000)
public void testCommitCanceled() throws Exception {
{code}
Please cleanup the findbugs warnings.
Spoke offline with [~hitesh] about aborting all output after any failure and it seems like the right thing to do.
was (Author: bikassaha):
abortVertex() should probably be called for all non-succeeded states? Other TaskCompletedAfterVertexSuccessTransition will have a regression.
{code} case ERROR:
addDiagnostic("Vertex: " + logIdentifier + " error due to:" + terminationCause);
if (!StringUtils.isEmpty(diag)) {
addDiagnostic(diag);
}
eventHandler.handle(new DAGEvent(getDAGId(),
DAGEventType.INTERNAL_ERROR));
try {
logJobHistoryVertexFailedEvent(finalState);
} catch (IOException e) {
LOG.error("Failed to send vertex finished event to recovery", e);
}
break;
case KILLED:
case FAILED:
addDiagnostic("Vertex " + logIdentifier + " killed/failed due to:" + terminationCause);
if (!StringUtils.isEmpty(diag)) {
addDiagnostic(diag);
}
abortVertex(VertexStatus.State.valueOf(finalState.name()));
{code}
testCommitOutputOnVertexSuccess() still has
{code} // both committers fail
DAG dag4 = createDAG("testDAGBothCommitsFail", false, true); <<< true/true
{code}
Mis-written comment? v3 is killed due to...
{code} // killed is failed due to the commit failure of the vertex group (v1,v2)
Assert.assertEquals(VertexState.KILLED, v3.getState());
Assert.assertEquals(VertexTerminationCause.OTHER_VERTEX_FAILURE,
{code}
The test is not checking/verifying that one of the commits was indeed not scheduled on the threadpool and cancelled before that. Maybe set a flag when the callable is scheduled and check that the flag was not set. Also perhaps set a flag when interrupted exception is caught when the scheduled event got canceled.
{code}
// test commit will be canceled no matter it is started or still in the threadpool
@Test(timeout = 5000)
public void testCommitCanceled() throws Exception {
{code}
Please cleanup the findbugs warnings.
Spoke offline with [~hitesh] about aborting all output after any failure and it seems like the right thing to do.
> OutputCommitters should not run in the main AM dispatcher thread
> ----------------------------------------------------------------
>
> Key: TEZ-714
> URL: https://issues.apache.org/jira/browse/TEZ-714
> Project: Apache Tez
> Issue Type: Improvement
> Reporter: Siddharth Seth
> Assignee: Jeff Zhang
> Priority: Critical
> Attachments: DAG_2.pdf, TEZ-714-1.patch, TEZ-714-10.patch, TEZ-714-2.patch, TEZ-714-3.patch, TEZ-714-4.patch, TEZ-714-5.patch, TEZ-714-6.patch, TEZ-714-7.patch, TEZ-714-8.patch, TEZ-714-9.patch, Vertex_2.pdf
>
>
> Follow up jira from TEZ-41.
> 1) If there's multiple OutputCommitters on a Vertex, they can be run in parallel.
> 2) Running an OutputCommitter in the main thread blocks all other event handling, w.r.t the DAG, and causes the event queue to back up.
> 3) This should also cover shared commits that happen in the DAG.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)