You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "Bikas Saha (JIRA)" <ji...@apache.org> on 2014/09/23 07:13:33 UTC

[jira] [Comment Edited] (TEZ-992) Recovery data should not be written on AsyncDispatcher thread

    [ https://issues.apache.org/jira/browse/TEZ-992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14144357#comment-14144357 ] 

Bikas Saha edited comment on TEZ-992 at 9/23/14 5:12 AM:
---------------------------------------------------------

In this case, the flow looks like RUNNING -=> WAIT_FOR_COMMIT_START_SAVED-->DO_COMMIT-->WAIT_FOR_VERTEX_FINISHED_SAVED-->FINISH.
So it does look like we need 3 states. However, there was an open item to do commit on a separate thread because commit itself can take a long time. Given that need for a separate thread. Maybe the easier thing to do would be to RUNNING->COMMITTING->FINISHED where the RUNNING->COMMITTING transition enqueues the commit operation on a thread or threadpool. After commit completes on the thread then it send an event to its vertex which moves COMMITTING->RUNNING. Given that the operation happens on a separate thread, this thread could do the following. If committer present then save commit_start (blocking), then commit. In both cases (commit present or not present), it will save finished (blocking) and then send an event to its vertex that will change vertex from committing to finished.

Are there any other blocking operations?


was (Author: bikassaha):
In this case, the flow looks like RUNNING -> WAIT_FOR_COMMIT_START_SAVED->DO_COMMIT->WAIT_FOR_VERTEX_FINISHED_SAVED->FINISH.
So it does look like we need 3 states. However, there was an open item to do commit on a separate thread because commit itself can take a long time. Given that need for a separate thread. Maybe the easier thing to do would be to RUNNING->COMMITTING->FINISHED where the RUNNING->COMMITTING transition enqueues the commit operation on a thread or threadpool. After commit completes on the thread then it send an event to its vertex which moves COMMITTING->RUNNING. Given that the operation happens on a separate thread, this thread could do the following. If committer present then save commit_start (blocking), then commit. In both cases (commit present or not present), it will save finished (blocking) and then send an event to its vertex that will change vertex from committing to finished.

Are there any other blocking operations?

> Recovery data should not be written on AsyncDispatcher thread
> -------------------------------------------------------------
>
>                 Key: TEZ-992
>                 URL: https://issues.apache.org/jira/browse/TEZ-992
>             Project: Apache Tez
>          Issue Type: Sub-task
>            Reporter: Bikas Saha
>            Assignee: Jeff Zhang
>
> This may block the DAG operations in case the recovery data needs to be synchronously stored. The operations requiring this blocking operation should change their state machines to wait for the store operation before moving ahead. They will move ahead after they receive notification from the RecoveryService that their operation has completed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)