You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tajo.apache.org by "Hyunsik Choi (JIRA)" <ji...@apache.org> on 2014/12/01 12:37:12 UTC

[jira] [Created] (TAJO-1216) Output commit should be two phase commit

Hyunsik Choi created TAJO-1216:
----------------------------------

             Summary: Output commit should be two phase commit
                 Key: TAJO-1216
                 URL: https://issues.apache.org/jira/browse/TAJO-1216
             Project: Tajo
          Issue Type: Improvement
            Reporter: Hyunsik Choi


*Problem*

The output data of each query are firstly stored in some temporary staging directory. Then, they are finally moved to the final output directory when all tasks are successfully completed. We call this step *output commit*.

Currently, we use a simple way to just move an output data set to the final directory. But, this manner makes failure handle very hard.

*Solution*

In order to solve the problem, we need two-phase commit. This approach is as follows:
 * When each task is completed, the task request a *commit pending* to QueryMaster.
 * QueryMaster chooses only one commit pending possibly among multiple commit pending requests, and then response *commit* to a corresponding task.
 * Only one task which receives *commit* moves the result data to the final output directory. Others cancel commit works.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)