You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tajo.apache.org by "Hyunsik Choi (JIRA)" <ji...@apache.org> on 2014/12/01 12:37:12 UTC
[jira] [Created] (TAJO-1216) Output commit should be two phase
commit
Hyunsik Choi created TAJO-1216:
----------------------------------
Summary: Output commit should be two phase commit
Key: TAJO-1216
URL: https://issues.apache.org/jira/browse/TAJO-1216
Project: Tajo
Issue Type: Improvement
Reporter: Hyunsik Choi
*Problem*
The output data of each query are firstly stored in some temporary staging directory. Then, they are finally moved to the final output directory when all tasks are successfully completed. We call this step *output commit*.
Currently, we use a simple way to just move an output data set to the final directory. But, this manner makes failure handle very hard.
*Solution*
In order to solve the problem, we need two-phase commit. This approach is as follows:
* When each task is completed, the task request a *commit pending* to QueryMaster.
* QueryMaster chooses only one commit pending possibly among multiple commit pending requests, and then response *commit* to a corresponding task.
* Only one task which receives *commit* moves the result data to the final output directory. Others cancel commit works.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)