You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tajo.apache.org by "Hyunsik Choi (JIRA)" <ji...@apache.org> on 2014/02/08 15:06:19 UTC

[jira] [Commented] (TAJO-504) when inserting to a column partitioned table, if a queryunit attempt fails, an AlreadyExistsStorageException will throw

    [ https://issues.apache.org/jira/browse/TAJO-504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13895578#comment-13895578 ] 

Hyunsik Choi commented on TAJO-504:
-----------------------------------

The main cause of this issue are two things. 

One is there is no output committer. In Tajo, when some task is failed, QueryMaster retries the task on another node. But, if the task belongs to the final step that generates output files in HDFS, this error occurs because the final output file already exists. In order to solve this problem, we have to implement some kind of output committer which would be similar to that of MR.

The second one is the bug of ColumnPartitionedTableStoreExec. During few weeks, I have added sort-based column partitioned table executor. which is a default executor for wring column partiions. It eliminates many problems and it works well. I think that the second one was resolved.

So, the remain issue is only output committer. I'll create the issue in order to clearfy what we should do, and I close this issue as not a problem.

Thank you Min for this report!

> when inserting to a column partitioned table, if a queryunit attempt fails, an AlreadyExistsStorageException will throw
> -----------------------------------------------------------------------------------------------------------------------
>
>                 Key: TAJO-504
>                 URL: https://issues.apache.org/jira/browse/TAJO-504
>             Project: Tajo
>          Issue Type: Bug
>          Components: distributed query plan
>            Reporter: Min Zhou
>             Fix For: 0.8-incubating
>
>
> I came across such exception these days.
> ColumnPartitionedTableStoreExec creates hdfs directories based on partition values. For example is one of the partitions is 'col1=2',   a directory named 'col1=2' will be created.   If query unit attempt fails, that directory still exist.  After that TajoQueryMaster start another query unit attempt to reprocessing the logic in ColumnPartitionedTableStoreExec.  This physical executor try to create 'col1=2' again, but before creating, it found the target directory already exist, it will throw an AlreadyExistsStorageException.  And eventually, the query fails.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)