You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "zhengchenyu (Jira)" <ji...@apache.org> on 2021/09/29 08:16:00 UTC

[jira] [Comment Edited] (HIVE-25561) Killed task should not commit file.

    [ https://issues.apache.org/jira/browse/HIVE-25561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17421994#comment-17421994 ] 

zhengchenyu edited comment on HIVE-25561 at 9/29/21, 8:15 AM:
--------------------------------------------------------------

[~zabetak] When bug is reproduced, partition contains duplicate file:  000002_0 and 000002_1. The two file are created by two different task attempt which belong to same task. One is normal task attempt, the other is speculative task attempt. So we will query duplicated line.

One file is the subset of the other file. Because the speculative task is killed, so this file created by killed task is the subset of the completed file.

I found the file created by killed task could be committed.


was (Author: zhengchenyu):
[~zabetak] When bug is reproduced, partition contains duplicate file:  000002_0 and 000002_1. The two file are created by two different task attempt which belong to same task. One is normal task attempt, the other is speculative task attempt. So we will query duplicated line.

> Killed task should not commit file.
> -----------------------------------
>
>                 Key: HIVE-25561
>                 URL: https://issues.apache.org/jira/browse/HIVE-25561
>             Project: Hive
>          Issue Type: Bug
>          Components: Tez
>    Affects Versions: 1.2.1, 2.3.8, 2.4.0
>            Reporter: zhengchenyu
>            Assignee: zhengchenyu
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> For tez engine in our cluster, I found some duplicate line, especially tez speculation is enabled. In partition dir, I found both 000002_0 and 000002_1 exist.
> It's a very low probability event. HIVE-10429 has fix some bug about interrupt, but some exception was not caught.
> In our cluster, Task receive SIGTERM, then ClientFinalizer(Hadoop Class) was called, hdfs client will close. Then will raise exception, but abort may not set to true.
> Then removeTempOrDuplicateFiles may fail because of inconsistency, duplicate file will retain. 
> (Notes: Driver first list dir, then Task commit file, then Driver remove duplicate file. It is a inconsistency case)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)