You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Jason Dere (JIRA)" <ji...@apache.org> on 2017/07/17 22:07:01 UTC

[jira] [Created] (HIVE-17113) Duplicate bucket files can get written to table by runaway task

Jason Dere created HIVE-17113:
---------------------------------

             Summary: Duplicate bucket files can get written to table by runaway task
                 Key: HIVE-17113
                 URL: https://issues.apache.org/jira/browse/HIVE-17113
             Project: Hive
          Issue Type: Bug
          Components: Query Processor
            Reporter: Jason Dere
            Assignee: Jason Dere


Saw a table get a duplicate bucket file from a Hive query. It looks like the following happened:

1. Task attempt A_0 starts,but then stops making progress
2. The job was running with speculative execution on, and task attempt A_1 is started
3. Task attempt A_1 finishes execution and saves its output to the temp directory.
5. A task kill is sent to A_0, though this does appear to actually kill A_0
6. The job for the query finishes and Utilities.mvFileToFinalPath() calls Utilities.removeTempOrDuplicateFiles() to check for duplicate bucket files
7. A_0 (still running) finally finishes and saves its file to the temp directory. At this point we now have duplicate bucket files - oops!
8. Utilities.removeTempOrDuplicateFiles() moves the temp directory to the final location, where it is later moved to the partition directory.





--
This message was sent by Atlassian JIRA
(v6.4.14#64029)