You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Lefty Leverenz (JIRA)" <ji...@apache.org> on 2017/08/03 06:43:00 UTC
[jira] [Commented] (HIVE-17113) Duplicate bucket files can get
written to table by runaway task
[ https://issues.apache.org/jira/browse/HIVE-17113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16112279#comment-16112279 ]
Lefty Leverenz commented on HIVE-17113:
---------------------------------------
Doc note: This adds *hive.exec.move.files.from.source.dir* to HiveConf.java, so it needs to be documented in the wiki.
* [Configuration Properties -- Query and DDL Execution | https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-QueryandDDLExecution]
Added a TODOC3.0 label.
> Duplicate bucket files can get written to table by runaway task
> ---------------------------------------------------------------
>
> Key: HIVE-17113
> URL: https://issues.apache.org/jira/browse/HIVE-17113
> Project: Hive
> Issue Type: Bug
> Components: Query Processor
> Reporter: Jason Dere
> Assignee: Jason Dere
> Labels: TODOC3.0
> Fix For: 3.0.0
>
> Attachments: HIVE-17113.1.patch, HIVE-17113.2.patch, HIVE-17113.3.patch
>
>
> Saw a table get a duplicate bucket file from a Hive query. It looks like the following happened:
> 1. Task attempt A_0 starts,but then stops making progress
> 2. The job was running with speculative execution on, and task attempt A_1 is started
> 3. Task attempt A_1 finishes execution and saves its output to the temp directory.
> 5. A task kill is sent to A_0, though this does appear to actually kill A_0
> 6. The job for the query finishes and Utilities.mvFileToFinalPath() calls Utilities.removeTempOrDuplicateFiles() to check for duplicate bucket files
> 7. A_0 (still running) finally finishes and saves its file to the temp directory. At this point we now have duplicate bucket files - oops!
> 8. Utilities.removeTempOrDuplicateFiles() moves the temp directory to the final location, where it is later moved to the partition directory.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)