You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Jason Dere (JIRA)" <ji...@apache.org> on 2017/11/02 21:15:00 UTC

[jira] [Updated] (HIVE-17963) Fix for HIVE-17113 can be improved for non-blobstore filesystems

     [ https://issues.apache.org/jira/browse/HIVE-17963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jason Dere updated HIVE-17963:
------------------------------
    Attachment: HIVE-17963.1.patch

> Fix for HIVE-17113 can be improved for non-blobstore filesystems
> ----------------------------------------------------------------
>
>                 Key: HIVE-17963
>                 URL: https://issues.apache.org/jira/browse/HIVE-17963
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Jason Dere
>            Assignee: Jason Dere
>            Priority: Major
>         Attachments: HIVE-17963.1.patch
>
>
> HIVE-17113/HIVE-17813 fix the duplicate file issue by performing file moves on a file-by-file basis. For non-blobstore filesystems this results in many more filesystem/namenode operations compared to the previous Utilities.mvFileToFinalPath() behavior (dedup files in src dir, rename src dir to final dir).
> For non-blobstore filesystems, a better solution would be the one described [here|https://issues.apache.org/jira/browse/HIVE-17113?focusedCommentId=16100564&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16100564]:
> 1) Move the temp directory to a new directory name, to prevent additional files from being added by any runaway processes.
> 2) Run removeTempOrDuplicateFiles() on this renamed temp directory
> 3) Run renameOrMoveFiles() to move the renamed temp directory to the final location.
> This results in only one additional file operation in non-blobstore FSes compared to the original Utilities.mvFileToFinalPath() behavior.
> The proposal is to do away with the config setting hive.exec.move.files.from.source.dir and always have behavior that should take care of the duplicate file issue described in HIVE-17113. For non-blobstore filesystems we will do steps 1-3 described above. For blobstore filesystems we will do the solution done in HIVE-17113/HIVE-17813 which does the file-by-file copy - this should have the same number of file operations as doing a rename directory on blobstore, which effectively results in file moves on a file-by-file basis.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)