You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Goden Yao (Jira)" <ji...@apache.org> on 2022/12/07 07:26:00 UTC

[jira] [Comment Edited] (HIVE-25277) Slow Hive partition deletion for Cloud object stores with expensive ListFiles

    [ https://issues.apache.org/jira/browse/HIVE-25277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17644177#comment-17644177 ] 

Goden Yao edited comment on HIVE-25277 at 12/7/22 7:25 AM:
-----------------------------------------------------------

is this going to 2.3.9 and 3.1.3 code line as well? fix version only indicates 4.0  [~chaosun] 


was (Author: godenyao):
is this going to 2.3.9 and 3.1.3 code line as well? fix version only indicates 4.0

> Slow Hive partition deletion for Cloud object stores with expensive ListFiles
> -----------------------------------------------------------------------------
>
>                 Key: HIVE-25277
>                 URL: https://issues.apache.org/jira/browse/HIVE-25277
>             Project: Hive
>          Issue Type: Improvement
>          Components: Standalone Metastore
>    Affects Versions: All Versions
>            Reporter: Zhou Fang
>            Assignee: Zhou Fang
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 4.0.0-alpha-1
>
>          Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> Deleting a Hive partition is slow when use a Cloud object store as the warehouse for which ListFiles is expensive. A root cause is that the recursive parent dir deletion is very inefficient: there are many duplicated calls to isEmpty (ListFiles is called at the end). This fix sorts the parents to delete according to the path size, and always processes the longest one (e.g., a/b/c is always before a/b). As a result, each parent path is only needed to be checked once.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)