You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Yao Guangdong (Jira)" <ji...@apache.org> on 2021/12/30 03:15:00 UTC

[jira] [Updated] (HIVE-25837) Hive merge file operation may cost too long time

     [ https://issues.apache.org/jira/browse/HIVE-25837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Yao Guangdong updated HIVE-25837:
---------------------------------
    Summary: Hive merge file operation may cost too long time  (was: Hive merge file operation may consume long time)

> Hive merge file operation may cost too long time
> ------------------------------------------------
>
>                 Key: HIVE-25837
>                 URL: https://issues.apache.org/jira/browse/HIVE-25837
>             Project: Hive
>          Issue Type: Improvement
>          Components: Hive
>    Affects Versions: All Versions
>            Reporter: Yao Guangdong
>            Assignee: Yao Guangdong
>            Priority: Major
>         Attachments: HIVE-25837.0001.patch
>
>
>   It will cost very long time in some cases when we use hive merge files.This is because we have thousands, even tens of thousands or more small files.But this files is very small.Most of small files only have a little kb.The merge file implement is only consider the target size(default 256M) at now.Which make one map will merge thousands, even tens of thousands or more small files.Which will cost too long time.
>   In this case,we change the code not only consider the targe size but also care about the number of merge files per map(default 1024/map).Which may cause the target files small than user's setting,but compare with the cost on merge files i think user can accept it.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)