You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Yao Guangdong (Jira)" <ji...@apache.org> on 2021/12/30 03:05:00 UTC

[jira] [Created] (HIVE-25837) Hive merge file operation may consume long time

Yao Guangdong created HIVE-25837:
------------------------------------

             Summary: Hive merge file operation may consume long time
                 Key: HIVE-25837
                 URL: https://issues.apache.org/jira/browse/HIVE-25837
             Project: Hive
          Issue Type: Improvement
          Components: Hive
    Affects Versions: All Versions
            Reporter: Yao Guangdong


  It will cost very long time in some cases when we use hive merge files.This is because we have thousands, even tens of thousands or more small files.But this files is very small.Most of small files only have a little kb.The merge file implement is only consider the target size(default 256M) at now.Which make one map will merge thousands, even tens of thousands or more small files.Which will cost too long time.

  In this case,we change the code not only consider the targe size but also care about the number of merge files per map(default 1024/map).Which may cause the target files small than user's setting,but compare with the cost on merge files i think user can accept it.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)