You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Wang Haihua (JIRA)" <ji...@apache.org> on 2017/12/03 16:40:00 UTC

[jira] [Work started] (HIVE-18206) Merge of RC/ORC file should follow other fileformate which use merge configuration parameter

     [ https://issues.apache.org/jira/browse/HIVE-18206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Work on HIVE-18206 started by Wang Haihua.
------------------------------------------
> Merge of RC/ORC file should follow other fileformate which use merge configuration parameter
> --------------------------------------------------------------------------------------------
>
>                 Key: HIVE-18206
>                 URL: https://issues.apache.org/jira/browse/HIVE-18206
>             Project: Hive
>          Issue Type: New Feature
>            Reporter: Wang Haihua
>            Assignee: Wang Haihua
>
> Merge configuration parameter, like {{hive.merge.size.per.task}} , decide the average file after merge stage.
> But we found it only work for file format like {{Textfile/SequenceFile}}. With {{RC/ORC}} file format, it {{does not work}}.
> For {{RC/ORC}} file format, we found instead the file size after merge stage, depends on parameter like {{mapreduce.input.fileinputformat.split.maxsize}.
> it is better to use {{hive.merge.size.per.task}} to decide the the average file size for RC/ORC fileformat, which results in unifying.
> Root Cause is for RC/ORC file format, merge class is {{MergeFileTask}} instead of {{MapRedTask}} for Textfile/SequenceFile. And {{MergeFileTask}}  just has not accept the configuration value in MergeFileWork, so the solution is passing it into  {{MergeFileTask}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)