You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Wang Haihua (JIRA)" <ji...@apache.org> on 2017/12/04 03:47:00 UTC
[jira] [Comment Edited] (HIVE-18206) Merge of RC/ORC file should
follow other fileformate which use merge configuration parameter
[ https://issues.apache.org/jira/browse/HIVE-18206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16276257#comment-16276257 ]
Wang Haihua edited comment on HIVE-18206 at 12/4/17 3:46 AM:
-------------------------------------------------------------
[~prasanth_j] For MergeFileTask, {{mapred.max.split.size}} is having effect, but the source of {{mapred.max.split.size}}, which comes from {{hive.merge.size.per.task}}, is not taking effect.
I saw {{hive.merge.size.per.task}} has been setting to MaxSplitSize in {{MergeFileWork}} from ConditionalResolverMergeFiles.java:
{code}
private void setupMapRedWork(HiveConf conf, MapWork mWork, long targetSize, long totalSize) {
mWork.setMaxSplitSize(targetSize);
mWork.setMinSplitSize(targetSize);
mWork.setMinSplitSizePerNode(targetSize);
mWork.setMinSplitSizePerRack(targetSize);
}
{code}
But value of {{MaxSplitSize}} not take effect when executing, it just comes from hadoopconf.
was (Author: wanghaihua):
[~prasanth_j] For MergeFileTask, {{mapred.max.split.size}} is having effect, but the source of {{mapred.max.split.size}}, which comes from {{hive.merge.size.per.task}}, is not taking effect.
I saw {{hive.merge.size.per.task}} has been setting to MaxSplitSize in {{MergeFileWork}} from ConditionalResolverMergeFiles.java:
{{
private void setupMapRedWork(HiveConf conf, MapWork mWork, long targetSize, long totalSize) {
mWork.setMaxSplitSize(targetSize);
mWork.setMinSplitSize(targetSize);
mWork.setMinSplitSizePerNode(targetSize);
mWork.setMinSplitSizePerRack(targetSize);
}
}}
But value of {{MaxSplitSize}} not take effect when executing, it just comes from hadoopconf.
> Merge of RC/ORC file should follow other fileformate which use merge configuration parameter
> --------------------------------------------------------------------------------------------
>
> Key: HIVE-18206
> URL: https://issues.apache.org/jira/browse/HIVE-18206
> Project: Hive
> Issue Type: New Feature
> Affects Versions: 1.2.1, 2.1.1, 2.2.0, 3.0.0
> Reporter: Wang Haihua
> Assignee: Wang Haihua
> Attachments: HIVE-18206.1.patch, HIVE-18206.2.patch
>
>
> Merge configuration parameter, like {{hive.merge.size.per.task}} , decide the average file after merge stage.
> But we found it only work for file format like {{Textfile/SequenceFile}}. With {{RC/ORC}} file format, it {{does not work}}.
> For {{RC/ORC}} file format we found the file size after merge stage, depends on parameter like {{mapreduce.input.fileinputformat.split.maxsize}.
> it is better to use {{hive.merge.size.per.task}} to decide the the average file size for RC/ORC fileformat, which results in unifying.
> Root Cause is for RC/ORC file format, merge class is {{MergeFileTask}} instead of {{MapRedTask}} for Textfile/SequenceFile. And {{MergeFileTask}} just has not accept the configuration value in MergeFileWork, so the solution is passing it into {{MergeFileTask}}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)