You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Syed Shameerur Rahman (Jira)" <ji...@apache.org> on 2020/02/17 06:27:00 UTC
[jira] [Comment Edited] (HIVE-22891) Skip PartitonDesc Extraction
In CombineHiveRecord For Non-LLAP Execution Mode
[ https://issues.apache.org/jira/browse/HIVE-22891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17037675#comment-17037675 ]
Syed Shameerur Rahman edited comment on HIVE-22891 at 2/17/20 6:26 AM:
-----------------------------------------------------------------------
[~sershe] [~ashutoshc] [~prasanth_j] [~gopalv] Please review the patch.
was (Author: srahman):
[~ashutoshc] [~prasanth_j] [~gopalv] Please review the patch.
> Skip PartitonDesc Extraction In CombineHiveRecord For Non-LLAP Execution Mode
> -----------------------------------------------------------------------------
>
> Key: HIVE-22891
> URL: https://issues.apache.org/jira/browse/HIVE-22891
> Project: Hive
> Issue Type: Task
> Reporter: Syed Shameerur Rahman
> Assignee: Syed Shameerur Rahman
> Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-22891.01.patch
>
>
> {code:java}
> try {
> // TODO: refactor this out
> if (pathToPartInfo == null) {
> MapWork mrwork;
> if (HiveConf.getVar(conf, HiveConf.ConfVars.HIVE_EXECUTION_ENGINE).equals("tez")) {
> mrwork = (MapWork) Utilities.getMergeWork(jobConf);
> if (mrwork == null) {
> mrwork = Utilities.getMapWork(jobConf);
> }
> } else {
> mrwork = Utilities.getMapWork(jobConf);
> }
> pathToPartInfo = mrwork.getPathToPartitionInfo();
> } PartitionDesc part = extractSinglePartSpec(hsplit);
> inputFormat = HiveInputFormat.wrapForLlap(inputFormat, jobConf, part);
> } catch (HiveException e) {
> throw new IOException(e);
> }
> {code}
> The above piece of code in CombineHiveRecordReader.java was introduced in HIVE-15147. This overwrites inputFormat based on the PartitionDesc which is not required in non-LLAP mode of execution as the method HiveInputFormat.wrapForLlap() simply returns the previously defined inputFormat in case of non-LLAP mode. The method call extractSinglePartSpec() has some serious performance implications. If there are large no. of small files, each call in the method extractSinglePartSpec() takes approx ~ (2 - 3) seconds. Hence the same query which runs in Hive 1.x / Hive 2 is way faster than the query run on latest hive.
> {code:java}
> 2020-02-11 07:15:04,701 INFO [main] org.apache.hadoop.hive.ql.io.orc.ReaderImpl: Reading ORC rows from
> 2020-02-11 07:15:06,468 WARN [main] org.apache.hadoop.hive.ql.io.CombineHiveRecordReader: Multiple partitions found; not going to pass a part spec to LLAP IO: {{logdate=2020-02-03, hour=01, event=win}} and {{logdate=2020-02-03, hour=02, event=act}}
> 2020-02-11 07:15:06,468 INFO [main] org.apache.hadoop.hive.ql.io.CombineHiveRecordReader: succeeded in getting org.apache.hadoop.mapred.FileSplit{code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)