You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@hive.apache.org by "Xuefu Zhang (JIRA)" <ji...@apache.org> on 2017/02/07 01:45:41 UTC

[jira] [Comment Edited] (HIVE-15682) Eliminate the dummy iterator and optimize the per row based reducer-side processing

    [ https://issues.apache.org/jira/browse/HIVE-15682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15855133#comment-15855133 ] 

Xuefu Zhang edited comment on HIVE-15682 at 2/7/17 1:45 AM:
------------------------------------------------------------

I did some performance measurement query order by queries using our non-dedicated cluster, and here is the perf diff w/ vs w/o HIVE-15580:
{code}
select count(*) from (select request_lat from dwh.fact_trip where datestr > '2017-01-27' order by request_lat) x;
Origin: 246.56, 342.78, 216.40, 216.587, 270.805, 449.232, 233.406 AVG: 282.25
patch: 125.21, 123.22, 166.31, 168.30, 120.428, 119.21, 120.385    AVG: 134.72
{code}

I used static allocation to avoid further env variations in the test. The perf numbers are in seconds. The inconclusive (due to the nature of our cluster) conclusion is that HIVE-15580 actually boosts the performance by 2.1X.



was (Author: xuefuz):
I did some performance measurement query order by queries using our non-dedicated cluster, and here is the perf diff w/ vs w/o HIVE-15580:
{code}
select count(*) from (select request_lat from dwh.fact_trip where datestr > '2017-01-27' order by request_lat) x;
Origin: 246.56, 342.78, 216.40, 216.587, 270.805, 449.232, 233.406 AVG: 282.25
patch: 125.21, 123.22, 166.31, 168.30, 120.428, 119.21, 120.385    AVG: 134.72
(code}

I used static allocation to avoid further env variations in the test. The perf numbers are in seconds. The inconclusive (due to the nature of our cluster) conclusion is that HIVE-15580 actually boosts the performance by 2.1X.


> Eliminate the dummy iterator and optimize the per row based reducer-side processing
> -----------------------------------------------------------------------------------
>
>                 Key: HIVE-15682
>                 URL: https://issues.apache.org/jira/browse/HIVE-15682
>             Project: Hive
>          Issue Type: Improvement
>          Components: Spark
>    Affects Versions: 2.2.0
>            Reporter: Xuefu Zhang
>            Assignee: Xuefu Zhang
>
> HIVE-15580 introduced a dummy iterator per input row which can be eliminated. This is because {{SparkReduceRecordHandler}} is able to handle single key value pairs. We can refactor this part of code 1. to remove the need for a iterator and 2. to optimize the code path for per (key, value) based (instead of (key, value iterator)) processing. It would be also great if we can measure the performance after the optimizations and compare to performance prior to HIVE-15580.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)