You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Hive QA (JIRA)" <ji...@apache.org> on 2014/12/03 06:54:12 UTC

[jira] [Commented] (HIVE-8975) Possible performance regression on bucket_map_join_tez2.q

    [ https://issues.apache.org/jira/browse/HIVE-8975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14232631#comment-14232631 ] 

Hive QA commented on HIVE-8975:
-------------------------------



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12684771/HIVE-8975.1.patch

{color:red}ERROR:{color} -1 due to 122 failed/errored test(s), 6695 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join18
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join18_multi_distinct
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_correlationoptimizer10
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_correlationoptimizer14
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_correlationoptimizer15
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_correlationoptimizer2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_correlationoptimizer3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_correlationoptimizer6
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_correlationoptimizer7
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_correlationoptimizer9
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_count
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cross_product_check_1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cross_product_check_2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_dynpart_sort_opt_vectorization
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_dynpart_sort_optimization
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_explain_logical
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby10
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby11
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby12
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby1_map_skew
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby1_noskew
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby2_limit
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby2_map_skew
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby2_noskew
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby2_noskew_multi_distinct
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby4
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby4_noskew
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby5
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby5_noskew
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby6
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby6_map_skew
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby6_noskew
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby7_map_multi_single_reducer
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby7_map_skew
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby7_noskew
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby7_noskew_multi_single_reducer
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby8
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby8_map
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby8_map_skew
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby8_noskew
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby9
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_grouping_sets2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_multi_insert_common_distinct
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_multi_single_reducer
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_multi_single_reducer2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_multi_single_reducer3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_position
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_ppr
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_ppr_multi_distinct
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_resolution
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_sort_10
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_sort_1_23
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_sort_3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_sort_5
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_sort_7
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_sort_skew_1_23
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join18
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join18_multi_distinct
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join29
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_limit_pushdown
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_limit_pushdown_negative
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_mapjoin_distinct
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_multiMapJoin2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_multi_insert
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_multi_insert_gby
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_multi_insert_gby3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_multi_insert_lateral_view
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_multi_insert_move_tasks_share_dependencies
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_nullgroup2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parallel
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ppd2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_reduce_deduplicate_extended
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample10
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_semijoin
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_exists_having
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_in
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_in_having
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_notexists_having
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_notin_having
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udtf_json_tuple
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union31
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union33
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_char_2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_decimal_aggregate
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_groupby_reduce
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_mapjoin_reduce
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_orderby_5
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_reduce_groupby_decimal
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorization_limit
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorized_ptf
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_queries
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_auto_sortmerge_join_9
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_bucket_map_join_tez1
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_bucket_map_join_tez2
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_count
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cross_product_check_1
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cross_product_check_2
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynamic_partition_pruning
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynamic_partition_pruning_2
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_optimization
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_groupby1
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_groupby2
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_limit_pushdown
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_mrr
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_optimize_nullscan
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_parallel
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_subquery_in
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_tez_dml
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_groupby_reduce
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_mapjoin_reduce
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_orderby_5
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_reduce_groupby_decimal
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vectorization_limit
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vectorized_dynamic_partition_pruning
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vectorized_ptf
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_groupby2
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_infer_bucket_sort_map_operators
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx
org.apache.hadoop.hive.ql.TestMTQueries.testMTQueries1
org.apache.hive.hcatalog.listener.TestNotificationListener.testAMQListener
{noformat}

Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1957/testReport
Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1957/console
Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1957/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 122 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12684771 - PreCommit-HIVE-TRUNK-Build

> Possible performance regression on bucket_map_join_tez2.q
> ---------------------------------------------------------
>
>                 Key: HIVE-8975
>                 URL: https://issues.apache.org/jira/browse/HIVE-8975
>             Project: Hive
>          Issue Type: Bug
>          Components: Logical Optimizer, Statistics
>    Affects Versions: 0.15.0
>            Reporter: Jesus Camacho Rodriguez
>            Assignee: Prasanth Jayachandran
>         Attachments: HIVE-8975.1.patch
>
>
> After introducing the identity project removal optimization in HIVE-8435, plan in bucket_map_join_tez2.q that runs on Tez changed to be sub-optimal. In particular, earlier it was doing a map-join and after HIVE-8435 it changed to a reduce-join.
> The query is the following one:
> {noformat}
> select a.key, b.key from (select distinct key from tab) a join tab b on b.key = a.key
> {noformat}
> The plan before removing the projections is:
> {noformat}
> TS[0]-FIL[16]-SEL[1]-GBY[2]-RS[3]-GBY[4]-SEL[5]-RS[8]-JOIN[11]-SEL[12]-FS[13]
> TS[6]-FIL[17]-RS[10]-JOIN[11]
> {noformat}
> And after removing identity projections:
> {noformat}
> TS[0]-FIL[16]-GBY[2]-RS[3]-GBY[4]-RS[8]-JOIN[11]-SEL[12]-FS[13]
> TS[6]-FIL[17]-RS[10]-JOIN[11]
> {noformat}
> After digging a bit, I realized it is not converting the reduce-join into a map-join because stats for GBY\[4\] change if SEL\[5\] is removed; thus the optimization does not kick in. 
> The reason for the stats change in the GroupBy operator is in [this line|https://github.com/apache/hive/blob/6f4365e8a21e7b480bf595d079a71303a50bf1b2/ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java#L633], where it is checked whether the GBY is immediately followed by a RS operator or not, and calculate stats differently depending on it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)