You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Hive QA (JIRA)" <ji...@apache.org> on 2015/06/26 06:24:04 UTC

[jira] [Commented] (HIVE-11110) Enable HiveJoinAddNotNullRule in CBO

    [ https://issues.apache.org/jira/browse/HIVE-11110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14602375#comment-14602375 ] 

Hive QA commented on HIVE-11110:
--------------------------------



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12741867/HIVE-11110.patch

{color:red}ERROR:{color} -1 due to 152 failed/errored test(s), 8846 tests executed
*Failed tests:*
{noformat}
TestCliDriver-authorization_create_table_owner_privs.q-create_func1.q-partition_wise_fileformat.q-and-12-more - did not produce a TEST-*.xml file
TestCliDriver-avro_add_column.q-orc_wide_table.q-query_with_semi.q-and-12-more - did not produce a TEST-*.xml file
TestCliDriver-groupby_grouping_sets5.q-temp_table_display_colstats_tbllvl.q-nonreserved_keywords_input37.q-and-12-more - did not produce a TEST-*.xml file
TestCliDriver-skewjoinopt16.q-udf_in_file.q-mapjoin_filter_on_outerjoin.q-and-12-more - did not produce a TEST-*.xml file
TestMiniTezCliDriver-orc_vectorization_ppd.q-vector_left_outer_join2.q-vector_outer_join5.q-and-12-more - did not produce a TEST-*.xml file
TestMiniTezCliDriver-update_after_multiple_inserts.q-update_all_partitioned.q-vectorized_rcfile_columnar.q-and-12-more - did not produce a TEST-*.xml file
TestMiniTezCliDriver-vector_grouping_sets.q-tez_union_dynamic_partition.q-scriptfile1.q-and-12-more - did not produce a TEST-*.xml file
TestMiniTezCliDriver-vector_interval_2.q-vectorization_7.q-union8.q-and-12-more - did not produce a TEST-*.xml file
TestMiniTezCliDriver-vectorization_16.q-mapjoin_mapjoin.q-groupby2.q-and-12-more - did not produce a TEST-*.xml file
TestSparkCliDriver-groupby4.q-timestamp_null.q-auto_join23.q-and-12-more - did not produce a TEST-*.xml file
TestSparkCliDriver-groupby_map_ppr_multi_distinct.q-groupby10.q-timestamp_comparison.q-and-12-more - did not produce a TEST-*.xml file
TestSparkCliDriver-skewjoinopt16.q-scriptfile1.q-udf_in_file.q-and-12-more - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_join
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_join_pkfk
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join11
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join12
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join13
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join16
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join22
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join25
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join33
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join_without_localtask
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_smb_mapjoin_14
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_9
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_rp_auto_join1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_rp_join0
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_correlationoptimizer1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_correlationoptimizer10
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_correlationoptimizer2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_correlationoptimizer6
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_correlationoptimizer7
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_correlationoptimizer8
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_create_view
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_explain_logical
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_filter_cond_pushdown
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_auto_mult_tables
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_auto_mult_tables_compact
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_auto_self_join
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_infer_bucket_sort
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_interval_3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join12
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join13
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join16
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join22
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join32_lessSize
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join33
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join34
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join35
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join_alt_syntax
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join_cond_pushdown_1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join_cond_pushdown_2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join_cond_pushdown_3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join_cond_pushdown_4
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join_merge_multi_expressions
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_louter_join_ppr
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_mapjoin_hook
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_mapjoin_mapjoin
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_mergejoins
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_metadataonly1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_multiMapJoin1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_multiMapJoin2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_nestedvirtual
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ppd_gby_join
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ppd_join
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ppd_join2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ppd_join3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ppd_join_filter
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ppd_outer_join2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ppd_outer_join3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ppd_outer_join4
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ppd_random
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ppd_udf_case
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ppd_union_view
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_regex_col
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_router_join_ppr
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_semijoin
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_skewjoin
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_in
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_in_having
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_unqualcolumnrefs
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_table_access_keys_stats
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union20
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_unionDistinct_1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_interval_mapjoin
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_leftsemi_mapjoin
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorized_nested_mapjoin
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_constprog_partitioner
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_infer_bucket_sort_map_operators
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_infer_bucket_sort_reducers_power_two
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_ql_rewrite_gbtoidx_cbo_1
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_auto_sortmerge_join_9
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_correlationoptimizer1
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynamic_partition_pruning
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_explainuser_2
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_skewjoin
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_subquery_in
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_unionDistinct_1
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_leftsemi_mapjoin
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vectorized_dynamic_partition_pruning
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vectorized_nested_mapjoin
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_constprog_partitioner
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_infer_bucket_sort_map_operators
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_infer_bucket_sort_reducers_power_two
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx_cbo_1
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_annotate_stats_join
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join12
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join13
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join16
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join22
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join3
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join_without_localtask
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_sortmerge_join_9
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_index_auto_self_join
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join12
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join13
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join16
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join2
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join22
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join3
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join32
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join32_lessSize
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join33
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join34
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join35
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join_alt_syntax
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join_cond_pushdown_1
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join_cond_pushdown_2
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join_cond_pushdown_3
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join_cond_pushdown_4
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join_merge_multi_expressions
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_louter_join_ppr
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_mapjoin_mapjoin
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_mergejoins
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_ppd_gby_join
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_ppd_join
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_ppd_join2
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_ppd_join3
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_ppd_join_filter
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_ppd_outer_join2
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_ppd_outer_join3
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_ppd_outer_join4
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_router_join_ppr
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_skewjoin
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_subquery_in
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_table_access_keys_stats
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union20
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_vectorized_nested_mapjoin
org.apache.hadoop.hive.ql.TestMTQueries.testMTQueries1
{noformat}

Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4386/testReport
Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4386/console
Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4386/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 152 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12741867 - PreCommit-HIVE-TRUNK-Build

> Enable HiveJoinAddNotNullRule in CBO
> ------------------------------------
>
>                 Key: HIVE-11110
>                 URL: https://issues.apache.org/jira/browse/HIVE-11110
>             Project: Hive
>          Issue Type: Bug
>          Components: CBO
>            Reporter: Jesus Camacho Rodriguez
>            Assignee: Jesus Camacho Rodriguez
>         Attachments: HIVE-11110.patch
>
>
> Query
> {code}
> select  count(*)
>  from store_sales
>      ,store_returns
>      ,date_dim d1
>      ,date_dim d2
>  where d1.d_quarter_name = '2000Q1'
>    and d1.d_date_sk = ss_sold_date_sk
>    and ss_customer_sk = sr_customer_sk
>    and ss_item_sk = sr_item_sk
>    and ss_ticket_number = sr_ticket_number
>    and sr_returned_date_sk = d2.d_date_sk
>    and d2.d_quarter_name in ('2000Q1','2000Q2','2000Q3’);
> {code}
> The store_sales table is partitioned on ss_sold_date_sk, which is also used in a join clause. The join clause should add a filter “filterExpr: ss_sold_date_sk is not null”, which should get pushed the MetaStore when fetching the stats. Currently this is not done in CBO planning, which results in the stats from __HIVE_DEFAULT_PARTITION__ to be fetched and considered in the optimization phase. In particular, this increases the NDV for the join columns and may result in wrong planning.
> Including HiveJoinAddNotNullRule in the optimization phase solves this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)