You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Mostafa Mokhtar (JIRA)" <ji...@apache.org> on 2015/04/21 03:17:00 UTC

[jira] [Created] (HIVE-10412) CBO : Calculate join selectivity when computing HiveJoin cost

Mostafa Mokhtar created HIVE-10412:
--------------------------------------

             Summary: CBO : Calculate join selectivity when computing HiveJoin cost
                 Key: HIVE-10412
                 URL: https://issues.apache.org/jira/browse/HIVE-10412
             Project: Hive
          Issue Type: Sub-task
          Components: CBO
            Reporter: Mostafa Mokhtar
            Assignee: Laljo John Pullokkaran


This is from TPC-DS Q7
Because we don't compute the selectivity of sub-expression in a HiveJoin we assume that selective and non-selective joins have the similar cost.

{code}
select  i_item_id, 
        avg(ss_quantity) agg1,
        avg(ss_list_price) agg2,
        avg(ss_coupon_amt) agg3,
        avg(ss_sales_price) agg4 
 from store_sales, customer_demographics, item
 where store_sales.ss_item_sk = item.i_item_sk and
       store_sales.ss_cdemo_sk = customer_demographics.cd_demo_sk and
       cd_gender = 'F' and 
       cd_marital_status = 'W' and
       cd_education_status = 'Primary'
 group by i_item_id
 order by i_item_id
 limit 100
{code}

Cardinality 
{code}
item 462,000
customer_demographics 1,920,800
store_sales 82,510,879,939
{code}

NDVs
{code}
item.i_item_sk 439501
customer_demographics.cd_demo_sk 1835839
store_sales.ss_cdemo_sk 1835839
{code}



>From the logs 
{code}
2015-04-20 21:09:58,055 DEBUG [main]: cost.HiveCostModel (HiveCostModel.java:getJoinCost(60)) - Join algorithm selection for:
HiveJoin(condition=[=($0, $10)], joinType=[inner], algorithm=[none], cost=[not available])
  HiveJoin(condition=[=($1, $6)], joinType=[inner], algorithm=[MapJoin], cost=[{8.251089518344444E10 rows, 2.324083308641975E8 cpu, 275417.5666666666 io}])
    HiveProject(ss_item_sk=[$1], ss_cdemo_sk=[$3], ss_quantity=[$9], ss_list_price=[$11], ss_sales_price=[$12], ss_coupon_amt=[$18])
      HiveTableScan(table=[[tpcds_bin_partitioned_orc_30000.store_sales]])
    HiveProject(cd_demo_sk=[$0], cd_gender=[$1], cd_marital_status=[$2], cd_education_status=[$3])
      HiveFilter(condition=[AND(=($1, 'F'), =($2, 'W'), =($3, 'Primary'))])
        HiveTableScan(table=[[tpcds_bin_partitioned_orc_30000.customer_demographics]])
  HiveProject(i_item_sk=[$0], i_item_id=[$1])
    HiveTableScan(table=[[tpcds_bin_partitioned_orc_30000.item]])

2015-04-20 21:09:58,056 DEBUG [main]: cost.HiveCostModel (HiveCostModel.java:getJoinCost(69)) - CommonJoin cost: {6.553102534841269E8 rows, 4.0217814199458417E18 cpu, 3.499540319862703E7 io}
2015-04-20 21:09:58,056 DEBUG [main]: cost.HiveCostModel (HiveCostModel.java:getJoinCost(69)) - MapJoin cost: {6.553102534841269E8 rows, 2.13444462E11 cpu, 1.0720709999999998E7 io}
2015-04-20 21:09:58,056 DEBUG [main]: cost.HiveCostModel (HiveCostModel.java:getJoinCost(78)) - MapJoin selected
2015-04-20 21:09:58,057 DEBUG [main]: cost.HiveCostModel (HiveCostModel.java:getJoinCost(60)) - Join algorithm selection for:
HiveJoin(condition=[=($1, $8)], joinType=[inner], algorithm=[none], cost=[not available])
  HiveJoin(condition=[=($0, $6)], joinType=[inner], algorithm=[MapJoin], cost=[{8.2511341939E10 rows, 2.13444462E11 cpu, 1.0720709999999998E7 io}])
    HiveProject(ss_item_sk=[$1], ss_cdemo_sk=[$3], ss_quantity=[$9], ss_list_price=[$11], ss_sales_price=[$12], ss_coupon_amt=[$18])
      HiveTableScan(table=[[tpcds_bin_partitioned_orc_30000.store_sales]])
    HiveProject(i_item_sk=[$0], i_item_id=[$1])
      HiveTableScan(table=[[tpcds_bin_partitioned_orc_30000.item]])
  HiveProject(cd_demo_sk=[$0], cd_gender=[$1], cd_marital_status=[$2], cd_education_status=[$3])
    HiveFilter(condition=[AND(=($1, 'F'), =($2, 'W'), =($3, 'Primary'))])
      HiveTableScan(table=[[tpcds_bin_partitioned_orc_30000.customer_demographics]])

2015-04-20 21:09:58,058 DEBUG [main]: cost.HiveCostModel (HiveCostModel.java:getJoinCost(69)) - CommonJoin cost: {8.251089518344444E10 rows, 2.6089279242468144E21 cpu, 4.901146588836599E9 io}
2015-04-20 21:09:58,058 DEBUG [main]: cost.HiveCostModel (HiveCostModel.java:getJoinCost(69)) - MapJoin cost: {8.251089518344444E10 rows, 2.324083308641975E8 cpu, 275417.5666666666 io}
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)