You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Mostafa Mokhtar (JIRA)" <ji...@apache.org> on 2015/04/21 03:17:00 UTC
[jira] [Created] (HIVE-10412) CBO : Calculate join selectivity when
computing HiveJoin cost
Mostafa Mokhtar created HIVE-10412:
--------------------------------------
Summary: CBO : Calculate join selectivity when computing HiveJoin cost
Key: HIVE-10412
URL: https://issues.apache.org/jira/browse/HIVE-10412
Project: Hive
Issue Type: Sub-task
Components: CBO
Reporter: Mostafa Mokhtar
Assignee: Laljo John Pullokkaran
This is from TPC-DS Q7
Because we don't compute the selectivity of sub-expression in a HiveJoin we assume that selective and non-selective joins have the similar cost.
{code}
select i_item_id,
avg(ss_quantity) agg1,
avg(ss_list_price) agg2,
avg(ss_coupon_amt) agg3,
avg(ss_sales_price) agg4
from store_sales, customer_demographics, item
where store_sales.ss_item_sk = item.i_item_sk and
store_sales.ss_cdemo_sk = customer_demographics.cd_demo_sk and
cd_gender = 'F' and
cd_marital_status = 'W' and
cd_education_status = 'Primary'
group by i_item_id
order by i_item_id
limit 100
{code}
Cardinality
{code}
item 462,000
customer_demographics 1,920,800
store_sales 82,510,879,939
{code}
NDVs
{code}
item.i_item_sk 439501
customer_demographics.cd_demo_sk 1835839
store_sales.ss_cdemo_sk 1835839
{code}
>From the logs
{code}
2015-04-20 21:09:58,055 DEBUG [main]: cost.HiveCostModel (HiveCostModel.java:getJoinCost(60)) - Join algorithm selection for:
HiveJoin(condition=[=($0, $10)], joinType=[inner], algorithm=[none], cost=[not available])
HiveJoin(condition=[=($1, $6)], joinType=[inner], algorithm=[MapJoin], cost=[{8.251089518344444E10 rows, 2.324083308641975E8 cpu, 275417.5666666666 io}])
HiveProject(ss_item_sk=[$1], ss_cdemo_sk=[$3], ss_quantity=[$9], ss_list_price=[$11], ss_sales_price=[$12], ss_coupon_amt=[$18])
HiveTableScan(table=[[tpcds_bin_partitioned_orc_30000.store_sales]])
HiveProject(cd_demo_sk=[$0], cd_gender=[$1], cd_marital_status=[$2], cd_education_status=[$3])
HiveFilter(condition=[AND(=($1, 'F'), =($2, 'W'), =($3, 'Primary'))])
HiveTableScan(table=[[tpcds_bin_partitioned_orc_30000.customer_demographics]])
HiveProject(i_item_sk=[$0], i_item_id=[$1])
HiveTableScan(table=[[tpcds_bin_partitioned_orc_30000.item]])
2015-04-20 21:09:58,056 DEBUG [main]: cost.HiveCostModel (HiveCostModel.java:getJoinCost(69)) - CommonJoin cost: {6.553102534841269E8 rows, 4.0217814199458417E18 cpu, 3.499540319862703E7 io}
2015-04-20 21:09:58,056 DEBUG [main]: cost.HiveCostModel (HiveCostModel.java:getJoinCost(69)) - MapJoin cost: {6.553102534841269E8 rows, 2.13444462E11 cpu, 1.0720709999999998E7 io}
2015-04-20 21:09:58,056 DEBUG [main]: cost.HiveCostModel (HiveCostModel.java:getJoinCost(78)) - MapJoin selected
2015-04-20 21:09:58,057 DEBUG [main]: cost.HiveCostModel (HiveCostModel.java:getJoinCost(60)) - Join algorithm selection for:
HiveJoin(condition=[=($1, $8)], joinType=[inner], algorithm=[none], cost=[not available])
HiveJoin(condition=[=($0, $6)], joinType=[inner], algorithm=[MapJoin], cost=[{8.2511341939E10 rows, 2.13444462E11 cpu, 1.0720709999999998E7 io}])
HiveProject(ss_item_sk=[$1], ss_cdemo_sk=[$3], ss_quantity=[$9], ss_list_price=[$11], ss_sales_price=[$12], ss_coupon_amt=[$18])
HiveTableScan(table=[[tpcds_bin_partitioned_orc_30000.store_sales]])
HiveProject(i_item_sk=[$0], i_item_id=[$1])
HiveTableScan(table=[[tpcds_bin_partitioned_orc_30000.item]])
HiveProject(cd_demo_sk=[$0], cd_gender=[$1], cd_marital_status=[$2], cd_education_status=[$3])
HiveFilter(condition=[AND(=($1, 'F'), =($2, 'W'), =($3, 'Primary'))])
HiveTableScan(table=[[tpcds_bin_partitioned_orc_30000.customer_demographics]])
2015-04-20 21:09:58,058 DEBUG [main]: cost.HiveCostModel (HiveCostModel.java:getJoinCost(69)) - CommonJoin cost: {8.251089518344444E10 rows, 2.6089279242468144E21 cpu, 4.901146588836599E9 io}
2015-04-20 21:09:58,058 DEBUG [main]: cost.HiveCostModel (HiveCostModel.java:getJoinCost(69)) - MapJoin cost: {8.251089518344444E10 rows, 2.324083308641975E8 cpu, 275417.5666666666 io}
{code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)