You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Mostafa Mokhtar (JIRA)" <ji...@apache.org> on 2014/11/06 23:15:33 UTC
[jira] [Created] (HIVE-8767) CBO : Join on inequality results in
in-correct join order
Mostafa Mokhtar created HIVE-8767:
-------------------------------------
Summary: CBO : Join on inequality results in in-correct join order
Key: HIVE-8767
URL: https://issues.apache.org/jira/browse/HIVE-8767
Project: Hive
Issue Type: Bug
Components: CBO
Affects Versions: 0.14.0
Reporter: Mostafa Mokhtar
Assignee: Laljo John Pullokkaran
Fix For: 0.15.0
Queries with hybrid joins (inner join and in-equality join) produce inefficient join order.
CBO joins the two tables involved in the in-equality join first then the remaining joins are considered.
The problem with that selectivity of the other joins is not taken into consideration.
Queries that are affected by this are Q64 and Q72 from TPC-DS
Logical plan for Q72
{code}
2014-11-06 14:13:12,169 DEBUG [main]: parse.SemanticAnalyzer (SemanticAnalyzer.java:apply(12631)) - Plan After Join Reordering:
HiveSortRel(fetch=[100]): rowcount = 139827.8175849229, cumulative cost = {7.195499709572942E13 rows, 279655.6351698458 cpu, 0.0 io}, id = 2037
HiveSortRel(sort0=[$3], sort1=[$0], sort2=[$1], sort3=[$2], dir0=[DESC], dir1=[ASC], dir2=[ASC], dir3=[ASC]): rowcount = 139827.8175849229, cumulative cost = {7.195497058847592E13 rows, 139827.8175849229 cpu, 0.0 io}, id = 2035
HiveProjectRel(i_item_desc=[$0], w_warehouse_name=[$1], d_week_seq=[$2], total_cnt=[$3]): rowcount = 139827.8175849229, cumulative cost = {7.195494408122242E13 rows, 0.0 cpu, 0.0 io}, id = 2033
HiveAggregateRel(group=[{0, 1, 2}], agg#0=[count()]): rowcount = 139827.8175849229, cumulative cost = {7.195494408122242E13 rows, 0.0 cpu, 0.0 io}, id = 2031
HiveProjectRel($f0=[$13], $f1=[$11], $f2=[$20]): rowcount = 106451.860966184, cumulative cost = {7.195494408122242E13 rows, 0.0 cpu, 0.0 io}, id = 2029
HiveFilterRel(condition=[>(CAST($25):DOUBLE, +(CAST($19):DOUBLE, CAST(5):DOUBLE))]): rowcount = 106451.860966184, cumulative cost = {7.195494408122242E13 rows, 0.0 cpu, 0.0 io}, id = 2027
HiveProjectRel(cs_ship_date_sk=[$2], cs_bill_cdemo_sk=[$3], cs_bill_hdemo_sk=[$4], cs_item_sk=[$5], cs_quantity=[$6], cs_sold_date_sk=[$7], inv_item_sk=[$8], inv_warehouse_sk=[$9], inv_quantity_on_hand=[$10], inv_date_sk=[$11], w_warehouse_sk=[$24], w_warehouse_name=[$25], i_item_sk=[$0], i_item_desc=[$1], cd_demo_sk=[$20], cd_marital_status=[$21], hd_demo_sk=[$18], hd_buy_potential=[$19], d_date_sk=[$12], d_date=[$13], d_week_seq=[$14], d_year=[$15], d_date_sk0=[$16], d_week_seq0=[$17], d_date_sk1=[$22], d_date0=[$23]): rowcount = 319355.582898552, cumulative cost = {7.195494408122242E13 rows, 0.0 cpu, 0.0 io}, id = 2451
HiveJoinRel(condition=[=($24, $9)], joinType=[inner]): rowcount = 319355.582898552, cumulative cost = {7.195494408122242E13 rows, 0.0 cpu, 0.0 io}, id = 2449
HiveJoinRel(condition=[=($2, $22)], joinType=[inner]): rowcount = 319355.582898552, cumulative cost = {7.195494376183984E13 rows, 0.0 cpu, 0.0 io}, id = 2447
HiveJoinRel(condition=[=($0, $5)], joinType=[inner]): rowcount = 319355.582898552, cumulative cost = {7.195494336943527E13 rows, 0.0 cpu, 0.0 io}, id = 2445
HiveProjectRel(i_item_sk=[$0], i_item_desc=[$4]): rowcount = 462000.0, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 1997
HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_30000.item]]): rowcount = 462000.0, cumulative cost = {0}, id = 1645
HiveJoinRel(condition=[=($1, $18)], joinType=[inner]): rowcount = 319355.582898552, cumulative cost = {7.195494258807969E13 rows, 0.0 cpu, 0.0 io}, id = 2443
HiveJoinRel(condition=[=($2, $16)], joinType=[inner]): rowcount = 2235489.080289864, cumulative cost = {7.195494007819061E13 rows, 0.0 cpu, 0.0 io}, id = 2441
HiveJoinRel(condition=[AND(=($9, $14), =($12, $15))], joinType=[inner]): rowcount = 5.588722700724658E7, cumulative cost = {7.195488419067561E13 rows, 0.0 cpu, 0.0 io}, id = 2439
HiveJoinRel(condition=[=($5, $10)], joinType=[inner]): rowcount = 5.732184228903609E9, cumulative cost = {7.19491519333977E13 rows, 0.0 cpu, 0.0 io}, id = 2080
HiveFilterRel(condition=[<($8, $4)]): rowcount = 7.190451896736688E13, cumulative cost = {4.4632966025E10 rows, 0.0 cpu, 0.0 io}, id = 1991
HiveProjectRel(cs_ship_date_sk=[$0], cs_bill_cdemo_sk=[$1], cs_bill_hdemo_sk=[$2], cs_item_sk=[$3], cs_quantity=[$4], cs_sold_date_sk=[$5], inv_item_sk=[$6], inv_warehouse_sk=[$7], inv_quantity_on_hand=[$8], inv_date_sk=[$9]): rowcount = 2.1571355690210062E14, cumulative cost = {4.4632966025E10 rows, 0.0 cpu, 0.0 io}, id = 2072
HiveJoinRel(condition=[=($3, $6)], joinType=[inner]): rowcount = 2.1571355690210062E14, cumulative cost = {4.4632966025E10 rows, 0.0 cpu, 0.0 io}, id = 2067
HiveProjectRel(cs_ship_date_sk=[$1], cs_bill_cdemo_sk=[$3], cs_bill_hdemo_sk=[$4], cs_item_sk=[$14], cs_quantity=[$17], cs_sold_date_sk=[$33]): rowcount = 4.3005109025E10, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 1925
HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_30000.catalog_sales]]): rowcount = 4.3005109025E10, cumulative cost = {0}, id = 1642
HiveProjectRel(inv_item_sk=[$0], inv_warehouse_sk=[$1], inv_quantity_on_hand=[$2], inv_date_sk=[$3]): rowcount = 1.627857E9, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 1928
HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_30000.inventory]]): rowcount = 1.627857E9, cumulative cost = {0}, id = 1644
HiveProjectRel(d_date_sk=[$0], d_date=[$2], d_week_seq=[$4], d_year=[$6]): rowcount = 5.823421556122448, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 2015
HiveFilterRel(condition=[AND(=($6, 2001), =($6, 2001))]): rowcount = 5.823421556122448, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 2013
HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_30000.date_dim]]): rowcount = 73049.0, cumulative cost = {0}, id = 1640
HiveProjectRel(d_date_sk=[$0], d_week_seq=[$4]): rowcount = 73049.0, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 2019
HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_30000.date_dim]]): rowcount = 73049.0, cumulative cost = {0}, id = 1640
HiveProjectRel(hd_demo_sk=[$0], hd_buy_potential=[$2]): rowcount = 288.00000000000006, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 2009
HiveFilterRel(condition=[AND(=($2, '1001-5000'), =($2, '1001-5000'))]): rowcount = 288.00000000000006, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 2007
HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_30000.household_demographics]]): rowcount = 7200.0, cumulative cost = {0}, id = 1639
HiveProjectRel(cd_demo_sk=[$0], cd_marital_status=[$2]): rowcount = 274400.0, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 2003
HiveFilterRel(condition=[=($2, 'M')]): rowcount = 274400.0, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 2001
HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_30000.customer_demographics]]): rowcount = 1920800.0, cumulative cost = {0}, id = 1646
HiveProjectRel(d_date_sk=[$0], d_date=[$2]): rowcount = 73049.0, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 2023
HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_30000.date_dim]]): rowcount = 73049.0, cumulative cost = {0}, id = 1640
HiveProjectRel(w_warehouse_sk=[$0], w_warehouse_name=[$2]): rowcount = 27.0, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 1993
HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_30000.warehouse]]): rowcount = 27.0, cumulative cost = {0}, id = 1647
{code}
Logical plan for Q72 if the join order in the query is changed
{code}
2014-11-06 14:14:54,469 DEBUG [main]: parse.SemanticAnalyzer (SemanticAnalyzer.java:apply(12631)) - Plan After Join Reordering:
HiveSortRel(fetch=[100]): rowcount = 139827.81758492283, cumulative cost = {4.472416182114529E10 rows, 279655.63516984566 cpu, 0.0 io}, id = 2860
HiveSortRel(sort0=[$3], sort1=[$0], sort2=[$1], sort3=[$2], dir0=[DESC], dir1=[ASC], dir2=[ASC], dir3=[ASC]): rowcount = 139827.81758492283, cumulative cost = {4.469765456764418E10 rows, 139827.81758492283 cpu, 0.0 io}, id = 2858
HiveProjectRel(i_item_desc=[$0], w_warehouse_name=[$1], d_week_seq=[$2], total_cnt=[$3]): rowcount = 139827.81758492283, cumulative cost = {4.4671147314143074E10 rows, 0.0 cpu, 0.0 io}, id = 2856
HiveAggregateRel(group=[{0, 1, 2}], agg#0=[count()]): rowcount = 139827.81758492283, cumulative cost = {4.4671147314143074E10 rows, 0.0 cpu, 0.0 io}, id = 2854
HiveProjectRel($f0=[$21], $f1=[$19], $f2=[$8]): rowcount = 106451.86096618396, cumulative cost = {4.4671147314143074E10 rows, 0.0 cpu, 0.0 io}, id = 2852
HiveFilterRel(condition=[>(CAST($25):DOUBLE, +(CAST($7):DOUBLE, CAST(5):DOUBLE))]): rowcount = 106451.86096618396, cumulative cost = {4.4671147314143074E10 rows, 0.0 cpu, 0.0 io}, id = 2850
HiveProjectRel(cs_ship_date_sk=[$2], cs_bill_cdemo_sk=[$3], cs_bill_hdemo_sk=[$4], cs_item_sk=[$5], cs_quantity=[$6], cs_sold_date_sk=[$7], d_date_sk=[$8], d_date=[$9], d_week_seq=[$10], d_year=[$11], cd_demo_sk=[$12], cd_marital_status=[$13], hd_demo_sk=[$14], hd_buy_potential=[$15], inv_item_sk=[$16], inv_warehouse_sk=[$17], inv_quantity_on_hand=[$18], inv_date_sk=[$19], w_warehouse_sk=[$24], w_warehouse_name=[$25], i_item_sk=[$0], i_item_desc=[$1], d_date_sk0=[$20], d_week_seq0=[$21], d_date_sk1=[$22], d_date0=[$23]): rowcount = 319355.5828985519, cumulative cost = {4.4671147314143074E10 rows, 0.0 cpu, 0.0 io}, id = 3081
HiveJoinRel(condition=[=($24, $17)], joinType=[inner]): rowcount = 319355.5828985519, cumulative cost = {4.4671147314143074E10 rows, 0.0 cpu, 0.0 io}, id = 3079
HiveJoinRel(condition=[=($2, $22)], joinType=[inner]): rowcount = 319355.5828985519, cumulative cost = {4.467082793156017E10 rows, 0.0 cpu, 0.0 io}, id = 3077
HiveJoinRel(condition=[=($0, $5)], joinType=[inner]): rowcount = 319355.5828985519, cumulative cost = {4.467043552697727E10 rows, 0.0 cpu, 0.0 io}, id = 3075
HiveProjectRel(i_item_sk=[$0], i_item_desc=[$4]): rowcount = 462000.0, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 2838
HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_30000.item]]): rowcount = 462000.0, cumulative cost = {0}, id = 2462
HiveJoinRel(condition=[AND(=($17, $18), =($8, $19))], joinType=[inner]): rowcount = 319355.5828985519, cumulative cost = {4.466965417139437E10 rows, 0.0 cpu, 0.0 io}, id = 2991
HiveFilterRel(condition=[<($16, $4)]): rowcount = 3.2755338450877763E7, cumulative cost = {4.46368257839435E10 rows, 0.0 cpu, 0.0 io}, id = 2832
HiveProjectRel(cs_ship_date_sk=[$6], cs_bill_cdemo_sk=[$7], cs_bill_hdemo_sk=[$8], cs_item_sk=[$9], cs_quantity=[$10], cs_sold_date_sk=[$11], d_date_sk=[$12], d_date=[$13], d_week_seq=[$14], d_year=[$15], cd_demo_sk=[$4], cd_marital_status=[$5], hd_demo_sk=[$16], hd_buy_potential=[$17], inv_item_sk=[$0], inv_warehouse_sk=[$1], inv_quantity_on_hand=[$2], inv_date_sk=[$3]): rowcount = 9.82660153526333E7, cumulative cost = {4.46368257839435E10 rows, 0.0 cpu, 0.0 io}, id = 2987
HiveJoinRel(condition=[=($9, $0)], joinType=[inner]): rowcount = 9.82660153526333E7, cumulative cost = {4.46368257839435E10 rows, 0.0 cpu, 0.0 io}, id = 2985
HiveProjectRel(inv_item_sk=[$0], inv_warehouse_sk=[$1], inv_quantity_on_hand=[$2], inv_date_sk=[$3]): rowcount = 1.627857E9, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 2828
HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_30000.inventory]]): rowcount = 1.627857E9, cumulative cost = {0}, id = 2461
HiveJoinRel(condition=[=($3, $0)], joinType=[inner]): rowcount = 19590.519781796647, cumulative cost = {4.300894919342371E10 rows, 0.0 cpu, 0.0 io}, id = 2983
HiveProjectRel(cd_demo_sk=[$0], cd_marital_status=[$2]): rowcount = 274400.0, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 2818
HiveFilterRel(condition=[=($2, 'M')]): rowcount = 274400.0, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 2816
HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_30000.customer_demographics]]): rowcount = 1920800.0, cumulative cost = {0}, id = 2463
HiveJoinRel(condition=[=($2, $10)], joinType=[inner]): rowcount = 137133.63847257654, cumulative cost = {4.300853765978524E10 rows, 0.0 cpu, 0.0 io}, id = 2981
HiveJoinRel(condition=[=($5, $6)], joinType=[inner]): rowcount = 3428340.961814413, cumulative cost = {4.3005109030823425E10 rows, 0.0 cpu, 0.0 io}, id = 2901
HiveProjectRel(cs_ship_date_sk=[$1], cs_bill_cdemo_sk=[$3], cs_bill_hdemo_sk=[$4], cs_item_sk=[$14], cs_quantity=[$17], cs_sold_date_sk=[$33]): rowcount = 4.3005109025E10, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 2808
HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_30000.catalog_sales]]): rowcount = 4.3005109025E10, cumulative cost = {0}, id = 2459
HiveProjectRel(d_date_sk=[$0], d_date=[$2], d_week_seq=[$4], d_year=[$6]): rowcount = 5.823421556122448, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 2812
HiveFilterRel(condition=[AND(=($6, 2001), =($6, 2001))]): rowcount = 5.823421556122448, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 2810
HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_30000.date_dim]]): rowcount = 73049.0, cumulative cost = {0}, id = 2457
HiveProjectRel(hd_demo_sk=[$0], hd_buy_potential=[$2]): rowcount = 288.00000000000006, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 2824
HiveFilterRel(condition=[AND(=($2, '1001-5000'), =($2, '1001-5000'))]): rowcount = 288.00000000000006, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 2822
HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_30000.household_demographics]]): rowcount = 7200.0, cumulative cost = {0}, id = 2456
HiveProjectRel(d_date_sk=[$0], d_week_seq=[$4]): rowcount = 73049.0, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 2842
HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_30000.date_dim]]): rowcount = 73049.0, cumulative cost = {0}, id = 2457
HiveProjectRel(d_date_sk=[$0], d_date=[$2]): rowcount = 73049.0, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 2846
HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_30000.date_dim]]): rowcount = 73049.0, cumulative cost = {0}, id = 2457
HiveProjectRel(w_warehouse_sk=[$0], w_warehouse_name=[$2]): rowcount = 27.0, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 2834
HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_30000.warehouse]]): rowcount = 27.0, cumulative cost = {0}, id = 2464
{code}
Logical plan after replacing "inv_quantity_on_hand < cs_quantity" with inv_quantity_on_hand = 528
{code}
2014-11-06 14:18:17,149 DEBUG [main]: parse.SemanticAnalyzer (SemanticAnalyzer.java:apply(12631)) - Plan After Join Reordering:
HiveSortRel(fetch=[100]): rowcount = 863.6667057213618, cumulative cost = {4.301228282853069E10 rows, 1727.3334114427237 cpu, 0.0 io}, id = 4382
HiveSortRel(sort0=[$3], sort1=[$0], sort2=[$1], sort3=[$2], dir0=[DESC], dir1=[ASC], dir2=[ASC], dir3=[ASC]): rowcount = 863.6667057213618, cumulative cost = {4.301218939793791E10 rows, 863.6667057213618 cpu, 0.0 io}, id = 4380
HiveProjectRel(i_item_desc=[$0], w_warehouse_name=[$1], d_week_seq=[$2], total_cnt=[$3]): rowcount = 863.6667057213618, cumulative cost = {4.301209596734513E10 rows, 0.0 cpu, 0.0 io}, id = 4378
HiveAggregateRel(group=[{0, 1, 2}], agg#0=[count()]): rowcount = 863.6667057213618, cumulative cost = {4.301209596734513E10 rows, 0.0 cpu, 0.0 io}, id = 4376
HiveProjectRel($f0=[$12], $f1=[$10], $f2=[$19]): rowcount = 462.4898665205019, cumulative cost = {4.301209596734513E10 rows, 0.0 cpu, 0.0 io}, id = 4374
HiveFilterRel(condition=[>(CAST($24):DOUBLE, +(CAST($18):DOUBLE, CAST(5):DOUBLE))]): rowcount = 462.4898665205019, cumulative cost = {4.301209596734513E10 rows, 0.0 cpu, 0.0 io}, id = 4372
HiveProjectRel(cs_ship_date_sk=[$12], cs_bill_cdemo_sk=[$13], cs_bill_hdemo_sk=[$14], cs_item_sk=[$15], cs_sold_date_sk=[$16], inv_item_sk=[$2], inv_warehouse_sk=[$3], inv_quantity_on_hand=[$4], inv_date_sk=[$5], w_warehouse_sk=[$6], w_warehouse_name=[$7], i_item_sk=[$8], i_item_desc=[$9], cd_demo_sk=[$10], cd_marital_status=[$11], hd_demo_sk=[$21], hd_buy_potential=[$22], d_date_sk=[$17], d_date=[$18], d_week_seq=[$19], d_year=[$20], d_date_sk0=[$23], d_week_seq0=[$24], d_date_sk1=[$0], d_date0=[$1]): rowcount = 1387.4695995615057, cumulative cost = {4.301209596734513E10 rows, 0.0 cpu, 0.0 io}, id = 4853
HiveJoinRel(condition=[=($12, $0)], joinType=[inner]): rowcount = 1387.4695995615057, cumulative cost = {4.301209596734513E10 rows, 0.0 cpu, 0.0 io}, id = 4851
HiveProjectRel(d_date_sk=[$0], d_date=[$2]): rowcount = 73049.0, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 4368
HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_30000.date_dim]]): rowcount = 73049.0, cumulative cost = {0}, id = 3975
HiveJoinRel(condition=[AND(=($3, $21), =($17, $22))], joinType=[inner]): rowcount = 1387.4695995615057, cumulative cost = {4.3012021530875534E10 rows, 0.0 cpu, 0.0 io}, id = 4849
HiveJoinRel(condition=[=($13, $0)], joinType=[inner]): rowcount = 142308.57000041206, cumulative cost = {4.3011806173305534E10 rows, 0.0 cpu, 0.0 io}, id = 4847
HiveJoinRel(condition=[=($4, $1)], joinType=[inner]): rowcount = 2355798.8422575975, cumulative cost = {2355825.8422575975 rows, 0.0 cpu, 0.0 io}, id = 4460
HiveProjectRel(inv_item_sk=[$0], inv_warehouse_sk=[$1], inv_quantity_on_hand=[$2], inv_date_sk=[$3]): rowcount = 2355798.8422575975, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 4334
HiveFilterRel(condition=[=($2, 528)]): rowcount = 2355798.8422575975, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 4332
HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_30000.inventory]]): rowcount = 1.627857E9, cumulative cost = {0}, id = 3979
HiveProjectRel(w_warehouse_sk=[$0], w_warehouse_name=[$2]): rowcount = 27.0, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 4338
HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_30000.warehouse]]): rowcount = 27.0, cumulative cost = {0}, id = 3982
HiveJoinRel(condition=[=($0, $7)], joinType=[inner]): rowcount = 19590.519781796647, cumulative cost = {4.30094307839435E10 rows, 0.0 cpu, 0.0 io}, id = 4845
HiveProjectRel(i_item_sk=[$0], i_item_desc=[$4]): rowcount = 462000.0, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 4342
HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_30000.item]]): rowcount = 462000.0, cumulative cost = {0}, id = 3980
HiveJoinRel(condition=[=($3, $0)], joinType=[inner]): rowcount = 19590.519781796647, cumulative cost = {4.300894919342371E10 rows, 0.0 cpu, 0.0 io}, id = 4843
HiveProjectRel(cd_demo_sk=[$0], cd_marital_status=[$2]): rowcount = 274400.0, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 4348
HiveFilterRel(condition=[=($2, 'M')]): rowcount = 274400.0, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 4346
HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_30000.customer_demographics]]): rowcount = 1920800.0, cumulative cost = {0}, id = 3981
HiveJoinRel(condition=[=($2, $9)], joinType=[inner]): rowcount = 137133.63847257654, cumulative cost = {4.300853765978524E10 rows, 0.0 cpu, 0.0 io}, id = 4841
HiveJoinRel(condition=[=($4, $5)], joinType=[inner]): rowcount = 3428340.961814413, cumulative cost = {4.3005109030823425E10 rows, 0.0 cpu, 0.0 io}, id = 4428
HiveProjectRel(cs_ship_date_sk=[$1], cs_bill_cdemo_sk=[$3], cs_bill_hdemo_sk=[$4], cs_item_sk=[$14], cs_sold_date_sk=[$33]): rowcount = 4.3005109025E10, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 4330
HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_30000.catalog_sales]]): rowcount = 4.3005109025E10, cumulative cost = {0}, id = 3977
HiveProjectRel(d_date_sk=[$0], d_date=[$2], d_week_seq=[$4], d_year=[$6]): rowcount = 5.823421556122448, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 4360
HiveFilterRel(condition=[AND(=($6, 2001), =($6, 2001))]): rowcount = 5.823421556122448, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 4358
HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_30000.date_dim]]): rowcount = 73049.0, cumulative cost = {0}, id = 3975
HiveProjectRel(hd_demo_sk=[$0], hd_buy_potential=[$2]): rowcount = 288.00000000000006, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 4354
HiveFilterRel(condition=[AND(=($2, '1001-5000'), =($2, '1001-5000'))]): rowcount = 288.00000000000006, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 4352
HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_30000.household_demographics]]): rowcount = 7200.0, cumulative cost = {0}, id = 3974
HiveProjectRel(d_date_sk=[$0], d_week_seq=[$4]): rowcount = 73049.0, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 4364
HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_30000.date_dim]]): rowcount = 73049.0, cumulative cost = {0}, id = 3975
{code}
Q72
{code}
select i_item_desc
,w_warehouse_name
,d1.d_week_seq
,count(case when p_promo_sk is null then 1 else 0 end) no_promo
,count(case when p_promo_sk is not null then 1 else 0 end) promo
,count(*) total_cnt
from catalog_sales
join inventory on (catalog_sales.cs_item_sk = inventory.inv_item_sk)
join warehouse on (warehouse.w_warehouse_sk=inventory.inv_warehouse_sk)
join item on (item.i_item_sk = catalog_sales.cs_item_sk)
join customer_demographics on (catalog_sales.cs_bill_cdemo_sk = customer_demographics.cd_demo_sk)
join household_demographics on (catalog_sales.cs_bill_hdemo_sk = household_demographics.hd_demo_sk)
join date_dim d1 on (catalog_sales.cs_sold_date_sk = d1.d_date_sk)
join date_dim d2 on (inventory.inv_date_sk = d2.d_date_sk)
join date_dim d3 on (catalog_sales.cs_ship_date_sk = d3.d_date_sk)
left outer join promotion on (catalog_sales.cs_promo_sk=promotion.p_promo_sk)
left outer join catalog_returns on (catalog_returns.cr_item_sk = catalog_sales.cs_item_sk and catalog_returns.cr_order_number = catalog_sales.cs_order_number)
where d1.d_week_seq = d2.d_week_seq
and inv_quantity_on_hand < cs_quantity
and d3.d_date > d1.d_date + 5
and hd_buy_potential = '1001-5000'
and d1.d_year = 2001
and hd_buy_potential = '1001-5000'
and cd_marital_status = 'M'
and d1.d_year = 2001
group by i_item_desc,w_warehouse_name,d1.d_week_seq
order by total_cnt desc, i_item_desc, w_warehouse_name, d_week_seq
limit 100
{code}
Q64
{code}
select i_product_name as product_name ,i_item_sk as item_sk ,s_store_name as store_name
,s_zip as store_zip ,ad1.ca_street_number as b_street_number ,ad1.ca_street_name as b_streen_name
,ad1.ca_city as b_city ,ad1.ca_zip as b_zip ,ad2.ca_street_number as c_street_number
,ad2.ca_street_name as c_street_name ,ad2.ca_city as c_city ,ad2.ca_zip as c_zip
,d1.d_year as syear ,d2.d_year as fsyear ,d3.d_year as s2year ,count(*) as cnt
,sum(ss_wholesale_cost) as s1 ,sum(ss_list_price) as s2 ,sum(ss_coupon_amt) as s3
FROM store_sales
JOIN store_returns ON store_sales.ss_item_sk = store_returns.sr_item_sk and store_sales.ss_ticket_number = store_returns.sr_ticket_number
JOIN customer ON store_sales.ss_customer_sk = customer.c_customer_sk
JOIN date_dim d1 ON store_sales.ss_sold_date_sk = d1.d_date_sk
JOIN date_dim d2 ON customer.c_first_sales_date_sk = d2.d_date_sk
JOIN date_dim d3 ON customer.c_first_shipto_date_sk = d3.d_date_sk
JOIN store ON store_sales.ss_store_sk = store.s_store_sk
JOIN customer_demographics cd1 ON store_sales.ss_cdemo_sk= cd1.cd_demo_sk
JOIN customer_demographics cd2 ON customer.c_current_cdemo_sk = cd2.cd_demo_sk
JOIN promotion ON store_sales.ss_promo_sk = promotion.p_promo_sk
JOIN household_demographics hd1 ON store_sales.ss_hdemo_sk = hd1.hd_demo_sk
JOIN household_demographics hd2 ON customer.c_current_hdemo_sk = hd2.hd_demo_sk
JOIN customer_address ad1 ON store_sales.ss_addr_sk = ad1.ca_address_sk
JOIN customer_address ad2 ON customer.c_current_addr_sk = ad2.ca_address_sk
JOIN income_band ib1 ON hd1.hd_income_band_sk = ib1.ib_income_band_sk
JOIN income_band ib2 ON hd2.hd_income_band_sk = ib2.ib_income_band_sk
JOIN item ON store_sales.ss_item_sk = item.i_item_sk
JOIN
(select cs_item_sk
,sum(cs_ext_list_price) as sale,sum(cr_refunded_cash+cr_reversed_charge+cr_store_credit) as refund
from catalog_sales JOIN catalog_returns
ON catalog_sales.cs_item_sk = catalog_returns.cr_item_sk
and catalog_sales.cs_order_number = catalog_returns.cr_order_number
group by cs_item_sk
having sum(cs_ext_list_price)>2*sum(cr_refunded_cash+cr_reversed_charge+cr_store_credit)) cs_ui
ON store_sales.ss_item_sk = cs_ui.cs_item_sk
WHERE
cd1.cd_marital_status <> cd2.cd_marital_status and
i_color in ('maroon','burnished','dim','steel','navajo','chocolate') and
i_current_price between 35 and 35 + 10 and
i_current_price between 35 + 1 and 35 + 15
group by i_product_name ,i_item_sk ,s_store_name ,s_zip ,ad1.ca_street_number
,ad1.ca_street_name ,ad1.ca_city ,ad1.ca_zip ,ad2.ca_street_number
,ad2.ca_street_name ,ad2.ca_city ,ad2.ca_zip ,d1.d_year ,d2.d_year ,d3.d_year
{code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)