You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Mostafa Mokhtar (JIRA)" <ji...@apache.org> on 2014/11/06 01:32:33 UTC
[jira] [Created] (HIVE-8752) Disjunction cardinality estimation has
selectivity of 1
Mostafa Mokhtar created HIVE-8752:
-------------------------------------
Summary: Disjunction cardinality estimation has selectivity of 1
Key: HIVE-8752
URL: https://issues.apache.org/jira/browse/HIVE-8752
Project: Hive
Issue Type: Bug
Components: CBO
Affects Versions: 0.14.0
Reporter: Mostafa Mokhtar
Assignee: Laljo John Pullokkaran
Priority: Critical
Fix For: 0.14.0
TPC-DS Q89 has the wrong join order.
Store_sales should be joining with item first then date_dim.
The issue is that the predicate on item shows a selectivity of 1
{code}
((i_category in ('Home','Books','Electronics') and
i_class in ('wallpaper','parenting','musical')
)
or (i_category in ('Shoes','Jewelry','Men') and
i_class in ('womens','birdal','pants')
))
{code}
{code}
HiveProjectRel(i_item_sk=[$0], i_brand=[$8], i_class=[$10], i_category=[$12]): rowcount = 462000.0, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 4052
HiveFilterRel(condition=[OR(AND(in($12, 'Home', 'Books', 'Electronics'), in($10, 'wallpaper', 'parenting', 'musical')), AND(in($12, 'Shoes', 'Jewelry', 'Men'), in($10, 'womens', 'birdal', 'pants')))]): rowcount = 462000.0, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 4050
HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_30000.item]]): rowcount = 462000.0, cumulative cost = {0}, id = 3818
{code}
Query
{code}
select *
from(
select i_category, i_class, i_brand,
s_store_name, s_company_name,
d_moy,
sum(ss_sales_price) sum_sales,
avg(sum(ss_sales_price)) over
(partition by i_category, i_brand, s_store_name, s_company_name)
avg_monthly_sales
from item, store_sales, date_dim, store
where store_sales.ss_item_sk = item.i_item_sk and
store_sales.ss_sold_date_sk = date_dim.d_date_sk and
store_sales.ss_store_sk = store.s_store_sk and
d_year in (2000) and
((i_category in ('Home','Books','Electronics') and
i_class in ('wallpaper','parenting','musical')
)
or (i_category in ('Shoes','Jewelry','Men') and
i_class in ('womens','birdal','pants')
))
group by i_category, i_class, i_brand,
s_store_name, s_company_name, d_moy) tmp1
where case when (avg_monthly_sales <> 0) then (abs(sum_sales - avg_monthly_sales) / avg_monthly_sales) else null end > 0.1
order by sum_sales - avg_monthly_sales, s_store_name
limit 100
{code}
The result of the wrong join order is that the query runs in 335 seconds compared to 124 seconds with the correct join order.
Removing the disjunction in the item filter produces the correct plan
{code}
i_category in ('Home','Books','Electronics') and
i_class in ('wallpaper','parenting','musical')
{code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)