You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Vineet Garg (JIRA)" <ji...@apache.org> on 2017/06/01 23:20:05 UTC

[jira] [Created] (HIVE-16811) Estimate statistics in absence of stats

Vineet Garg created HIVE-16811:
----------------------------------

             Summary: Estimate statistics in absence of stats
                 Key: HIVE-16811
                 URL: https://issues.apache.org/jira/browse/HIVE-16811
             Project: Hive
          Issue Type: Improvement
            Reporter: Vineet Garg
            Assignee: Vineet Garg


Currently Join ordering completely bails out in absence of statistics and this could lead to bad joins such as cross joins.
e.g. following select query will produce cross join.

{code:sql}
create table supplier (S_SUPPKEY INT, S_NAME STRING, S_ADDRESS STRING, S_NATIONKEY INT, 
S_PHONE STRING, S_ACCTBAL DOUBLE, S_COMMENT STRING)

CREATE TABLE lineitem (L_ORDERKEY      INT,
                                L_PARTKEY       INT,
                                L_SUPPKEY       INT,
                                L_LINENUMBER    INT,
                                L_QUANTITY      DOUBLE,
                                L_EXTENDEDPRICE DOUBLE,
                                L_DISCOUNT      DOUBLE,
                                L_TAX           DOUBLE,
                                L_RETURNFLAG    STRING,
                                L_LINESTATUS    STRING,
                                l_shipdate      STRING,
                                L_COMMITDATE    STRING,
                                L_RECEIPTDATE   STRING,
                                L_SHIPINSTRUCT  STRING,
                                L_SHIPMODE      STRING,
                                L_COMMENT       STRING) partitioned by (dl int)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '|';


CREATE TABLE part(
    p_partkey INT,
    p_name STRING,
    p_mfgr STRING,
    p_brand STRING,
    p_type STRING,
    p_size INT,
    p_container STRING,
    p_retailprice DOUBLE,
    p_comment STRING
);

explain select count(1) from part,supplier,lineitem where p_partkey = l_partkey and s_suppkey = l_suppkey;

{code}

Estimating stats will prevent join ordering algorithm to bail out and come up with join at least better than cross join 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)