You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Vineet Garg (JIRA)" <ji...@apache.org> on 2017/08/26 00:37:00 UTC
[jira] [Comment Edited] (HIVE-16811) Estimate statistics in absence
of stats
[ https://issues.apache.org/jira/browse/HIVE-16811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16142496#comment-16142496 ]
Vineet Garg edited comment on HIVE-16811 at 8/26/17 12:36 AM:
--------------------------------------------------------------
Latest patch addresses review comments
was (Author: vgarg):
Addresses review comments
> Estimate statistics in absence of stats
> ---------------------------------------
>
> Key: HIVE-16811
> URL: https://issues.apache.org/jira/browse/HIVE-16811
> Project: Hive
> Issue Type: Improvement
> Reporter: Vineet Garg
> Assignee: Vineet Garg
> Attachments: HIVE-16811.1.patch, HIVE-16811.2.patch, HIVE-16811.3.patch, HIVE-16811.4.patch, HIVE-16811.5.patch, HIVE-16811.6.patch, HIVE-16811.7.patch, HIVE-16811.8.patch
>
>
> Currently Join ordering completely bails out in absence of statistics and this could lead to bad joins such as cross joins.
> e.g. following select query will produce cross join.
> {code:sql}
> create table supplier (S_SUPPKEY INT, S_NAME STRING, S_ADDRESS STRING, S_NATIONKEY INT,
> S_PHONE STRING, S_ACCTBAL DOUBLE, S_COMMENT STRING)
> CREATE TABLE lineitem (L_ORDERKEY INT,
> L_PARTKEY INT,
> L_SUPPKEY INT,
> L_LINENUMBER INT,
> L_QUANTITY DOUBLE,
> L_EXTENDEDPRICE DOUBLE,
> L_DISCOUNT DOUBLE,
> L_TAX DOUBLE,
> L_RETURNFLAG STRING,
> L_LINESTATUS STRING,
> l_shipdate STRING,
> L_COMMITDATE STRING,
> L_RECEIPTDATE STRING,
> L_SHIPINSTRUCT STRING,
> L_SHIPMODE STRING,
> L_COMMENT STRING) partitioned by (dl int)
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY '|';
> CREATE TABLE part(
> p_partkey INT,
> p_name STRING,
> p_mfgr STRING,
> p_brand STRING,
> p_type STRING,
> p_size INT,
> p_container STRING,
> p_retailprice DOUBLE,
> p_comment STRING
> );
> explain select count(1) from part,supplier,lineitem where p_partkey = l_partkey and s_suppkey = l_suppkey;
> {code}
> Estimating stats will prevent join ordering algorithm to bail out and come up with join at least better than cross join
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)