You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by pengcheng xiong <px...@hortonworks.com> on 2015/02/19 06:03:49 UTC
Review Request 31178: Discrepancy in cardinality estimates between
partitioned and un-partitioned tables
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31178/
-----------------------------------------------------------
Review request for hive and Ashutosh Chauhan.
Repository: hive-git
Description
-------
The discrepancy is because NDV calculation for a partitioned table assumes that the NDV range is contained within each partition and is calculates as "select max(NUM_DISTINCTS) from PART_COL_STATS” .
This is problematic for columns like ticket number which are naturally increasing with the partitioned date column ss_sold_date_sk.
Diffs
-----
data/files/extrapolate_stats_partial_ndv.txt PRE-CREATION
metastore/src/java/org/apache/hadoop/hive/metastore/IExtrapolatePartStatus.java 74f1b01
metastore/src/java/org/apache/hadoop/hive/metastore/LinearExtrapolatePartStatus.java 7fc04f1
metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java 574141c
metastore/src/java/org/apache/hadoop/hive/metastore/StatObjectConverter.java 475883b
ql/src/test/queries/clientpositive/extrapolate_part_stats_full.q 00c9b53
ql/src/test/queries/clientpositive/extrapolate_part_stats_partial.q 8ae9a90
ql/src/test/queries/clientpositive/extrapolate_part_stats_partial_ndv.q PRE-CREATION
ql/src/test/results/clientpositive/extrapolate_part_stats_full.q.out 0f6b15d
ql/src/test/results/clientpositive/extrapolate_part_stats_partial.q.out 1fdeb90
ql/src/test/results/clientpositive/extrapolate_part_stats_partial_ndv.q.out PRE-CREATION
Diff: https://reviews.apache.org/r/31178/diff/
Testing
-------
Thanks,
pengcheng xiong
Re: Review Request 31178: Discrepancy in cardinality estimates between
partitioned and un-partitioned tables
Posted by pengcheng xiong <px...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31178/
-----------------------------------------------------------
(Updated April 8, 2015, 12:40 a.m.)
Review request for hive and Ashutosh Chauhan.
Changes
-------
Address test failures.
Repository: hive-git
Description
-------
The discrepancy is because NDV calculation for a partitioned table assumes that the NDV range is contained within each partition and is calculates as "select max(NUM_DISTINCTS) from PART_COL_STATS” .
This is problematic for columns like ticket number which are naturally increasing with the partitioned date column ss_sold_date_sk.
Diffs (updated)
-----
common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 2b8280e
data/files/extrapolate_stats_partial_ndv.txt PRE-CREATION
metastore/src/java/org/apache/hadoop/hive/metastore/IExtrapolatePartStatus.java 74f1b01
metastore/src/java/org/apache/hadoop/hive/metastore/LinearExtrapolatePartStatus.java 7fc04f1
metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java ba27f10
metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 75005aa
metastore/src/java/org/apache/hadoop/hive/metastore/StatObjectConverter.java 475883b
ql/src/test/queries/clientpositive/extrapolate_part_stats_partial_ndv.q PRE-CREATION
ql/src/test/results/clientpositive/extrapolate_part_stats_partial_ndv.q.out PRE-CREATION
Diff: https://reviews.apache.org/r/31178/diff/
Testing
-------
Thanks,
pengcheng xiong
Re: Review Request 31178: Discrepancy in cardinality estimates between
partitioned and un-partitioned tables
Posted by pengcheng xiong <px...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31178/
-----------------------------------------------------------
(Updated April 6, 2015, 9:21 p.m.)
Review request for hive and Ashutosh Chauhan.
Repository: hive-git
Description
-------
The discrepancy is because NDV calculation for a partitioned table assumes that the NDV range is contained within each partition and is calculates as "select max(NUM_DISTINCTS) from PART_COL_STATS” .
This is problematic for columns like ticket number which are naturally increasing with the partitioned date column ss_sold_date_sk.
Diffs (updated)
-----
common/src/java/org/apache/hadoop/hive/conf/HiveConf.java cc16c38
data/files/extrapolate_stats_partial_ndv.txt PRE-CREATION
metastore/src/java/org/apache/hadoop/hive/metastore/IExtrapolatePartStatus.java 74f1b01
metastore/src/java/org/apache/hadoop/hive/metastore/LinearExtrapolatePartStatus.java 7fc04f1
metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java d404789
metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 6956e3b
metastore/src/java/org/apache/hadoop/hive/metastore/StatObjectConverter.java 475883b
ql/src/test/queries/clientpositive/extrapolate_part_stats_partial_ndv.q PRE-CREATION
ql/src/test/results/clientpositive/extrapolate_part_stats_partial_ndv.q.out PRE-CREATION
Diff: https://reviews.apache.org/r/31178/diff/
Testing
-------
Thanks,
pengcheng xiong