You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by pengcheng xiong <px...@hortonworks.com> on 2017/07/10 21:29:56 UTC

Review Request 60753: Add HLL as an alternative to FM sketch to compute stats

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/60753/
-----------------------------------------------------------

Review request for hive, Ashutosh Chauhan and Prasanth_J.


Repository: hive-git


Description
-------

HIVE-16966


Diffs
-----

  common/pom.xml e6722babd8 
  common/src/java/org/apache/hadoop/hive/common/HiveStatsUtils.java 7c9d72fbd2 
  common/src/java/org/apache/hadoop/hive/common/ndv/NumDistinctValueEstimator.java PRE-CREATION 
  common/src/java/org/apache/hadoop/hive/common/ndv/NumDistinctValueEstimatorFactory.java PRE-CREATION 
  common/src/java/org/apache/hadoop/hive/common/ndv/hll/HLLConstants.java PRE-CREATION 
  common/src/java/org/apache/hadoop/hive/common/ndv/hll/HLLDenseRegister.java PRE-CREATION 
  common/src/java/org/apache/hadoop/hive/common/ndv/hll/HLLRegister.java PRE-CREATION 
  common/src/java/org/apache/hadoop/hive/common/ndv/hll/HLLSparseRegister.java PRE-CREATION 
  common/src/java/org/apache/hadoop/hive/common/ndv/hll/HyperLogLog.java PRE-CREATION 
  common/src/java/org/apache/hadoop/hive/common/ndv/hll/HyperLogLogUtils.java PRE-CREATION 
  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 5700fb9325 
  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java.orig da48a7ccbd 
  metastore/src/java/org/apache/hadoop/hive/metastore/NumDistinctValueEstimator.java 92f9a845e3 
  metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/DecimalColumnStatsAggregator.java 36b2c9c56b 
  metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/DoubleColumnStatsAggregator.java a88ef84e5c 
  metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/LongColumnStatsAggregator.java 8ac6561aec 
  metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/StringColumnStatsAggregator.java 2aa4046a46 
  metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/ColumnStatsMerger.java 33c7e3e52c 
  metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/ColumnStatsMergerFactory.java fe890e4e27 
  metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/DateColumnStatsMerger.java 3179b23438 
  metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/DecimalColumnStatsMerger.java c13add9d9c 
  metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/DoubleColumnStatsMerger.java fbdba24b0a 
  metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/LongColumnStatsMerger.java ac65590505 
  metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/StringColumnStatsMerger.java 41587477d3 
  pom.xml f9fae59a5d 
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/DecimalNumDistinctValueEstimator.java a05906edfa 
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/DoubleNumDistinctValueEstimator.java e76fc74dbc 
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFComputeStats.java 2ebfcb2360 
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/LongNumDistinctValueEstimator.java 1c197a028a 
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/NumDistinctValueEstimator.java fa70f49857 
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/StringNumDistinctValueEstimator.java 601901c163 
  ql/src/test/queries/clientpositive/hll.q PRE-CREATION 
  ql/src/test/results/clientpositive/hll.q.out PRE-CREATION 


Diff: https://reviews.apache.org/r/60753/diff/1/


Testing
-------


Thanks,

pengcheng xiong


Re: Review Request 60753: Add HLL as an alternative to FM sketch to compute stats

Posted by j....@gmail.com.

> On July 10, 2017, 10:02 p.m., Prasanth_J wrote:
> > common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
> > Line 1723 (original), 1723 (patched)
> > <https://reviews.apache.org/r/60753/diff/1/?file=1773281#file1773281line1723>
> >
> >     I am not sure if we need this config. 
> >     Any reason to support storing FM sketch's bitvectors?
> >     
> >     If this is for 3.0.0 release only, we could even remove this config.
> >     
> >     If we need to support FM sketch based NDV then having a separate field in metastore will be better as Ashutosh suggested.
> 
> pengcheng xiong wrote:
>     This config is just to give user an option to choose whether to use FM or HLL to COMPUTE ndv. This patch does not change the fact that we do not store bit vectors for both FM and NDV in object store.

I don't think it will be useful for user to tune this.
IMHO, most users won't care a lot about this. This can be used as fallback but this will only add more checks.


- Prasanth_J


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/60753/#review180119
-----------------------------------------------------------


On July 10, 2017, 9:29 p.m., pengcheng xiong wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/60753/
> -----------------------------------------------------------
> 
> (Updated July 10, 2017, 9:29 p.m.)
> 
> 
> Review request for hive, Ashutosh Chauhan and Prasanth_J.
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> HIVE-16966
> 
> 
> Diffs
> -----
> 
>   common/pom.xml e6722babd8 
>   common/src/java/org/apache/hadoop/hive/common/HiveStatsUtils.java 7c9d72fbd2 
>   common/src/java/org/apache/hadoop/hive/common/ndv/NumDistinctValueEstimator.java PRE-CREATION 
>   common/src/java/org/apache/hadoop/hive/common/ndv/NumDistinctValueEstimatorFactory.java PRE-CREATION 
>   common/src/java/org/apache/hadoop/hive/common/ndv/hll/HLLConstants.java PRE-CREATION 
>   common/src/java/org/apache/hadoop/hive/common/ndv/hll/HLLDenseRegister.java PRE-CREATION 
>   common/src/java/org/apache/hadoop/hive/common/ndv/hll/HLLRegister.java PRE-CREATION 
>   common/src/java/org/apache/hadoop/hive/common/ndv/hll/HLLSparseRegister.java PRE-CREATION 
>   common/src/java/org/apache/hadoop/hive/common/ndv/hll/HyperLogLog.java PRE-CREATION 
>   common/src/java/org/apache/hadoop/hive/common/ndv/hll/HyperLogLogUtils.java PRE-CREATION 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 5700fb9325 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java.orig da48a7ccbd 
>   metastore/src/java/org/apache/hadoop/hive/metastore/NumDistinctValueEstimator.java 92f9a845e3 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/DecimalColumnStatsAggregator.java 36b2c9c56b 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/DoubleColumnStatsAggregator.java a88ef84e5c 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/LongColumnStatsAggregator.java 8ac6561aec 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/StringColumnStatsAggregator.java 2aa4046a46 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/ColumnStatsMerger.java 33c7e3e52c 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/ColumnStatsMergerFactory.java fe890e4e27 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/DateColumnStatsMerger.java 3179b23438 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/DecimalColumnStatsMerger.java c13add9d9c 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/DoubleColumnStatsMerger.java fbdba24b0a 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/LongColumnStatsMerger.java ac65590505 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/StringColumnStatsMerger.java 41587477d3 
>   pom.xml f9fae59a5d 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/DecimalNumDistinctValueEstimator.java a05906edfa 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/DoubleNumDistinctValueEstimator.java e76fc74dbc 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFComputeStats.java 2ebfcb2360 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/LongNumDistinctValueEstimator.java 1c197a028a 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/NumDistinctValueEstimator.java fa70f49857 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/StringNumDistinctValueEstimator.java 601901c163 
>   ql/src/test/queries/clientpositive/hll.q PRE-CREATION 
>   ql/src/test/results/clientpositive/hll.q.out PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/60753/diff/1/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> pengcheng xiong
> 
>


Re: Review Request 60753: Add HLL as an alternative to FM sketch to compute stats

Posted by pengcheng xiong <px...@hortonworks.com>.

> On July 10, 2017, 10:02 p.m., Prasanth_J wrote:
> > common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
> > Line 1723 (original), 1723 (patched)
> > <https://reviews.apache.org/r/60753/diff/1/?file=1773281#file1773281line1723>
> >
> >     I am not sure if we need this config. 
> >     Any reason to support storing FM sketch's bitvectors?
> >     
> >     If this is for 3.0.0 release only, we could even remove this config.
> >     
> >     If we need to support FM sketch based NDV then having a separate field in metastore will be better as Ashutosh suggested.

This config is just to give user an option to choose whether to use FM or HLL to COMPUTE ndv. This patch does not change the fact that we do not store bit vectors for both FM and NDV in object store.


- pengcheng


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/60753/#review180119
-----------------------------------------------------------


On July 10, 2017, 9:29 p.m., pengcheng xiong wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/60753/
> -----------------------------------------------------------
> 
> (Updated July 10, 2017, 9:29 p.m.)
> 
> 
> Review request for hive, Ashutosh Chauhan and Prasanth_J.
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> HIVE-16966
> 
> 
> Diffs
> -----
> 
>   common/pom.xml e6722babd8 
>   common/src/java/org/apache/hadoop/hive/common/HiveStatsUtils.java 7c9d72fbd2 
>   common/src/java/org/apache/hadoop/hive/common/ndv/NumDistinctValueEstimator.java PRE-CREATION 
>   common/src/java/org/apache/hadoop/hive/common/ndv/NumDistinctValueEstimatorFactory.java PRE-CREATION 
>   common/src/java/org/apache/hadoop/hive/common/ndv/hll/HLLConstants.java PRE-CREATION 
>   common/src/java/org/apache/hadoop/hive/common/ndv/hll/HLLDenseRegister.java PRE-CREATION 
>   common/src/java/org/apache/hadoop/hive/common/ndv/hll/HLLRegister.java PRE-CREATION 
>   common/src/java/org/apache/hadoop/hive/common/ndv/hll/HLLSparseRegister.java PRE-CREATION 
>   common/src/java/org/apache/hadoop/hive/common/ndv/hll/HyperLogLog.java PRE-CREATION 
>   common/src/java/org/apache/hadoop/hive/common/ndv/hll/HyperLogLogUtils.java PRE-CREATION 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 5700fb9325 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java.orig da48a7ccbd 
>   metastore/src/java/org/apache/hadoop/hive/metastore/NumDistinctValueEstimator.java 92f9a845e3 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/DecimalColumnStatsAggregator.java 36b2c9c56b 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/DoubleColumnStatsAggregator.java a88ef84e5c 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/LongColumnStatsAggregator.java 8ac6561aec 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/StringColumnStatsAggregator.java 2aa4046a46 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/ColumnStatsMerger.java 33c7e3e52c 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/ColumnStatsMergerFactory.java fe890e4e27 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/DateColumnStatsMerger.java 3179b23438 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/DecimalColumnStatsMerger.java c13add9d9c 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/DoubleColumnStatsMerger.java fbdba24b0a 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/LongColumnStatsMerger.java ac65590505 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/StringColumnStatsMerger.java 41587477d3 
>   pom.xml f9fae59a5d 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/DecimalNumDistinctValueEstimator.java a05906edfa 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/DoubleNumDistinctValueEstimator.java e76fc74dbc 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFComputeStats.java 2ebfcb2360 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/LongNumDistinctValueEstimator.java 1c197a028a 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/NumDistinctValueEstimator.java fa70f49857 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/StringNumDistinctValueEstimator.java 601901c163 
>   ql/src/test/queries/clientpositive/hll.q PRE-CREATION 
>   ql/src/test/results/clientpositive/hll.q.out PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/60753/diff/1/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> pengcheng xiong
> 
>


Re: Review Request 60753: Add HLL as an alternative to FM sketch to compute stats

Posted by Ashutosh Chauhan <ha...@apache.org>.

> On July 10, 2017, 10:02 p.m., Prasanth_J wrote:
> > common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
> > Line 1723 (original), 1723 (patched)
> > <https://reviews.apache.org/r/60753/diff/1/?file=1773281#file1773281line1723>
> >
> >     I am not sure if we need this config. 
> >     Any reason to support storing FM sketch's bitvectors?
> >     
> >     If this is for 3.0.0 release only, we could even remove this config.
> >     
> >     If we need to support FM sketch based NDV then having a separate field in metastore will be better as Ashutosh suggested.
> 
> pengcheng xiong wrote:
>     This config is just to give user an option to choose whether to use FM or HLL to COMPUTE ndv. This patch does not change the fact that we do not store bit vectors for both FM and NDV in object store.
> 
> Prasanth_J wrote:
>     I don't think it will be useful for user to tune this.
>     IMHO, most users won't care a lot about this. This can be used as fallback but this will only add more checks.

I agree with Prasanth. Overloading this config seems hacky. It is leading to confusing code, e.g, NDVEstimator uses number of bit vectors to determine ndv algo, thats confusing. Its confusing for end user too that +ve value from [0,100] is useful for one algo, but -ve value is only to switch over to diff algo. I suggest that we leave this config as is and introduce a new config which dictates which algo is used for ndv computation. This config value is than passed to compute_stats() udaf by ColumnStatsSemanticAnalyzer. GenericUDAFComputeStats than uses this config to determine which algo its using. 
This config we can store in metastore (either as is or mapped as int) so that we can deserialize the bit vectors correctly when we retrieve them.


- Ashutosh


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/60753/#review180119
-----------------------------------------------------------


On July 10, 2017, 9:29 p.m., pengcheng xiong wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/60753/
> -----------------------------------------------------------
> 
> (Updated July 10, 2017, 9:29 p.m.)
> 
> 
> Review request for hive, Ashutosh Chauhan and Prasanth_J.
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> HIVE-16966
> 
> 
> Diffs
> -----
> 
>   common/pom.xml e6722babd8 
>   common/src/java/org/apache/hadoop/hive/common/HiveStatsUtils.java 7c9d72fbd2 
>   common/src/java/org/apache/hadoop/hive/common/ndv/NumDistinctValueEstimator.java PRE-CREATION 
>   common/src/java/org/apache/hadoop/hive/common/ndv/NumDistinctValueEstimatorFactory.java PRE-CREATION 
>   common/src/java/org/apache/hadoop/hive/common/ndv/hll/HLLConstants.java PRE-CREATION 
>   common/src/java/org/apache/hadoop/hive/common/ndv/hll/HLLDenseRegister.java PRE-CREATION 
>   common/src/java/org/apache/hadoop/hive/common/ndv/hll/HLLRegister.java PRE-CREATION 
>   common/src/java/org/apache/hadoop/hive/common/ndv/hll/HLLSparseRegister.java PRE-CREATION 
>   common/src/java/org/apache/hadoop/hive/common/ndv/hll/HyperLogLog.java PRE-CREATION 
>   common/src/java/org/apache/hadoop/hive/common/ndv/hll/HyperLogLogUtils.java PRE-CREATION 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 5700fb9325 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java.orig da48a7ccbd 
>   metastore/src/java/org/apache/hadoop/hive/metastore/NumDistinctValueEstimator.java 92f9a845e3 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/DecimalColumnStatsAggregator.java 36b2c9c56b 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/DoubleColumnStatsAggregator.java a88ef84e5c 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/LongColumnStatsAggregator.java 8ac6561aec 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/StringColumnStatsAggregator.java 2aa4046a46 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/ColumnStatsMerger.java 33c7e3e52c 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/ColumnStatsMergerFactory.java fe890e4e27 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/DateColumnStatsMerger.java 3179b23438 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/DecimalColumnStatsMerger.java c13add9d9c 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/DoubleColumnStatsMerger.java fbdba24b0a 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/LongColumnStatsMerger.java ac65590505 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/StringColumnStatsMerger.java 41587477d3 
>   pom.xml f9fae59a5d 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/DecimalNumDistinctValueEstimator.java a05906edfa 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/DoubleNumDistinctValueEstimator.java e76fc74dbc 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFComputeStats.java 2ebfcb2360 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/LongNumDistinctValueEstimator.java 1c197a028a 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/NumDistinctValueEstimator.java fa70f49857 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/StringNumDistinctValueEstimator.java 601901c163 
>   ql/src/test/queries/clientpositive/hll.q PRE-CREATION 
>   ql/src/test/results/clientpositive/hll.q.out PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/60753/diff/1/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> pengcheng xiong
> 
>


Re: Review Request 60753: Add HLL as an alternative to FM sketch to compute stats

Posted by pengcheng xiong <px...@hortonworks.com>.

> On July 10, 2017, 10:02 p.m., Prasanth_J wrote:
> > common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
> > Line 1723 (original), 1723 (patched)
> > <https://reviews.apache.org/r/60753/diff/1/?file=1773281#file1773281line1723>
> >
> >     I am not sure if we need this config. 
> >     Any reason to support storing FM sketch's bitvectors?
> >     
> >     If this is for 3.0.0 release only, we could even remove this config.
> >     
> >     If we need to support FM sketch based NDV then having a separate field in metastore will be better as Ashutosh suggested.
> 
> pengcheng xiong wrote:
>     This config is just to give user an option to choose whether to use FM or HLL to COMPUTE ndv. This patch does not change the fact that we do not store bit vectors for both FM and NDV in object store.
> 
> Prasanth_J wrote:
>     I don't think it will be useful for user to tune this.
>     IMHO, most users won't care a lot about this. This can be used as fallback but this will only add more checks.
> 
> Ashutosh Chauhan wrote:
>     I agree with Prasanth. Overloading this config seems hacky. It is leading to confusing code, e.g, NDVEstimator uses number of bit vectors to determine ndv algo, thats confusing. Its confusing for end user too that +ve value from [0,100] is useful for one algo, but -ve value is only to switch over to diff algo. I suggest that we leave this config as is and introduce a new config which dictates which algo is used for ndv computation. This config value is than passed to compute_stats() udaf by ColumnStatsSemanticAnalyzer. GenericUDAFComputeStats than uses this config to determine which algo its using. 
>     This config we can store in metastore (either as is or mapped as int) so that we can deserialize the bit vectors correctly when we retrieve them.

I agree it is a bit confusing but then we need to store two extra information in a metastore: (1) the algorithm that we use, FMSketch or HLL and (2) the number of bitvectors for FMSketch.


- pengcheng


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/60753/#review180119
-----------------------------------------------------------


On July 10, 2017, 9:29 p.m., pengcheng xiong wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/60753/
> -----------------------------------------------------------
> 
> (Updated July 10, 2017, 9:29 p.m.)
> 
> 
> Review request for hive, Ashutosh Chauhan and Prasanth_J.
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> HIVE-16966
> 
> 
> Diffs
> -----
> 
>   common/pom.xml e6722babd8 
>   common/src/java/org/apache/hadoop/hive/common/HiveStatsUtils.java 7c9d72fbd2 
>   common/src/java/org/apache/hadoop/hive/common/ndv/NumDistinctValueEstimator.java PRE-CREATION 
>   common/src/java/org/apache/hadoop/hive/common/ndv/NumDistinctValueEstimatorFactory.java PRE-CREATION 
>   common/src/java/org/apache/hadoop/hive/common/ndv/hll/HLLConstants.java PRE-CREATION 
>   common/src/java/org/apache/hadoop/hive/common/ndv/hll/HLLDenseRegister.java PRE-CREATION 
>   common/src/java/org/apache/hadoop/hive/common/ndv/hll/HLLRegister.java PRE-CREATION 
>   common/src/java/org/apache/hadoop/hive/common/ndv/hll/HLLSparseRegister.java PRE-CREATION 
>   common/src/java/org/apache/hadoop/hive/common/ndv/hll/HyperLogLog.java PRE-CREATION 
>   common/src/java/org/apache/hadoop/hive/common/ndv/hll/HyperLogLogUtils.java PRE-CREATION 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 5700fb9325 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java.orig da48a7ccbd 
>   metastore/src/java/org/apache/hadoop/hive/metastore/NumDistinctValueEstimator.java 92f9a845e3 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/DecimalColumnStatsAggregator.java 36b2c9c56b 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/DoubleColumnStatsAggregator.java a88ef84e5c 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/LongColumnStatsAggregator.java 8ac6561aec 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/StringColumnStatsAggregator.java 2aa4046a46 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/ColumnStatsMerger.java 33c7e3e52c 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/ColumnStatsMergerFactory.java fe890e4e27 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/DateColumnStatsMerger.java 3179b23438 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/DecimalColumnStatsMerger.java c13add9d9c 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/DoubleColumnStatsMerger.java fbdba24b0a 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/LongColumnStatsMerger.java ac65590505 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/StringColumnStatsMerger.java 41587477d3 
>   pom.xml f9fae59a5d 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/DecimalNumDistinctValueEstimator.java a05906edfa 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/DoubleNumDistinctValueEstimator.java e76fc74dbc 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFComputeStats.java 2ebfcb2360 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/LongNumDistinctValueEstimator.java 1c197a028a 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/NumDistinctValueEstimator.java fa70f49857 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/StringNumDistinctValueEstimator.java 601901c163 
>   ql/src/test/queries/clientpositive/hll.q PRE-CREATION 
>   ql/src/test/results/clientpositive/hll.q.out PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/60753/diff/1/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> pengcheng xiong
> 
>


Re: Review Request 60753: Add HLL as an alternative to FM sketch to compute stats

Posted by Ashutosh Chauhan <ha...@apache.org>.

> On July 10, 2017, 10:02 p.m., Prasanth_J wrote:
> > common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
> > Line 1723 (original), 1723 (patched)
> > <https://reviews.apache.org/r/60753/diff/1/?file=1773281#file1773281line1723>
> >
> >     I am not sure if we need this config. 
> >     Any reason to support storing FM sketch's bitvectors?
> >     
> >     If this is for 3.0.0 release only, we could even remove this config.
> >     
> >     If we need to support FM sketch based NDV then having a separate field in metastore will be better as Ashutosh suggested.
> 
> pengcheng xiong wrote:
>     This config is just to give user an option to choose whether to use FM or HLL to COMPUTE ndv. This patch does not change the fact that we do not store bit vectors for both FM and NDV in object store.
> 
> Prasanth_J wrote:
>     I don't think it will be useful for user to tune this.
>     IMHO, most users won't care a lot about this. This can be used as fallback but this will only add more checks.
> 
> Ashutosh Chauhan wrote:
>     I agree with Prasanth. Overloading this config seems hacky. It is leading to confusing code, e.g, NDVEstimator uses number of bit vectors to determine ndv algo, thats confusing. Its confusing for end user too that +ve value from [0,100] is useful for one algo, but -ve value is only to switch over to diff algo. I suggest that we leave this config as is and introduce a new config which dictates which algo is used for ndv computation. This config value is than passed to compute_stats() udaf by ColumnStatsSemanticAnalyzer. GenericUDAFComputeStats than uses this config to determine which algo its using. 
>     This config we can store in metastore (either as is or mapped as int) so that we can deserialize the bit vectors correctly when we retrieve them.
> 
> pengcheng xiong wrote:
>     I agree it is a bit confusing but then we need to store two extra information in a metastore: (1) the algorithm that we use, FMSketch or HLL and (2) the number of bitvectors for FMSketch.

We can just store algo. For # of bit vectors for FMSketch we can use existing logic of counting '{' in serialized string for bit vectors.


- Ashutosh


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/60753/#review180119
-----------------------------------------------------------


On July 10, 2017, 9:29 p.m., pengcheng xiong wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/60753/
> -----------------------------------------------------------
> 
> (Updated July 10, 2017, 9:29 p.m.)
> 
> 
> Review request for hive, Ashutosh Chauhan and Prasanth_J.
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> HIVE-16966
> 
> 
> Diffs
> -----
> 
>   common/pom.xml e6722babd8 
>   common/src/java/org/apache/hadoop/hive/common/HiveStatsUtils.java 7c9d72fbd2 
>   common/src/java/org/apache/hadoop/hive/common/ndv/NumDistinctValueEstimator.java PRE-CREATION 
>   common/src/java/org/apache/hadoop/hive/common/ndv/NumDistinctValueEstimatorFactory.java PRE-CREATION 
>   common/src/java/org/apache/hadoop/hive/common/ndv/hll/HLLConstants.java PRE-CREATION 
>   common/src/java/org/apache/hadoop/hive/common/ndv/hll/HLLDenseRegister.java PRE-CREATION 
>   common/src/java/org/apache/hadoop/hive/common/ndv/hll/HLLRegister.java PRE-CREATION 
>   common/src/java/org/apache/hadoop/hive/common/ndv/hll/HLLSparseRegister.java PRE-CREATION 
>   common/src/java/org/apache/hadoop/hive/common/ndv/hll/HyperLogLog.java PRE-CREATION 
>   common/src/java/org/apache/hadoop/hive/common/ndv/hll/HyperLogLogUtils.java PRE-CREATION 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 5700fb9325 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java.orig da48a7ccbd 
>   metastore/src/java/org/apache/hadoop/hive/metastore/NumDistinctValueEstimator.java 92f9a845e3 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/DecimalColumnStatsAggregator.java 36b2c9c56b 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/DoubleColumnStatsAggregator.java a88ef84e5c 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/LongColumnStatsAggregator.java 8ac6561aec 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/StringColumnStatsAggregator.java 2aa4046a46 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/ColumnStatsMerger.java 33c7e3e52c 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/ColumnStatsMergerFactory.java fe890e4e27 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/DateColumnStatsMerger.java 3179b23438 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/DecimalColumnStatsMerger.java c13add9d9c 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/DoubleColumnStatsMerger.java fbdba24b0a 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/LongColumnStatsMerger.java ac65590505 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/StringColumnStatsMerger.java 41587477d3 
>   pom.xml f9fae59a5d 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/DecimalNumDistinctValueEstimator.java a05906edfa 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/DoubleNumDistinctValueEstimator.java e76fc74dbc 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFComputeStats.java 2ebfcb2360 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/LongNumDistinctValueEstimator.java 1c197a028a 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/NumDistinctValueEstimator.java fa70f49857 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/StringNumDistinctValueEstimator.java 601901c163 
>   ql/src/test/queries/clientpositive/hll.q PRE-CREATION 
>   ql/src/test/results/clientpositive/hll.q.out PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/60753/diff/1/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> pengcheng xiong
> 
>


Re: Review Request 60753: Add HLL as an alternative to FM sketch to compute stats

Posted by j....@gmail.com.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/60753/#review180119
-----------------------------------------------------------




common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
Line 1723 (original), 1723 (patched)
<https://reviews.apache.org/r/60753/#comment255142>

    I am not sure if we need this config. 
    Any reason to support storing FM sketch's bitvectors?
    
    If this is for 3.0.0 release only, we could even remove this config.
    
    If we need to support FM sketch based NDV then having a separate field in metastore will be better as Ashutosh suggested.


- Prasanth_J


On July 10, 2017, 9:29 p.m., pengcheng xiong wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/60753/
> -----------------------------------------------------------
> 
> (Updated July 10, 2017, 9:29 p.m.)
> 
> 
> Review request for hive, Ashutosh Chauhan and Prasanth_J.
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> HIVE-16966
> 
> 
> Diffs
> -----
> 
>   common/pom.xml e6722babd8 
>   common/src/java/org/apache/hadoop/hive/common/HiveStatsUtils.java 7c9d72fbd2 
>   common/src/java/org/apache/hadoop/hive/common/ndv/NumDistinctValueEstimator.java PRE-CREATION 
>   common/src/java/org/apache/hadoop/hive/common/ndv/NumDistinctValueEstimatorFactory.java PRE-CREATION 
>   common/src/java/org/apache/hadoop/hive/common/ndv/hll/HLLConstants.java PRE-CREATION 
>   common/src/java/org/apache/hadoop/hive/common/ndv/hll/HLLDenseRegister.java PRE-CREATION 
>   common/src/java/org/apache/hadoop/hive/common/ndv/hll/HLLRegister.java PRE-CREATION 
>   common/src/java/org/apache/hadoop/hive/common/ndv/hll/HLLSparseRegister.java PRE-CREATION 
>   common/src/java/org/apache/hadoop/hive/common/ndv/hll/HyperLogLog.java PRE-CREATION 
>   common/src/java/org/apache/hadoop/hive/common/ndv/hll/HyperLogLogUtils.java PRE-CREATION 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 5700fb9325 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java.orig da48a7ccbd 
>   metastore/src/java/org/apache/hadoop/hive/metastore/NumDistinctValueEstimator.java 92f9a845e3 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/DecimalColumnStatsAggregator.java 36b2c9c56b 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/DoubleColumnStatsAggregator.java a88ef84e5c 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/LongColumnStatsAggregator.java 8ac6561aec 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/StringColumnStatsAggregator.java 2aa4046a46 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/ColumnStatsMerger.java 33c7e3e52c 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/ColumnStatsMergerFactory.java fe890e4e27 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/DateColumnStatsMerger.java 3179b23438 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/DecimalColumnStatsMerger.java c13add9d9c 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/DoubleColumnStatsMerger.java fbdba24b0a 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/LongColumnStatsMerger.java ac65590505 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/StringColumnStatsMerger.java 41587477d3 
>   pom.xml f9fae59a5d 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/DecimalNumDistinctValueEstimator.java a05906edfa 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/DoubleNumDistinctValueEstimator.java e76fc74dbc 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFComputeStats.java 2ebfcb2360 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/LongNumDistinctValueEstimator.java 1c197a028a 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/NumDistinctValueEstimator.java fa70f49857 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/StringNumDistinctValueEstimator.java 601901c163 
>   ql/src/test/queries/clientpositive/hll.q PRE-CREATION 
>   ql/src/test/results/clientpositive/hll.q.out PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/60753/diff/1/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> pengcheng xiong
> 
>


Re: Review Request 60753: Add HLL as an alternative to FM sketch to compute stats

Posted by Lefty Leverenz <le...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/60753/#review180255
-----------------------------------------------------------



Typo:  The RB description says "HIVE-16966" but it should be HIVE-16996.

- Lefty Leverenz


On July 10, 2017, 9:29 p.m., pengcheng xiong wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/60753/
> -----------------------------------------------------------
> 
> (Updated July 10, 2017, 9:29 p.m.)
> 
> 
> Review request for hive, Ashutosh Chauhan and Prasanth_J.
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> HIVE-16966
> 
> 
> Diffs
> -----
> 
>   common/pom.xml e6722babd8 
>   common/src/java/org/apache/hadoop/hive/common/HiveStatsUtils.java 7c9d72fbd2 
>   common/src/java/org/apache/hadoop/hive/common/ndv/NumDistinctValueEstimator.java PRE-CREATION 
>   common/src/java/org/apache/hadoop/hive/common/ndv/NumDistinctValueEstimatorFactory.java PRE-CREATION 
>   common/src/java/org/apache/hadoop/hive/common/ndv/hll/HLLConstants.java PRE-CREATION 
>   common/src/java/org/apache/hadoop/hive/common/ndv/hll/HLLDenseRegister.java PRE-CREATION 
>   common/src/java/org/apache/hadoop/hive/common/ndv/hll/HLLRegister.java PRE-CREATION 
>   common/src/java/org/apache/hadoop/hive/common/ndv/hll/HLLSparseRegister.java PRE-CREATION 
>   common/src/java/org/apache/hadoop/hive/common/ndv/hll/HyperLogLog.java PRE-CREATION 
>   common/src/java/org/apache/hadoop/hive/common/ndv/hll/HyperLogLogUtils.java PRE-CREATION 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 5700fb9325 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java.orig da48a7ccbd 
>   metastore/src/java/org/apache/hadoop/hive/metastore/NumDistinctValueEstimator.java 92f9a845e3 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/DecimalColumnStatsAggregator.java 36b2c9c56b 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/DoubleColumnStatsAggregator.java a88ef84e5c 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/LongColumnStatsAggregator.java 8ac6561aec 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/StringColumnStatsAggregator.java 2aa4046a46 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/ColumnStatsMerger.java 33c7e3e52c 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/ColumnStatsMergerFactory.java fe890e4e27 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/DateColumnStatsMerger.java 3179b23438 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/DecimalColumnStatsMerger.java c13add9d9c 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/DoubleColumnStatsMerger.java fbdba24b0a 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/LongColumnStatsMerger.java ac65590505 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/StringColumnStatsMerger.java 41587477d3 
>   pom.xml f9fae59a5d 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/DecimalNumDistinctValueEstimator.java a05906edfa 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/DoubleNumDistinctValueEstimator.java e76fc74dbc 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFComputeStats.java 2ebfcb2360 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/LongNumDistinctValueEstimator.java 1c197a028a 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/NumDistinctValueEstimator.java fa70f49857 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/StringNumDistinctValueEstimator.java 601901c163 
>   ql/src/test/queries/clientpositive/hll.q PRE-CREATION 
>   ql/src/test/results/clientpositive/hll.q.out PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/60753/diff/1/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> pengcheng xiong
> 
>


Re: Review Request 60753: Add HLL as an alternative to FM sketch to compute stats

Posted by pengcheng xiong <px...@hortonworks.com>.

> On July 10, 2017, 9:42 p.m., Prasanth_J wrote:
> > common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
> > Line 1723 (original), 1723 (patched)
> > <https://reviews.apache.org/r/60753/diff/1/?file=1773281#file1773281line1723>
> >
> >     Does this mean any positive value will use FM Sketch?
> >     If so, how will storing bitvectors in metastore identify if the serialized representation is HLL or FM sketch?

yes. The following question is a very good question. Now, I am using a try catch block. First try to deser with HLL, then deser with FMSketch. Ashutosh suggested adding another field in metastore to distinguish it.


- pengcheng


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/60753/#review180111
-----------------------------------------------------------


On July 10, 2017, 9:29 p.m., pengcheng xiong wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/60753/
> -----------------------------------------------------------
> 
> (Updated July 10, 2017, 9:29 p.m.)
> 
> 
> Review request for hive, Ashutosh Chauhan and Prasanth_J.
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> HIVE-16966
> 
> 
> Diffs
> -----
> 
>   common/pom.xml e6722babd8 
>   common/src/java/org/apache/hadoop/hive/common/HiveStatsUtils.java 7c9d72fbd2 
>   common/src/java/org/apache/hadoop/hive/common/ndv/NumDistinctValueEstimator.java PRE-CREATION 
>   common/src/java/org/apache/hadoop/hive/common/ndv/NumDistinctValueEstimatorFactory.java PRE-CREATION 
>   common/src/java/org/apache/hadoop/hive/common/ndv/hll/HLLConstants.java PRE-CREATION 
>   common/src/java/org/apache/hadoop/hive/common/ndv/hll/HLLDenseRegister.java PRE-CREATION 
>   common/src/java/org/apache/hadoop/hive/common/ndv/hll/HLLRegister.java PRE-CREATION 
>   common/src/java/org/apache/hadoop/hive/common/ndv/hll/HLLSparseRegister.java PRE-CREATION 
>   common/src/java/org/apache/hadoop/hive/common/ndv/hll/HyperLogLog.java PRE-CREATION 
>   common/src/java/org/apache/hadoop/hive/common/ndv/hll/HyperLogLogUtils.java PRE-CREATION 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 5700fb9325 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java.orig da48a7ccbd 
>   metastore/src/java/org/apache/hadoop/hive/metastore/NumDistinctValueEstimator.java 92f9a845e3 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/DecimalColumnStatsAggregator.java 36b2c9c56b 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/DoubleColumnStatsAggregator.java a88ef84e5c 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/LongColumnStatsAggregator.java 8ac6561aec 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/StringColumnStatsAggregator.java 2aa4046a46 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/ColumnStatsMerger.java 33c7e3e52c 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/ColumnStatsMergerFactory.java fe890e4e27 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/DateColumnStatsMerger.java 3179b23438 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/DecimalColumnStatsMerger.java c13add9d9c 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/DoubleColumnStatsMerger.java fbdba24b0a 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/LongColumnStatsMerger.java ac65590505 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/StringColumnStatsMerger.java 41587477d3 
>   pom.xml f9fae59a5d 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/DecimalNumDistinctValueEstimator.java a05906edfa 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/DoubleNumDistinctValueEstimator.java e76fc74dbc 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFComputeStats.java 2ebfcb2360 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/LongNumDistinctValueEstimator.java 1c197a028a 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/NumDistinctValueEstimator.java fa70f49857 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/StringNumDistinctValueEstimator.java 601901c163 
>   ql/src/test/queries/clientpositive/hll.q PRE-CREATION 
>   ql/src/test/results/clientpositive/hll.q.out PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/60753/diff/1/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> pengcheng xiong
> 
>


Re: Review Request 60753: Add HLL as an alternative to FM sketch to compute stats

Posted by j....@gmail.com.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/60753/#review180111
-----------------------------------------------------------




common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
Line 1723 (original), 1723 (patched)
<https://reviews.apache.org/r/60753/#comment255137>

    Does this mean any positive value will use FM Sketch?
    If so, how will storing bitvectors in metastore identify if the serialized representation is HLL or FM sketch?


- Prasanth_J


On July 10, 2017, 9:29 p.m., pengcheng xiong wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/60753/
> -----------------------------------------------------------
> 
> (Updated July 10, 2017, 9:29 p.m.)
> 
> 
> Review request for hive, Ashutosh Chauhan and Prasanth_J.
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> HIVE-16966
> 
> 
> Diffs
> -----
> 
>   common/pom.xml e6722babd8 
>   common/src/java/org/apache/hadoop/hive/common/HiveStatsUtils.java 7c9d72fbd2 
>   common/src/java/org/apache/hadoop/hive/common/ndv/NumDistinctValueEstimator.java PRE-CREATION 
>   common/src/java/org/apache/hadoop/hive/common/ndv/NumDistinctValueEstimatorFactory.java PRE-CREATION 
>   common/src/java/org/apache/hadoop/hive/common/ndv/hll/HLLConstants.java PRE-CREATION 
>   common/src/java/org/apache/hadoop/hive/common/ndv/hll/HLLDenseRegister.java PRE-CREATION 
>   common/src/java/org/apache/hadoop/hive/common/ndv/hll/HLLRegister.java PRE-CREATION 
>   common/src/java/org/apache/hadoop/hive/common/ndv/hll/HLLSparseRegister.java PRE-CREATION 
>   common/src/java/org/apache/hadoop/hive/common/ndv/hll/HyperLogLog.java PRE-CREATION 
>   common/src/java/org/apache/hadoop/hive/common/ndv/hll/HyperLogLogUtils.java PRE-CREATION 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 5700fb9325 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java.orig da48a7ccbd 
>   metastore/src/java/org/apache/hadoop/hive/metastore/NumDistinctValueEstimator.java 92f9a845e3 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/DecimalColumnStatsAggregator.java 36b2c9c56b 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/DoubleColumnStatsAggregator.java a88ef84e5c 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/LongColumnStatsAggregator.java 8ac6561aec 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/StringColumnStatsAggregator.java 2aa4046a46 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/ColumnStatsMerger.java 33c7e3e52c 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/ColumnStatsMergerFactory.java fe890e4e27 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/DateColumnStatsMerger.java 3179b23438 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/DecimalColumnStatsMerger.java c13add9d9c 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/DoubleColumnStatsMerger.java fbdba24b0a 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/LongColumnStatsMerger.java ac65590505 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/StringColumnStatsMerger.java 41587477d3 
>   pom.xml f9fae59a5d 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/DecimalNumDistinctValueEstimator.java a05906edfa 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/DoubleNumDistinctValueEstimator.java e76fc74dbc 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFComputeStats.java 2ebfcb2360 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/LongNumDistinctValueEstimator.java 1c197a028a 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/NumDistinctValueEstimator.java fa70f49857 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/StringNumDistinctValueEstimator.java 601901c163 
>   ql/src/test/queries/clientpositive/hll.q PRE-CREATION 
>   ql/src/test/results/clientpositive/hll.q.out PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/60753/diff/1/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> pengcheng xiong
> 
>


Re: Review Request 60753: Add HLL as an alternative to FM sketch to compute stats

Posted by pengcheng xiong <px...@hortonworks.com>.

> On July 14, 2017, 12:55 a.m., Ashutosh Chauhan wrote:
> > common/src/java/org/apache/hadoop/hive/common/ndv/NumDistinctValueEstimatorFactory.java
> > Lines 30-34 (patched)
> > <https://reviews.apache.org/r/60753/diff/2/?file=1775584#file1775584line30>
> >
> >     As discussed this determination should happen using first byte of serialized string.

That is true. But I would prefer to do it in the next following up jira as this will totally change the serde of hll as well as fm. And, as I analyzed, I may need to do bitpacking for fm sketch in order to fit 64k limit for varchar in the worst case.


- pengcheng


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/60753/#review180494
-----------------------------------------------------------


On July 14, 2017, 12:08 a.m., pengcheng xiong wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/60753/
> -----------------------------------------------------------
> 
> (Updated July 14, 2017, 12:08 a.m.)
> 
> 
> Review request for hive, Ashutosh Chauhan and Prasanth_J.
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> HIVE-16966
> 
> 
> Diffs
> -----
> 
>   common/pom.xml 023f084511 
>   common/src/java/org/apache/hadoop/hive/common/ndv/NumDistinctValueEstimatorFactory.java PRE-CREATION 
>   common/src/java/org/apache/hadoop/hive/common/ndv/hll/HLLConstants.java PRE-CREATION 
>   common/src/java/org/apache/hadoop/hive/common/ndv/hll/HLLDenseRegister.java PRE-CREATION 
>   common/src/java/org/apache/hadoop/hive/common/ndv/hll/HLLRegister.java PRE-CREATION 
>   common/src/java/org/apache/hadoop/hive/common/ndv/hll/HLLSparseRegister.java PRE-CREATION 
>   common/src/java/org/apache/hadoop/hive/common/ndv/hll/HyperLogLog.java PRE-CREATION 
>   common/src/java/org/apache/hadoop/hive/common/ndv/hll/HyperLogLogUtils.java PRE-CREATION 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java c7afe2bc4a 
>   metastore/src/java/org/apache/hadoop/hive/metastore/NumDistinctValueEstimator.java 92f9a845e3 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/StatsCache.java 18f8afc9ad 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/ColumnStatsAggregator.java 31955b4363 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/ColumnStatsAggregatorFactory.java daf85692eb 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/DecimalColumnStatsAggregator.java 36b2c9c56b 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/DoubleColumnStatsAggregator.java a88ef84e5c 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/LongColumnStatsAggregator.java 8ac6561aec 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/StringColumnStatsAggregator.java 2aa4046a46 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/ColumnStatsMerger.java 33c7e3e52c 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/ColumnStatsMergerFactory.java fe890e4e27 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/DateColumnStatsMerger.java 3179b23438 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/DecimalColumnStatsMerger.java c13add9d9c 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/DoubleColumnStatsMerger.java fbdba24b0a 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/LongColumnStatsMerger.java ac65590505 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/StringColumnStatsMerger.java 41587477d3 
>   metastore/src/test/org/apache/hadoop/hive/metastore/hbase/TestHBaseAggregateStatsNDVUniformDist.java 87b1ac870d 
>   pom.xml 32f5fd1493 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/ColumnStatsSemanticAnalyzer.java 0a5cf00c44 
>   ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUtils.java 76f7daeb1b 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/DecimalNumDistinctValueEstimator.java a05906edfa 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/DoubleNumDistinctValueEstimator.java e76fc74dbc 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFComputeStats.java 2ebfcb2360 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/LongNumDistinctValueEstimator.java 1c197a028a 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/NumDistinctValueEstimator.java fa70f49857 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/StringNumDistinctValueEstimator.java 601901c163 
>   ql/src/test/queries/clientpositive/char_udf1.q 39aa0e0e17 
>   ql/src/test/queries/clientpositive/compute_stats_date.q 09128f6fb9 
>   ql/src/test/queries/clientpositive/compute_stats_decimal.q 76e1468ada 
>   ql/src/test/queries/clientpositive/compute_stats_double.q 7a1e0f6295 
>   ql/src/test/queries/clientpositive/compute_stats_long.q 6a2070f780 
>   ql/src/test/queries/clientpositive/compute_stats_string.q 0023e7f6bd 
>   ql/src/test/queries/clientpositive/hll.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/reduceSinkDeDuplication_pRS_key_empty.q 8bbae3914d 
>   ql/src/test/queries/clientpositive/varchar_udf1.q 4d1f884ea7 
>   ql/src/test/queries/clientpositive/vector_udf1.q 48d3e1ee4d 
>   ql/src/test/results/clientpositive/alter_partition_update_status.q.out 922822e6d2 
>   ql/src/test/results/clientpositive/alter_table_column_stats.q.out 2cc7cbc7b6 
>   ql/src/test/results/clientpositive/alter_table_update_status.q.out e26e8cba1c 
>   ql/src/test/results/clientpositive/analyze_tbl_part.q.out ed90b6fc92 
>   ql/src/test/results/clientpositive/annotate_stats_deep_filters.q.out 95dd6abaec 
>   ql/src/test/results/clientpositive/annotate_stats_groupby.q.out a8e4854a00 
>   ql/src/test/results/clientpositive/annotate_stats_join.q.out c1a140b558 
>   ql/src/test/results/clientpositive/annotate_stats_join_pkfk.q.out f7d73c9ddf 
>   ql/src/test/results/clientpositive/autoColumnStats_4.q.out fe3b9e53ef 
>   ql/src/test/results/clientpositive/autoColumnStats_5.q.out e19fb5f504 
>   ql/src/test/results/clientpositive/autoColumnStats_6.q.out 29b3373e10 
>   ql/src/test/results/clientpositive/autoColumnStats_7.q.out 9d24bc53ab 
>   ql/src/test/results/clientpositive/autoColumnStats_8.q.out 681d962ed0 
>   ql/src/test/results/clientpositive/autoColumnStats_9.q.out d26e2c02b7 
>   ql/src/test/results/clientpositive/auto_join_without_localtask.q.out 17a912ec13 
>   ql/src/test/results/clientpositive/avro_decimal.q.out 5a3b72defe 
>   ql/src/test/results/clientpositive/avro_decimal_native.q.out fe77512191 
>   ql/src/test/results/clientpositive/cbo_rp_annotate_stats_groupby.q.out f260f034b6 
>   ql/src/test/results/clientpositive/cbo_rp_join0.q.out b9cf3ceab4 
>   ql/src/test/results/clientpositive/char_udf1.q.out 07ce108a75 
>   ql/src/test/results/clientpositive/colstats_all_nulls.q.out 14c5d5b59b 
>   ql/src/test/results/clientpositive/column_pruner_multiple_children.q.out 96feeed49c 
>   ql/src/test/results/clientpositive/columnstats_partlvl.q.out 07d26e92bb 
>   ql/src/test/results/clientpositive/columnstats_partlvl_dp.q.out 468d2e797b 
>   ql/src/test/results/clientpositive/columnstats_quoting.q.out 52e35385a1 
>   ql/src/test/results/clientpositive/columnstats_tbllvl.q.out 462d4c1771 
>   ql/src/test/results/clientpositive/compute_stats_date.q.out c2472377a8 
>   ql/src/test/results/clientpositive/compute_stats_decimal.q.out e0584c50a8 
>   ql/src/test/results/clientpositive/compute_stats_double.q.out 5b921735f0 
>   ql/src/test/results/clientpositive/compute_stats_long.q.out 119d1731cc 
>   ql/src/test/results/clientpositive/compute_stats_string.q.out 8c40490bc0 
>   ql/src/test/results/clientpositive/confirm_initial_tbl_stats.q.out faa14ba9c5 
>   ql/src/test/results/clientpositive/constant_prop_2.q.out 24be5188e2 
>   ql/src/test/results/clientpositive/correlated_join_keys.q.out ec5d008728 
>   ql/src/test/results/clientpositive/cross_join_merge.q.out f4956ded22 
>   ql/src/test/results/clientpositive/decimal_stats.q.out 5d86866e2a 
>   ql/src/test/results/clientpositive/describe_table.q.out 7869494252 
>   ql/src/test/results/clientpositive/display_colstats_tbllvl.q.out a4b18d7cec 
>   ql/src/test/results/clientpositive/encrypted/encryption_move_tbl.q.out 06f8408327 
>   ql/src/test/results/clientpositive/exec_parallel_column_stats.q.out f256ec11bf 
>   ql/src/test/results/clientpositive/hll.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/infer_bucket_sort.q.out 1aea388815 
>   ql/src/test/results/clientpositive/join32.q.out a191284aca 
>   ql/src/test/results/clientpositive/join33.q.out a191284aca 
>   ql/src/test/results/clientpositive/join_parse.q.out 17733acdc3 
>   ql/src/test/results/clientpositive/llap/autoColumnStats_2.q.out ec209b2ef1 
>   ql/src/test/results/clientpositive/llap/auto_join1.q.out 6a0a1d5d09 
>   ql/src/test/results/clientpositive/llap/auto_join21.q.out 25cd67e7f5 
>   ql/src/test/results/clientpositive/llap/auto_join29.q.out f693ce4512 
>   ql/src/test/results/clientpositive/llap/auto_join30.q.out 91a80127a9 
>   ql/src/test/results/clientpositive/llap/cluster.q.out 2fa976b6d5 
>   ql/src/test/results/clientpositive/llap/column_table_stats.q.out fb04ee8cf9 
>   ql/src/test/results/clientpositive/llap/column_table_stats_orc.q.out d55cf30331 
>   ql/src/test/results/clientpositive/llap/columnstats_part_coltype.q.out dc50fb7fc1 
>   ql/src/test/results/clientpositive/llap/correlationoptimizer1.q.out fab5c9c029 
>   ql/src/test/results/clientpositive/llap/correlationoptimizer2.q.out ee645410b9 
>   ql/src/test/results/clientpositive/llap/correlationoptimizer3.q.out f1af97d98d 
>   ql/src/test/results/clientpositive/llap/correlationoptimizer6.q.out 9b71d3ec82 
>   ql/src/test/results/clientpositive/llap/count_dist_rewrite.q.out 6d411365b8 
>   ql/src/test/results/clientpositive/llap/cross_join.q.out ae3f9bf6f5 
>   ql/src/test/results/clientpositive/llap/dynamic_semijoin_reduction.q.out 7fd7e20e15 
>   ql/src/test/results/clientpositive/llap/dynamic_semijoin_user_level.q.out c4b18b7118 
>   ql/src/test/results/clientpositive/llap/dynpart_sort_optimization2.q.out d4811d64d7 
>   ql/src/test/results/clientpositive/llap/explainuser_1.q.out 87b8cae445 
>   ql/src/test/results/clientpositive/llap/explainuser_2.q.out f8a6526c67 
>   ql/src/test/results/clientpositive/llap/explainuser_4.q.out 95d13216e7 
>   ql/src/test/results/clientpositive/llap/extrapolate_part_stats_partial_ndv.q.out d97223c9d0 
>   ql/src/test/results/clientpositive/llap/filter_union.q.out c4af31715c 
>   ql/src/test/results/clientpositive/llap/groupby1.q.out 0eecbb6f4e 
>   ql/src/test/results/clientpositive/llap/groupby2.q.out 29b85d1f44 
>   ql/src/test/results/clientpositive/llap/groupby_resolution.q.out f2a6ab05ac 
>   ql/src/test/results/clientpositive/llap/having.q.out 267254c0de 
>   ql/src/test/results/clientpositive/llap/hybridgrace_hashjoin_1.q.out 083bfc301c 
>   ql/src/test/results/clientpositive/llap/hybridgrace_hashjoin_2.q.out a59188a46a 
>   ql/src/test/results/clientpositive/llap/identity_project_remove_skip.q.out b03b96b463 
>   ql/src/test/results/clientpositive/llap/jdbc_handler.q.out b4feb0ee1b 
>   ql/src/test/results/clientpositive/llap/join1.q.out d79a405a41 
>   ql/src/test/results/clientpositive/llap/join32_lessSize.q.out c226eed126 
>   ql/src/test/results/clientpositive/llap/join_max_hashtable.q.out 85d45fe712 
>   ql/src/test/results/clientpositive/llap/limit_join_transpose.q.out 99d8d6f664 
>   ql/src/test/results/clientpositive/llap/limit_pushdown.q.out 93315e6b1b 
>   ql/src/test/results/clientpositive/llap/limit_pushdown3.q.out 351ee01b45 
>   ql/src/test/results/clientpositive/llap/llap_smb.q.out e3044baf52 
>   ql/src/test/results/clientpositive/llap/llap_stats.q.out f81ad50679 
>   ql/src/test/results/clientpositive/llap/llap_vector_nohybridgrace.q.out 0e9e1207cb 
>   ql/src/test/results/clientpositive/llap/llapdecider.q.out 69312cd6a2 
>   ql/src/test/results/clientpositive/llap/merge1.q.out 8021b67733 
>   ql/src/test/results/clientpositive/llap/merge2.q.out 7bcdd2d57e 
>   ql/src/test/results/clientpositive/llap/mergejoin.q.out 10fb45d284 
>   ql/src/test/results/clientpositive/llap/mrr.q.out fe477fd815 
>   ql/src/test/results/clientpositive/llap/multiMapJoin2.q.out 25378d3e8b 
>   ql/src/test/results/clientpositive/llap/offset_limit.q.out adfeb05448 
>   ql/src/test/results/clientpositive/llap/offset_limit_ppd_optimizer.q.out 48e58e1bd9 
>   ql/src/test/results/clientpositive/llap/parallel_colstats.q.out 95ed8b813a 
>   ql/src/test/results/clientpositive/llap/ptf.q.out fbaf1e6474 
>   ql/src/test/results/clientpositive/llap/ptf_streaming.q.out 6013c11c9e 
>   ql/src/test/results/clientpositive/llap/reduce_deduplicate_extended.q.out bc44db795a 
>   ql/src/test/results/clientpositive/llap/semijoin_hint.q.out 76c985e727 
>   ql/src/test/results/clientpositive/llap/skewjoin.q.out dc79b26020 
>   ql/src/test/results/clientpositive/llap/subquery_exists.q.out 0749872253 
>   ql/src/test/results/clientpositive/llap/subquery_in.q.out 1d5f547adf 
>   ql/src/test/results/clientpositive/llap/subquery_multi.q.out 41b8ea3f25 
>   ql/src/test/results/clientpositive/llap/subquery_notin.q.out c760af2f87 
>   ql/src/test/results/clientpositive/llap/subquery_scalar.q.out 03d5f191b6 
>   ql/src/test/results/clientpositive/llap/subquery_select.q.out 4917c42b0e 
>   ql/src/test/results/clientpositive/llap/subquery_views.q.out b64e0f49c6 
>   ql/src/test/results/clientpositive/llap/sysdb.q.out fbbf8d9b7f 
>   ql/src/test/results/clientpositive/llap/tez_dml.q.out 786929e7af 
>   ql/src/test/results/clientpositive/llap/tez_dynpart_hashjoin_1.q.out 626a0640da 
>   ql/src/test/results/clientpositive/llap/tez_dynpart_hashjoin_2.q.out 273cf06e31 
>   ql/src/test/results/clientpositive/llap/tez_dynpart_hashjoin_3.q.out 04e0f86fe2 
>   ql/src/test/results/clientpositive/llap/tez_join_hash.q.out 92a188e18a 
>   ql/src/test/results/clientpositive/llap/tez_join_tests.q.out 1e1b63d95e 
>   ql/src/test/results/clientpositive/llap/tez_joins_explain.q.out 5297d7ecbc 
>   ql/src/test/results/clientpositive/llap/tez_smb_main.q.out 66d7aeca70 
>   ql/src/test/results/clientpositive/llap/tez_union.q.out c72b232b35 
>   ql/src/test/results/clientpositive/llap/tez_union2.q.out 7b45c7c719 
>   ql/src/test/results/clientpositive/llap/tez_union_multiinsert.q.out 5e0d072095 
>   ql/src/test/results/clientpositive/llap/tez_vector_dynpart_hashjoin_1.q.out a38e339012 
>   ql/src/test/results/clientpositive/llap/tez_vector_dynpart_hashjoin_2.q.out f7c694bf84 
>   ql/src/test/results/clientpositive/llap/unionDistinct_1.q.out 602d57625c 
>   ql/src/test/results/clientpositive/llap/union_top_level.q.out 268e0413cd 
>   ql/src/test/results/clientpositive/llap/varchar_udf1.q.out 8162305df6 
>   ql/src/test/results/clientpositive/llap/vector_groupby_mapjoin.q.out 16b716c4e5 
>   ql/src/test/results/clientpositive/llap/vector_left_outer_join.q.out 6118c73c2b 
>   ql/src/test/results/clientpositive/llap/vector_mapjoin_reduce.q.out 8df5a64fee 
>   ql/src/test/results/clientpositive/llap/vector_outer_join1.q.out d95604d722 
>   ql/src/test/results/clientpositive/llap/vector_outer_join2.q.out 16de760424 
>   ql/src/test/results/clientpositive/llap/vector_udf1.q.out 16edaacf94 
>   ql/src/test/results/clientpositive/llap/vectorization_0.q.out fba9c07350 
>   ql/src/test/results/clientpositive/llap/vectorization_8.q.out 0d5b6d53e0 
>   ql/src/test/results/clientpositive/llap/vectorization_short_regress.q.out 00577620d8 
>   ql/src/test/results/clientpositive/llap/vectorized_distinct_gby.q.out c3e5f7c90d 
>   ql/src/test/results/clientpositive/llap/vectorized_dynamic_semijoin_reduction.q.out 9a1c44c3e6 
>   ql/src/test/results/clientpositive/llap/vectorized_dynamic_semijoin_reduction2.q.out a03466f859 
>   ql/src/test/results/clientpositive/llap/vectorized_mapjoin.q.out 8160bc7c44 
>   ql/src/test/results/clientpositive/llap/vectorized_multi_output_select.q.out f744eb6513 
>   ql/src/test/results/clientpositive/llap/vectorized_nested_mapjoin.q.out 28a8340a9c 
>   ql/src/test/results/clientpositive/llap/vectorized_shufflejoin.q.out 73ab9fca82 
>   ql/src/test/results/clientpositive/llap/windowing_gby.q.out 2c47b8b2a6 
>   ql/src/test/results/clientpositive/parallel_colstats.q.out c85113137b 
>   ql/src/test/results/clientpositive/partial_column_stats.q.out 5876efacf3 
>   ql/src/test/results/clientpositive/partition_coltype_literals.q.out 3505556029 
>   ql/src/test/results/clientpositive/reduceSinkDeDuplication_pRS_key_empty.q.out 3ad09e815c 
>   ql/src/test/results/clientpositive/remove_exprs_stats.q.out 33cf90ae9d 
>   ql/src/test/results/clientpositive/rename_external_partition_location.q.out ec4076f908 
>   ql/src/test/results/clientpositive/rename_table_update_column_stats.q.out b3d6f039ac 
>   ql/src/test/results/clientpositive/temp_table_display_colstats_tbllvl.q.out bd024a7ab1 
>   ql/src/test/results/clientpositive/tez/explainanalyze_1.q.out 6602222ed7 
>   ql/src/test/results/clientpositive/tez/explainanalyze_2.q.out 0916565f0f 
>   ql/src/test/results/clientpositive/tez/explainanalyze_3.q.out e5c8d6c51e 
>   ql/src/test/results/clientpositive/tez/explainanalyze_4.q.out 9fbe8c5263 
>   ql/src/test/results/clientpositive/tez/explainanalyze_5.q.out b35e294813 
>   ql/src/test/results/clientpositive/tez/explainuser_3.q.out 65c9114b20 
>   ql/src/test/results/clientpositive/tez/hybridgrace_hashjoin_1.q.out 0a1e039cf1 
>   ql/src/test/results/clientpositive/tez/hybridgrace_hashjoin_2.q.out 6f5a3a96ca 
>   ql/src/test/results/clientpositive/tez/vectorization_limit.q.out afcae8c34c 
>   ql/src/test/results/clientpositive/tunable_ndv.q.out 6ae54b4927 
> 
> 
> Diff: https://reviews.apache.org/r/60753/diff/3/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> pengcheng xiong
> 
>


Re: Review Request 60753: Add HLL as an alternative to FM sketch to compute stats

Posted by j....@gmail.com.

> On July 14, 2017, 12:55 a.m., Ashutosh Chauhan wrote:
> > common/src/java/org/apache/hadoop/hive/common/ndv/hll/HLLSparseRegister.java
> > Lines 28-30 (patched)
> > <https://reviews.apache.org/r/60753/diff/2/?file=1775588#file1775588line28>
> >
> >     Is it really worth to have a dependency on 3rd party jar for this class. Will using TreeMap be really slow?
> >     Javolution library on which we already have a dependcy also provides these fast data structures. If really needed, we shall use those instead.

The original dependency was added to a have pretty flat treemap + list. Fast path inserts go to list which will get sorted and merged to treemap. Fastutil has primitive hash maps/lists with minimal object allocations. 
Since we are using HLL in the context of ANALYZE I think this will not have a huge impact in terms performance. Also the threshold to switch between SPARSE to DENSE is pretty low (in the order of 1000s), object allocations in inner loop will not be significant. 

I agree with Ashutosh, we can remove this runtime dependency as it will not have big impact. Replacing with java Treemap should work pretty well. If using Treemap becomes a bottleneck we can replace it later on in a follow up jira.

Also I would recommend having these JUnit tests as well to make sure Treemap replacement goes well
https://github.com/prasanthj/hyperloglog/tree/master/src/test/com/github/prasanthj/hll


- Prasanth_J


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/60753/#review180494
-----------------------------------------------------------


On July 14, 2017, 12:08 a.m., pengcheng xiong wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/60753/
> -----------------------------------------------------------
> 
> (Updated July 14, 2017, 12:08 a.m.)
> 
> 
> Review request for hive, Ashutosh Chauhan and Prasanth_J.
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> HIVE-16966
> 
> 
> Diffs
> -----
> 
>   common/pom.xml 023f084511 
>   common/src/java/org/apache/hadoop/hive/common/ndv/NumDistinctValueEstimatorFactory.java PRE-CREATION 
>   common/src/java/org/apache/hadoop/hive/common/ndv/hll/HLLConstants.java PRE-CREATION 
>   common/src/java/org/apache/hadoop/hive/common/ndv/hll/HLLDenseRegister.java PRE-CREATION 
>   common/src/java/org/apache/hadoop/hive/common/ndv/hll/HLLRegister.java PRE-CREATION 
>   common/src/java/org/apache/hadoop/hive/common/ndv/hll/HLLSparseRegister.java PRE-CREATION 
>   common/src/java/org/apache/hadoop/hive/common/ndv/hll/HyperLogLog.java PRE-CREATION 
>   common/src/java/org/apache/hadoop/hive/common/ndv/hll/HyperLogLogUtils.java PRE-CREATION 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java c7afe2bc4a 
>   metastore/src/java/org/apache/hadoop/hive/metastore/NumDistinctValueEstimator.java 92f9a845e3 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/StatsCache.java 18f8afc9ad 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/ColumnStatsAggregator.java 31955b4363 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/ColumnStatsAggregatorFactory.java daf85692eb 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/DecimalColumnStatsAggregator.java 36b2c9c56b 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/DoubleColumnStatsAggregator.java a88ef84e5c 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/LongColumnStatsAggregator.java 8ac6561aec 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/StringColumnStatsAggregator.java 2aa4046a46 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/ColumnStatsMerger.java 33c7e3e52c 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/ColumnStatsMergerFactory.java fe890e4e27 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/DateColumnStatsMerger.java 3179b23438 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/DecimalColumnStatsMerger.java c13add9d9c 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/DoubleColumnStatsMerger.java fbdba24b0a 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/LongColumnStatsMerger.java ac65590505 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/StringColumnStatsMerger.java 41587477d3 
>   metastore/src/test/org/apache/hadoop/hive/metastore/hbase/TestHBaseAggregateStatsNDVUniformDist.java 87b1ac870d 
>   pom.xml 32f5fd1493 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/ColumnStatsSemanticAnalyzer.java 0a5cf00c44 
>   ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUtils.java 76f7daeb1b 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/DecimalNumDistinctValueEstimator.java a05906edfa 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/DoubleNumDistinctValueEstimator.java e76fc74dbc 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFComputeStats.java 2ebfcb2360 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/LongNumDistinctValueEstimator.java 1c197a028a 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/NumDistinctValueEstimator.java fa70f49857 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/StringNumDistinctValueEstimator.java 601901c163 
>   ql/src/test/queries/clientpositive/char_udf1.q 39aa0e0e17 
>   ql/src/test/queries/clientpositive/compute_stats_date.q 09128f6fb9 
>   ql/src/test/queries/clientpositive/compute_stats_decimal.q 76e1468ada 
>   ql/src/test/queries/clientpositive/compute_stats_double.q 7a1e0f6295 
>   ql/src/test/queries/clientpositive/compute_stats_long.q 6a2070f780 
>   ql/src/test/queries/clientpositive/compute_stats_string.q 0023e7f6bd 
>   ql/src/test/queries/clientpositive/hll.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/reduceSinkDeDuplication_pRS_key_empty.q 8bbae3914d 
>   ql/src/test/queries/clientpositive/varchar_udf1.q 4d1f884ea7 
>   ql/src/test/queries/clientpositive/vector_udf1.q 48d3e1ee4d 
>   ql/src/test/results/clientpositive/alter_partition_update_status.q.out 922822e6d2 
>   ql/src/test/results/clientpositive/alter_table_column_stats.q.out 2cc7cbc7b6 
>   ql/src/test/results/clientpositive/alter_table_update_status.q.out e26e8cba1c 
>   ql/src/test/results/clientpositive/analyze_tbl_part.q.out ed90b6fc92 
>   ql/src/test/results/clientpositive/annotate_stats_deep_filters.q.out 95dd6abaec 
>   ql/src/test/results/clientpositive/annotate_stats_groupby.q.out a8e4854a00 
>   ql/src/test/results/clientpositive/annotate_stats_join.q.out c1a140b558 
>   ql/src/test/results/clientpositive/annotate_stats_join_pkfk.q.out f7d73c9ddf 
>   ql/src/test/results/clientpositive/autoColumnStats_4.q.out fe3b9e53ef 
>   ql/src/test/results/clientpositive/autoColumnStats_5.q.out e19fb5f504 
>   ql/src/test/results/clientpositive/autoColumnStats_6.q.out 29b3373e10 
>   ql/src/test/results/clientpositive/autoColumnStats_7.q.out 9d24bc53ab 
>   ql/src/test/results/clientpositive/autoColumnStats_8.q.out 681d962ed0 
>   ql/src/test/results/clientpositive/autoColumnStats_9.q.out d26e2c02b7 
>   ql/src/test/results/clientpositive/auto_join_without_localtask.q.out 17a912ec13 
>   ql/src/test/results/clientpositive/avro_decimal.q.out 5a3b72defe 
>   ql/src/test/results/clientpositive/avro_decimal_native.q.out fe77512191 
>   ql/src/test/results/clientpositive/cbo_rp_annotate_stats_groupby.q.out f260f034b6 
>   ql/src/test/results/clientpositive/cbo_rp_join0.q.out b9cf3ceab4 
>   ql/src/test/results/clientpositive/char_udf1.q.out 07ce108a75 
>   ql/src/test/results/clientpositive/colstats_all_nulls.q.out 14c5d5b59b 
>   ql/src/test/results/clientpositive/column_pruner_multiple_children.q.out 96feeed49c 
>   ql/src/test/results/clientpositive/columnstats_partlvl.q.out 07d26e92bb 
>   ql/src/test/results/clientpositive/columnstats_partlvl_dp.q.out 468d2e797b 
>   ql/src/test/results/clientpositive/columnstats_quoting.q.out 52e35385a1 
>   ql/src/test/results/clientpositive/columnstats_tbllvl.q.out 462d4c1771 
>   ql/src/test/results/clientpositive/compute_stats_date.q.out c2472377a8 
>   ql/src/test/results/clientpositive/compute_stats_decimal.q.out e0584c50a8 
>   ql/src/test/results/clientpositive/compute_stats_double.q.out 5b921735f0 
>   ql/src/test/results/clientpositive/compute_stats_long.q.out 119d1731cc 
>   ql/src/test/results/clientpositive/compute_stats_string.q.out 8c40490bc0 
>   ql/src/test/results/clientpositive/confirm_initial_tbl_stats.q.out faa14ba9c5 
>   ql/src/test/results/clientpositive/constant_prop_2.q.out 24be5188e2 
>   ql/src/test/results/clientpositive/correlated_join_keys.q.out ec5d008728 
>   ql/src/test/results/clientpositive/cross_join_merge.q.out f4956ded22 
>   ql/src/test/results/clientpositive/decimal_stats.q.out 5d86866e2a 
>   ql/src/test/results/clientpositive/describe_table.q.out 7869494252 
>   ql/src/test/results/clientpositive/display_colstats_tbllvl.q.out a4b18d7cec 
>   ql/src/test/results/clientpositive/encrypted/encryption_move_tbl.q.out 06f8408327 
>   ql/src/test/results/clientpositive/exec_parallel_column_stats.q.out f256ec11bf 
>   ql/src/test/results/clientpositive/hll.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/infer_bucket_sort.q.out 1aea388815 
>   ql/src/test/results/clientpositive/join32.q.out a191284aca 
>   ql/src/test/results/clientpositive/join33.q.out a191284aca 
>   ql/src/test/results/clientpositive/join_parse.q.out 17733acdc3 
>   ql/src/test/results/clientpositive/llap/autoColumnStats_2.q.out ec209b2ef1 
>   ql/src/test/results/clientpositive/llap/auto_join1.q.out 6a0a1d5d09 
>   ql/src/test/results/clientpositive/llap/auto_join21.q.out 25cd67e7f5 
>   ql/src/test/results/clientpositive/llap/auto_join29.q.out f693ce4512 
>   ql/src/test/results/clientpositive/llap/auto_join30.q.out 91a80127a9 
>   ql/src/test/results/clientpositive/llap/cluster.q.out 2fa976b6d5 
>   ql/src/test/results/clientpositive/llap/column_table_stats.q.out fb04ee8cf9 
>   ql/src/test/results/clientpositive/llap/column_table_stats_orc.q.out d55cf30331 
>   ql/src/test/results/clientpositive/llap/columnstats_part_coltype.q.out dc50fb7fc1 
>   ql/src/test/results/clientpositive/llap/correlationoptimizer1.q.out fab5c9c029 
>   ql/src/test/results/clientpositive/llap/correlationoptimizer2.q.out ee645410b9 
>   ql/src/test/results/clientpositive/llap/correlationoptimizer3.q.out f1af97d98d 
>   ql/src/test/results/clientpositive/llap/correlationoptimizer6.q.out 9b71d3ec82 
>   ql/src/test/results/clientpositive/llap/count_dist_rewrite.q.out 6d411365b8 
>   ql/src/test/results/clientpositive/llap/cross_join.q.out ae3f9bf6f5 
>   ql/src/test/results/clientpositive/llap/dynamic_semijoin_reduction.q.out 7fd7e20e15 
>   ql/src/test/results/clientpositive/llap/dynamic_semijoin_user_level.q.out c4b18b7118 
>   ql/src/test/results/clientpositive/llap/dynpart_sort_optimization2.q.out d4811d64d7 
>   ql/src/test/results/clientpositive/llap/explainuser_1.q.out 87b8cae445 
>   ql/src/test/results/clientpositive/llap/explainuser_2.q.out f8a6526c67 
>   ql/src/test/results/clientpositive/llap/explainuser_4.q.out 95d13216e7 
>   ql/src/test/results/clientpositive/llap/extrapolate_part_stats_partial_ndv.q.out d97223c9d0 
>   ql/src/test/results/clientpositive/llap/filter_union.q.out c4af31715c 
>   ql/src/test/results/clientpositive/llap/groupby1.q.out 0eecbb6f4e 
>   ql/src/test/results/clientpositive/llap/groupby2.q.out 29b85d1f44 
>   ql/src/test/results/clientpositive/llap/groupby_resolution.q.out f2a6ab05ac 
>   ql/src/test/results/clientpositive/llap/having.q.out 267254c0de 
>   ql/src/test/results/clientpositive/llap/hybridgrace_hashjoin_1.q.out 083bfc301c 
>   ql/src/test/results/clientpositive/llap/hybridgrace_hashjoin_2.q.out a59188a46a 
>   ql/src/test/results/clientpositive/llap/identity_project_remove_skip.q.out b03b96b463 
>   ql/src/test/results/clientpositive/llap/jdbc_handler.q.out b4feb0ee1b 
>   ql/src/test/results/clientpositive/llap/join1.q.out d79a405a41 
>   ql/src/test/results/clientpositive/llap/join32_lessSize.q.out c226eed126 
>   ql/src/test/results/clientpositive/llap/join_max_hashtable.q.out 85d45fe712 
>   ql/src/test/results/clientpositive/llap/limit_join_transpose.q.out 99d8d6f664 
>   ql/src/test/results/clientpositive/llap/limit_pushdown.q.out 93315e6b1b 
>   ql/src/test/results/clientpositive/llap/limit_pushdown3.q.out 351ee01b45 
>   ql/src/test/results/clientpositive/llap/llap_smb.q.out e3044baf52 
>   ql/src/test/results/clientpositive/llap/llap_stats.q.out f81ad50679 
>   ql/src/test/results/clientpositive/llap/llap_vector_nohybridgrace.q.out 0e9e1207cb 
>   ql/src/test/results/clientpositive/llap/llapdecider.q.out 69312cd6a2 
>   ql/src/test/results/clientpositive/llap/merge1.q.out 8021b67733 
>   ql/src/test/results/clientpositive/llap/merge2.q.out 7bcdd2d57e 
>   ql/src/test/results/clientpositive/llap/mergejoin.q.out 10fb45d284 
>   ql/src/test/results/clientpositive/llap/mrr.q.out fe477fd815 
>   ql/src/test/results/clientpositive/llap/multiMapJoin2.q.out 25378d3e8b 
>   ql/src/test/results/clientpositive/llap/offset_limit.q.out adfeb05448 
>   ql/src/test/results/clientpositive/llap/offset_limit_ppd_optimizer.q.out 48e58e1bd9 
>   ql/src/test/results/clientpositive/llap/parallel_colstats.q.out 95ed8b813a 
>   ql/src/test/results/clientpositive/llap/ptf.q.out fbaf1e6474 
>   ql/src/test/results/clientpositive/llap/ptf_streaming.q.out 6013c11c9e 
>   ql/src/test/results/clientpositive/llap/reduce_deduplicate_extended.q.out bc44db795a 
>   ql/src/test/results/clientpositive/llap/semijoin_hint.q.out 76c985e727 
>   ql/src/test/results/clientpositive/llap/skewjoin.q.out dc79b26020 
>   ql/src/test/results/clientpositive/llap/subquery_exists.q.out 0749872253 
>   ql/src/test/results/clientpositive/llap/subquery_in.q.out 1d5f547adf 
>   ql/src/test/results/clientpositive/llap/subquery_multi.q.out 41b8ea3f25 
>   ql/src/test/results/clientpositive/llap/subquery_notin.q.out c760af2f87 
>   ql/src/test/results/clientpositive/llap/subquery_scalar.q.out 03d5f191b6 
>   ql/src/test/results/clientpositive/llap/subquery_select.q.out 4917c42b0e 
>   ql/src/test/results/clientpositive/llap/subquery_views.q.out b64e0f49c6 
>   ql/src/test/results/clientpositive/llap/sysdb.q.out fbbf8d9b7f 
>   ql/src/test/results/clientpositive/llap/tez_dml.q.out 786929e7af 
>   ql/src/test/results/clientpositive/llap/tez_dynpart_hashjoin_1.q.out 626a0640da 
>   ql/src/test/results/clientpositive/llap/tez_dynpart_hashjoin_2.q.out 273cf06e31 
>   ql/src/test/results/clientpositive/llap/tez_dynpart_hashjoin_3.q.out 04e0f86fe2 
>   ql/src/test/results/clientpositive/llap/tez_join_hash.q.out 92a188e18a 
>   ql/src/test/results/clientpositive/llap/tez_join_tests.q.out 1e1b63d95e 
>   ql/src/test/results/clientpositive/llap/tez_joins_explain.q.out 5297d7ecbc 
>   ql/src/test/results/clientpositive/llap/tez_smb_main.q.out 66d7aeca70 
>   ql/src/test/results/clientpositive/llap/tez_union.q.out c72b232b35 
>   ql/src/test/results/clientpositive/llap/tez_union2.q.out 7b45c7c719 
>   ql/src/test/results/clientpositive/llap/tez_union_multiinsert.q.out 5e0d072095 
>   ql/src/test/results/clientpositive/llap/tez_vector_dynpart_hashjoin_1.q.out a38e339012 
>   ql/src/test/results/clientpositive/llap/tez_vector_dynpart_hashjoin_2.q.out f7c694bf84 
>   ql/src/test/results/clientpositive/llap/unionDistinct_1.q.out 602d57625c 
>   ql/src/test/results/clientpositive/llap/union_top_level.q.out 268e0413cd 
>   ql/src/test/results/clientpositive/llap/varchar_udf1.q.out 8162305df6 
>   ql/src/test/results/clientpositive/llap/vector_groupby_mapjoin.q.out 16b716c4e5 
>   ql/src/test/results/clientpositive/llap/vector_left_outer_join.q.out 6118c73c2b 
>   ql/src/test/results/clientpositive/llap/vector_mapjoin_reduce.q.out 8df5a64fee 
>   ql/src/test/results/clientpositive/llap/vector_outer_join1.q.out d95604d722 
>   ql/src/test/results/clientpositive/llap/vector_outer_join2.q.out 16de760424 
>   ql/src/test/results/clientpositive/llap/vector_udf1.q.out 16edaacf94 
>   ql/src/test/results/clientpositive/llap/vectorization_0.q.out fba9c07350 
>   ql/src/test/results/clientpositive/llap/vectorization_8.q.out 0d5b6d53e0 
>   ql/src/test/results/clientpositive/llap/vectorization_short_regress.q.out 00577620d8 
>   ql/src/test/results/clientpositive/llap/vectorized_distinct_gby.q.out c3e5f7c90d 
>   ql/src/test/results/clientpositive/llap/vectorized_dynamic_semijoin_reduction.q.out 9a1c44c3e6 
>   ql/src/test/results/clientpositive/llap/vectorized_dynamic_semijoin_reduction2.q.out a03466f859 
>   ql/src/test/results/clientpositive/llap/vectorized_mapjoin.q.out 8160bc7c44 
>   ql/src/test/results/clientpositive/llap/vectorized_multi_output_select.q.out f744eb6513 
>   ql/src/test/results/clientpositive/llap/vectorized_nested_mapjoin.q.out 28a8340a9c 
>   ql/src/test/results/clientpositive/llap/vectorized_shufflejoin.q.out 73ab9fca82 
>   ql/src/test/results/clientpositive/llap/windowing_gby.q.out 2c47b8b2a6 
>   ql/src/test/results/clientpositive/parallel_colstats.q.out c85113137b 
>   ql/src/test/results/clientpositive/partial_column_stats.q.out 5876efacf3 
>   ql/src/test/results/clientpositive/partition_coltype_literals.q.out 3505556029 
>   ql/src/test/results/clientpositive/reduceSinkDeDuplication_pRS_key_empty.q.out 3ad09e815c 
>   ql/src/test/results/clientpositive/remove_exprs_stats.q.out 33cf90ae9d 
>   ql/src/test/results/clientpositive/rename_external_partition_location.q.out ec4076f908 
>   ql/src/test/results/clientpositive/rename_table_update_column_stats.q.out b3d6f039ac 
>   ql/src/test/results/clientpositive/temp_table_display_colstats_tbllvl.q.out bd024a7ab1 
>   ql/src/test/results/clientpositive/tez/explainanalyze_1.q.out 6602222ed7 
>   ql/src/test/results/clientpositive/tez/explainanalyze_2.q.out 0916565f0f 
>   ql/src/test/results/clientpositive/tez/explainanalyze_3.q.out e5c8d6c51e 
>   ql/src/test/results/clientpositive/tez/explainanalyze_4.q.out 9fbe8c5263 
>   ql/src/test/results/clientpositive/tez/explainanalyze_5.q.out b35e294813 
>   ql/src/test/results/clientpositive/tez/explainuser_3.q.out 65c9114b20 
>   ql/src/test/results/clientpositive/tez/hybridgrace_hashjoin_1.q.out 0a1e039cf1 
>   ql/src/test/results/clientpositive/tez/hybridgrace_hashjoin_2.q.out 6f5a3a96ca 
>   ql/src/test/results/clientpositive/tez/vectorization_limit.q.out afcae8c34c 
>   ql/src/test/results/clientpositive/tunable_ndv.q.out 6ae54b4927 
> 
> 
> Diff: https://reviews.apache.org/r/60753/diff/3/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> pengcheng xiong
> 
>


Re: Review Request 60753: Add HLL as an alternative to FM sketch to compute stats

Posted by Ashutosh Chauhan <ha...@apache.org>.

> On July 14, 2017, 12:55 a.m., Ashutosh Chauhan wrote:
> > common/src/java/org/apache/hadoop/hive/common/ndv/NumDistinctValueEstimatorFactory.java
> > Lines 30-34 (patched)
> > <https://reviews.apache.org/r/60753/diff/2/?file=1775584#file1775584line30>
> >
> >     As discussed this determination should happen using first byte of serialized string.
> 
> pengcheng xiong wrote:
>     That is true. But I would prefer to do it in the next following up jira as this will totally change the serde of hll as well as fm. And, as I analyzed, I may need to do bitpacking for fm sketch in order to fit 64k limit for varchar in the worst case.

Ok. This can be a follow-up.


> On July 14, 2017, 12:55 a.m., Ashutosh Chauhan wrote:
> > common/src/java/org/apache/hadoop/hive/common/ndv/hll/HLLSparseRegister.java
> > Lines 28-30 (patched)
> > <https://reviews.apache.org/r/60753/diff/2/?file=1775588#file1775588line28>
> >
> >     Is it really worth to have a dependency on 3rd party jar for this class. Will using TreeMap be really slow?
> >     Javolution library on which we already have a dependcy also provides these fast data structures. If really needed, we shall use those instead.
> 
> Prasanth_J wrote:
>     The original dependency was added to a have pretty flat treemap + list. Fast path inserts go to list which will get sorted and merged to treemap. Fastutil has primitive hash maps/lists with minimal object allocations. 
>     Since we are using HLL in the context of ANALYZE I think this will not have a huge impact in terms performance. Also the threshold to switch between SPARSE to DENSE is pretty low (in the order of 1000s), object allocations in inner loop will not be significant. 
>     
>     I agree with Ashutosh, we can remove this runtime dependency as it will not have big impact. Replacing with java Treemap should work pretty well. If using Treemap becomes a bottleneck we can replace it later on in a follow up jira.
>     
>     Also I would recommend having these JUnit tests as well to make sure Treemap replacement goes well
>     https://github.com/prasanthj/hyperloglog/tree/master/src/test/com/github/prasanthj/hll

@Pengcheng,
Lets go with TreeMap for now and simplify this by not bringing a new dependency.


- Ashutosh


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/60753/#review180494
-----------------------------------------------------------


On July 14, 2017, 12:08 a.m., pengcheng xiong wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/60753/
> -----------------------------------------------------------
> 
> (Updated July 14, 2017, 12:08 a.m.)
> 
> 
> Review request for hive, Ashutosh Chauhan and Prasanth_J.
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> HIVE-16966
> 
> 
> Diffs
> -----
> 
>   common/pom.xml 023f084511 
>   common/src/java/org/apache/hadoop/hive/common/ndv/NumDistinctValueEstimatorFactory.java PRE-CREATION 
>   common/src/java/org/apache/hadoop/hive/common/ndv/hll/HLLConstants.java PRE-CREATION 
>   common/src/java/org/apache/hadoop/hive/common/ndv/hll/HLLDenseRegister.java PRE-CREATION 
>   common/src/java/org/apache/hadoop/hive/common/ndv/hll/HLLRegister.java PRE-CREATION 
>   common/src/java/org/apache/hadoop/hive/common/ndv/hll/HLLSparseRegister.java PRE-CREATION 
>   common/src/java/org/apache/hadoop/hive/common/ndv/hll/HyperLogLog.java PRE-CREATION 
>   common/src/java/org/apache/hadoop/hive/common/ndv/hll/HyperLogLogUtils.java PRE-CREATION 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java c7afe2bc4a 
>   metastore/src/java/org/apache/hadoop/hive/metastore/NumDistinctValueEstimator.java 92f9a845e3 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/StatsCache.java 18f8afc9ad 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/ColumnStatsAggregator.java 31955b4363 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/ColumnStatsAggregatorFactory.java daf85692eb 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/DecimalColumnStatsAggregator.java 36b2c9c56b 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/DoubleColumnStatsAggregator.java a88ef84e5c 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/LongColumnStatsAggregator.java 8ac6561aec 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/StringColumnStatsAggregator.java 2aa4046a46 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/ColumnStatsMerger.java 33c7e3e52c 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/ColumnStatsMergerFactory.java fe890e4e27 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/DateColumnStatsMerger.java 3179b23438 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/DecimalColumnStatsMerger.java c13add9d9c 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/DoubleColumnStatsMerger.java fbdba24b0a 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/LongColumnStatsMerger.java ac65590505 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/StringColumnStatsMerger.java 41587477d3 
>   metastore/src/test/org/apache/hadoop/hive/metastore/hbase/TestHBaseAggregateStatsNDVUniformDist.java 87b1ac870d 
>   pom.xml 32f5fd1493 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/ColumnStatsSemanticAnalyzer.java 0a5cf00c44 
>   ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUtils.java 76f7daeb1b 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/DecimalNumDistinctValueEstimator.java a05906edfa 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/DoubleNumDistinctValueEstimator.java e76fc74dbc 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFComputeStats.java 2ebfcb2360 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/LongNumDistinctValueEstimator.java 1c197a028a 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/NumDistinctValueEstimator.java fa70f49857 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/StringNumDistinctValueEstimator.java 601901c163 
>   ql/src/test/queries/clientpositive/char_udf1.q 39aa0e0e17 
>   ql/src/test/queries/clientpositive/compute_stats_date.q 09128f6fb9 
>   ql/src/test/queries/clientpositive/compute_stats_decimal.q 76e1468ada 
>   ql/src/test/queries/clientpositive/compute_stats_double.q 7a1e0f6295 
>   ql/src/test/queries/clientpositive/compute_stats_long.q 6a2070f780 
>   ql/src/test/queries/clientpositive/compute_stats_string.q 0023e7f6bd 
>   ql/src/test/queries/clientpositive/hll.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/reduceSinkDeDuplication_pRS_key_empty.q 8bbae3914d 
>   ql/src/test/queries/clientpositive/varchar_udf1.q 4d1f884ea7 
>   ql/src/test/queries/clientpositive/vector_udf1.q 48d3e1ee4d 
>   ql/src/test/results/clientpositive/alter_partition_update_status.q.out 922822e6d2 
>   ql/src/test/results/clientpositive/alter_table_column_stats.q.out 2cc7cbc7b6 
>   ql/src/test/results/clientpositive/alter_table_update_status.q.out e26e8cba1c 
>   ql/src/test/results/clientpositive/analyze_tbl_part.q.out ed90b6fc92 
>   ql/src/test/results/clientpositive/annotate_stats_deep_filters.q.out 95dd6abaec 
>   ql/src/test/results/clientpositive/annotate_stats_groupby.q.out a8e4854a00 
>   ql/src/test/results/clientpositive/annotate_stats_join.q.out c1a140b558 
>   ql/src/test/results/clientpositive/annotate_stats_join_pkfk.q.out f7d73c9ddf 
>   ql/src/test/results/clientpositive/autoColumnStats_4.q.out fe3b9e53ef 
>   ql/src/test/results/clientpositive/autoColumnStats_5.q.out e19fb5f504 
>   ql/src/test/results/clientpositive/autoColumnStats_6.q.out 29b3373e10 
>   ql/src/test/results/clientpositive/autoColumnStats_7.q.out 9d24bc53ab 
>   ql/src/test/results/clientpositive/autoColumnStats_8.q.out 681d962ed0 
>   ql/src/test/results/clientpositive/autoColumnStats_9.q.out d26e2c02b7 
>   ql/src/test/results/clientpositive/auto_join_without_localtask.q.out 17a912ec13 
>   ql/src/test/results/clientpositive/avro_decimal.q.out 5a3b72defe 
>   ql/src/test/results/clientpositive/avro_decimal_native.q.out fe77512191 
>   ql/src/test/results/clientpositive/cbo_rp_annotate_stats_groupby.q.out f260f034b6 
>   ql/src/test/results/clientpositive/cbo_rp_join0.q.out b9cf3ceab4 
>   ql/src/test/results/clientpositive/char_udf1.q.out 07ce108a75 
>   ql/src/test/results/clientpositive/colstats_all_nulls.q.out 14c5d5b59b 
>   ql/src/test/results/clientpositive/column_pruner_multiple_children.q.out 96feeed49c 
>   ql/src/test/results/clientpositive/columnstats_partlvl.q.out 07d26e92bb 
>   ql/src/test/results/clientpositive/columnstats_partlvl_dp.q.out 468d2e797b 
>   ql/src/test/results/clientpositive/columnstats_quoting.q.out 52e35385a1 
>   ql/src/test/results/clientpositive/columnstats_tbllvl.q.out 462d4c1771 
>   ql/src/test/results/clientpositive/compute_stats_date.q.out c2472377a8 
>   ql/src/test/results/clientpositive/compute_stats_decimal.q.out e0584c50a8 
>   ql/src/test/results/clientpositive/compute_stats_double.q.out 5b921735f0 
>   ql/src/test/results/clientpositive/compute_stats_long.q.out 119d1731cc 
>   ql/src/test/results/clientpositive/compute_stats_string.q.out 8c40490bc0 
>   ql/src/test/results/clientpositive/confirm_initial_tbl_stats.q.out faa14ba9c5 
>   ql/src/test/results/clientpositive/constant_prop_2.q.out 24be5188e2 
>   ql/src/test/results/clientpositive/correlated_join_keys.q.out ec5d008728 
>   ql/src/test/results/clientpositive/cross_join_merge.q.out f4956ded22 
>   ql/src/test/results/clientpositive/decimal_stats.q.out 5d86866e2a 
>   ql/src/test/results/clientpositive/describe_table.q.out 7869494252 
>   ql/src/test/results/clientpositive/display_colstats_tbllvl.q.out a4b18d7cec 
>   ql/src/test/results/clientpositive/encrypted/encryption_move_tbl.q.out 06f8408327 
>   ql/src/test/results/clientpositive/exec_parallel_column_stats.q.out f256ec11bf 
>   ql/src/test/results/clientpositive/hll.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/infer_bucket_sort.q.out 1aea388815 
>   ql/src/test/results/clientpositive/join32.q.out a191284aca 
>   ql/src/test/results/clientpositive/join33.q.out a191284aca 
>   ql/src/test/results/clientpositive/join_parse.q.out 17733acdc3 
>   ql/src/test/results/clientpositive/llap/autoColumnStats_2.q.out ec209b2ef1 
>   ql/src/test/results/clientpositive/llap/auto_join1.q.out 6a0a1d5d09 
>   ql/src/test/results/clientpositive/llap/auto_join21.q.out 25cd67e7f5 
>   ql/src/test/results/clientpositive/llap/auto_join29.q.out f693ce4512 
>   ql/src/test/results/clientpositive/llap/auto_join30.q.out 91a80127a9 
>   ql/src/test/results/clientpositive/llap/cluster.q.out 2fa976b6d5 
>   ql/src/test/results/clientpositive/llap/column_table_stats.q.out fb04ee8cf9 
>   ql/src/test/results/clientpositive/llap/column_table_stats_orc.q.out d55cf30331 
>   ql/src/test/results/clientpositive/llap/columnstats_part_coltype.q.out dc50fb7fc1 
>   ql/src/test/results/clientpositive/llap/correlationoptimizer1.q.out fab5c9c029 
>   ql/src/test/results/clientpositive/llap/correlationoptimizer2.q.out ee645410b9 
>   ql/src/test/results/clientpositive/llap/correlationoptimizer3.q.out f1af97d98d 
>   ql/src/test/results/clientpositive/llap/correlationoptimizer6.q.out 9b71d3ec82 
>   ql/src/test/results/clientpositive/llap/count_dist_rewrite.q.out 6d411365b8 
>   ql/src/test/results/clientpositive/llap/cross_join.q.out ae3f9bf6f5 
>   ql/src/test/results/clientpositive/llap/dynamic_semijoin_reduction.q.out 7fd7e20e15 
>   ql/src/test/results/clientpositive/llap/dynamic_semijoin_user_level.q.out c4b18b7118 
>   ql/src/test/results/clientpositive/llap/dynpart_sort_optimization2.q.out d4811d64d7 
>   ql/src/test/results/clientpositive/llap/explainuser_1.q.out 87b8cae445 
>   ql/src/test/results/clientpositive/llap/explainuser_2.q.out f8a6526c67 
>   ql/src/test/results/clientpositive/llap/explainuser_4.q.out 95d13216e7 
>   ql/src/test/results/clientpositive/llap/extrapolate_part_stats_partial_ndv.q.out d97223c9d0 
>   ql/src/test/results/clientpositive/llap/filter_union.q.out c4af31715c 
>   ql/src/test/results/clientpositive/llap/groupby1.q.out 0eecbb6f4e 
>   ql/src/test/results/clientpositive/llap/groupby2.q.out 29b85d1f44 
>   ql/src/test/results/clientpositive/llap/groupby_resolution.q.out f2a6ab05ac 
>   ql/src/test/results/clientpositive/llap/having.q.out 267254c0de 
>   ql/src/test/results/clientpositive/llap/hybridgrace_hashjoin_1.q.out 083bfc301c 
>   ql/src/test/results/clientpositive/llap/hybridgrace_hashjoin_2.q.out a59188a46a 
>   ql/src/test/results/clientpositive/llap/identity_project_remove_skip.q.out b03b96b463 
>   ql/src/test/results/clientpositive/llap/jdbc_handler.q.out b4feb0ee1b 
>   ql/src/test/results/clientpositive/llap/join1.q.out d79a405a41 
>   ql/src/test/results/clientpositive/llap/join32_lessSize.q.out c226eed126 
>   ql/src/test/results/clientpositive/llap/join_max_hashtable.q.out 85d45fe712 
>   ql/src/test/results/clientpositive/llap/limit_join_transpose.q.out 99d8d6f664 
>   ql/src/test/results/clientpositive/llap/limit_pushdown.q.out 93315e6b1b 
>   ql/src/test/results/clientpositive/llap/limit_pushdown3.q.out 351ee01b45 
>   ql/src/test/results/clientpositive/llap/llap_smb.q.out e3044baf52 
>   ql/src/test/results/clientpositive/llap/llap_stats.q.out f81ad50679 
>   ql/src/test/results/clientpositive/llap/llap_vector_nohybridgrace.q.out 0e9e1207cb 
>   ql/src/test/results/clientpositive/llap/llapdecider.q.out 69312cd6a2 
>   ql/src/test/results/clientpositive/llap/merge1.q.out 8021b67733 
>   ql/src/test/results/clientpositive/llap/merge2.q.out 7bcdd2d57e 
>   ql/src/test/results/clientpositive/llap/mergejoin.q.out 10fb45d284 
>   ql/src/test/results/clientpositive/llap/mrr.q.out fe477fd815 
>   ql/src/test/results/clientpositive/llap/multiMapJoin2.q.out 25378d3e8b 
>   ql/src/test/results/clientpositive/llap/offset_limit.q.out adfeb05448 
>   ql/src/test/results/clientpositive/llap/offset_limit_ppd_optimizer.q.out 48e58e1bd9 
>   ql/src/test/results/clientpositive/llap/parallel_colstats.q.out 95ed8b813a 
>   ql/src/test/results/clientpositive/llap/ptf.q.out fbaf1e6474 
>   ql/src/test/results/clientpositive/llap/ptf_streaming.q.out 6013c11c9e 
>   ql/src/test/results/clientpositive/llap/reduce_deduplicate_extended.q.out bc44db795a 
>   ql/src/test/results/clientpositive/llap/semijoin_hint.q.out 76c985e727 
>   ql/src/test/results/clientpositive/llap/skewjoin.q.out dc79b26020 
>   ql/src/test/results/clientpositive/llap/subquery_exists.q.out 0749872253 
>   ql/src/test/results/clientpositive/llap/subquery_in.q.out 1d5f547adf 
>   ql/src/test/results/clientpositive/llap/subquery_multi.q.out 41b8ea3f25 
>   ql/src/test/results/clientpositive/llap/subquery_notin.q.out c760af2f87 
>   ql/src/test/results/clientpositive/llap/subquery_scalar.q.out 03d5f191b6 
>   ql/src/test/results/clientpositive/llap/subquery_select.q.out 4917c42b0e 
>   ql/src/test/results/clientpositive/llap/subquery_views.q.out b64e0f49c6 
>   ql/src/test/results/clientpositive/llap/sysdb.q.out fbbf8d9b7f 
>   ql/src/test/results/clientpositive/llap/tez_dml.q.out 786929e7af 
>   ql/src/test/results/clientpositive/llap/tez_dynpart_hashjoin_1.q.out 626a0640da 
>   ql/src/test/results/clientpositive/llap/tez_dynpart_hashjoin_2.q.out 273cf06e31 
>   ql/src/test/results/clientpositive/llap/tez_dynpart_hashjoin_3.q.out 04e0f86fe2 
>   ql/src/test/results/clientpositive/llap/tez_join_hash.q.out 92a188e18a 
>   ql/src/test/results/clientpositive/llap/tez_join_tests.q.out 1e1b63d95e 
>   ql/src/test/results/clientpositive/llap/tez_joins_explain.q.out 5297d7ecbc 
>   ql/src/test/results/clientpositive/llap/tez_smb_main.q.out 66d7aeca70 
>   ql/src/test/results/clientpositive/llap/tez_union.q.out c72b232b35 
>   ql/src/test/results/clientpositive/llap/tez_union2.q.out 7b45c7c719 
>   ql/src/test/results/clientpositive/llap/tez_union_multiinsert.q.out 5e0d072095 
>   ql/src/test/results/clientpositive/llap/tez_vector_dynpart_hashjoin_1.q.out a38e339012 
>   ql/src/test/results/clientpositive/llap/tez_vector_dynpart_hashjoin_2.q.out f7c694bf84 
>   ql/src/test/results/clientpositive/llap/unionDistinct_1.q.out 602d57625c 
>   ql/src/test/results/clientpositive/llap/union_top_level.q.out 268e0413cd 
>   ql/src/test/results/clientpositive/llap/varchar_udf1.q.out 8162305df6 
>   ql/src/test/results/clientpositive/llap/vector_groupby_mapjoin.q.out 16b716c4e5 
>   ql/src/test/results/clientpositive/llap/vector_left_outer_join.q.out 6118c73c2b 
>   ql/src/test/results/clientpositive/llap/vector_mapjoin_reduce.q.out 8df5a64fee 
>   ql/src/test/results/clientpositive/llap/vector_outer_join1.q.out d95604d722 
>   ql/src/test/results/clientpositive/llap/vector_outer_join2.q.out 16de760424 
>   ql/src/test/results/clientpositive/llap/vector_udf1.q.out 16edaacf94 
>   ql/src/test/results/clientpositive/llap/vectorization_0.q.out fba9c07350 
>   ql/src/test/results/clientpositive/llap/vectorization_8.q.out 0d5b6d53e0 
>   ql/src/test/results/clientpositive/llap/vectorization_short_regress.q.out 00577620d8 
>   ql/src/test/results/clientpositive/llap/vectorized_distinct_gby.q.out c3e5f7c90d 
>   ql/src/test/results/clientpositive/llap/vectorized_dynamic_semijoin_reduction.q.out 9a1c44c3e6 
>   ql/src/test/results/clientpositive/llap/vectorized_dynamic_semijoin_reduction2.q.out a03466f859 
>   ql/src/test/results/clientpositive/llap/vectorized_mapjoin.q.out 8160bc7c44 
>   ql/src/test/results/clientpositive/llap/vectorized_multi_output_select.q.out f744eb6513 
>   ql/src/test/results/clientpositive/llap/vectorized_nested_mapjoin.q.out 28a8340a9c 
>   ql/src/test/results/clientpositive/llap/vectorized_shufflejoin.q.out 73ab9fca82 
>   ql/src/test/results/clientpositive/llap/windowing_gby.q.out 2c47b8b2a6 
>   ql/src/test/results/clientpositive/parallel_colstats.q.out c85113137b 
>   ql/src/test/results/clientpositive/partial_column_stats.q.out 5876efacf3 
>   ql/src/test/results/clientpositive/partition_coltype_literals.q.out 3505556029 
>   ql/src/test/results/clientpositive/reduceSinkDeDuplication_pRS_key_empty.q.out 3ad09e815c 
>   ql/src/test/results/clientpositive/remove_exprs_stats.q.out 33cf90ae9d 
>   ql/src/test/results/clientpositive/rename_external_partition_location.q.out ec4076f908 
>   ql/src/test/results/clientpositive/rename_table_update_column_stats.q.out b3d6f039ac 
>   ql/src/test/results/clientpositive/temp_table_display_colstats_tbllvl.q.out bd024a7ab1 
>   ql/src/test/results/clientpositive/tez/explainanalyze_1.q.out 6602222ed7 
>   ql/src/test/results/clientpositive/tez/explainanalyze_2.q.out 0916565f0f 
>   ql/src/test/results/clientpositive/tez/explainanalyze_3.q.out e5c8d6c51e 
>   ql/src/test/results/clientpositive/tez/explainanalyze_4.q.out 9fbe8c5263 
>   ql/src/test/results/clientpositive/tez/explainanalyze_5.q.out b35e294813 
>   ql/src/test/results/clientpositive/tez/explainuser_3.q.out 65c9114b20 
>   ql/src/test/results/clientpositive/tez/hybridgrace_hashjoin_1.q.out 0a1e039cf1 
>   ql/src/test/results/clientpositive/tez/hybridgrace_hashjoin_2.q.out 6f5a3a96ca 
>   ql/src/test/results/clientpositive/tez/vectorization_limit.q.out afcae8c34c 
>   ql/src/test/results/clientpositive/tunable_ndv.q.out 6ae54b4927 
> 
> 
> Diff: https://reviews.apache.org/r/60753/diff/3/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> pengcheng xiong
> 
>


Re: Review Request 60753: Add HLL as an alternative to FM sketch to compute stats

Posted by Ashutosh Chauhan <ha...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/60753/#review180494
-----------------------------------------------------------




common/pom.xml
Lines 234-238 (patched)
<https://reviews.apache.org/r/60753/#comment255712>

    Since this is also a runtime dependency. This set of changes will be needed in ql/pom.xml where we create exec jar which is shipped to all containers.



common/src/java/org/apache/hadoop/hive/common/ndv/NumDistinctValueEstimatorFactory.java
Lines 30-34 (patched)
<https://reviews.apache.org/r/60753/#comment255700>

    As discussed this determination should happen using first byte of serialized string.



common/src/java/org/apache/hadoop/hive/common/ndv/hll/HLLSparseRegister.java
Lines 28-30 (patched)
<https://reviews.apache.org/r/60753/#comment255713>

    Is it really worth to have a dependency on 3rd party jar for this class. Will using TreeMap be really slow?
    Javolution library on which we already have a dependcy also provides these fast data structures. If really needed, we shall use those instead.



common/src/java/org/apache/hadoop/hive/common/ndv/hll/HyperLogLog.java
Lines 49 (patched)
<https://reviews.apache.org/r/60753/#comment255714>

    ws



common/src/java/org/apache/hadoop/hive/common/ndv/hll/HyperLogLog.java
Lines 569 (patched)
<https://reviews.apache.org/r/60753/#comment255715>

    can remove



common/src/java/org/apache/hadoop/hive/common/ndv/hll/HyperLogLog.java
Lines 594 (patched)
<https://reviews.apache.org/r/60753/#comment255716>

    ws



common/src/java/org/apache/hadoop/hive/common/ndv/hll/HyperLogLogUtils.java
Lines 2 (patched)
<https://reviews.apache.org/r/60753/#comment255717>

    can remove this.



common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
Line 1721 (original), 1721 (patched)
<https://reviews.apache.org/r/60753/#comment255718>

    Better name: hive.stats.ndv.algo



pom.xml
Lines 210 (patched)
<https://reviews.apache.org/r/60753/#comment255711>

    Latest version of this lib is 8.1 We might as well use that one.



ql/src/java/org/apache/hadoop/hive/ql/udf/generic/DecimalNumDistinctValueEstimator.java
Line 34 (original), 32 (patched)
<https://reviews.apache.org/r/60753/#comment255709>

    whitespace (ws)



ql/src/java/org/apache/hadoop/hive/ql/udf/generic/DecimalNumDistinctValueEstimator.java
Lines 46 (patched)
<https://reviews.apache.org/r/60753/#comment255710>

    whitespace (ws)


- Ashutosh Chauhan


On July 14, 2017, 12:08 a.m., pengcheng xiong wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/60753/
> -----------------------------------------------------------
> 
> (Updated July 14, 2017, 12:08 a.m.)
> 
> 
> Review request for hive, Ashutosh Chauhan and Prasanth_J.
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> HIVE-16966
> 
> 
> Diffs
> -----
> 
>   common/pom.xml 023f084511 
>   common/src/java/org/apache/hadoop/hive/common/ndv/NumDistinctValueEstimatorFactory.java PRE-CREATION 
>   common/src/java/org/apache/hadoop/hive/common/ndv/hll/HLLConstants.java PRE-CREATION 
>   common/src/java/org/apache/hadoop/hive/common/ndv/hll/HLLDenseRegister.java PRE-CREATION 
>   common/src/java/org/apache/hadoop/hive/common/ndv/hll/HLLRegister.java PRE-CREATION 
>   common/src/java/org/apache/hadoop/hive/common/ndv/hll/HLLSparseRegister.java PRE-CREATION 
>   common/src/java/org/apache/hadoop/hive/common/ndv/hll/HyperLogLog.java PRE-CREATION 
>   common/src/java/org/apache/hadoop/hive/common/ndv/hll/HyperLogLogUtils.java PRE-CREATION 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java c7afe2bc4a 
>   metastore/src/java/org/apache/hadoop/hive/metastore/NumDistinctValueEstimator.java 92f9a845e3 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/StatsCache.java 18f8afc9ad 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/ColumnStatsAggregator.java 31955b4363 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/ColumnStatsAggregatorFactory.java daf85692eb 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/DecimalColumnStatsAggregator.java 36b2c9c56b 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/DoubleColumnStatsAggregator.java a88ef84e5c 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/LongColumnStatsAggregator.java 8ac6561aec 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/StringColumnStatsAggregator.java 2aa4046a46 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/ColumnStatsMerger.java 33c7e3e52c 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/ColumnStatsMergerFactory.java fe890e4e27 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/DateColumnStatsMerger.java 3179b23438 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/DecimalColumnStatsMerger.java c13add9d9c 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/DoubleColumnStatsMerger.java fbdba24b0a 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/LongColumnStatsMerger.java ac65590505 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/StringColumnStatsMerger.java 41587477d3 
>   metastore/src/test/org/apache/hadoop/hive/metastore/hbase/TestHBaseAggregateStatsNDVUniformDist.java 87b1ac870d 
>   pom.xml 32f5fd1493 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/ColumnStatsSemanticAnalyzer.java 0a5cf00c44 
>   ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUtils.java 76f7daeb1b 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/DecimalNumDistinctValueEstimator.java a05906edfa 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/DoubleNumDistinctValueEstimator.java e76fc74dbc 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFComputeStats.java 2ebfcb2360 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/LongNumDistinctValueEstimator.java 1c197a028a 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/NumDistinctValueEstimator.java fa70f49857 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/StringNumDistinctValueEstimator.java 601901c163 
>   ql/src/test/queries/clientpositive/char_udf1.q 39aa0e0e17 
>   ql/src/test/queries/clientpositive/compute_stats_date.q 09128f6fb9 
>   ql/src/test/queries/clientpositive/compute_stats_decimal.q 76e1468ada 
>   ql/src/test/queries/clientpositive/compute_stats_double.q 7a1e0f6295 
>   ql/src/test/queries/clientpositive/compute_stats_long.q 6a2070f780 
>   ql/src/test/queries/clientpositive/compute_stats_string.q 0023e7f6bd 
>   ql/src/test/queries/clientpositive/hll.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/reduceSinkDeDuplication_pRS_key_empty.q 8bbae3914d 
>   ql/src/test/queries/clientpositive/varchar_udf1.q 4d1f884ea7 
>   ql/src/test/queries/clientpositive/vector_udf1.q 48d3e1ee4d 
>   ql/src/test/results/clientpositive/alter_partition_update_status.q.out 922822e6d2 
>   ql/src/test/results/clientpositive/alter_table_column_stats.q.out 2cc7cbc7b6 
>   ql/src/test/results/clientpositive/alter_table_update_status.q.out e26e8cba1c 
>   ql/src/test/results/clientpositive/analyze_tbl_part.q.out ed90b6fc92 
>   ql/src/test/results/clientpositive/annotate_stats_deep_filters.q.out 95dd6abaec 
>   ql/src/test/results/clientpositive/annotate_stats_groupby.q.out a8e4854a00 
>   ql/src/test/results/clientpositive/annotate_stats_join.q.out c1a140b558 
>   ql/src/test/results/clientpositive/annotate_stats_join_pkfk.q.out f7d73c9ddf 
>   ql/src/test/results/clientpositive/autoColumnStats_4.q.out fe3b9e53ef 
>   ql/src/test/results/clientpositive/autoColumnStats_5.q.out e19fb5f504 
>   ql/src/test/results/clientpositive/autoColumnStats_6.q.out 29b3373e10 
>   ql/src/test/results/clientpositive/autoColumnStats_7.q.out 9d24bc53ab 
>   ql/src/test/results/clientpositive/autoColumnStats_8.q.out 681d962ed0 
>   ql/src/test/results/clientpositive/autoColumnStats_9.q.out d26e2c02b7 
>   ql/src/test/results/clientpositive/auto_join_without_localtask.q.out 17a912ec13 
>   ql/src/test/results/clientpositive/avro_decimal.q.out 5a3b72defe 
>   ql/src/test/results/clientpositive/avro_decimal_native.q.out fe77512191 
>   ql/src/test/results/clientpositive/cbo_rp_annotate_stats_groupby.q.out f260f034b6 
>   ql/src/test/results/clientpositive/cbo_rp_join0.q.out b9cf3ceab4 
>   ql/src/test/results/clientpositive/char_udf1.q.out 07ce108a75 
>   ql/src/test/results/clientpositive/colstats_all_nulls.q.out 14c5d5b59b 
>   ql/src/test/results/clientpositive/column_pruner_multiple_children.q.out 96feeed49c 
>   ql/src/test/results/clientpositive/columnstats_partlvl.q.out 07d26e92bb 
>   ql/src/test/results/clientpositive/columnstats_partlvl_dp.q.out 468d2e797b 
>   ql/src/test/results/clientpositive/columnstats_quoting.q.out 52e35385a1 
>   ql/src/test/results/clientpositive/columnstats_tbllvl.q.out 462d4c1771 
>   ql/src/test/results/clientpositive/compute_stats_date.q.out c2472377a8 
>   ql/src/test/results/clientpositive/compute_stats_decimal.q.out e0584c50a8 
>   ql/src/test/results/clientpositive/compute_stats_double.q.out 5b921735f0 
>   ql/src/test/results/clientpositive/compute_stats_long.q.out 119d1731cc 
>   ql/src/test/results/clientpositive/compute_stats_string.q.out 8c40490bc0 
>   ql/src/test/results/clientpositive/confirm_initial_tbl_stats.q.out faa14ba9c5 
>   ql/src/test/results/clientpositive/constant_prop_2.q.out 24be5188e2 
>   ql/src/test/results/clientpositive/correlated_join_keys.q.out ec5d008728 
>   ql/src/test/results/clientpositive/cross_join_merge.q.out f4956ded22 
>   ql/src/test/results/clientpositive/decimal_stats.q.out 5d86866e2a 
>   ql/src/test/results/clientpositive/describe_table.q.out 7869494252 
>   ql/src/test/results/clientpositive/display_colstats_tbllvl.q.out a4b18d7cec 
>   ql/src/test/results/clientpositive/encrypted/encryption_move_tbl.q.out 06f8408327 
>   ql/src/test/results/clientpositive/exec_parallel_column_stats.q.out f256ec11bf 
>   ql/src/test/results/clientpositive/hll.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/infer_bucket_sort.q.out 1aea388815 
>   ql/src/test/results/clientpositive/join32.q.out a191284aca 
>   ql/src/test/results/clientpositive/join33.q.out a191284aca 
>   ql/src/test/results/clientpositive/join_parse.q.out 17733acdc3 
>   ql/src/test/results/clientpositive/llap/autoColumnStats_2.q.out ec209b2ef1 
>   ql/src/test/results/clientpositive/llap/auto_join1.q.out 6a0a1d5d09 
>   ql/src/test/results/clientpositive/llap/auto_join21.q.out 25cd67e7f5 
>   ql/src/test/results/clientpositive/llap/auto_join29.q.out f693ce4512 
>   ql/src/test/results/clientpositive/llap/auto_join30.q.out 91a80127a9 
>   ql/src/test/results/clientpositive/llap/cluster.q.out 2fa976b6d5 
>   ql/src/test/results/clientpositive/llap/column_table_stats.q.out fb04ee8cf9 
>   ql/src/test/results/clientpositive/llap/column_table_stats_orc.q.out d55cf30331 
>   ql/src/test/results/clientpositive/llap/columnstats_part_coltype.q.out dc50fb7fc1 
>   ql/src/test/results/clientpositive/llap/correlationoptimizer1.q.out fab5c9c029 
>   ql/src/test/results/clientpositive/llap/correlationoptimizer2.q.out ee645410b9 
>   ql/src/test/results/clientpositive/llap/correlationoptimizer3.q.out f1af97d98d 
>   ql/src/test/results/clientpositive/llap/correlationoptimizer6.q.out 9b71d3ec82 
>   ql/src/test/results/clientpositive/llap/count_dist_rewrite.q.out 6d411365b8 
>   ql/src/test/results/clientpositive/llap/cross_join.q.out ae3f9bf6f5 
>   ql/src/test/results/clientpositive/llap/dynamic_semijoin_reduction.q.out 7fd7e20e15 
>   ql/src/test/results/clientpositive/llap/dynamic_semijoin_user_level.q.out c4b18b7118 
>   ql/src/test/results/clientpositive/llap/dynpart_sort_optimization2.q.out d4811d64d7 
>   ql/src/test/results/clientpositive/llap/explainuser_1.q.out 87b8cae445 
>   ql/src/test/results/clientpositive/llap/explainuser_2.q.out f8a6526c67 
>   ql/src/test/results/clientpositive/llap/explainuser_4.q.out 95d13216e7 
>   ql/src/test/results/clientpositive/llap/extrapolate_part_stats_partial_ndv.q.out d97223c9d0 
>   ql/src/test/results/clientpositive/llap/filter_union.q.out c4af31715c 
>   ql/src/test/results/clientpositive/llap/groupby1.q.out 0eecbb6f4e 
>   ql/src/test/results/clientpositive/llap/groupby2.q.out 29b85d1f44 
>   ql/src/test/results/clientpositive/llap/groupby_resolution.q.out f2a6ab05ac 
>   ql/src/test/results/clientpositive/llap/having.q.out 267254c0de 
>   ql/src/test/results/clientpositive/llap/hybridgrace_hashjoin_1.q.out 083bfc301c 
>   ql/src/test/results/clientpositive/llap/hybridgrace_hashjoin_2.q.out a59188a46a 
>   ql/src/test/results/clientpositive/llap/identity_project_remove_skip.q.out b03b96b463 
>   ql/src/test/results/clientpositive/llap/jdbc_handler.q.out b4feb0ee1b 
>   ql/src/test/results/clientpositive/llap/join1.q.out d79a405a41 
>   ql/src/test/results/clientpositive/llap/join32_lessSize.q.out c226eed126 
>   ql/src/test/results/clientpositive/llap/join_max_hashtable.q.out 85d45fe712 
>   ql/src/test/results/clientpositive/llap/limit_join_transpose.q.out 99d8d6f664 
>   ql/src/test/results/clientpositive/llap/limit_pushdown.q.out 93315e6b1b 
>   ql/src/test/results/clientpositive/llap/limit_pushdown3.q.out 351ee01b45 
>   ql/src/test/results/clientpositive/llap/llap_smb.q.out e3044baf52 
>   ql/src/test/results/clientpositive/llap/llap_stats.q.out f81ad50679 
>   ql/src/test/results/clientpositive/llap/llap_vector_nohybridgrace.q.out 0e9e1207cb 
>   ql/src/test/results/clientpositive/llap/llapdecider.q.out 69312cd6a2 
>   ql/src/test/results/clientpositive/llap/merge1.q.out 8021b67733 
>   ql/src/test/results/clientpositive/llap/merge2.q.out 7bcdd2d57e 
>   ql/src/test/results/clientpositive/llap/mergejoin.q.out 10fb45d284 
>   ql/src/test/results/clientpositive/llap/mrr.q.out fe477fd815 
>   ql/src/test/results/clientpositive/llap/multiMapJoin2.q.out 25378d3e8b 
>   ql/src/test/results/clientpositive/llap/offset_limit.q.out adfeb05448 
>   ql/src/test/results/clientpositive/llap/offset_limit_ppd_optimizer.q.out 48e58e1bd9 
>   ql/src/test/results/clientpositive/llap/parallel_colstats.q.out 95ed8b813a 
>   ql/src/test/results/clientpositive/llap/ptf.q.out fbaf1e6474 
>   ql/src/test/results/clientpositive/llap/ptf_streaming.q.out 6013c11c9e 
>   ql/src/test/results/clientpositive/llap/reduce_deduplicate_extended.q.out bc44db795a 
>   ql/src/test/results/clientpositive/llap/semijoin_hint.q.out 76c985e727 
>   ql/src/test/results/clientpositive/llap/skewjoin.q.out dc79b26020 
>   ql/src/test/results/clientpositive/llap/subquery_exists.q.out 0749872253 
>   ql/src/test/results/clientpositive/llap/subquery_in.q.out 1d5f547adf 
>   ql/src/test/results/clientpositive/llap/subquery_multi.q.out 41b8ea3f25 
>   ql/src/test/results/clientpositive/llap/subquery_notin.q.out c760af2f87 
>   ql/src/test/results/clientpositive/llap/subquery_scalar.q.out 03d5f191b6 
>   ql/src/test/results/clientpositive/llap/subquery_select.q.out 4917c42b0e 
>   ql/src/test/results/clientpositive/llap/subquery_views.q.out b64e0f49c6 
>   ql/src/test/results/clientpositive/llap/sysdb.q.out fbbf8d9b7f 
>   ql/src/test/results/clientpositive/llap/tez_dml.q.out 786929e7af 
>   ql/src/test/results/clientpositive/llap/tez_dynpart_hashjoin_1.q.out 626a0640da 
>   ql/src/test/results/clientpositive/llap/tez_dynpart_hashjoin_2.q.out 273cf06e31 
>   ql/src/test/results/clientpositive/llap/tez_dynpart_hashjoin_3.q.out 04e0f86fe2 
>   ql/src/test/results/clientpositive/llap/tez_join_hash.q.out 92a188e18a 
>   ql/src/test/results/clientpositive/llap/tez_join_tests.q.out 1e1b63d95e 
>   ql/src/test/results/clientpositive/llap/tez_joins_explain.q.out 5297d7ecbc 
>   ql/src/test/results/clientpositive/llap/tez_smb_main.q.out 66d7aeca70 
>   ql/src/test/results/clientpositive/llap/tez_union.q.out c72b232b35 
>   ql/src/test/results/clientpositive/llap/tez_union2.q.out 7b45c7c719 
>   ql/src/test/results/clientpositive/llap/tez_union_multiinsert.q.out 5e0d072095 
>   ql/src/test/results/clientpositive/llap/tez_vector_dynpart_hashjoin_1.q.out a38e339012 
>   ql/src/test/results/clientpositive/llap/tez_vector_dynpart_hashjoin_2.q.out f7c694bf84 
>   ql/src/test/results/clientpositive/llap/unionDistinct_1.q.out 602d57625c 
>   ql/src/test/results/clientpositive/llap/union_top_level.q.out 268e0413cd 
>   ql/src/test/results/clientpositive/llap/varchar_udf1.q.out 8162305df6 
>   ql/src/test/results/clientpositive/llap/vector_groupby_mapjoin.q.out 16b716c4e5 
>   ql/src/test/results/clientpositive/llap/vector_left_outer_join.q.out 6118c73c2b 
>   ql/src/test/results/clientpositive/llap/vector_mapjoin_reduce.q.out 8df5a64fee 
>   ql/src/test/results/clientpositive/llap/vector_outer_join1.q.out d95604d722 
>   ql/src/test/results/clientpositive/llap/vector_outer_join2.q.out 16de760424 
>   ql/src/test/results/clientpositive/llap/vector_udf1.q.out 16edaacf94 
>   ql/src/test/results/clientpositive/llap/vectorization_0.q.out fba9c07350 
>   ql/src/test/results/clientpositive/llap/vectorization_8.q.out 0d5b6d53e0 
>   ql/src/test/results/clientpositive/llap/vectorization_short_regress.q.out 00577620d8 
>   ql/src/test/results/clientpositive/llap/vectorized_distinct_gby.q.out c3e5f7c90d 
>   ql/src/test/results/clientpositive/llap/vectorized_dynamic_semijoin_reduction.q.out 9a1c44c3e6 
>   ql/src/test/results/clientpositive/llap/vectorized_dynamic_semijoin_reduction2.q.out a03466f859 
>   ql/src/test/results/clientpositive/llap/vectorized_mapjoin.q.out 8160bc7c44 
>   ql/src/test/results/clientpositive/llap/vectorized_multi_output_select.q.out f744eb6513 
>   ql/src/test/results/clientpositive/llap/vectorized_nested_mapjoin.q.out 28a8340a9c 
>   ql/src/test/results/clientpositive/llap/vectorized_shufflejoin.q.out 73ab9fca82 
>   ql/src/test/results/clientpositive/llap/windowing_gby.q.out 2c47b8b2a6 
>   ql/src/test/results/clientpositive/parallel_colstats.q.out c85113137b 
>   ql/src/test/results/clientpositive/partial_column_stats.q.out 5876efacf3 
>   ql/src/test/results/clientpositive/partition_coltype_literals.q.out 3505556029 
>   ql/src/test/results/clientpositive/reduceSinkDeDuplication_pRS_key_empty.q.out 3ad09e815c 
>   ql/src/test/results/clientpositive/remove_exprs_stats.q.out 33cf90ae9d 
>   ql/src/test/results/clientpositive/rename_external_partition_location.q.out ec4076f908 
>   ql/src/test/results/clientpositive/rename_table_update_column_stats.q.out b3d6f039ac 
>   ql/src/test/results/clientpositive/temp_table_display_colstats_tbllvl.q.out bd024a7ab1 
>   ql/src/test/results/clientpositive/tez/explainanalyze_1.q.out 6602222ed7 
>   ql/src/test/results/clientpositive/tez/explainanalyze_2.q.out 0916565f0f 
>   ql/src/test/results/clientpositive/tez/explainanalyze_3.q.out e5c8d6c51e 
>   ql/src/test/results/clientpositive/tez/explainanalyze_4.q.out 9fbe8c5263 
>   ql/src/test/results/clientpositive/tez/explainanalyze_5.q.out b35e294813 
>   ql/src/test/results/clientpositive/tez/explainuser_3.q.out 65c9114b20 
>   ql/src/test/results/clientpositive/tez/hybridgrace_hashjoin_1.q.out 0a1e039cf1 
>   ql/src/test/results/clientpositive/tez/hybridgrace_hashjoin_2.q.out 6f5a3a96ca 
>   ql/src/test/results/clientpositive/tez/vectorization_limit.q.out afcae8c34c 
>   ql/src/test/results/clientpositive/tunable_ndv.q.out 6ae54b4927 
> 
> 
> Diff: https://reviews.apache.org/r/60753/diff/3/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> pengcheng xiong
> 
>


Re: Review Request 60753: Add HLL as an alternative to FM sketch to compute stats

Posted by Ashutosh Chauhan <ha...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/60753/#review180508
-----------------------------------------------------------




ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFComputeStats.java
Lines 501 (patched)
<https://reviews.apache.org/r/60753/#comment255724>

    Is this behavior correct?



ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFComputeStats.java
Lines 941-942 (original), 932-933 (patched)
<https://reviews.apache.org/r/60753/#comment255725>

    Seems like this exception can never be thrown.



ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFComputeStats.java
Line 1019 (original), 1000 (patched)
<https://reviews.apache.org/r/60753/#comment255726>

    In what case myagg.numDV will be null?



ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFComputeStats.java
Lines 1414 (patched)
<https://reviews.apache.org/r/60753/#comment255727>

    Does this method work for hll as well?


- Ashutosh Chauhan


On July 14, 2017, 12:08 a.m., pengcheng xiong wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/60753/
> -----------------------------------------------------------
> 
> (Updated July 14, 2017, 12:08 a.m.)
> 
> 
> Review request for hive, Ashutosh Chauhan and Prasanth_J.
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> HIVE-16966
> 
> 
> Diffs
> -----
> 
>   common/pom.xml 023f084511 
>   common/src/java/org/apache/hadoop/hive/common/ndv/NumDistinctValueEstimatorFactory.java PRE-CREATION 
>   common/src/java/org/apache/hadoop/hive/common/ndv/hll/HLLConstants.java PRE-CREATION 
>   common/src/java/org/apache/hadoop/hive/common/ndv/hll/HLLDenseRegister.java PRE-CREATION 
>   common/src/java/org/apache/hadoop/hive/common/ndv/hll/HLLRegister.java PRE-CREATION 
>   common/src/java/org/apache/hadoop/hive/common/ndv/hll/HLLSparseRegister.java PRE-CREATION 
>   common/src/java/org/apache/hadoop/hive/common/ndv/hll/HyperLogLog.java PRE-CREATION 
>   common/src/java/org/apache/hadoop/hive/common/ndv/hll/HyperLogLogUtils.java PRE-CREATION 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java c7afe2bc4a 
>   metastore/src/java/org/apache/hadoop/hive/metastore/NumDistinctValueEstimator.java 92f9a845e3 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/StatsCache.java 18f8afc9ad 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/ColumnStatsAggregator.java 31955b4363 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/ColumnStatsAggregatorFactory.java daf85692eb 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/DecimalColumnStatsAggregator.java 36b2c9c56b 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/DoubleColumnStatsAggregator.java a88ef84e5c 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/LongColumnStatsAggregator.java 8ac6561aec 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/StringColumnStatsAggregator.java 2aa4046a46 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/ColumnStatsMerger.java 33c7e3e52c 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/ColumnStatsMergerFactory.java fe890e4e27 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/DateColumnStatsMerger.java 3179b23438 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/DecimalColumnStatsMerger.java c13add9d9c 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/DoubleColumnStatsMerger.java fbdba24b0a 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/LongColumnStatsMerger.java ac65590505 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/StringColumnStatsMerger.java 41587477d3 
>   metastore/src/test/org/apache/hadoop/hive/metastore/hbase/TestHBaseAggregateStatsNDVUniformDist.java 87b1ac870d 
>   pom.xml 32f5fd1493 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/ColumnStatsSemanticAnalyzer.java 0a5cf00c44 
>   ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUtils.java 76f7daeb1b 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/DecimalNumDistinctValueEstimator.java a05906edfa 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/DoubleNumDistinctValueEstimator.java e76fc74dbc 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFComputeStats.java 2ebfcb2360 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/LongNumDistinctValueEstimator.java 1c197a028a 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/NumDistinctValueEstimator.java fa70f49857 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/StringNumDistinctValueEstimator.java 601901c163 
>   ql/src/test/queries/clientpositive/char_udf1.q 39aa0e0e17 
>   ql/src/test/queries/clientpositive/compute_stats_date.q 09128f6fb9 
>   ql/src/test/queries/clientpositive/compute_stats_decimal.q 76e1468ada 
>   ql/src/test/queries/clientpositive/compute_stats_double.q 7a1e0f6295 
>   ql/src/test/queries/clientpositive/compute_stats_long.q 6a2070f780 
>   ql/src/test/queries/clientpositive/compute_stats_string.q 0023e7f6bd 
>   ql/src/test/queries/clientpositive/hll.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/reduceSinkDeDuplication_pRS_key_empty.q 8bbae3914d 
>   ql/src/test/queries/clientpositive/varchar_udf1.q 4d1f884ea7 
>   ql/src/test/queries/clientpositive/vector_udf1.q 48d3e1ee4d 
>   ql/src/test/results/clientpositive/alter_partition_update_status.q.out 922822e6d2 
>   ql/src/test/results/clientpositive/alter_table_column_stats.q.out 2cc7cbc7b6 
>   ql/src/test/results/clientpositive/alter_table_update_status.q.out e26e8cba1c 
>   ql/src/test/results/clientpositive/analyze_tbl_part.q.out ed90b6fc92 
>   ql/src/test/results/clientpositive/annotate_stats_deep_filters.q.out 95dd6abaec 
>   ql/src/test/results/clientpositive/annotate_stats_groupby.q.out a8e4854a00 
>   ql/src/test/results/clientpositive/annotate_stats_join.q.out c1a140b558 
>   ql/src/test/results/clientpositive/annotate_stats_join_pkfk.q.out f7d73c9ddf 
>   ql/src/test/results/clientpositive/autoColumnStats_4.q.out fe3b9e53ef 
>   ql/src/test/results/clientpositive/autoColumnStats_5.q.out e19fb5f504 
>   ql/src/test/results/clientpositive/autoColumnStats_6.q.out 29b3373e10 
>   ql/src/test/results/clientpositive/autoColumnStats_7.q.out 9d24bc53ab 
>   ql/src/test/results/clientpositive/autoColumnStats_8.q.out 681d962ed0 
>   ql/src/test/results/clientpositive/autoColumnStats_9.q.out d26e2c02b7 
>   ql/src/test/results/clientpositive/auto_join_without_localtask.q.out 17a912ec13 
>   ql/src/test/results/clientpositive/avro_decimal.q.out 5a3b72defe 
>   ql/src/test/results/clientpositive/avro_decimal_native.q.out fe77512191 
>   ql/src/test/results/clientpositive/cbo_rp_annotate_stats_groupby.q.out f260f034b6 
>   ql/src/test/results/clientpositive/cbo_rp_join0.q.out b9cf3ceab4 
>   ql/src/test/results/clientpositive/char_udf1.q.out 07ce108a75 
>   ql/src/test/results/clientpositive/colstats_all_nulls.q.out 14c5d5b59b 
>   ql/src/test/results/clientpositive/column_pruner_multiple_children.q.out 96feeed49c 
>   ql/src/test/results/clientpositive/columnstats_partlvl.q.out 07d26e92bb 
>   ql/src/test/results/clientpositive/columnstats_partlvl_dp.q.out 468d2e797b 
>   ql/src/test/results/clientpositive/columnstats_quoting.q.out 52e35385a1 
>   ql/src/test/results/clientpositive/columnstats_tbllvl.q.out 462d4c1771 
>   ql/src/test/results/clientpositive/compute_stats_date.q.out c2472377a8 
>   ql/src/test/results/clientpositive/compute_stats_decimal.q.out e0584c50a8 
>   ql/src/test/results/clientpositive/compute_stats_double.q.out 5b921735f0 
>   ql/src/test/results/clientpositive/compute_stats_long.q.out 119d1731cc 
>   ql/src/test/results/clientpositive/compute_stats_string.q.out 8c40490bc0 
>   ql/src/test/results/clientpositive/confirm_initial_tbl_stats.q.out faa14ba9c5 
>   ql/src/test/results/clientpositive/constant_prop_2.q.out 24be5188e2 
>   ql/src/test/results/clientpositive/correlated_join_keys.q.out ec5d008728 
>   ql/src/test/results/clientpositive/cross_join_merge.q.out f4956ded22 
>   ql/src/test/results/clientpositive/decimal_stats.q.out 5d86866e2a 
>   ql/src/test/results/clientpositive/describe_table.q.out 7869494252 
>   ql/src/test/results/clientpositive/display_colstats_tbllvl.q.out a4b18d7cec 
>   ql/src/test/results/clientpositive/encrypted/encryption_move_tbl.q.out 06f8408327 
>   ql/src/test/results/clientpositive/exec_parallel_column_stats.q.out f256ec11bf 
>   ql/src/test/results/clientpositive/hll.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/infer_bucket_sort.q.out 1aea388815 
>   ql/src/test/results/clientpositive/join32.q.out a191284aca 
>   ql/src/test/results/clientpositive/join33.q.out a191284aca 
>   ql/src/test/results/clientpositive/join_parse.q.out 17733acdc3 
>   ql/src/test/results/clientpositive/llap/autoColumnStats_2.q.out ec209b2ef1 
>   ql/src/test/results/clientpositive/llap/auto_join1.q.out 6a0a1d5d09 
>   ql/src/test/results/clientpositive/llap/auto_join21.q.out 25cd67e7f5 
>   ql/src/test/results/clientpositive/llap/auto_join29.q.out f693ce4512 
>   ql/src/test/results/clientpositive/llap/auto_join30.q.out 91a80127a9 
>   ql/src/test/results/clientpositive/llap/cluster.q.out 2fa976b6d5 
>   ql/src/test/results/clientpositive/llap/column_table_stats.q.out fb04ee8cf9 
>   ql/src/test/results/clientpositive/llap/column_table_stats_orc.q.out d55cf30331 
>   ql/src/test/results/clientpositive/llap/columnstats_part_coltype.q.out dc50fb7fc1 
>   ql/src/test/results/clientpositive/llap/correlationoptimizer1.q.out fab5c9c029 
>   ql/src/test/results/clientpositive/llap/correlationoptimizer2.q.out ee645410b9 
>   ql/src/test/results/clientpositive/llap/correlationoptimizer3.q.out f1af97d98d 
>   ql/src/test/results/clientpositive/llap/correlationoptimizer6.q.out 9b71d3ec82 
>   ql/src/test/results/clientpositive/llap/count_dist_rewrite.q.out 6d411365b8 
>   ql/src/test/results/clientpositive/llap/cross_join.q.out ae3f9bf6f5 
>   ql/src/test/results/clientpositive/llap/dynamic_semijoin_reduction.q.out 7fd7e20e15 
>   ql/src/test/results/clientpositive/llap/dynamic_semijoin_user_level.q.out c4b18b7118 
>   ql/src/test/results/clientpositive/llap/dynpart_sort_optimization2.q.out d4811d64d7 
>   ql/src/test/results/clientpositive/llap/explainuser_1.q.out 87b8cae445 
>   ql/src/test/results/clientpositive/llap/explainuser_2.q.out f8a6526c67 
>   ql/src/test/results/clientpositive/llap/explainuser_4.q.out 95d13216e7 
>   ql/src/test/results/clientpositive/llap/extrapolate_part_stats_partial_ndv.q.out d97223c9d0 
>   ql/src/test/results/clientpositive/llap/filter_union.q.out c4af31715c 
>   ql/src/test/results/clientpositive/llap/groupby1.q.out 0eecbb6f4e 
>   ql/src/test/results/clientpositive/llap/groupby2.q.out 29b85d1f44 
>   ql/src/test/results/clientpositive/llap/groupby_resolution.q.out f2a6ab05ac 
>   ql/src/test/results/clientpositive/llap/having.q.out 267254c0de 
>   ql/src/test/results/clientpositive/llap/hybridgrace_hashjoin_1.q.out 083bfc301c 
>   ql/src/test/results/clientpositive/llap/hybridgrace_hashjoin_2.q.out a59188a46a 
>   ql/src/test/results/clientpositive/llap/identity_project_remove_skip.q.out b03b96b463 
>   ql/src/test/results/clientpositive/llap/jdbc_handler.q.out b4feb0ee1b 
>   ql/src/test/results/clientpositive/llap/join1.q.out d79a405a41 
>   ql/src/test/results/clientpositive/llap/join32_lessSize.q.out c226eed126 
>   ql/src/test/results/clientpositive/llap/join_max_hashtable.q.out 85d45fe712 
>   ql/src/test/results/clientpositive/llap/limit_join_transpose.q.out 99d8d6f664 
>   ql/src/test/results/clientpositive/llap/limit_pushdown.q.out 93315e6b1b 
>   ql/src/test/results/clientpositive/llap/limit_pushdown3.q.out 351ee01b45 
>   ql/src/test/results/clientpositive/llap/llap_smb.q.out e3044baf52 
>   ql/src/test/results/clientpositive/llap/llap_stats.q.out f81ad50679 
>   ql/src/test/results/clientpositive/llap/llap_vector_nohybridgrace.q.out 0e9e1207cb 
>   ql/src/test/results/clientpositive/llap/llapdecider.q.out 69312cd6a2 
>   ql/src/test/results/clientpositive/llap/merge1.q.out 8021b67733 
>   ql/src/test/results/clientpositive/llap/merge2.q.out 7bcdd2d57e 
>   ql/src/test/results/clientpositive/llap/mergejoin.q.out 10fb45d284 
>   ql/src/test/results/clientpositive/llap/mrr.q.out fe477fd815 
>   ql/src/test/results/clientpositive/llap/multiMapJoin2.q.out 25378d3e8b 
>   ql/src/test/results/clientpositive/llap/offset_limit.q.out adfeb05448 
>   ql/src/test/results/clientpositive/llap/offset_limit_ppd_optimizer.q.out 48e58e1bd9 
>   ql/src/test/results/clientpositive/llap/parallel_colstats.q.out 95ed8b813a 
>   ql/src/test/results/clientpositive/llap/ptf.q.out fbaf1e6474 
>   ql/src/test/results/clientpositive/llap/ptf_streaming.q.out 6013c11c9e 
>   ql/src/test/results/clientpositive/llap/reduce_deduplicate_extended.q.out bc44db795a 
>   ql/src/test/results/clientpositive/llap/semijoin_hint.q.out 76c985e727 
>   ql/src/test/results/clientpositive/llap/skewjoin.q.out dc79b26020 
>   ql/src/test/results/clientpositive/llap/subquery_exists.q.out 0749872253 
>   ql/src/test/results/clientpositive/llap/subquery_in.q.out 1d5f547adf 
>   ql/src/test/results/clientpositive/llap/subquery_multi.q.out 41b8ea3f25 
>   ql/src/test/results/clientpositive/llap/subquery_notin.q.out c760af2f87 
>   ql/src/test/results/clientpositive/llap/subquery_scalar.q.out 03d5f191b6 
>   ql/src/test/results/clientpositive/llap/subquery_select.q.out 4917c42b0e 
>   ql/src/test/results/clientpositive/llap/subquery_views.q.out b64e0f49c6 
>   ql/src/test/results/clientpositive/llap/sysdb.q.out fbbf8d9b7f 
>   ql/src/test/results/clientpositive/llap/tez_dml.q.out 786929e7af 
>   ql/src/test/results/clientpositive/llap/tez_dynpart_hashjoin_1.q.out 626a0640da 
>   ql/src/test/results/clientpositive/llap/tez_dynpart_hashjoin_2.q.out 273cf06e31 
>   ql/src/test/results/clientpositive/llap/tez_dynpart_hashjoin_3.q.out 04e0f86fe2 
>   ql/src/test/results/clientpositive/llap/tez_join_hash.q.out 92a188e18a 
>   ql/src/test/results/clientpositive/llap/tez_join_tests.q.out 1e1b63d95e 
>   ql/src/test/results/clientpositive/llap/tez_joins_explain.q.out 5297d7ecbc 
>   ql/src/test/results/clientpositive/llap/tez_smb_main.q.out 66d7aeca70 
>   ql/src/test/results/clientpositive/llap/tez_union.q.out c72b232b35 
>   ql/src/test/results/clientpositive/llap/tez_union2.q.out 7b45c7c719 
>   ql/src/test/results/clientpositive/llap/tez_union_multiinsert.q.out 5e0d072095 
>   ql/src/test/results/clientpositive/llap/tez_vector_dynpart_hashjoin_1.q.out a38e339012 
>   ql/src/test/results/clientpositive/llap/tez_vector_dynpart_hashjoin_2.q.out f7c694bf84 
>   ql/src/test/results/clientpositive/llap/unionDistinct_1.q.out 602d57625c 
>   ql/src/test/results/clientpositive/llap/union_top_level.q.out 268e0413cd 
>   ql/src/test/results/clientpositive/llap/varchar_udf1.q.out 8162305df6 
>   ql/src/test/results/clientpositive/llap/vector_groupby_mapjoin.q.out 16b716c4e5 
>   ql/src/test/results/clientpositive/llap/vector_left_outer_join.q.out 6118c73c2b 
>   ql/src/test/results/clientpositive/llap/vector_mapjoin_reduce.q.out 8df5a64fee 
>   ql/src/test/results/clientpositive/llap/vector_outer_join1.q.out d95604d722 
>   ql/src/test/results/clientpositive/llap/vector_outer_join2.q.out 16de760424 
>   ql/src/test/results/clientpositive/llap/vector_udf1.q.out 16edaacf94 
>   ql/src/test/results/clientpositive/llap/vectorization_0.q.out fba9c07350 
>   ql/src/test/results/clientpositive/llap/vectorization_8.q.out 0d5b6d53e0 
>   ql/src/test/results/clientpositive/llap/vectorization_short_regress.q.out 00577620d8 
>   ql/src/test/results/clientpositive/llap/vectorized_distinct_gby.q.out c3e5f7c90d 
>   ql/src/test/results/clientpositive/llap/vectorized_dynamic_semijoin_reduction.q.out 9a1c44c3e6 
>   ql/src/test/results/clientpositive/llap/vectorized_dynamic_semijoin_reduction2.q.out a03466f859 
>   ql/src/test/results/clientpositive/llap/vectorized_mapjoin.q.out 8160bc7c44 
>   ql/src/test/results/clientpositive/llap/vectorized_multi_output_select.q.out f744eb6513 
>   ql/src/test/results/clientpositive/llap/vectorized_nested_mapjoin.q.out 28a8340a9c 
>   ql/src/test/results/clientpositive/llap/vectorized_shufflejoin.q.out 73ab9fca82 
>   ql/src/test/results/clientpositive/llap/windowing_gby.q.out 2c47b8b2a6 
>   ql/src/test/results/clientpositive/parallel_colstats.q.out c85113137b 
>   ql/src/test/results/clientpositive/partial_column_stats.q.out 5876efacf3 
>   ql/src/test/results/clientpositive/partition_coltype_literals.q.out 3505556029 
>   ql/src/test/results/clientpositive/reduceSinkDeDuplication_pRS_key_empty.q.out 3ad09e815c 
>   ql/src/test/results/clientpositive/remove_exprs_stats.q.out 33cf90ae9d 
>   ql/src/test/results/clientpositive/rename_external_partition_location.q.out ec4076f908 
>   ql/src/test/results/clientpositive/rename_table_update_column_stats.q.out b3d6f039ac 
>   ql/src/test/results/clientpositive/temp_table_display_colstats_tbllvl.q.out bd024a7ab1 
>   ql/src/test/results/clientpositive/tez/explainanalyze_1.q.out 6602222ed7 
>   ql/src/test/results/clientpositive/tez/explainanalyze_2.q.out 0916565f0f 
>   ql/src/test/results/clientpositive/tez/explainanalyze_3.q.out e5c8d6c51e 
>   ql/src/test/results/clientpositive/tez/explainanalyze_4.q.out 9fbe8c5263 
>   ql/src/test/results/clientpositive/tez/explainanalyze_5.q.out b35e294813 
>   ql/src/test/results/clientpositive/tez/explainuser_3.q.out 65c9114b20 
>   ql/src/test/results/clientpositive/tez/hybridgrace_hashjoin_1.q.out 0a1e039cf1 
>   ql/src/test/results/clientpositive/tez/hybridgrace_hashjoin_2.q.out 6f5a3a96ca 
>   ql/src/test/results/clientpositive/tez/vectorization_limit.q.out afcae8c34c 
>   ql/src/test/results/clientpositive/tunable_ndv.q.out 6ae54b4927 
> 
> 
> Diff: https://reviews.apache.org/r/60753/diff/3/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> pengcheng xiong
> 
>


Re: Review Request 60753: Add HLL as an alternative to FM sketch to compute stats

Posted by pengcheng xiong <px...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/60753/
-----------------------------------------------------------

(Updated July 14, 2017, 9:02 p.m.)


Review request for hive, Ashutosh Chauhan and Prasanth_J.


Repository: hive-git


Description
-------

HIVE-16966


Diffs (updated)
-----

  common/pom.xml 023f084511 
  common/src/java/org/apache/hadoop/hive/common/ndv/NumDistinctValueEstimatorFactory.java PRE-CREATION 
  common/src/java/org/apache/hadoop/hive/common/ndv/hll/HLLConstants.java PRE-CREATION 
  common/src/java/org/apache/hadoop/hive/common/ndv/hll/HLLDenseRegister.java PRE-CREATION 
  common/src/java/org/apache/hadoop/hive/common/ndv/hll/HLLRegister.java PRE-CREATION 
  common/src/java/org/apache/hadoop/hive/common/ndv/hll/HLLSparseRegister.java PRE-CREATION 
  common/src/java/org/apache/hadoop/hive/common/ndv/hll/HyperLogLog.java PRE-CREATION 
  common/src/java/org/apache/hadoop/hive/common/ndv/hll/HyperLogLogUtils.java PRE-CREATION 
  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java c7afe2bc4a 
  common/src/test/org/apache/hadoop/hive/common/ndv/hll/TestHLLNoBias.java PRE-CREATION 
  common/src/test/org/apache/hadoop/hive/common/ndv/hll/TestHLLSerialization.java PRE-CREATION 
  common/src/test/org/apache/hadoop/hive/common/ndv/hll/TestHyperLogLog.java PRE-CREATION 
  common/src/test/org/apache/hadoop/hive/common/ndv/hll/TestHyperLogLogDense.java PRE-CREATION 
  common/src/test/org/apache/hadoop/hive/common/ndv/hll/TestHyperLogLogSparse.java PRE-CREATION 
  common/src/test/org/apache/hadoop/hive/common/ndv/hll/TestSparseEncodeHash.java PRE-CREATION 
  metastore/src/java/org/apache/hadoop/hive/metastore/NumDistinctValueEstimator.java 92f9a845e3 
  metastore/src/java/org/apache/hadoop/hive/metastore/hbase/StatsCache.java 18f8afc9ad 
  metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/ColumnStatsAggregator.java 31955b4363 
  metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/ColumnStatsAggregatorFactory.java daf85692eb 
  metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/DecimalColumnStatsAggregator.java 36b2c9c56b 
  metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/DoubleColumnStatsAggregator.java a88ef84e5c 
  metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/LongColumnStatsAggregator.java 8ac6561aec 
  metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/StringColumnStatsAggregator.java 2aa4046a46 
  metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/ColumnStatsMerger.java 33c7e3e52c 
  metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/ColumnStatsMergerFactory.java fe890e4e27 
  metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/DateColumnStatsMerger.java 3179b23438 
  metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/DecimalColumnStatsMerger.java c13add9d9c 
  metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/DoubleColumnStatsMerger.java fbdba24b0a 
  metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/LongColumnStatsMerger.java ac65590505 
  metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/StringColumnStatsMerger.java 41587477d3 
  metastore/src/test/org/apache/hadoop/hive/metastore/hbase/TestHBaseAggregateStatsNDVUniformDist.java 87b1ac870d 
  ql/pom.xml 5732965e47 
  ql/src/java/org/apache/hadoop/hive/ql/parse/ColumnStatsSemanticAnalyzer.java 0a5cf00c44 
  ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUtils.java 76f7daeb1b 
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/DecimalNumDistinctValueEstimator.java a05906edfa 
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/DoubleNumDistinctValueEstimator.java e76fc74dbc 
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFComputeStats.java 2ebfcb2360 
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/LongNumDistinctValueEstimator.java 1c197a028a 
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/NumDistinctValueEstimator.java fa70f49857 
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/StringNumDistinctValueEstimator.java 601901c163 
  ql/src/test/queries/clientpositive/char_udf1.q 39aa0e0e17 
  ql/src/test/queries/clientpositive/compute_stats_date.q 09128f6fb9 
  ql/src/test/queries/clientpositive/compute_stats_decimal.q 76e1468ada 
  ql/src/test/queries/clientpositive/compute_stats_double.q 7a1e0f6295 
  ql/src/test/queries/clientpositive/compute_stats_long.q 6a2070f780 
  ql/src/test/queries/clientpositive/compute_stats_string.q 0023e7f6bd 
  ql/src/test/queries/clientpositive/hll.q PRE-CREATION 
  ql/src/test/queries/clientpositive/reduceSinkDeDuplication_pRS_key_empty.q 8bbae3914d 
  ql/src/test/queries/clientpositive/varchar_udf1.q 4d1f884ea7 
  ql/src/test/queries/clientpositive/vector_udf1.q 48d3e1ee4d 
  ql/src/test/results/clientpositive/alter_partition_update_status.q.out 922822e6d2 
  ql/src/test/results/clientpositive/alter_table_column_stats.q.out 2cc7cbc7b6 
  ql/src/test/results/clientpositive/alter_table_update_status.q.out e26e8cba1c 
  ql/src/test/results/clientpositive/analyze_tbl_part.q.out ed90b6fc92 
  ql/src/test/results/clientpositive/annotate_stats_deep_filters.q.out 95dd6abaec 
  ql/src/test/results/clientpositive/annotate_stats_groupby.q.out a8e4854a00 
  ql/src/test/results/clientpositive/annotate_stats_join.q.out c1a140b558 
  ql/src/test/results/clientpositive/annotate_stats_join_pkfk.q.out f7d73c9ddf 
  ql/src/test/results/clientpositive/autoColumnStats_4.q.out fe3b9e53ef 
  ql/src/test/results/clientpositive/autoColumnStats_5.q.out e19fb5f504 
  ql/src/test/results/clientpositive/autoColumnStats_6.q.out 29b3373e10 
  ql/src/test/results/clientpositive/autoColumnStats_7.q.out 9d24bc53ab 
  ql/src/test/results/clientpositive/autoColumnStats_8.q.out 681d962ed0 
  ql/src/test/results/clientpositive/autoColumnStats_9.q.out d26e2c02b7 
  ql/src/test/results/clientpositive/auto_join_without_localtask.q.out 17a912ec13 
  ql/src/test/results/clientpositive/avro_decimal.q.out 5a3b72defe 
  ql/src/test/results/clientpositive/avro_decimal_native.q.out fe77512191 
  ql/src/test/results/clientpositive/cbo_rp_annotate_stats_groupby.q.out f260f034b6 
  ql/src/test/results/clientpositive/cbo_rp_join0.q.out b9cf3ceab4 
  ql/src/test/results/clientpositive/char_udf1.q.out 07ce108a75 
  ql/src/test/results/clientpositive/colstats_all_nulls.q.out 14c5d5b59b 
  ql/src/test/results/clientpositive/column_pruner_multiple_children.q.out 96feeed49c 
  ql/src/test/results/clientpositive/columnstats_partlvl.q.out 07d26e92bb 
  ql/src/test/results/clientpositive/columnstats_partlvl_dp.q.out 468d2e797b 
  ql/src/test/results/clientpositive/columnstats_quoting.q.out 52e35385a1 
  ql/src/test/results/clientpositive/columnstats_tbllvl.q.out 462d4c1771 
  ql/src/test/results/clientpositive/compute_stats_date.q.out c2472377a8 
  ql/src/test/results/clientpositive/compute_stats_decimal.q.out e0584c50a8 
  ql/src/test/results/clientpositive/compute_stats_double.q.out 5b921735f0 
  ql/src/test/results/clientpositive/compute_stats_long.q.out 119d1731cc 
  ql/src/test/results/clientpositive/compute_stats_string.q.out 8c40490bc0 
  ql/src/test/results/clientpositive/confirm_initial_tbl_stats.q.out faa14ba9c5 
  ql/src/test/results/clientpositive/constant_prop_2.q.out 24be5188e2 
  ql/src/test/results/clientpositive/correlated_join_keys.q.out ec5d008728 
  ql/src/test/results/clientpositive/cross_join_merge.q.out f4956ded22 
  ql/src/test/results/clientpositive/decimal_stats.q.out 5d86866e2a 
  ql/src/test/results/clientpositive/describe_table.q.out 7869494252 
  ql/src/test/results/clientpositive/display_colstats_tbllvl.q.out a4b18d7cec 
  ql/src/test/results/clientpositive/encrypted/encryption_move_tbl.q.out 06f8408327 
  ql/src/test/results/clientpositive/exec_parallel_column_stats.q.out f256ec11bf 
  ql/src/test/results/clientpositive/hll.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/infer_bucket_sort.q.out 1aea388815 
  ql/src/test/results/clientpositive/join32.q.out a191284aca 
  ql/src/test/results/clientpositive/join33.q.out a191284aca 
  ql/src/test/results/clientpositive/join_parse.q.out 17733acdc3 
  ql/src/test/results/clientpositive/llap/autoColumnStats_2.q.out ec209b2ef1 
  ql/src/test/results/clientpositive/llap/auto_join1.q.out 6a0a1d5d09 
  ql/src/test/results/clientpositive/llap/auto_join21.q.out 25cd67e7f5 
  ql/src/test/results/clientpositive/llap/auto_join29.q.out f693ce4512 
  ql/src/test/results/clientpositive/llap/auto_join30.q.out 91a80127a9 
  ql/src/test/results/clientpositive/llap/cluster.q.out 2fa976b6d5 
  ql/src/test/results/clientpositive/llap/column_table_stats.q.out fb04ee8cf9 
  ql/src/test/results/clientpositive/llap/column_table_stats_orc.q.out d55cf30331 
  ql/src/test/results/clientpositive/llap/columnstats_part_coltype.q.out dc50fb7fc1 
  ql/src/test/results/clientpositive/llap/correlationoptimizer1.q.out fab5c9c029 
  ql/src/test/results/clientpositive/llap/correlationoptimizer2.q.out ee645410b9 
  ql/src/test/results/clientpositive/llap/correlationoptimizer3.q.out f1af97d98d 
  ql/src/test/results/clientpositive/llap/correlationoptimizer6.q.out 9b71d3ec82 
  ql/src/test/results/clientpositive/llap/count_dist_rewrite.q.out 6d411365b8 
  ql/src/test/results/clientpositive/llap/cross_join.q.out ae3f9bf6f5 
  ql/src/test/results/clientpositive/llap/dynamic_semijoin_reduction.q.out 7fd7e20e15 
  ql/src/test/results/clientpositive/llap/dynamic_semijoin_user_level.q.out c4b18b7118 
  ql/src/test/results/clientpositive/llap/dynpart_sort_optimization2.q.out d4811d64d7 
  ql/src/test/results/clientpositive/llap/explainuser_1.q.out 87b8cae445 
  ql/src/test/results/clientpositive/llap/explainuser_2.q.out f8a6526c67 
  ql/src/test/results/clientpositive/llap/explainuser_4.q.out 95d13216e7 
  ql/src/test/results/clientpositive/llap/extrapolate_part_stats_partial_ndv.q.out d97223c9d0 
  ql/src/test/results/clientpositive/llap/filter_union.q.out c4af31715c 
  ql/src/test/results/clientpositive/llap/groupby1.q.out 0eecbb6f4e 
  ql/src/test/results/clientpositive/llap/groupby2.q.out 29b85d1f44 
  ql/src/test/results/clientpositive/llap/groupby_resolution.q.out f2a6ab05ac 
  ql/src/test/results/clientpositive/llap/having.q.out 267254c0de 
  ql/src/test/results/clientpositive/llap/hybridgrace_hashjoin_1.q.out 083bfc301c 
  ql/src/test/results/clientpositive/llap/hybridgrace_hashjoin_2.q.out a59188a46a 
  ql/src/test/results/clientpositive/llap/identity_project_remove_skip.q.out b03b96b463 
  ql/src/test/results/clientpositive/llap/jdbc_handler.q.out b4feb0ee1b 
  ql/src/test/results/clientpositive/llap/join1.q.out d79a405a41 
  ql/src/test/results/clientpositive/llap/join32_lessSize.q.out c226eed126 
  ql/src/test/results/clientpositive/llap/join_max_hashtable.q.out 85d45fe712 
  ql/src/test/results/clientpositive/llap/limit_join_transpose.q.out 99d8d6f664 
  ql/src/test/results/clientpositive/llap/limit_pushdown.q.out 93315e6b1b 
  ql/src/test/results/clientpositive/llap/limit_pushdown3.q.out 351ee01b45 
  ql/src/test/results/clientpositive/llap/llap_smb.q.out e3044baf52 
  ql/src/test/results/clientpositive/llap/llap_stats.q.out f81ad50679 
  ql/src/test/results/clientpositive/llap/llap_vector_nohybridgrace.q.out 0e9e1207cb 
  ql/src/test/results/clientpositive/llap/llapdecider.q.out 69312cd6a2 
  ql/src/test/results/clientpositive/llap/merge1.q.out 8021b67733 
  ql/src/test/results/clientpositive/llap/merge2.q.out 7bcdd2d57e 
  ql/src/test/results/clientpositive/llap/mergejoin.q.out 10fb45d284 
  ql/src/test/results/clientpositive/llap/mrr.q.out fe477fd815 
  ql/src/test/results/clientpositive/llap/multiMapJoin2.q.out 25378d3e8b 
  ql/src/test/results/clientpositive/llap/offset_limit.q.out adfeb05448 
  ql/src/test/results/clientpositive/llap/offset_limit_ppd_optimizer.q.out 48e58e1bd9 
  ql/src/test/results/clientpositive/llap/parallel_colstats.q.out 95ed8b813a 
  ql/src/test/results/clientpositive/llap/ptf.q.out fbaf1e6474 
  ql/src/test/results/clientpositive/llap/ptf_streaming.q.out 6013c11c9e 
  ql/src/test/results/clientpositive/llap/reduce_deduplicate_extended.q.out bc44db795a 
  ql/src/test/results/clientpositive/llap/semijoin_hint.q.out 76c985e727 
  ql/src/test/results/clientpositive/llap/skewjoin.q.out dc79b26020 
  ql/src/test/results/clientpositive/llap/subquery_exists.q.out 0749872253 
  ql/src/test/results/clientpositive/llap/subquery_in.q.out 1d5f547adf 
  ql/src/test/results/clientpositive/llap/subquery_multi.q.out 41b8ea3f25 
  ql/src/test/results/clientpositive/llap/subquery_notin.q.out c760af2f87 
  ql/src/test/results/clientpositive/llap/subquery_scalar.q.out 03d5f191b6 
  ql/src/test/results/clientpositive/llap/subquery_select.q.out 4917c42b0e 
  ql/src/test/results/clientpositive/llap/subquery_views.q.out b64e0f49c6 
  ql/src/test/results/clientpositive/llap/sysdb.q.out fbbf8d9b7f 
  ql/src/test/results/clientpositive/llap/tez_dml.q.out 786929e7af 
  ql/src/test/results/clientpositive/llap/tez_dynpart_hashjoin_1.q.out 626a0640da 
  ql/src/test/results/clientpositive/llap/tez_dynpart_hashjoin_2.q.out 273cf06e31 
  ql/src/test/results/clientpositive/llap/tez_dynpart_hashjoin_3.q.out 04e0f86fe2 
  ql/src/test/results/clientpositive/llap/tez_join_hash.q.out 92a188e18a 
  ql/src/test/results/clientpositive/llap/tez_join_tests.q.out 1e1b63d95e 
  ql/src/test/results/clientpositive/llap/tez_joins_explain.q.out 5297d7ecbc 
  ql/src/test/results/clientpositive/llap/tez_smb_main.q.out 66d7aeca70 
  ql/src/test/results/clientpositive/llap/tez_union.q.out c72b232b35 
  ql/src/test/results/clientpositive/llap/tez_union2.q.out 7b45c7c719 
  ql/src/test/results/clientpositive/llap/tez_union_multiinsert.q.out 5e0d072095 
  ql/src/test/results/clientpositive/llap/tez_vector_dynpart_hashjoin_1.q.out a38e339012 
  ql/src/test/results/clientpositive/llap/tez_vector_dynpart_hashjoin_2.q.out f7c694bf84 
  ql/src/test/results/clientpositive/llap/unionDistinct_1.q.out 602d57625c 
  ql/src/test/results/clientpositive/llap/union_top_level.q.out 268e0413cd 
  ql/src/test/results/clientpositive/llap/varchar_udf1.q.out 8162305df6 
  ql/src/test/results/clientpositive/llap/vector_groupby_mapjoin.q.out 16b716c4e5 
  ql/src/test/results/clientpositive/llap/vector_left_outer_join.q.out 6118c73c2b 
  ql/src/test/results/clientpositive/llap/vector_mapjoin_reduce.q.out 8df5a64fee 
  ql/src/test/results/clientpositive/llap/vector_outer_join1.q.out d95604d722 
  ql/src/test/results/clientpositive/llap/vector_outer_join2.q.out 16de760424 
  ql/src/test/results/clientpositive/llap/vector_udf1.q.out 16edaacf94 
  ql/src/test/results/clientpositive/llap/vectorization_0.q.out fba9c07350 
  ql/src/test/results/clientpositive/llap/vectorization_8.q.out 0d5b6d53e0 
  ql/src/test/results/clientpositive/llap/vectorization_short_regress.q.out 00577620d8 
  ql/src/test/results/clientpositive/llap/vectorized_distinct_gby.q.out c3e5f7c90d 
  ql/src/test/results/clientpositive/llap/vectorized_dynamic_semijoin_reduction.q.out 9a1c44c3e6 
  ql/src/test/results/clientpositive/llap/vectorized_dynamic_semijoin_reduction2.q.out a03466f859 
  ql/src/test/results/clientpositive/llap/vectorized_mapjoin.q.out 8160bc7c44 
  ql/src/test/results/clientpositive/llap/vectorized_multi_output_select.q.out f744eb6513 
  ql/src/test/results/clientpositive/llap/vectorized_nested_mapjoin.q.out 28a8340a9c 
  ql/src/test/results/clientpositive/llap/vectorized_shufflejoin.q.out 73ab9fca82 
  ql/src/test/results/clientpositive/llap/windowing_gby.q.out 2c47b8b2a6 
  ql/src/test/results/clientpositive/parallel_colstats.q.out c85113137b 
  ql/src/test/results/clientpositive/partial_column_stats.q.out 5876efacf3 
  ql/src/test/results/clientpositive/partition_coltype_literals.q.out 3505556029 
  ql/src/test/results/clientpositive/reduceSinkDeDuplication_pRS_key_empty.q.out 3ad09e815c 
  ql/src/test/results/clientpositive/remove_exprs_stats.q.out 33cf90ae9d 
  ql/src/test/results/clientpositive/rename_external_partition_location.q.out ec4076f908 
  ql/src/test/results/clientpositive/rename_table_update_column_stats.q.out b3d6f039ac 
  ql/src/test/results/clientpositive/temp_table_display_colstats_tbllvl.q.out bd024a7ab1 
  ql/src/test/results/clientpositive/tez/explainanalyze_1.q.out 6602222ed7 
  ql/src/test/results/clientpositive/tez/explainanalyze_2.q.out 0916565f0f 
  ql/src/test/results/clientpositive/tez/explainanalyze_3.q.out e5c8d6c51e 
  ql/src/test/results/clientpositive/tez/explainanalyze_4.q.out 9fbe8c5263 
  ql/src/test/results/clientpositive/tez/explainanalyze_5.q.out b35e294813 
  ql/src/test/results/clientpositive/tez/explainuser_3.q.out 65c9114b20 
  ql/src/test/results/clientpositive/tez/hybridgrace_hashjoin_1.q.out 0a1e039cf1 
  ql/src/test/results/clientpositive/tez/hybridgrace_hashjoin_2.q.out 6f5a3a96ca 
  ql/src/test/results/clientpositive/tez/vectorization_limit.q.out afcae8c34c 
  ql/src/test/results/clientpositive/tunable_ndv.q.out 6ae54b4927 


Diff: https://reviews.apache.org/r/60753/diff/4/

Changes: https://reviews.apache.org/r/60753/diff/3-4/


Testing
-------


Thanks,

pengcheng xiong


Re: Review Request 60753: Add HLL as an alternative to FM sketch to compute stats

Posted by pengcheng xiong <px...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/60753/
-----------------------------------------------------------

(Updated July 14, 2017, 12:08 a.m.)


Review request for hive, Ashutosh Chauhan and Prasanth_J.


Repository: hive-git


Description
-------

HIVE-16966


Diffs (updated)
-----

  common/pom.xml 023f084511 
  common/src/java/org/apache/hadoop/hive/common/ndv/NumDistinctValueEstimatorFactory.java PRE-CREATION 
  common/src/java/org/apache/hadoop/hive/common/ndv/hll/HLLConstants.java PRE-CREATION 
  common/src/java/org/apache/hadoop/hive/common/ndv/hll/HLLDenseRegister.java PRE-CREATION 
  common/src/java/org/apache/hadoop/hive/common/ndv/hll/HLLRegister.java PRE-CREATION 
  common/src/java/org/apache/hadoop/hive/common/ndv/hll/HLLSparseRegister.java PRE-CREATION 
  common/src/java/org/apache/hadoop/hive/common/ndv/hll/HyperLogLog.java PRE-CREATION 
  common/src/java/org/apache/hadoop/hive/common/ndv/hll/HyperLogLogUtils.java PRE-CREATION 
  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java c7afe2bc4a 
  metastore/src/java/org/apache/hadoop/hive/metastore/NumDistinctValueEstimator.java 92f9a845e3 
  metastore/src/java/org/apache/hadoop/hive/metastore/hbase/StatsCache.java 18f8afc9ad 
  metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/ColumnStatsAggregator.java 31955b4363 
  metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/ColumnStatsAggregatorFactory.java daf85692eb 
  metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/DecimalColumnStatsAggregator.java 36b2c9c56b 
  metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/DoubleColumnStatsAggregator.java a88ef84e5c 
  metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/LongColumnStatsAggregator.java 8ac6561aec 
  metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/StringColumnStatsAggregator.java 2aa4046a46 
  metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/ColumnStatsMerger.java 33c7e3e52c 
  metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/ColumnStatsMergerFactory.java fe890e4e27 
  metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/DateColumnStatsMerger.java 3179b23438 
  metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/DecimalColumnStatsMerger.java c13add9d9c 
  metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/DoubleColumnStatsMerger.java fbdba24b0a 
  metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/LongColumnStatsMerger.java ac65590505 
  metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/StringColumnStatsMerger.java 41587477d3 
  metastore/src/test/org/apache/hadoop/hive/metastore/hbase/TestHBaseAggregateStatsNDVUniformDist.java 87b1ac870d 
  pom.xml 32f5fd1493 
  ql/src/java/org/apache/hadoop/hive/ql/parse/ColumnStatsSemanticAnalyzer.java 0a5cf00c44 
  ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUtils.java 76f7daeb1b 
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/DecimalNumDistinctValueEstimator.java a05906edfa 
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/DoubleNumDistinctValueEstimator.java e76fc74dbc 
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFComputeStats.java 2ebfcb2360 
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/LongNumDistinctValueEstimator.java 1c197a028a 
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/NumDistinctValueEstimator.java fa70f49857 
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/StringNumDistinctValueEstimator.java 601901c163 
  ql/src/test/queries/clientpositive/char_udf1.q 39aa0e0e17 
  ql/src/test/queries/clientpositive/compute_stats_date.q 09128f6fb9 
  ql/src/test/queries/clientpositive/compute_stats_decimal.q 76e1468ada 
  ql/src/test/queries/clientpositive/compute_stats_double.q 7a1e0f6295 
  ql/src/test/queries/clientpositive/compute_stats_long.q 6a2070f780 
  ql/src/test/queries/clientpositive/compute_stats_string.q 0023e7f6bd 
  ql/src/test/queries/clientpositive/hll.q PRE-CREATION 
  ql/src/test/queries/clientpositive/reduceSinkDeDuplication_pRS_key_empty.q 8bbae3914d 
  ql/src/test/queries/clientpositive/varchar_udf1.q 4d1f884ea7 
  ql/src/test/queries/clientpositive/vector_udf1.q 48d3e1ee4d 
  ql/src/test/results/clientpositive/alter_partition_update_status.q.out 922822e6d2 
  ql/src/test/results/clientpositive/alter_table_column_stats.q.out 2cc7cbc7b6 
  ql/src/test/results/clientpositive/alter_table_update_status.q.out e26e8cba1c 
  ql/src/test/results/clientpositive/analyze_tbl_part.q.out ed90b6fc92 
  ql/src/test/results/clientpositive/annotate_stats_deep_filters.q.out 95dd6abaec 
  ql/src/test/results/clientpositive/annotate_stats_groupby.q.out a8e4854a00 
  ql/src/test/results/clientpositive/annotate_stats_join.q.out c1a140b558 
  ql/src/test/results/clientpositive/annotate_stats_join_pkfk.q.out f7d73c9ddf 
  ql/src/test/results/clientpositive/autoColumnStats_4.q.out fe3b9e53ef 
  ql/src/test/results/clientpositive/autoColumnStats_5.q.out e19fb5f504 
  ql/src/test/results/clientpositive/autoColumnStats_6.q.out 29b3373e10 
  ql/src/test/results/clientpositive/autoColumnStats_7.q.out 9d24bc53ab 
  ql/src/test/results/clientpositive/autoColumnStats_8.q.out 681d962ed0 
  ql/src/test/results/clientpositive/autoColumnStats_9.q.out d26e2c02b7 
  ql/src/test/results/clientpositive/auto_join_without_localtask.q.out 17a912ec13 
  ql/src/test/results/clientpositive/avro_decimal.q.out 5a3b72defe 
  ql/src/test/results/clientpositive/avro_decimal_native.q.out fe77512191 
  ql/src/test/results/clientpositive/cbo_rp_annotate_stats_groupby.q.out f260f034b6 
  ql/src/test/results/clientpositive/cbo_rp_join0.q.out b9cf3ceab4 
  ql/src/test/results/clientpositive/char_udf1.q.out 07ce108a75 
  ql/src/test/results/clientpositive/colstats_all_nulls.q.out 14c5d5b59b 
  ql/src/test/results/clientpositive/column_pruner_multiple_children.q.out 96feeed49c 
  ql/src/test/results/clientpositive/columnstats_partlvl.q.out 07d26e92bb 
  ql/src/test/results/clientpositive/columnstats_partlvl_dp.q.out 468d2e797b 
  ql/src/test/results/clientpositive/columnstats_quoting.q.out 52e35385a1 
  ql/src/test/results/clientpositive/columnstats_tbllvl.q.out 462d4c1771 
  ql/src/test/results/clientpositive/compute_stats_date.q.out c2472377a8 
  ql/src/test/results/clientpositive/compute_stats_decimal.q.out e0584c50a8 
  ql/src/test/results/clientpositive/compute_stats_double.q.out 5b921735f0 
  ql/src/test/results/clientpositive/compute_stats_long.q.out 119d1731cc 
  ql/src/test/results/clientpositive/compute_stats_string.q.out 8c40490bc0 
  ql/src/test/results/clientpositive/confirm_initial_tbl_stats.q.out faa14ba9c5 
  ql/src/test/results/clientpositive/constant_prop_2.q.out 24be5188e2 
  ql/src/test/results/clientpositive/correlated_join_keys.q.out ec5d008728 
  ql/src/test/results/clientpositive/cross_join_merge.q.out f4956ded22 
  ql/src/test/results/clientpositive/decimal_stats.q.out 5d86866e2a 
  ql/src/test/results/clientpositive/describe_table.q.out 7869494252 
  ql/src/test/results/clientpositive/display_colstats_tbllvl.q.out a4b18d7cec 
  ql/src/test/results/clientpositive/encrypted/encryption_move_tbl.q.out 06f8408327 
  ql/src/test/results/clientpositive/exec_parallel_column_stats.q.out f256ec11bf 
  ql/src/test/results/clientpositive/hll.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/infer_bucket_sort.q.out 1aea388815 
  ql/src/test/results/clientpositive/join32.q.out a191284aca 
  ql/src/test/results/clientpositive/join33.q.out a191284aca 
  ql/src/test/results/clientpositive/join_parse.q.out 17733acdc3 
  ql/src/test/results/clientpositive/llap/autoColumnStats_2.q.out ec209b2ef1 
  ql/src/test/results/clientpositive/llap/auto_join1.q.out 6a0a1d5d09 
  ql/src/test/results/clientpositive/llap/auto_join21.q.out 25cd67e7f5 
  ql/src/test/results/clientpositive/llap/auto_join29.q.out f693ce4512 
  ql/src/test/results/clientpositive/llap/auto_join30.q.out 91a80127a9 
  ql/src/test/results/clientpositive/llap/cluster.q.out 2fa976b6d5 
  ql/src/test/results/clientpositive/llap/column_table_stats.q.out fb04ee8cf9 
  ql/src/test/results/clientpositive/llap/column_table_stats_orc.q.out d55cf30331 
  ql/src/test/results/clientpositive/llap/columnstats_part_coltype.q.out dc50fb7fc1 
  ql/src/test/results/clientpositive/llap/correlationoptimizer1.q.out fab5c9c029 
  ql/src/test/results/clientpositive/llap/correlationoptimizer2.q.out ee645410b9 
  ql/src/test/results/clientpositive/llap/correlationoptimizer3.q.out f1af97d98d 
  ql/src/test/results/clientpositive/llap/correlationoptimizer6.q.out 9b71d3ec82 
  ql/src/test/results/clientpositive/llap/count_dist_rewrite.q.out 6d411365b8 
  ql/src/test/results/clientpositive/llap/cross_join.q.out ae3f9bf6f5 
  ql/src/test/results/clientpositive/llap/dynamic_semijoin_reduction.q.out 7fd7e20e15 
  ql/src/test/results/clientpositive/llap/dynamic_semijoin_user_level.q.out c4b18b7118 
  ql/src/test/results/clientpositive/llap/dynpart_sort_optimization2.q.out d4811d64d7 
  ql/src/test/results/clientpositive/llap/explainuser_1.q.out 87b8cae445 
  ql/src/test/results/clientpositive/llap/explainuser_2.q.out f8a6526c67 
  ql/src/test/results/clientpositive/llap/explainuser_4.q.out 95d13216e7 
  ql/src/test/results/clientpositive/llap/extrapolate_part_stats_partial_ndv.q.out d97223c9d0 
  ql/src/test/results/clientpositive/llap/filter_union.q.out c4af31715c 
  ql/src/test/results/clientpositive/llap/groupby1.q.out 0eecbb6f4e 
  ql/src/test/results/clientpositive/llap/groupby2.q.out 29b85d1f44 
  ql/src/test/results/clientpositive/llap/groupby_resolution.q.out f2a6ab05ac 
  ql/src/test/results/clientpositive/llap/having.q.out 267254c0de 
  ql/src/test/results/clientpositive/llap/hybridgrace_hashjoin_1.q.out 083bfc301c 
  ql/src/test/results/clientpositive/llap/hybridgrace_hashjoin_2.q.out a59188a46a 
  ql/src/test/results/clientpositive/llap/identity_project_remove_skip.q.out b03b96b463 
  ql/src/test/results/clientpositive/llap/jdbc_handler.q.out b4feb0ee1b 
  ql/src/test/results/clientpositive/llap/join1.q.out d79a405a41 
  ql/src/test/results/clientpositive/llap/join32_lessSize.q.out c226eed126 
  ql/src/test/results/clientpositive/llap/join_max_hashtable.q.out 85d45fe712 
  ql/src/test/results/clientpositive/llap/limit_join_transpose.q.out 99d8d6f664 
  ql/src/test/results/clientpositive/llap/limit_pushdown.q.out 93315e6b1b 
  ql/src/test/results/clientpositive/llap/limit_pushdown3.q.out 351ee01b45 
  ql/src/test/results/clientpositive/llap/llap_smb.q.out e3044baf52 
  ql/src/test/results/clientpositive/llap/llap_stats.q.out f81ad50679 
  ql/src/test/results/clientpositive/llap/llap_vector_nohybridgrace.q.out 0e9e1207cb 
  ql/src/test/results/clientpositive/llap/llapdecider.q.out 69312cd6a2 
  ql/src/test/results/clientpositive/llap/merge1.q.out 8021b67733 
  ql/src/test/results/clientpositive/llap/merge2.q.out 7bcdd2d57e 
  ql/src/test/results/clientpositive/llap/mergejoin.q.out 10fb45d284 
  ql/src/test/results/clientpositive/llap/mrr.q.out fe477fd815 
  ql/src/test/results/clientpositive/llap/multiMapJoin2.q.out 25378d3e8b 
  ql/src/test/results/clientpositive/llap/offset_limit.q.out adfeb05448 
  ql/src/test/results/clientpositive/llap/offset_limit_ppd_optimizer.q.out 48e58e1bd9 
  ql/src/test/results/clientpositive/llap/parallel_colstats.q.out 95ed8b813a 
  ql/src/test/results/clientpositive/llap/ptf.q.out fbaf1e6474 
  ql/src/test/results/clientpositive/llap/ptf_streaming.q.out 6013c11c9e 
  ql/src/test/results/clientpositive/llap/reduce_deduplicate_extended.q.out bc44db795a 
  ql/src/test/results/clientpositive/llap/semijoin_hint.q.out 76c985e727 
  ql/src/test/results/clientpositive/llap/skewjoin.q.out dc79b26020 
  ql/src/test/results/clientpositive/llap/subquery_exists.q.out 0749872253 
  ql/src/test/results/clientpositive/llap/subquery_in.q.out 1d5f547adf 
  ql/src/test/results/clientpositive/llap/subquery_multi.q.out 41b8ea3f25 
  ql/src/test/results/clientpositive/llap/subquery_notin.q.out c760af2f87 
  ql/src/test/results/clientpositive/llap/subquery_scalar.q.out 03d5f191b6 
  ql/src/test/results/clientpositive/llap/subquery_select.q.out 4917c42b0e 
  ql/src/test/results/clientpositive/llap/subquery_views.q.out b64e0f49c6 
  ql/src/test/results/clientpositive/llap/sysdb.q.out fbbf8d9b7f 
  ql/src/test/results/clientpositive/llap/tez_dml.q.out 786929e7af 
  ql/src/test/results/clientpositive/llap/tez_dynpart_hashjoin_1.q.out 626a0640da 
  ql/src/test/results/clientpositive/llap/tez_dynpart_hashjoin_2.q.out 273cf06e31 
  ql/src/test/results/clientpositive/llap/tez_dynpart_hashjoin_3.q.out 04e0f86fe2 
  ql/src/test/results/clientpositive/llap/tez_join_hash.q.out 92a188e18a 
  ql/src/test/results/clientpositive/llap/tez_join_tests.q.out 1e1b63d95e 
  ql/src/test/results/clientpositive/llap/tez_joins_explain.q.out 5297d7ecbc 
  ql/src/test/results/clientpositive/llap/tez_smb_main.q.out 66d7aeca70 
  ql/src/test/results/clientpositive/llap/tez_union.q.out c72b232b35 
  ql/src/test/results/clientpositive/llap/tez_union2.q.out 7b45c7c719 
  ql/src/test/results/clientpositive/llap/tez_union_multiinsert.q.out 5e0d072095 
  ql/src/test/results/clientpositive/llap/tez_vector_dynpart_hashjoin_1.q.out a38e339012 
  ql/src/test/results/clientpositive/llap/tez_vector_dynpart_hashjoin_2.q.out f7c694bf84 
  ql/src/test/results/clientpositive/llap/unionDistinct_1.q.out 602d57625c 
  ql/src/test/results/clientpositive/llap/union_top_level.q.out 268e0413cd 
  ql/src/test/results/clientpositive/llap/varchar_udf1.q.out 8162305df6 
  ql/src/test/results/clientpositive/llap/vector_groupby_mapjoin.q.out 16b716c4e5 
  ql/src/test/results/clientpositive/llap/vector_left_outer_join.q.out 6118c73c2b 
  ql/src/test/results/clientpositive/llap/vector_mapjoin_reduce.q.out 8df5a64fee 
  ql/src/test/results/clientpositive/llap/vector_outer_join1.q.out d95604d722 
  ql/src/test/results/clientpositive/llap/vector_outer_join2.q.out 16de760424 
  ql/src/test/results/clientpositive/llap/vector_udf1.q.out 16edaacf94 
  ql/src/test/results/clientpositive/llap/vectorization_0.q.out fba9c07350 
  ql/src/test/results/clientpositive/llap/vectorization_8.q.out 0d5b6d53e0 
  ql/src/test/results/clientpositive/llap/vectorization_short_regress.q.out 00577620d8 
  ql/src/test/results/clientpositive/llap/vectorized_distinct_gby.q.out c3e5f7c90d 
  ql/src/test/results/clientpositive/llap/vectorized_dynamic_semijoin_reduction.q.out 9a1c44c3e6 
  ql/src/test/results/clientpositive/llap/vectorized_dynamic_semijoin_reduction2.q.out a03466f859 
  ql/src/test/results/clientpositive/llap/vectorized_mapjoin.q.out 8160bc7c44 
  ql/src/test/results/clientpositive/llap/vectorized_multi_output_select.q.out f744eb6513 
  ql/src/test/results/clientpositive/llap/vectorized_nested_mapjoin.q.out 28a8340a9c 
  ql/src/test/results/clientpositive/llap/vectorized_shufflejoin.q.out 73ab9fca82 
  ql/src/test/results/clientpositive/llap/windowing_gby.q.out 2c47b8b2a6 
  ql/src/test/results/clientpositive/parallel_colstats.q.out c85113137b 
  ql/src/test/results/clientpositive/partial_column_stats.q.out 5876efacf3 
  ql/src/test/results/clientpositive/partition_coltype_literals.q.out 3505556029 
  ql/src/test/results/clientpositive/reduceSinkDeDuplication_pRS_key_empty.q.out 3ad09e815c 
  ql/src/test/results/clientpositive/remove_exprs_stats.q.out 33cf90ae9d 
  ql/src/test/results/clientpositive/rename_external_partition_location.q.out ec4076f908 
  ql/src/test/results/clientpositive/rename_table_update_column_stats.q.out b3d6f039ac 
  ql/src/test/results/clientpositive/temp_table_display_colstats_tbllvl.q.out bd024a7ab1 
  ql/src/test/results/clientpositive/tez/explainanalyze_1.q.out 6602222ed7 
  ql/src/test/results/clientpositive/tez/explainanalyze_2.q.out 0916565f0f 
  ql/src/test/results/clientpositive/tez/explainanalyze_3.q.out e5c8d6c51e 
  ql/src/test/results/clientpositive/tez/explainanalyze_4.q.out 9fbe8c5263 
  ql/src/test/results/clientpositive/tez/explainanalyze_5.q.out b35e294813 
  ql/src/test/results/clientpositive/tez/explainuser_3.q.out 65c9114b20 
  ql/src/test/results/clientpositive/tez/hybridgrace_hashjoin_1.q.out 0a1e039cf1 
  ql/src/test/results/clientpositive/tez/hybridgrace_hashjoin_2.q.out 6f5a3a96ca 
  ql/src/test/results/clientpositive/tez/vectorization_limit.q.out afcae8c34c 
  ql/src/test/results/clientpositive/tunable_ndv.q.out 6ae54b4927 


Diff: https://reviews.apache.org/r/60753/diff/3/

Changes: https://reviews.apache.org/r/60753/diff/2-3/


Testing
-------


Thanks,

pengcheng xiong


Re: Review Request 60753: Add HLL as an alternative to FM sketch to compute stats

Posted by Andrew Sherman <as...@cloudera.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/60753/#review180428
-----------------------------------------------------------



Comment "Copyright 2017 Prasanth Jayachandran" in HyperLogLogUtils.java should probably be removed

- Andrew Sherman


On July 13, 2017, 12:12 a.m., pengcheng xiong wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/60753/
> -----------------------------------------------------------
> 
> (Updated July 13, 2017, 12:12 a.m.)
> 
> 
> Review request for hive, Ashutosh Chauhan and Prasanth_J.
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> HIVE-16966
> 
> 
> Diffs
> -----
> 
>   common/pom.xml 023f084511 
>   common/src/java/org/apache/hadoop/hive/common/ndv/NumDistinctValueEstimatorFactory.java PRE-CREATION 
>   common/src/java/org/apache/hadoop/hive/common/ndv/hll/HLLConstants.java PRE-CREATION 
>   common/src/java/org/apache/hadoop/hive/common/ndv/hll/HLLDenseRegister.java PRE-CREATION 
>   common/src/java/org/apache/hadoop/hive/common/ndv/hll/HLLRegister.java PRE-CREATION 
>   common/src/java/org/apache/hadoop/hive/common/ndv/hll/HLLSparseRegister.java PRE-CREATION 
>   common/src/java/org/apache/hadoop/hive/common/ndv/hll/HyperLogLog.java PRE-CREATION 
>   common/src/java/org/apache/hadoop/hive/common/ndv/hll/HyperLogLogUtils.java PRE-CREATION 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 5700fb9325 
>   metastore/src/java/org/apache/hadoop/hive/metastore/NumDistinctValueEstimator.java 92f9a845e3 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/StatsCache.java 18f8afc9ad 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/ColumnStatsAggregator.java 31955b4363 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/ColumnStatsAggregatorFactory.java daf85692eb 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/DecimalColumnStatsAggregator.java 36b2c9c56b 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/DoubleColumnStatsAggregator.java a88ef84e5c 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/LongColumnStatsAggregator.java 8ac6561aec 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/StringColumnStatsAggregator.java 2aa4046a46 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/ColumnStatsMerger.java 33c7e3e52c 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/ColumnStatsMergerFactory.java fe890e4e27 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/DateColumnStatsMerger.java 3179b23438 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/DecimalColumnStatsMerger.java c13add9d9c 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/DoubleColumnStatsMerger.java fbdba24b0a 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/LongColumnStatsMerger.java ac65590505 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/StringColumnStatsMerger.java 41587477d3 
>   metastore/src/test/org/apache/hadoop/hive/metastore/hbase/TestHBaseAggregateStatsExtrapolation.java 99ce96ca0d 
>   metastore/src/test/org/apache/hadoop/hive/metastore/hbase/TestHBaseAggregateStatsNDVUniformDist.java 87b1ac870d 
>   pom.xml 1b8963274e 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/ColumnStatsSemanticAnalyzer.java 0a5cf00c44 
>   ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUtils.java 76f7daeb1b 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/DecimalNumDistinctValueEstimator.java a05906edfa 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/DoubleNumDistinctValueEstimator.java e76fc74dbc 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFComputeStats.java 2ebfcb2360 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/LongNumDistinctValueEstimator.java 1c197a028a 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/NumDistinctValueEstimator.java fa70f49857 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/StringNumDistinctValueEstimator.java 601901c163 
>   ql/src/test/queries/clientpositive/compute_stats_date.q 09128f6fb9 
>   ql/src/test/queries/clientpositive/hll.q PRE-CREATION 
>   ql/src/test/results/clientpositive/alter_partition_update_status.q.out 922822e6d2 
>   ql/src/test/results/clientpositive/alter_table_column_stats.q.out 2cc7cbc7b6 
>   ql/src/test/results/clientpositive/alter_table_update_status.q.out e26e8cba1c 
>   ql/src/test/results/clientpositive/analyze_tbl_part.q.out ed90b6fc92 
>   ql/src/test/results/clientpositive/annotate_stats_deep_filters.q.out 95dd6abaec 
>   ql/src/test/results/clientpositive/annotate_stats_groupby.q.out a8e4854a00 
>   ql/src/test/results/clientpositive/annotate_stats_join_pkfk.q.out b4d46d28e9 
>   ql/src/test/results/clientpositive/autoColumnStats_4.q.out c3ad1920b5 
>   ql/src/test/results/clientpositive/autoColumnStats_5.q.out e19fb5f504 
>   ql/src/test/results/clientpositive/autoColumnStats_6.q.out 29b3373e10 
>   ql/src/test/results/clientpositive/autoColumnStats_7.q.out 9d24bc53ab 
>   ql/src/test/results/clientpositive/autoColumnStats_8.q.out 681d962ed0 
>   ql/src/test/results/clientpositive/autoColumnStats_9.q.out d26e2c02b7 
>   ql/src/test/results/clientpositive/auto_join_without_localtask.q.out 17a912ec13 
>   ql/src/test/results/clientpositive/avro_decimal.q.out 5a3b72defe 
>   ql/src/test/results/clientpositive/avro_decimal_native.q.out fe77512191 
>   ql/src/test/results/clientpositive/cbo_rp_annotate_stats_groupby.q.out f260f034b6 
>   ql/src/test/results/clientpositive/cbo_rp_join0.q.out ba96de2fe1 
>   ql/src/test/results/clientpositive/colstats_all_nulls.q.out 14c5d5b59b 
>   ql/src/test/results/clientpositive/column_pruner_multiple_children.q.out 96feeed49c 
>   ql/src/test/results/clientpositive/columnstats_partlvl.q.out 07d26e92bb 
>   ql/src/test/results/clientpositive/columnstats_partlvl_dp.q.out 468d2e797b 
>   ql/src/test/results/clientpositive/columnstats_quoting.q.out 52e35385a1 
>   ql/src/test/results/clientpositive/columnstats_tbllvl.q.out 462d4c1771 
>   ql/src/test/results/clientpositive/compute_stats_date.q.out c2472377a8 
>   ql/src/test/results/clientpositive/confirm_initial_tbl_stats.q.out faa14ba9c5 
>   ql/src/test/results/clientpositive/constant_prop_2.q.out 24be5188e2 
>   ql/src/test/results/clientpositive/correlated_join_keys.q.out ec5d008728 
>   ql/src/test/results/clientpositive/cross_join_merge.q.out f4956ded22 
>   ql/src/test/results/clientpositive/decimal_stats.q.out 5d86866e2a 
>   ql/src/test/results/clientpositive/describe_table.q.out 7869494252 
>   ql/src/test/results/clientpositive/display_colstats_tbllvl.q.out a4b18d7cec 
>   ql/src/test/results/clientpositive/exec_parallel_column_stats.q.out f256ec11bf 
>   ql/src/test/results/clientpositive/hll.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/infer_bucket_sort.q.out 1aea388815 
>   ql/src/test/results/clientpositive/join32.q.out a191284aca 
>   ql/src/test/results/clientpositive/join33.q.out a191284aca 
>   ql/src/test/results/clientpositive/join_parse.q.out 17733acdc3 
>   ql/src/test/results/clientpositive/llap/autoColumnStats_2.q.out ec209b2ef1 
>   ql/src/test/results/clientpositive/llap/auto_join1.q.out 6a0a1d5d09 
>   ql/src/test/results/clientpositive/llap/auto_join21.q.out 25cd67e7f5 
>   ql/src/test/results/clientpositive/llap/auto_join29.q.out f693ce4512 
>   ql/src/test/results/clientpositive/llap/auto_join30.q.out 91a80127a9 
>   ql/src/test/results/clientpositive/llap/cluster.q.out 2fa976b6d5 
>   ql/src/test/results/clientpositive/llap/column_table_stats.q.out fb04ee8cf9 
>   ql/src/test/results/clientpositive/llap/column_table_stats_orc.q.out d55cf30331 
>   ql/src/test/results/clientpositive/llap/columnstats_part_coltype.q.out dc50fb7fc1 
>   ql/src/test/results/clientpositive/llap/correlationoptimizer1.q.out b970dd6716 
>   ql/src/test/results/clientpositive/llap/correlationoptimizer2.q.out 6067c77f85 
>   ql/src/test/results/clientpositive/llap/correlationoptimizer3.q.out f1af97d98d 
>   ql/src/test/results/clientpositive/llap/correlationoptimizer6.q.out 9b71d3ec82 
>   ql/src/test/results/clientpositive/llap/count_dist_rewrite.q.out 6d411365b8 
>   ql/src/test/results/clientpositive/llap/cross_join.q.out ae3f9bf6f5 
>   ql/src/test/results/clientpositive/llap/dynamic_semijoin_reduction.q.out 88c4a17bad 
>   ql/src/test/results/clientpositive/llap/dynamic_semijoin_user_level.q.out eb638d57a0 
>   ql/src/test/results/clientpositive/llap/dynpart_sort_optimization2.q.out d4811d64d7 
>   ql/src/test/results/clientpositive/llap/explainuser_1.q.out bb42e45c2d 
>   ql/src/test/results/clientpositive/llap/explainuser_2.q.out f8a6526c67 
>   ql/src/test/results/clientpositive/llap/explainuser_4.q.out 99db828d6a 
>   ql/src/test/results/clientpositive/llap/extrapolate_part_stats_partial_ndv.q.out d97223c9d0 
>   ql/src/test/results/clientpositive/llap/filter_union.q.out 17f10dff47 
>   ql/src/test/results/clientpositive/llap/groupby1.q.out 0eecbb6f4e 
>   ql/src/test/results/clientpositive/llap/groupby2.q.out 29b85d1f44 
>   ql/src/test/results/clientpositive/llap/groupby_resolution.q.out f2a6ab05ac 
>   ql/src/test/results/clientpositive/llap/having.q.out 267254c0de 
>   ql/src/test/results/clientpositive/llap/hybridgrace_hashjoin_1.q.out 083bfc301c 
>   ql/src/test/results/clientpositive/llap/hybridgrace_hashjoin_2.q.out a59188a46a 
>   ql/src/test/results/clientpositive/llap/identity_project_remove_skip.q.out b03b96b463 
>   ql/src/test/results/clientpositive/llap/jdbc_handler.q.out b4feb0ee1b 
>   ql/src/test/results/clientpositive/llap/join1.q.out d79a405a41 
>   ql/src/test/results/clientpositive/llap/join32_lessSize.q.out c226eed126 
>   ql/src/test/results/clientpositive/llap/join_max_hashtable.q.out 85d45fe712 
>   ql/src/test/results/clientpositive/llap/limit_join_transpose.q.out 99d8d6f664 
>   ql/src/test/results/clientpositive/llap/limit_pushdown.q.out 93315e6b1b 
>   ql/src/test/results/clientpositive/llap/limit_pushdown3.q.out 351ee01b45 
>   ql/src/test/results/clientpositive/llap/llap_smb.q.out e3044baf52 
>   ql/src/test/results/clientpositive/llap/llap_stats.q.out f81ad50679 
>   ql/src/test/results/clientpositive/llap/llap_vector_nohybridgrace.q.out 0e9e1207cb 
>   ql/src/test/results/clientpositive/llap/llapdecider.q.out 69312cd6a2 
>   ql/src/test/results/clientpositive/llap/merge1.q.out 8021b67733 
>   ql/src/test/results/clientpositive/llap/merge2.q.out 7bcdd2d57e 
>   ql/src/test/results/clientpositive/llap/mergejoin.q.out 10fb45d284 
>   ql/src/test/results/clientpositive/llap/mrr.q.out fe477fd815 
>   ql/src/test/results/clientpositive/llap/multiMapJoin2.q.out 25378d3e8b 
>   ql/src/test/results/clientpositive/llap/offset_limit.q.out adfeb05448 
>   ql/src/test/results/clientpositive/llap/offset_limit_ppd_optimizer.q.out 48e58e1bd9 
>   ql/src/test/results/clientpositive/llap/parallel_colstats.q.out 95ed8b813a 
>   ql/src/test/results/clientpositive/llap/ptf.q.out fbaf1e6474 
>   ql/src/test/results/clientpositive/llap/ptf_streaming.q.out 6013c11c9e 
>   ql/src/test/results/clientpositive/llap/reduce_deduplicate_extended.q.out bc44db795a 
>   ql/src/test/results/clientpositive/llap/semijoin_hint.q.out 76c985e727 
>   ql/src/test/results/clientpositive/llap/skewjoin.q.out dc79b26020 
>   ql/src/test/results/clientpositive/llap/subquery_exists.q.out 0749872253 
>   ql/src/test/results/clientpositive/llap/subquery_in.q.out e401f31e52 
>   ql/src/test/results/clientpositive/llap/subquery_multi.q.out a876c620e3 
>   ql/src/test/results/clientpositive/llap/subquery_notin.q.out 018ef1db54 
>   ql/src/test/results/clientpositive/llap/subquery_scalar.q.out 3a0d1464c5 
>   ql/src/test/results/clientpositive/llap/subquery_select.q.out 703d19de05 
>   ql/src/test/results/clientpositive/llap/subquery_views.q.out b64e0f49c6 
>   ql/src/test/results/clientpositive/llap/sysdb.q.out fbbf8d9b7f 
>   ql/src/test/results/clientpositive/llap/tez_dml.q.out 786929e7af 
>   ql/src/test/results/clientpositive/llap/tez_dynpart_hashjoin_1.q.out 6dd3fbf6ca 
>   ql/src/test/results/clientpositive/llap/tez_dynpart_hashjoin_2.q.out f434a1e00b 
>   ql/src/test/results/clientpositive/llap/tez_dynpart_hashjoin_3.q.out 04e0f86fe2 
>   ql/src/test/results/clientpositive/llap/tez_join_hash.q.out 92a188e18a 
>   ql/src/test/results/clientpositive/llap/tez_join_tests.q.out b0eff1e1f4 
>   ql/src/test/results/clientpositive/llap/tez_joins_explain.q.out 418c23c16d 
>   ql/src/test/results/clientpositive/llap/tez_smb_main.q.out 66d7aeca70 
>   ql/src/test/results/clientpositive/llap/tez_union.q.out c72b232b35 
>   ql/src/test/results/clientpositive/llap/tez_union2.q.out 7b45c7c719 
>   ql/src/test/results/clientpositive/llap/tez_union_multiinsert.q.out 5e0d072095 
>   ql/src/test/results/clientpositive/llap/tez_vector_dynpart_hashjoin_1.q.out 3b47383803 
>   ql/src/test/results/clientpositive/llap/tez_vector_dynpart_hashjoin_2.q.out 6eba119ca0 
>   ql/src/test/results/clientpositive/llap/unionDistinct_1.q.out 602d57625c 
>   ql/src/test/results/clientpositive/llap/union_top_level.q.out 268e0413cd 
>   ql/src/test/results/clientpositive/llap/vector_groupby_mapjoin.q.out 16b716c4e5 
>   ql/src/test/results/clientpositive/llap/vector_left_outer_join.q.out 82111bb27f 
>   ql/src/test/results/clientpositive/llap/vector_mapjoin_reduce.q.out 8df5a64fee 
>   ql/src/test/results/clientpositive/llap/vector_outer_join1.q.out f64e7393d9 
>   ql/src/test/results/clientpositive/llap/vector_outer_join2.q.out c24a2d0e6e 
>   ql/src/test/results/clientpositive/llap/vectorization_0.q.out fba9c07350 
>   ql/src/test/results/clientpositive/llap/vectorization_8.q.out 0d5b6d53e0 
>   ql/src/test/results/clientpositive/llap/vectorization_short_regress.q.out 00577620d8 
>   ql/src/test/results/clientpositive/llap/vectorized_distinct_gby.q.out c3e5f7c90d 
>   ql/src/test/results/clientpositive/llap/vectorized_dynamic_semijoin_reduction.q.out 9a1c44c3e6 
>   ql/src/test/results/clientpositive/llap/vectorized_dynamic_semijoin_reduction2.q.out a03466f859 
>   ql/src/test/results/clientpositive/llap/vectorized_mapjoin.q.out e56800ad40 
>   ql/src/test/results/clientpositive/llap/vectorized_nested_mapjoin.q.out ed28530eec 
>   ql/src/test/results/clientpositive/llap/vectorized_shufflejoin.q.out a750d9fd01 
>   ql/src/test/results/clientpositive/llap/windowing_gby.q.out 945f8e0caf 
>   ql/src/test/results/clientpositive/parallel_colstats.q.out c85113137b 
>   ql/src/test/results/clientpositive/partial_column_stats.q.out 5876efacf3 
>   ql/src/test/results/clientpositive/partition_coltype_literals.q.out 3505556029 
>   ql/src/test/results/clientpositive/perf/query23.q.out ebd2271108 
>   ql/src/test/results/clientpositive/remove_exprs_stats.q.out 33cf90ae9d 
>   ql/src/test/results/clientpositive/rename_external_partition_location.q.out ec4076f908 
>   ql/src/test/results/clientpositive/rename_table_update_column_stats.q.out b3d6f039ac 
>   ql/src/test/results/clientpositive/temp_table_display_colstats_tbllvl.q.out bd024a7ab1 
>   ql/src/test/results/clientpositive/tez/explainanalyze_1.q.out 6602222ed7 
>   ql/src/test/results/clientpositive/tez/explainanalyze_2.q.out 0916565f0f 
>   ql/src/test/results/clientpositive/tez/explainanalyze_3.q.out e5c8d6c51e 
>   ql/src/test/results/clientpositive/tez/explainanalyze_4.q.out 14535f63da 
>   ql/src/test/results/clientpositive/tez/explainanalyze_5.q.out b35e294813 
>   ql/src/test/results/clientpositive/tez/explainuser_3.q.out 65c9114b20 
>   ql/src/test/results/clientpositive/tez/hybridgrace_hashjoin_1.q.out 0a1e039cf1 
>   ql/src/test/results/clientpositive/tez/hybridgrace_hashjoin_2.q.out 6f5a3a96ca 
>   ql/src/test/results/clientpositive/tez/vectorization_limit.q.out afcae8c34c 
>   ql/src/test/results/clientpositive/tunable_ndv.q.out 6ae54b4927 
> 
> 
> Diff: https://reviews.apache.org/r/60753/diff/2/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> pengcheng xiong
> 
>


Re: Review Request 60753: Add HLL as an alternative to FM sketch to compute stats

Posted by pengcheng xiong <px...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/60753/
-----------------------------------------------------------

(Updated July 13, 2017, 12:12 a.m.)


Review request for hive, Ashutosh Chauhan and Prasanth_J.


Repository: hive-git


Description
-------

HIVE-16966


Diffs (updated)
-----

  common/pom.xml 023f084511 
  common/src/java/org/apache/hadoop/hive/common/ndv/NumDistinctValueEstimatorFactory.java PRE-CREATION 
  common/src/java/org/apache/hadoop/hive/common/ndv/hll/HLLConstants.java PRE-CREATION 
  common/src/java/org/apache/hadoop/hive/common/ndv/hll/HLLDenseRegister.java PRE-CREATION 
  common/src/java/org/apache/hadoop/hive/common/ndv/hll/HLLRegister.java PRE-CREATION 
  common/src/java/org/apache/hadoop/hive/common/ndv/hll/HLLSparseRegister.java PRE-CREATION 
  common/src/java/org/apache/hadoop/hive/common/ndv/hll/HyperLogLog.java PRE-CREATION 
  common/src/java/org/apache/hadoop/hive/common/ndv/hll/HyperLogLogUtils.java PRE-CREATION 
  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 5700fb9325 
  metastore/src/java/org/apache/hadoop/hive/metastore/NumDistinctValueEstimator.java 92f9a845e3 
  metastore/src/java/org/apache/hadoop/hive/metastore/hbase/StatsCache.java 18f8afc9ad 
  metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/ColumnStatsAggregator.java 31955b4363 
  metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/ColumnStatsAggregatorFactory.java daf85692eb 
  metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/DecimalColumnStatsAggregator.java 36b2c9c56b 
  metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/DoubleColumnStatsAggregator.java a88ef84e5c 
  metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/LongColumnStatsAggregator.java 8ac6561aec 
  metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/StringColumnStatsAggregator.java 2aa4046a46 
  metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/ColumnStatsMerger.java 33c7e3e52c 
  metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/ColumnStatsMergerFactory.java fe890e4e27 
  metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/DateColumnStatsMerger.java 3179b23438 
  metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/DecimalColumnStatsMerger.java c13add9d9c 
  metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/DoubleColumnStatsMerger.java fbdba24b0a 
  metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/LongColumnStatsMerger.java ac65590505 
  metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/StringColumnStatsMerger.java 41587477d3 
  metastore/src/test/org/apache/hadoop/hive/metastore/hbase/TestHBaseAggregateStatsExtrapolation.java 99ce96ca0d 
  metastore/src/test/org/apache/hadoop/hive/metastore/hbase/TestHBaseAggregateStatsNDVUniformDist.java 87b1ac870d 
  pom.xml 1b8963274e 
  ql/src/java/org/apache/hadoop/hive/ql/parse/ColumnStatsSemanticAnalyzer.java 0a5cf00c44 
  ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUtils.java 76f7daeb1b 
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/DecimalNumDistinctValueEstimator.java a05906edfa 
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/DoubleNumDistinctValueEstimator.java e76fc74dbc 
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFComputeStats.java 2ebfcb2360 
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/LongNumDistinctValueEstimator.java 1c197a028a 
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/NumDistinctValueEstimator.java fa70f49857 
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/StringNumDistinctValueEstimator.java 601901c163 
  ql/src/test/queries/clientpositive/compute_stats_date.q 09128f6fb9 
  ql/src/test/queries/clientpositive/hll.q PRE-CREATION 
  ql/src/test/results/clientpositive/alter_partition_update_status.q.out 922822e6d2 
  ql/src/test/results/clientpositive/alter_table_column_stats.q.out 2cc7cbc7b6 
  ql/src/test/results/clientpositive/alter_table_update_status.q.out e26e8cba1c 
  ql/src/test/results/clientpositive/analyze_tbl_part.q.out ed90b6fc92 
  ql/src/test/results/clientpositive/annotate_stats_deep_filters.q.out 95dd6abaec 
  ql/src/test/results/clientpositive/annotate_stats_groupby.q.out a8e4854a00 
  ql/src/test/results/clientpositive/annotate_stats_join_pkfk.q.out b4d46d28e9 
  ql/src/test/results/clientpositive/autoColumnStats_4.q.out c3ad1920b5 
  ql/src/test/results/clientpositive/autoColumnStats_5.q.out e19fb5f504 
  ql/src/test/results/clientpositive/autoColumnStats_6.q.out 29b3373e10 
  ql/src/test/results/clientpositive/autoColumnStats_7.q.out 9d24bc53ab 
  ql/src/test/results/clientpositive/autoColumnStats_8.q.out 681d962ed0 
  ql/src/test/results/clientpositive/autoColumnStats_9.q.out d26e2c02b7 
  ql/src/test/results/clientpositive/auto_join_without_localtask.q.out 17a912ec13 
  ql/src/test/results/clientpositive/avro_decimal.q.out 5a3b72defe 
  ql/src/test/results/clientpositive/avro_decimal_native.q.out fe77512191 
  ql/src/test/results/clientpositive/cbo_rp_annotate_stats_groupby.q.out f260f034b6 
  ql/src/test/results/clientpositive/cbo_rp_join0.q.out ba96de2fe1 
  ql/src/test/results/clientpositive/colstats_all_nulls.q.out 14c5d5b59b 
  ql/src/test/results/clientpositive/column_pruner_multiple_children.q.out 96feeed49c 
  ql/src/test/results/clientpositive/columnstats_partlvl.q.out 07d26e92bb 
  ql/src/test/results/clientpositive/columnstats_partlvl_dp.q.out 468d2e797b 
  ql/src/test/results/clientpositive/columnstats_quoting.q.out 52e35385a1 
  ql/src/test/results/clientpositive/columnstats_tbllvl.q.out 462d4c1771 
  ql/src/test/results/clientpositive/compute_stats_date.q.out c2472377a8 
  ql/src/test/results/clientpositive/confirm_initial_tbl_stats.q.out faa14ba9c5 
  ql/src/test/results/clientpositive/constant_prop_2.q.out 24be5188e2 
  ql/src/test/results/clientpositive/correlated_join_keys.q.out ec5d008728 
  ql/src/test/results/clientpositive/cross_join_merge.q.out f4956ded22 
  ql/src/test/results/clientpositive/decimal_stats.q.out 5d86866e2a 
  ql/src/test/results/clientpositive/describe_table.q.out 7869494252 
  ql/src/test/results/clientpositive/display_colstats_tbllvl.q.out a4b18d7cec 
  ql/src/test/results/clientpositive/exec_parallel_column_stats.q.out f256ec11bf 
  ql/src/test/results/clientpositive/hll.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/infer_bucket_sort.q.out 1aea388815 
  ql/src/test/results/clientpositive/join32.q.out a191284aca 
  ql/src/test/results/clientpositive/join33.q.out a191284aca 
  ql/src/test/results/clientpositive/join_parse.q.out 17733acdc3 
  ql/src/test/results/clientpositive/llap/autoColumnStats_2.q.out ec209b2ef1 
  ql/src/test/results/clientpositive/llap/auto_join1.q.out 6a0a1d5d09 
  ql/src/test/results/clientpositive/llap/auto_join21.q.out 25cd67e7f5 
  ql/src/test/results/clientpositive/llap/auto_join29.q.out f693ce4512 
  ql/src/test/results/clientpositive/llap/auto_join30.q.out 91a80127a9 
  ql/src/test/results/clientpositive/llap/cluster.q.out 2fa976b6d5 
  ql/src/test/results/clientpositive/llap/column_table_stats.q.out fb04ee8cf9 
  ql/src/test/results/clientpositive/llap/column_table_stats_orc.q.out d55cf30331 
  ql/src/test/results/clientpositive/llap/columnstats_part_coltype.q.out dc50fb7fc1 
  ql/src/test/results/clientpositive/llap/correlationoptimizer1.q.out b970dd6716 
  ql/src/test/results/clientpositive/llap/correlationoptimizer2.q.out 6067c77f85 
  ql/src/test/results/clientpositive/llap/correlationoptimizer3.q.out f1af97d98d 
  ql/src/test/results/clientpositive/llap/correlationoptimizer6.q.out 9b71d3ec82 
  ql/src/test/results/clientpositive/llap/count_dist_rewrite.q.out 6d411365b8 
  ql/src/test/results/clientpositive/llap/cross_join.q.out ae3f9bf6f5 
  ql/src/test/results/clientpositive/llap/dynamic_semijoin_reduction.q.out 88c4a17bad 
  ql/src/test/results/clientpositive/llap/dynamic_semijoin_user_level.q.out eb638d57a0 
  ql/src/test/results/clientpositive/llap/dynpart_sort_optimization2.q.out d4811d64d7 
  ql/src/test/results/clientpositive/llap/explainuser_1.q.out bb42e45c2d 
  ql/src/test/results/clientpositive/llap/explainuser_2.q.out f8a6526c67 
  ql/src/test/results/clientpositive/llap/explainuser_4.q.out 99db828d6a 
  ql/src/test/results/clientpositive/llap/extrapolate_part_stats_partial_ndv.q.out d97223c9d0 
  ql/src/test/results/clientpositive/llap/filter_union.q.out 17f10dff47 
  ql/src/test/results/clientpositive/llap/groupby1.q.out 0eecbb6f4e 
  ql/src/test/results/clientpositive/llap/groupby2.q.out 29b85d1f44 
  ql/src/test/results/clientpositive/llap/groupby_resolution.q.out f2a6ab05ac 
  ql/src/test/results/clientpositive/llap/having.q.out 267254c0de 
  ql/src/test/results/clientpositive/llap/hybridgrace_hashjoin_1.q.out 083bfc301c 
  ql/src/test/results/clientpositive/llap/hybridgrace_hashjoin_2.q.out a59188a46a 
  ql/src/test/results/clientpositive/llap/identity_project_remove_skip.q.out b03b96b463 
  ql/src/test/results/clientpositive/llap/jdbc_handler.q.out b4feb0ee1b 
  ql/src/test/results/clientpositive/llap/join1.q.out d79a405a41 
  ql/src/test/results/clientpositive/llap/join32_lessSize.q.out c226eed126 
  ql/src/test/results/clientpositive/llap/join_max_hashtable.q.out 85d45fe712 
  ql/src/test/results/clientpositive/llap/limit_join_transpose.q.out 99d8d6f664 
  ql/src/test/results/clientpositive/llap/limit_pushdown.q.out 93315e6b1b 
  ql/src/test/results/clientpositive/llap/limit_pushdown3.q.out 351ee01b45 
  ql/src/test/results/clientpositive/llap/llap_smb.q.out e3044baf52 
  ql/src/test/results/clientpositive/llap/llap_stats.q.out f81ad50679 
  ql/src/test/results/clientpositive/llap/llap_vector_nohybridgrace.q.out 0e9e1207cb 
  ql/src/test/results/clientpositive/llap/llapdecider.q.out 69312cd6a2 
  ql/src/test/results/clientpositive/llap/merge1.q.out 8021b67733 
  ql/src/test/results/clientpositive/llap/merge2.q.out 7bcdd2d57e 
  ql/src/test/results/clientpositive/llap/mergejoin.q.out 10fb45d284 
  ql/src/test/results/clientpositive/llap/mrr.q.out fe477fd815 
  ql/src/test/results/clientpositive/llap/multiMapJoin2.q.out 25378d3e8b 
  ql/src/test/results/clientpositive/llap/offset_limit.q.out adfeb05448 
  ql/src/test/results/clientpositive/llap/offset_limit_ppd_optimizer.q.out 48e58e1bd9 
  ql/src/test/results/clientpositive/llap/parallel_colstats.q.out 95ed8b813a 
  ql/src/test/results/clientpositive/llap/ptf.q.out fbaf1e6474 
  ql/src/test/results/clientpositive/llap/ptf_streaming.q.out 6013c11c9e 
  ql/src/test/results/clientpositive/llap/reduce_deduplicate_extended.q.out bc44db795a 
  ql/src/test/results/clientpositive/llap/semijoin_hint.q.out 76c985e727 
  ql/src/test/results/clientpositive/llap/skewjoin.q.out dc79b26020 
  ql/src/test/results/clientpositive/llap/subquery_exists.q.out 0749872253 
  ql/src/test/results/clientpositive/llap/subquery_in.q.out e401f31e52 
  ql/src/test/results/clientpositive/llap/subquery_multi.q.out a876c620e3 
  ql/src/test/results/clientpositive/llap/subquery_notin.q.out 018ef1db54 
  ql/src/test/results/clientpositive/llap/subquery_scalar.q.out 3a0d1464c5 
  ql/src/test/results/clientpositive/llap/subquery_select.q.out 703d19de05 
  ql/src/test/results/clientpositive/llap/subquery_views.q.out b64e0f49c6 
  ql/src/test/results/clientpositive/llap/sysdb.q.out fbbf8d9b7f 
  ql/src/test/results/clientpositive/llap/tez_dml.q.out 786929e7af 
  ql/src/test/results/clientpositive/llap/tez_dynpart_hashjoin_1.q.out 6dd3fbf6ca 
  ql/src/test/results/clientpositive/llap/tez_dynpart_hashjoin_2.q.out f434a1e00b 
  ql/src/test/results/clientpositive/llap/tez_dynpart_hashjoin_3.q.out 04e0f86fe2 
  ql/src/test/results/clientpositive/llap/tez_join_hash.q.out 92a188e18a 
  ql/src/test/results/clientpositive/llap/tez_join_tests.q.out b0eff1e1f4 
  ql/src/test/results/clientpositive/llap/tez_joins_explain.q.out 418c23c16d 
  ql/src/test/results/clientpositive/llap/tez_smb_main.q.out 66d7aeca70 
  ql/src/test/results/clientpositive/llap/tez_union.q.out c72b232b35 
  ql/src/test/results/clientpositive/llap/tez_union2.q.out 7b45c7c719 
  ql/src/test/results/clientpositive/llap/tez_union_multiinsert.q.out 5e0d072095 
  ql/src/test/results/clientpositive/llap/tez_vector_dynpart_hashjoin_1.q.out 3b47383803 
  ql/src/test/results/clientpositive/llap/tez_vector_dynpart_hashjoin_2.q.out 6eba119ca0 
  ql/src/test/results/clientpositive/llap/unionDistinct_1.q.out 602d57625c 
  ql/src/test/results/clientpositive/llap/union_top_level.q.out 268e0413cd 
  ql/src/test/results/clientpositive/llap/vector_groupby_mapjoin.q.out 16b716c4e5 
  ql/src/test/results/clientpositive/llap/vector_left_outer_join.q.out 82111bb27f 
  ql/src/test/results/clientpositive/llap/vector_mapjoin_reduce.q.out 8df5a64fee 
  ql/src/test/results/clientpositive/llap/vector_outer_join1.q.out f64e7393d9 
  ql/src/test/results/clientpositive/llap/vector_outer_join2.q.out c24a2d0e6e 
  ql/src/test/results/clientpositive/llap/vectorization_0.q.out fba9c07350 
  ql/src/test/results/clientpositive/llap/vectorization_8.q.out 0d5b6d53e0 
  ql/src/test/results/clientpositive/llap/vectorization_short_regress.q.out 00577620d8 
  ql/src/test/results/clientpositive/llap/vectorized_distinct_gby.q.out c3e5f7c90d 
  ql/src/test/results/clientpositive/llap/vectorized_dynamic_semijoin_reduction.q.out 9a1c44c3e6 
  ql/src/test/results/clientpositive/llap/vectorized_dynamic_semijoin_reduction2.q.out a03466f859 
  ql/src/test/results/clientpositive/llap/vectorized_mapjoin.q.out e56800ad40 
  ql/src/test/results/clientpositive/llap/vectorized_nested_mapjoin.q.out ed28530eec 
  ql/src/test/results/clientpositive/llap/vectorized_shufflejoin.q.out a750d9fd01 
  ql/src/test/results/clientpositive/llap/windowing_gby.q.out 945f8e0caf 
  ql/src/test/results/clientpositive/parallel_colstats.q.out c85113137b 
  ql/src/test/results/clientpositive/partial_column_stats.q.out 5876efacf3 
  ql/src/test/results/clientpositive/partition_coltype_literals.q.out 3505556029 
  ql/src/test/results/clientpositive/perf/query23.q.out ebd2271108 
  ql/src/test/results/clientpositive/remove_exprs_stats.q.out 33cf90ae9d 
  ql/src/test/results/clientpositive/rename_external_partition_location.q.out ec4076f908 
  ql/src/test/results/clientpositive/rename_table_update_column_stats.q.out b3d6f039ac 
  ql/src/test/results/clientpositive/temp_table_display_colstats_tbllvl.q.out bd024a7ab1 
  ql/src/test/results/clientpositive/tez/explainanalyze_1.q.out 6602222ed7 
  ql/src/test/results/clientpositive/tez/explainanalyze_2.q.out 0916565f0f 
  ql/src/test/results/clientpositive/tez/explainanalyze_3.q.out e5c8d6c51e 
  ql/src/test/results/clientpositive/tez/explainanalyze_4.q.out 14535f63da 
  ql/src/test/results/clientpositive/tez/explainanalyze_5.q.out b35e294813 
  ql/src/test/results/clientpositive/tez/explainuser_3.q.out 65c9114b20 
  ql/src/test/results/clientpositive/tez/hybridgrace_hashjoin_1.q.out 0a1e039cf1 
  ql/src/test/results/clientpositive/tez/hybridgrace_hashjoin_2.q.out 6f5a3a96ca 
  ql/src/test/results/clientpositive/tez/vectorization_limit.q.out afcae8c34c 
  ql/src/test/results/clientpositive/tunable_ndv.q.out 6ae54b4927 


Diff: https://reviews.apache.org/r/60753/diff/2/

Changes: https://reviews.apache.org/r/60753/diff/1-2/


Testing
-------


Thanks,

pengcheng xiong