You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by Navis Ryu <na...@nexr.com> on 2014/05/24 09:44:04 UTC

Review Request 21886: Column stats : LOW_VALUE (or HIGH_VALUE) will always be 0.0000 , if all the column values larger than 0.0 (or if all column values smaller than 0.0)

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/21886/
-----------------------------------------------------------

Review request for hive.


Bugs: HIVE-4561
    https://issues.apache.org/jira/browse/HIVE-4561


Repository: hive-git


Description
-------

if all column values larger than 0.0  DOUBLE_LOW_VALUE always will be 0.0 
or  if all column values less than 0.0,  DOUBLE_HIGH_VALUE will always be 

hive (default)> create table src_test (price double);
hive (default)> load data local inpath './test.txt' into table src_test;
hive (default)> select * from src_test;
OK
1.0
2.0
3.0
Time taken: 0.313 seconds, Fetched: 3 row(s)
hive (default)> analyze table src_test compute statistics for columns price;

mysql> select * from TAB_COL_STATS \G;
                 CS_ID: 16
               DB_NAME: default
            TABLE_NAME: src_test
           COLUMN_NAME: price
           COLUMN_TYPE: double
                TBL_ID: 2586
        LONG_LOW_VALUE: 0
       LONG_HIGH_VALUE: 0
      DOUBLE_LOW_VALUE: 0.0000   # Wrong Result ! Expected is 1.0000
     DOUBLE_HIGH_VALUE: 3.0000
 BIG_DECIMAL_LOW_VALUE: NULL
BIG_DECIMAL_HIGH_VALUE: NULL
             NUM_NULLS: 0
         NUM_DISTINCTS: 1
           AVG_COL_LEN: 0.0000
           MAX_COL_LEN: 0
             NUM_TRUES: 0
            NUM_FALSES: 0
         LAST_ANALYZED: 1368596151
2 rows in set (0.00 sec)


Diffs
-----

  metastore/if/hive_metastore.thrift eef1b80 
  metastore/src/gen/thrift/gen-cpp/hive_metastore_types.h 43869c2 
  metastore/src/gen/thrift/gen-cpp/hive_metastore_types.cpp 9e440bb 
  metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/DecimalColumnStatsData.java 5661252 
  metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/DoubleColumnStatsData.java d3f3f68 
  metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/LongColumnStatsData.java 2cf4380 
  metastore/src/gen/thrift/gen-py/hive_metastore/ttypes.py c4b583b 
  metastore/src/gen/thrift/gen-rb/hive_metastore_types.rb 79b7a1a 
  metastore/src/java/org/apache/hadoop/hive/metastore/StatObjectConverter.java dc0e266 
  metastore/src/model/org/apache/hadoop/hive/metastore/model/MPartitionColumnStatistics.java f61cdf0 
  metastore/src/model/org/apache/hadoop/hive/metastore/model/MTableColumnStatistics.java 85f6427 
  ql/src/java/org/apache/hadoop/hive/ql/exec/ColumnStatsTask.java 3dc02f0 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/StatsOptimizer.java ee4d56c 
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFComputeStats.java 3b063eb 
  ql/src/test/queries/clientpositive/metadata_only_queries.q b549a56 
  ql/src/test/results/clientpositive/compute_stats_empty_table.q.out 50d6c8d 
  ql/src/test/results/clientpositive/compute_stats_long.q.out 2f5cbdd 
  ql/src/test/results/clientpositive/metadata_only_queries.q.out 531ea41 

Diff: https://reviews.apache.org/r/21886/diff/


Testing
-------


Thanks,

Navis Ryu


Re: Review Request 21886: Column stats : LOW_VALUE (or HIGH_VALUE) will always be 0.0000 , if all the column values larger than 0.0 (or if all column values smaller than 0.0)

Posted by Zhuoluo Yang <zh...@taobao.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/21886/#review44095
-----------------------------------------------------------

Ship it!


Thanks, Looks good to me!

- Zhuoluo Yang


On May 28, 2014, 5:45 a.m., Navis Ryu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/21886/
> -----------------------------------------------------------
> 
> (Updated May 28, 2014, 5:45 a.m.)
> 
> 
> Review request for hive.
> 
> 
> Bugs: HIVE-4561
>     https://issues.apache.org/jira/browse/HIVE-4561
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> if all column values larger than 0.0  DOUBLE_LOW_VALUE always will be 0.0 
> or  if all column values less than 0.0,  DOUBLE_HIGH_VALUE will always be 
> 
> hive (default)> create table src_test (price double);
> hive (default)> load data local inpath './test.txt' into table src_test;
> hive (default)> select * from src_test;
> OK
> 1.0
> 2.0
> 3.0
> Time taken: 0.313 seconds, Fetched: 3 row(s)
> hive (default)> analyze table src_test compute statistics for columns price;
> 
> mysql> select * from TAB_COL_STATS \G;
>                  CS_ID: 16
>                DB_NAME: default
>             TABLE_NAME: src_test
>            COLUMN_NAME: price
>            COLUMN_TYPE: double
>                 TBL_ID: 2586
>         LONG_LOW_VALUE: 0
>        LONG_HIGH_VALUE: 0
>       DOUBLE_LOW_VALUE: 0.0000   # Wrong Result ! Expected is 1.0000
>      DOUBLE_HIGH_VALUE: 3.0000
>  BIG_DECIMAL_LOW_VALUE: NULL
> BIG_DECIMAL_HIGH_VALUE: NULL
>              NUM_NULLS: 0
>          NUM_DISTINCTS: 1
>            AVG_COL_LEN: 0.0000
>            MAX_COL_LEN: 0
>              NUM_TRUES: 0
>             NUM_FALSES: 0
>          LAST_ANALYZED: 1368596151
> 2 rows in set (0.00 sec)
> 
> 
> Diffs
> -----
> 
>   metastore/if/hive_metastore.thrift eef1b80 
>   metastore/src/gen/thrift/gen-cpp/hive_metastore_types.h 43869c2 
>   metastore/src/gen/thrift/gen-cpp/hive_metastore_types.cpp 9e440bb 
>   metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/DecimalColumnStatsData.java 5661252 
>   metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/DoubleColumnStatsData.java d3f3f68 
>   metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/LongColumnStatsData.java 2cf4380 
>   metastore/src/gen/thrift/gen-py/hive_metastore/ttypes.py c4b583b 
>   metastore/src/gen/thrift/gen-rb/hive_metastore_types.rb 79b7a1a 
>   metastore/src/java/org/apache/hadoop/hive/metastore/StatObjectConverter.java dc0e266 
>   metastore/src/model/org/apache/hadoop/hive/metastore/model/MPartitionColumnStatistics.java f61cdf0 
>   metastore/src/model/org/apache/hadoop/hive/metastore/model/MTableColumnStatistics.java 85f6427 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/ColumnStatsTask.java 3dc02f0 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/StatsOptimizer.java ee4d56c 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFComputeStats.java 3b063eb 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/NumDistinctValueEstimator.java 24159b8 
>   ql/src/test/queries/clientpositive/metadata_only_queries.q b549a56 
>   ql/src/test/results/clientpositive/compute_stats_empty_table.q.out 50d6c8d 
>   ql/src/test/results/clientpositive/compute_stats_long.q.out 2f5cbdd 
>   ql/src/test/results/clientpositive/metadata_only_queries.q.out 531ea41 
>   ql/src/test/results/clientpositive/metadata_only_queries_with_filters.q.out c8e2c0c 
> 
> Diff: https://reviews.apache.org/r/21886/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Navis Ryu
> 
>


Re: Review Request 21886: Column stats : LOW_VALUE (or HIGH_VALUE) will always be 0.0000 , if all the column values larger than 0.0 (or if all column values smaller than 0.0)

Posted by Navis Ryu <na...@nexr.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/21886/
-----------------------------------------------------------

(Updated May 28, 2014, 5:45 a.m.)


Review request for hive.


Changes
-------

Fixed test fails & Refactoring


Bugs: HIVE-4561
    https://issues.apache.org/jira/browse/HIVE-4561


Repository: hive-git


Description
-------

if all column values larger than 0.0  DOUBLE_LOW_VALUE always will be 0.0 
or  if all column values less than 0.0,  DOUBLE_HIGH_VALUE will always be 

hive (default)> create table src_test (price double);
hive (default)> load data local inpath './test.txt' into table src_test;
hive (default)> select * from src_test;
OK
1.0
2.0
3.0
Time taken: 0.313 seconds, Fetched: 3 row(s)
hive (default)> analyze table src_test compute statistics for columns price;

mysql> select * from TAB_COL_STATS \G;
                 CS_ID: 16
               DB_NAME: default
            TABLE_NAME: src_test
           COLUMN_NAME: price
           COLUMN_TYPE: double
                TBL_ID: 2586
        LONG_LOW_VALUE: 0
       LONG_HIGH_VALUE: 0
      DOUBLE_LOW_VALUE: 0.0000   # Wrong Result ! Expected is 1.0000
     DOUBLE_HIGH_VALUE: 3.0000
 BIG_DECIMAL_LOW_VALUE: NULL
BIG_DECIMAL_HIGH_VALUE: NULL
             NUM_NULLS: 0
         NUM_DISTINCTS: 1
           AVG_COL_LEN: 0.0000
           MAX_COL_LEN: 0
             NUM_TRUES: 0
            NUM_FALSES: 0
         LAST_ANALYZED: 1368596151
2 rows in set (0.00 sec)


Diffs (updated)
-----

  metastore/if/hive_metastore.thrift eef1b80 
  metastore/src/gen/thrift/gen-cpp/hive_metastore_types.h 43869c2 
  metastore/src/gen/thrift/gen-cpp/hive_metastore_types.cpp 9e440bb 
  metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/DecimalColumnStatsData.java 5661252 
  metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/DoubleColumnStatsData.java d3f3f68 
  metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/LongColumnStatsData.java 2cf4380 
  metastore/src/gen/thrift/gen-py/hive_metastore/ttypes.py c4b583b 
  metastore/src/gen/thrift/gen-rb/hive_metastore_types.rb 79b7a1a 
  metastore/src/java/org/apache/hadoop/hive/metastore/StatObjectConverter.java dc0e266 
  metastore/src/model/org/apache/hadoop/hive/metastore/model/MPartitionColumnStatistics.java f61cdf0 
  metastore/src/model/org/apache/hadoop/hive/metastore/model/MTableColumnStatistics.java 85f6427 
  ql/src/java/org/apache/hadoop/hive/ql/exec/ColumnStatsTask.java 3dc02f0 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/StatsOptimizer.java ee4d56c 
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFComputeStats.java 3b063eb 
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/NumDistinctValueEstimator.java 24159b8 
  ql/src/test/queries/clientpositive/metadata_only_queries.q b549a56 
  ql/src/test/results/clientpositive/compute_stats_empty_table.q.out 50d6c8d 
  ql/src/test/results/clientpositive/compute_stats_long.q.out 2f5cbdd 
  ql/src/test/results/clientpositive/metadata_only_queries.q.out 531ea41 
  ql/src/test/results/clientpositive/metadata_only_queries_with_filters.q.out c8e2c0c 

Diff: https://reviews.apache.org/r/21886/diff/


Testing
-------


Thanks,

Navis Ryu