You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by Jason Dere <jd...@hortonworks.com> on 2015/04/06 23:01:50 UTC

Review Request 32901: HIVE-10226 Column stats for Date columns not supported

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/32901/
-----------------------------------------------------------

Review request for hive, Ashutosh Chauhan and Prasanth_J.


Bugs: HIVE-10226
    https://issues.apache.org/jira/browse/HIVE-10226


Repository: hive-git


Description
-------

Re-use the long stats for Date column stats, using the days since epoch value as the long value.


Diffs
-----

  metastore/src/java/org/apache/hadoop/hive/metastore/StatObjectConverter.java 475883b 
  ql/src/java/org/apache/hadoop/hive/ql/exec/ColumnStatsTask.java 0c46b00 
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFComputeStats.java 363039b 
  ql/src/test/queries/clientpositive/compute_stats_date.q PRE-CREATION 
  ql/src/test/results/clientpositive/compute_stats_date.q.out PRE-CREATION 

Diff: https://reviews.apache.org/r/32901/diff/


Testing
-------


Thanks,

Jason Dere


Re: Review Request 32901: HIVE-10226 Column stats for Date columns not supported

Posted by Jason Dere <jd...@hortonworks.com>.

> On April 7, 2015, 4:19 a.m., Swarnim Kulkarni wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFComputeStats.java, line 1347
> > <https://reviews.apache.org/r/32901/diff/1/?file=918312#file918312line1347>
> >
> >     Nit: Else not needed.

The new patch will look a bit different, so this will not be needed.


> On April 7, 2015, 4:19 a.m., Swarnim Kulkarni wrote:
> > ql/src/test/results/clientpositive/compute_stats_date.q.out, line 110
> > <https://reviews.apache.org/r/32901/diff/1/?file=918314#file918314line110>
> >
> >     Getting rid of the tabs here would be nice.

This is query output generated by Hive, the tabs are expected here as column output delimiters.


> On April 7, 2015, 4:19 a.m., Swarnim Kulkarni wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFComputeStats.java, line 1341
> > <https://reviews.apache.org/r/32901/diff/1/?file=918312#file918312line1341>
> >
> >     Would it be a little safer here to assert that "parameters" has atleast 2 values in it so that we do not fail with an ArrayIndexOutOfBoundsException?

GenericUDAFComputeStats.getEvaluator(), which instantiates the various StatsEvaluators, is already doing the initial checking of the array size. The other StatsEvaluators for the other types are relying on this as well.


- Jason


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/32901/#review79119
-----------------------------------------------------------


On April 6, 2015, 9:01 p.m., Jason Dere wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/32901/
> -----------------------------------------------------------
> 
> (Updated April 6, 2015, 9:01 p.m.)
> 
> 
> Review request for hive, Ashutosh Chauhan and Prasanth_J.
> 
> 
> Bugs: HIVE-10226
>     https://issues.apache.org/jira/browse/HIVE-10226
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Re-use the long stats for Date column stats, using the days since epoch value as the long value.
> 
> 
> Diffs
> -----
> 
>   metastore/src/java/org/apache/hadoop/hive/metastore/StatObjectConverter.java 475883b 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/ColumnStatsTask.java 0c46b00 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFComputeStats.java 363039b 
>   ql/src/test/queries/clientpositive/compute_stats_date.q PRE-CREATION 
>   ql/src/test/results/clientpositive/compute_stats_date.q.out PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/32901/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Jason Dere
> 
>


Re: Review Request 32901: HIVE-10226 Column stats for Date columns not supported

Posted by Swarnim Kulkarni <ku...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/32901/#review79119
-----------------------------------------------------------



ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFComputeStats.java
<https://reviews.apache.org/r/32901/#comment128230>

    Nit: Add javadoc here.



ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFComputeStats.java
<https://reviews.apache.org/r/32901/#comment128231>

    Would it be a little safer here to assert that "parameters" has atleast 2 values in it so that we do not fail with an ArrayIndexOutOfBoundsException?



ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFComputeStats.java
<https://reviews.apache.org/r/32901/#comment128229>

    Nit: Else not needed.



ql/src/test/results/clientpositive/compute_stats_date.q.out
<https://reviews.apache.org/r/32901/#comment128232>

    Getting rid of the tabs here would be nice.


- Swarnim Kulkarni


On April 6, 2015, 9:01 p.m., Jason Dere wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/32901/
> -----------------------------------------------------------
> 
> (Updated April 6, 2015, 9:01 p.m.)
> 
> 
> Review request for hive, Ashutosh Chauhan and Prasanth_J.
> 
> 
> Bugs: HIVE-10226
>     https://issues.apache.org/jira/browse/HIVE-10226
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Re-use the long stats for Date column stats, using the days since epoch value as the long value.
> 
> 
> Diffs
> -----
> 
>   metastore/src/java/org/apache/hadoop/hive/metastore/StatObjectConverter.java 475883b 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/ColumnStatsTask.java 0c46b00 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFComputeStats.java 363039b 
>   ql/src/test/queries/clientpositive/compute_stats_date.q PRE-CREATION 
>   ql/src/test/results/clientpositive/compute_stats_date.q.out PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/32901/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Jason Dere
> 
>


Re: Review Request 32901: HIVE-10226 Column stats for Date columns not supported

Posted by Jason Dere <jd...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/32901/
-----------------------------------------------------------

(Updated April 8, 2015, 5:56 p.m.)


Review request for hive, Ashutosh Chauhan and Prasanth_J.


Changes
-------

previous patch did not have new changes, this one has changes from Swarnim's comments


Bugs: HIVE-10226
    https://issues.apache.org/jira/browse/HIVE-10226


Repository: hive-git


Description
-------

Re-use the long stats for Date column stats, using the days since epoch value as the long value.


Diffs (updated)
-----

  metastore/if/hive_metastore.thrift 57bce0c 
  metastore/src/model/org/apache/hadoop/hive/metastore/model/MPartitionColumnStatistics.java 1666dc3 
  metastore/src/model/org/apache/hadoop/hive/metastore/model/MTableColumnStatistics.java bce9f0f 
  ql/src/java/org/apache/hadoop/hive/ql/exec/ColumnStatsTask.java 0c46b00 
  ql/src/java/org/apache/hadoop/hive/ql/exec/ColumnStatsUpdateTask.java b85282c 
  ql/src/java/org/apache/hadoop/hive/ql/metadata/formatting/MetaDataFormatUtils.java 1662696 
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFComputeStats.java 363039b 
  ql/src/test/queries/clientpositive/compute_stats_date.q PRE-CREATION 
  ql/src/test/results/clientpositive/compute_stats_date.q.out PRE-CREATION 

Diff: https://reviews.apache.org/r/32901/diff/


Testing
-------


Thanks,

Jason Dere


Re: Review Request 32901: HIVE-10226 Column stats for Date columns not supported

Posted by Jason Dere <jd...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/32901/
-----------------------------------------------------------

(Updated April 8, 2015, 5:51 p.m.)


Review request for hive, Ashutosh Chauhan and Prasanth_J.


Changes
-------

minor changes per comments from Swarnim Kulkarni


Bugs: HIVE-10226
    https://issues.apache.org/jira/browse/HIVE-10226


Repository: hive-git


Description
-------

Re-use the long stats for Date column stats, using the days since epoch value as the long value.


Diffs (updated)
-----

  metastore/if/hive_metastore.thrift 57bce0c 
  metastore/src/java/org/apache/hadoop/hive/metastore/StatObjectConverter.java 475883b 
  metastore/src/model/org/apache/hadoop/hive/metastore/model/MPartitionColumnStatistics.java 1666dc3 
  metastore/src/model/org/apache/hadoop/hive/metastore/model/MTableColumnStatistics.java bce9f0f 
  ql/src/java/org/apache/hadoop/hive/ql/exec/ColumnStatsTask.java 0c46b00 
  ql/src/java/org/apache/hadoop/hive/ql/exec/ColumnStatsUpdateTask.java b85282c 
  ql/src/java/org/apache/hadoop/hive/ql/metadata/formatting/MetaDataFormatUtils.java 1662696 
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFComputeStats.java 363039b 
  ql/src/test/queries/clientpositive/compute_stats_date.q PRE-CREATION 
  ql/src/test/results/clientpositive/compute_stats_date.q.out PRE-CREATION 

Diff: https://reviews.apache.org/r/32901/diff/


Testing
-------


Thanks,

Jason Dere


Re: Review Request 32901: HIVE-10226 Column stats for Date columns not supported

Posted by Swarnim Kulkarni <ku...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/32901/#review79309
-----------------------------------------------------------



ql/src/java/org/apache/hadoop/hive/ql/exec/ColumnStatsUpdateTask.java
<https://reviews.apache.org/r/32901/#comment128564>

    Is this needed to be done? I thought thrift objects default this to false.



ql/src/java/org/apache/hadoop/hive/ql/exec/ColumnStatsUpdateTask.java
<https://reviews.apache.org/r/32901/#comment128565>

    I would think including a WARN or atleast a DEBUG here will be helpful for debugging.


- Swarnim Kulkarni


On April 7, 2015, 10:05 p.m., Jason Dere wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/32901/
> -----------------------------------------------------------
> 
> (Updated April 7, 2015, 10:05 p.m.)
> 
> 
> Review request for hive, Ashutosh Chauhan and Prasanth_J.
> 
> 
> Bugs: HIVE-10226
>     https://issues.apache.org/jira/browse/HIVE-10226
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Re-use the long stats for Date column stats, using the days since epoch value as the long value.
> 
> 
> Diffs
> -----
> 
>   metastore/if/hive_metastore.thrift 57bce0c 
>   metastore/src/model/org/apache/hadoop/hive/metastore/model/MPartitionColumnStatistics.java 1666dc3 
>   metastore/src/model/org/apache/hadoop/hive/metastore/model/MTableColumnStatistics.java bce9f0f 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/ColumnStatsTask.java 0c46b00 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/ColumnStatsUpdateTask.java b85282c 
>   ql/src/java/org/apache/hadoop/hive/ql/metadata/formatting/MetaDataFormatUtils.java 1662696 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFComputeStats.java 363039b 
>   ql/src/test/queries/clientpositive/compute_stats_date.q PRE-CREATION 
>   ql/src/test/results/clientpositive/compute_stats_date.q.out PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/32901/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Jason Dere
> 
>


Re: Review Request 32901: HIVE-10226 Column stats for Date columns not supported

Posted by Jason Dere <jd...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/32901/
-----------------------------------------------------------

(Updated April 7, 2015, 10:05 p.m.)


Review request for hive, Ashutosh Chauhan and Prasanth_J.


Changes
-------

Check for null values in Date/Decimal versions of MetaDataFormatUtils.convertToString()


Bugs: HIVE-10226
    https://issues.apache.org/jira/browse/HIVE-10226


Repository: hive-git


Description
-------

Re-use the long stats for Date column stats, using the days since epoch value as the long value.


Diffs (updated)
-----

  metastore/if/hive_metastore.thrift 57bce0c 
  metastore/src/model/org/apache/hadoop/hive/metastore/model/MPartitionColumnStatistics.java 1666dc3 
  metastore/src/model/org/apache/hadoop/hive/metastore/model/MTableColumnStatistics.java bce9f0f 
  ql/src/java/org/apache/hadoop/hive/ql/exec/ColumnStatsTask.java 0c46b00 
  ql/src/java/org/apache/hadoop/hive/ql/exec/ColumnStatsUpdateTask.java b85282c 
  ql/src/java/org/apache/hadoop/hive/ql/metadata/formatting/MetaDataFormatUtils.java 1662696 
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFComputeStats.java 363039b 
  ql/src/test/queries/clientpositive/compute_stats_date.q PRE-CREATION 
  ql/src/test/results/clientpositive/compute_stats_date.q.out PRE-CREATION 

Diff: https://reviews.apache.org/r/32901/diff/


Testing
-------


Thanks,

Jason Dere


Re: Review Request 32901: HIVE-10226 Column stats for Date columns not supported

Posted by Jason Dere <jd...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/32901/
-----------------------------------------------------------

(Updated April 7, 2015, 9:48 p.m.)


Review request for hive, Ashutosh Chauhan and Prasanth_J.


Changes
-------

Created new DateColumnStatsData in hive_metastore.thrift, which will be used for the date stats. Also updated describe formatter/UpdateStatsTask to handle Date column stats.
I've removed the generated Thrift code from diff to make this more readable.


Bugs: HIVE-10226
    https://issues.apache.org/jira/browse/HIVE-10226


Repository: hive-git


Description
-------

Re-use the long stats for Date column stats, using the days since epoch value as the long value.


Diffs (updated)
-----

  metastore/if/hive_metastore.thrift 57bce0c 
  metastore/src/java/org/apache/hadoop/hive/metastore/StatObjectConverter.java 475883b 
  metastore/src/model/org/apache/hadoop/hive/metastore/model/MPartitionColumnStatistics.java 1666dc3 
  metastore/src/model/org/apache/hadoop/hive/metastore/model/MTableColumnStatistics.java bce9f0f 
  ql/src/java/org/apache/hadoop/hive/ql/exec/ColumnStatsTask.java 0c46b00 
  ql/src/java/org/apache/hadoop/hive/ql/exec/ColumnStatsUpdateTask.java b85282c 
  ql/src/java/org/apache/hadoop/hive/ql/metadata/formatting/MetaDataFormatUtils.java 1662696 
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFComputeStats.java 363039b 
  ql/src/test/queries/clientpositive/compute_stats_date.q PRE-CREATION 
  ql/src/test/results/clientpositive/compute_stats_date.q.out PRE-CREATION 

Diff: https://reviews.apache.org/r/32901/diff/


Testing
-------


Thanks,

Jason Dere