You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by Jitendra Pandey <ji...@hortonworks.com> on 2014/03/27 08:02:51 UTC

Review Request 19718: Vectorized Between and IN expressions don't work with decimal, date types.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/19718/
-----------------------------------------------------------

Review request for hive and Eric Hanson.


Bugs: HIVE-6752
    https://issues.apache.org/jira/browse/HIVE-6752


Repository: hive-git


Description
-------

Vectorized Between and IN expressions don't work with decimal, date types.


Diffs
-----

  ant/src/org/apache/hadoop/hive/ant/GenVectorCode.java 44b0c59 
  ql/src/gen/vectorization/ExpressionTemplates/FilterDecimalColumnBetween.txt PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizationContext.java 96e74a9 
  ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/CastDateToString.java PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/DecimalColumnInList.java PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/FilterDecimalColumnInList.java PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/IDecimalInExpr.java PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java c2240c0 
  ql/src/test/queries/clientpositive/vector_between_in.q PRE-CREATION 
  ql/src/test/results/clientpositive/vector_between_in.q.out PRE-CREATION 

Diff: https://reviews.apache.org/r/19718/diff/


Testing
-------


Thanks,

Jitendra Pandey


Re: Review Request 19718: Vectorized Between and IN expressions don't work with decimal, date types.

Posted by Jitendra Pandey <ji...@hortonworks.com>.

> On March 28, 2014, 10:42 p.m., Eric Hanson wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorHashKeyWrapper.java, line 62
> > <https://reviews.apache.org/r/19718/diff/3/?file=539988#file539988line62>
> >
> >     please add a comment to explain why we use the sum of all the counts here to determine the array size.

This patch modifies only one line in this file, the other code has been around for quiet some time. I agree that we should add some comments to explain these offsets, I would like to address them in a separate jira.


> On March 28, 2014, 10:42 p.m., Eric Hanson wrote:
> > ql/src/test/org/apache/hadoop/hive/ql/exec/vector/TestVectorizationContext.java, line 968
> > <https://reviews.apache.org/r/19718/diff/3/?file=539995#file539995line968>
> >
> >     Timestamp is supposed to be represented as a long (# of nanos since epoch). So whey is this using a FilterStringColumnBetween?

The test compares timestamp with string constants, which requires these arguments to be cast to a common type. I am using the legacy hive FunctionRegistry to get the common type which happens to be string in this case. This should be optimized to instead cast strings to timestamps, but that would mean a different logic to determine common types for comparisons, which we better take up in a separate jira. This is not a regression even though test was passing earlier, because earlier for between clause, everything was getting cast to boolean (because first argument of Between expressions is a boolean for Not) which was incorrect anyway. 


> On March 28, 2014, 10:42 p.m., Eric Hanson wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorHashKeyWrapper.java, line 226
> > <https://reviews.apache.org/r/19718/diff/3/?file=539988#file539988line226>
> >
> >     Consider for readability/encapsulation having a function to compute offset, e.g. 
> >     
> >     isNull[decimalOffset(index)] = false;
> >     
> >     Please add a comment to explain offset logic.
> >     
> >     Does addition of decimal affect any other offsets? I guess not.

  All datatypes use this pattern of offset calculation. The readability argument is valid, but it would be better to address it for all datatypes in a separate jira.
 Addition of decimal was done in an earlier patch and it doesn't affect other offsets, this patch only sets the isNull correctly for non-null values.


> On March 28, 2014, 10:42 p.m., Eric Hanson wrote:
> > ql/src/test/org/apache/hadoop/hive/ql/exec/vector/TestVectorizationContext.java, line 973
> > <https://reviews.apache.org/r/19718/diff/3/?file=539995#file539995line973>
> >
> >     Again, why string and not long "not between" operator?

See previous one.


- Jitendra


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/19718/#review38958
-----------------------------------------------------------


On March 28, 2014, 9:56 p.m., Jitendra Pandey wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/19718/
> -----------------------------------------------------------
> 
> (Updated March 28, 2014, 9:56 p.m.)
> 
> 
> Review request for hive and Eric Hanson.
> 
> 
> Bugs: HIVE-6752
>     https://issues.apache.org/jira/browse/HIVE-6752
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Vectorized Between and IN expressions don't work with decimal, date types.
> 
> 
> Diffs
> -----
> 
>   ant/src/org/apache/hadoop/hive/ant/GenVectorCode.java 44b0c59 
>   ql/src/gen/vectorization/ExpressionTemplates/FilterDecimalColumnBetween.txt PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorHashKeyWrapper.java 2229079 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizationContext.java 96e74a9 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/CastDateToString.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/DecimalColumnInList.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/FilterDecimalColumnInList.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/IDecimalInExpr.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java c2240c0 
>   ql/src/test/org/apache/hadoop/hive/ql/exec/vector/TestVectorizationContext.java 5ebab70 
>   ql/src/test/queries/clientpositive/vector_between_in.q PRE-CREATION 
>   ql/src/test/results/clientpositive/vector_between_in.q.out PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/19718/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Jitendra Pandey
> 
>


Re: Review Request 19718: Vectorized Between and IN expressions don't work with decimal, date types.

Posted by Eric Hanson <eh...@microsoft.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/19718/#review38958
-----------------------------------------------------------



ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorHashKeyWrapper.java
<https://reviews.apache.org/r/19718/#comment71328>

    please add a comment to explain why we use the sum of all the counts here to determine the array size.



ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorHashKeyWrapper.java
<https://reviews.apache.org/r/19718/#comment71329>

    Consider for readability/encapsulation having a function to compute offset, e.g. 
    
    isNull[decimalOffset(index)] = false;
    
    Please add a comment to explain offset logic.
    
    Does addition of decimal affect any other offsets? I guess not.



ql/src/test/org/apache/hadoop/hive/ql/exec/vector/TestVectorizationContext.java
<https://reviews.apache.org/r/19718/#comment71330>

    Timestamp is supposed to be represented as a long (# of nanos since epoch). So whey is this using a FilterStringColumnBetween?



ql/src/test/org/apache/hadoop/hive/ql/exec/vector/TestVectorizationContext.java
<https://reviews.apache.org/r/19718/#comment71331>

    Again, why string and not long "not between" operator?


- Eric Hanson


On March 28, 2014, 9:56 p.m., Jitendra Pandey wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/19718/
> -----------------------------------------------------------
> 
> (Updated March 28, 2014, 9:56 p.m.)
> 
> 
> Review request for hive and Eric Hanson.
> 
> 
> Bugs: HIVE-6752
>     https://issues.apache.org/jira/browse/HIVE-6752
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Vectorized Between and IN expressions don't work with decimal, date types.
> 
> 
> Diffs
> -----
> 
>   ant/src/org/apache/hadoop/hive/ant/GenVectorCode.java 44b0c59 
>   ql/src/gen/vectorization/ExpressionTemplates/FilterDecimalColumnBetween.txt PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorHashKeyWrapper.java 2229079 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizationContext.java 96e74a9 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/CastDateToString.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/DecimalColumnInList.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/FilterDecimalColumnInList.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/IDecimalInExpr.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java c2240c0 
>   ql/src/test/org/apache/hadoop/hive/ql/exec/vector/TestVectorizationContext.java 5ebab70 
>   ql/src/test/queries/clientpositive/vector_between_in.q PRE-CREATION 
>   ql/src/test/results/clientpositive/vector_between_in.q.out PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/19718/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Jitendra Pandey
> 
>


Re: Review Request 19718: Vectorized Between and IN expressions don't work with decimal, date types.

Posted by Jitendra Pandey <ji...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/19718/
-----------------------------------------------------------

(Updated March 28, 2014, 9:56 p.m.)


Review request for hive and Eric Hanson.


Bugs: HIVE-6752
    https://issues.apache.org/jira/browse/HIVE-6752


Repository: hive-git


Description
-------

Vectorized Between and IN expressions don't work with decimal, date types.


Diffs (updated)
-----

  ant/src/org/apache/hadoop/hive/ant/GenVectorCode.java 44b0c59 
  ql/src/gen/vectorization/ExpressionTemplates/FilterDecimalColumnBetween.txt PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorHashKeyWrapper.java 2229079 
  ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizationContext.java 96e74a9 
  ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/CastDateToString.java PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/DecimalColumnInList.java PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/FilterDecimalColumnInList.java PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/IDecimalInExpr.java PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java c2240c0 
  ql/src/test/org/apache/hadoop/hive/ql/exec/vector/TestVectorizationContext.java 5ebab70 
  ql/src/test/queries/clientpositive/vector_between_in.q PRE-CREATION 
  ql/src/test/results/clientpositive/vector_between_in.q.out PRE-CREATION 

Diff: https://reviews.apache.org/r/19718/diff/


Testing
-------


Thanks,

Jitendra Pandey


Re: Review Request 19718: Vectorized Between and IN expressions don't work with decimal, date types.

Posted by Jitendra Pandey <ji...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/19718/
-----------------------------------------------------------

(Updated March 27, 2014, 11:22 p.m.)


Review request for hive and Eric Hanson.


Changes
-------

Addressed comments. Also updated test with 'Order by' clauses, because hadoop-2 and hadoop-1 produce different order of results if order by is not specified.


Bugs: HIVE-6752
    https://issues.apache.org/jira/browse/HIVE-6752


Repository: hive-git


Description
-------

Vectorized Between and IN expressions don't work with decimal, date types.


Diffs (updated)
-----

  ant/src/org/apache/hadoop/hive/ant/GenVectorCode.java 44b0c59 
  ql/src/gen/vectorization/ExpressionTemplates/FilterDecimalColumnBetween.txt PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizationContext.java 96e74a9 
  ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/CastDateToString.java PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/DecimalColumnInList.java PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/FilterDecimalColumnInList.java PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/IDecimalInExpr.java PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java c2240c0 
  ql/src/test/queries/clientpositive/vector_between_in.q PRE-CREATION 
  ql/src/test/results/clientpositive/vector_between_in.q.out PRE-CREATION 

Diff: https://reviews.apache.org/r/19718/diff/


Testing
-------


Thanks,

Jitendra Pandey


Re: Review Request 19718: Vectorized Between and IN expressions don't work with decimal, date types.

Posted by Eric Hanson <eh...@microsoft.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/19718/#review38752
-----------------------------------------------------------


Looks good overall. Only minor comments.


ql/src/gen/vectorization/ExpressionTemplates/FilterDecimalColumnBetween.txt
<https://reviews.apache.org/r/19718/#comment71027>

    please remove all trailing whitespace in this file



ql/src/gen/vectorization/ExpressionTemplates/FilterDecimalColumnBetween.txt
<https://reviews.apache.org/r/19718/#comment71034>

    add blank after //



ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizationContext.java
<https://reviews.apache.org/r/19718/#comment71038>

    Couldn't determine common type ...
    
    sounds better



ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/DecimalColumnInList.java
<https://reviews.apache.org/r/19718/#comment71053>

    Change comment. This is not a filter, it is a Boolean-valued expression.



ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/DecimalColumnInList.java
<https://reviews.apache.org/r/19718/#comment71052>

    Remove the comment about "This is optimized for lookup of the data type of the column." 
    
    because that doesn't apply here since you're using the standard HashSet.
    
    But it is still pretty good :-)



ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/DecimalColumnInList.java
<https://reviews.apache.org/r/19718/#comment71057>

    formatting: j=0 ==> j = 0
    



ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/DecimalColumnInList.java
<https://reviews.apache.org/r/19718/#comment71059>

    add blanks line before comment and space after //



ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/FilterDecimalColumnInList.java
<https://reviews.apache.org/r/19718/#comment71062>

    remove "This is optimized...."



ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/FilterDecimalColumnInList.java
<https://reviews.apache.org/r/19718/#comment71061>

    see formatting comments for DecimalColumnInList


- Eric Hanson


On March 27, 2014, 7:02 a.m., Jitendra Pandey wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/19718/
> -----------------------------------------------------------
> 
> (Updated March 27, 2014, 7:02 a.m.)
> 
> 
> Review request for hive and Eric Hanson.
> 
> 
> Bugs: HIVE-6752
>     https://issues.apache.org/jira/browse/HIVE-6752
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Vectorized Between and IN expressions don't work with decimal, date types.
> 
> 
> Diffs
> -----
> 
>   ant/src/org/apache/hadoop/hive/ant/GenVectorCode.java 44b0c59 
>   ql/src/gen/vectorization/ExpressionTemplates/FilterDecimalColumnBetween.txt PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizationContext.java 96e74a9 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/CastDateToString.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/DecimalColumnInList.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/FilterDecimalColumnInList.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/IDecimalInExpr.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java c2240c0 
>   ql/src/test/queries/clientpositive/vector_between_in.q PRE-CREATION 
>   ql/src/test/results/clientpositive/vector_between_in.q.out PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/19718/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Jitendra Pandey
> 
>