You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by Jason Dere <jd...@hortonworks.com> on 2017/01/25 00:03:42 UTC

Review Request 55898: HIVE-15698 Vectorization support for min/max/bloomfilter runtime filtering

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55898/
-----------------------------------------------------------

Review request for hive, Deepak Jaiswal and Matt McCline.


Bugs: HIVE-15698
    https://issues.apache.org/jira/browse/HIVE-15698


Repository: hive-git


Description
-------

Adds vectorized support for ExprNodeDynamicValue, BETWEEN() with DynamicValue, bloom_filter() aggregation function, and in_bloom_filter()


Diffs
-----

  ant/src/org/apache/hadoop/hive/ant/GenVectorCode.java e9fe8fa 
  ql/src/gen/vectorization/ExpressionTemplates/FilterColumnBetweenDynamicValue.txt PRE-CREATION 
  ql/src/gen/vectorization/ExpressionTemplates/FilterDecimalColumnBetween.txt d68edfa 
  ql/src/gen/vectorization/ExpressionTemplates/FilterStringColumnBetween.txt e8049da 
  ql/src/gen/vectorization/ExpressionTemplates/FilterTimestampColumnBetween.txt 4298d79 
  ql/src/gen/vectorization/ExpressionTemplates/FilterTruncStringColumnBetween.txt 94a174d 
  ql/src/java/org/apache/hadoop/hive/ql/exec/tez/DynamicValueRegistryTez.java 7bbedf6 
  ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorExpressionDescriptor.java 217af3f 
  ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorFilterOperator.java 261246b 
  ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorSelectOperator.java f7fec8f 
  ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizationContext.java c887757 
  ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/DynamicValueVectorExpression.java PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/VectorExpression.java 8fca8a1 
  ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/VectorInBloomFilterColDynamicValue.java PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/aggregates/VectorUDAFBloomFilter.java PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/aggregates/VectorUDAFBloomFilterMerge.java PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java e3d9d7f 
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFInBloomFilter.java 1b7de6c 
  ql/src/test/org/apache/hadoop/hive/ql/optimizer/physical/TestVectorizer.java 59cb31e 
  ql/src/test/queries/clientpositive/vectorized_dynamic_semijoin_reduction.q PRE-CREATION 
  ql/src/test/results/clientpositive/llap/vectorized_dynamic_partition_pruning.q.out 3d087b3 
  ql/src/test/results/clientpositive/llap/vectorized_dynamic_semijoin_reduction.q.out PRE-CREATION 
  storage-api/src/java/org/apache/hive/common/util/BloomFilter.java d44bba8 
  storage-api/src/test/org/apache/hive/common/util/TestBloomFilter.java 63c7050 

Diff: https://reviews.apache.org/r/55898/diff/


Testing
-------

qtests


Thanks,

Jason Dere


Re: Review Request 55898: HIVE-15698 Vectorization support for min/max/bloomfilter runtime filtering

Posted by Jason Dere <jd...@hortonworks.com>.

> On Jan. 26, 2017, 10:32 a.m., Matt McCline wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/DynamicValueVectorExpression.java, line 76
> > <https://reviews.apache.org/r/55898/diff/1/?file=1613972#file1613972line76>
> >
> >     One pattern we have added is also setting isNull to false on the value path.

Ok, will add this to all of the evaluateLong/Double/etc methods


> On Jan. 26, 2017, 10:32 a.m., Matt McCline wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/DynamicValueVectorExpression.java, line 146
> > <https://reviews.apache.org/r/55898/diff/1/?file=1613972#file1613972line146>
> >
> >     So, boolean is not applicable for dynamic values?

Missed that - will add boolean.


> On Jan. 26, 2017, 10:32 a.m., Matt McCline wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/VectorInBloomFilterColDynamicValue.java, line 270
> > <https://reviews.apache.org/r/55898/diff/1/?file=1613974#file1613974line270>
> >
> >     With the FastDecimal you can get the String from the HiveDecimalWritable directly now.  I.e. the DecimalColumnVector.vector.  So, you do not have to call getHiveDecimal and get better performance.
> >     
> >     An additional performance option is available, too.  You can call a variation of toString that passes a scratch byte[] that makes to String conversion even faster...

Thanks for the tip. Will try using.


- Jason


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55898/#review163114
-----------------------------------------------------------


On Jan. 25, 2017, 12:03 a.m., Jason Dere wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55898/
> -----------------------------------------------------------
> 
> (Updated Jan. 25, 2017, 12:03 a.m.)
> 
> 
> Review request for hive, Deepak Jaiswal and Matt McCline.
> 
> 
> Bugs: HIVE-15698
>     https://issues.apache.org/jira/browse/HIVE-15698
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Adds vectorized support for ExprNodeDynamicValue, BETWEEN() with DynamicValue, bloom_filter() aggregation function, and in_bloom_filter()
> 
> 
> Diffs
> -----
> 
>   ant/src/org/apache/hadoop/hive/ant/GenVectorCode.java e9fe8fa 
>   ql/src/gen/vectorization/ExpressionTemplates/FilterColumnBetweenDynamicValue.txt PRE-CREATION 
>   ql/src/gen/vectorization/ExpressionTemplates/FilterDecimalColumnBetween.txt d68edfa 
>   ql/src/gen/vectorization/ExpressionTemplates/FilterStringColumnBetween.txt e8049da 
>   ql/src/gen/vectorization/ExpressionTemplates/FilterTimestampColumnBetween.txt 4298d79 
>   ql/src/gen/vectorization/ExpressionTemplates/FilterTruncStringColumnBetween.txt 94a174d 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/tez/DynamicValueRegistryTez.java 7bbedf6 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorExpressionDescriptor.java 217af3f 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorFilterOperator.java 261246b 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorSelectOperator.java f7fec8f 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizationContext.java c887757 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/DynamicValueVectorExpression.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/VectorExpression.java 8fca8a1 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/VectorInBloomFilterColDynamicValue.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/aggregates/VectorUDAFBloomFilter.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/aggregates/VectorUDAFBloomFilterMerge.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java e3d9d7f 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFInBloomFilter.java 1b7de6c 
>   ql/src/test/org/apache/hadoop/hive/ql/optimizer/physical/TestVectorizer.java 59cb31e 
>   ql/src/test/queries/clientpositive/vectorized_dynamic_semijoin_reduction.q PRE-CREATION 
>   ql/src/test/results/clientpositive/llap/vectorized_dynamic_partition_pruning.q.out 3d087b3 
>   ql/src/test/results/clientpositive/llap/vectorized_dynamic_semijoin_reduction.q.out PRE-CREATION 
>   storage-api/src/java/org/apache/hive/common/util/BloomFilter.java d44bba8 
>   storage-api/src/test/org/apache/hive/common/util/TestBloomFilter.java 63c7050 
> 
> Diff: https://reviews.apache.org/r/55898/diff/
> 
> 
> Testing
> -------
> 
> qtests
> 
> 
> Thanks,
> 
> Jason Dere
> 
>


Re: Review Request 55898: HIVE-15698 Vectorization support for min/max/bloomfilter runtime filtering

Posted by Matt McCline <mm...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55898/#review163114
-----------------------------------------------------------




ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizationContext.java (line 2706)
<https://reviews.apache.org/r/55898/#comment234567>

    I'm glad we are porting HIVE-13713.



ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/DynamicValueVectorExpression.java (line 76)
<https://reviews.apache.org/r/55898/#comment234563>

    One pattern we have added is also setting isNull to false on the value path.



ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/DynamicValueVectorExpression.java (line 146)
<https://reviews.apache.org/r/55898/#comment234566>

    So, boolean is not applicable for dynamic values?



ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/VectorInBloomFilterColDynamicValue.java (line 270)
<https://reviews.apache.org/r/55898/#comment234568>

    With the FastDecimal you can get the String from the HiveDecimalWritable directly now.  I.e. the DecimalColumnVector.vector.  So, you do not have to call getHiveDecimal and get better performance.
    
    An additional performance option is available, too.  You can call a variation of toString that passes a scratch byte[] that makes to String conversion even faster...



ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/aggregates/VectorUDAFBloomFilter.java (line 85)
<https://reviews.apache.org/r/55898/#comment234570>

    Nit: whitespace.



ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/aggregates/VectorUDAFBloomFilter.java (line 147)
<https://reviews.apache.org/r/55898/#comment234569>

    So implicitly !noNulls implies all NULL.



ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/aggregates/VectorUDAFBloomFilter.java (line 367)
<https://reviews.apache.org/r/55898/#comment234573>

    As Sergey would say: is it?



ql/src/test/results/clientpositive/llap/vectorized_dynamic_semijoin_reduction.q.out (line 53)
<https://reviews.apache.org/r/55898/#comment234576>

    I'm glad it shows up so nicely in EXPLAIN.


- Matt McCline


On Jan. 25, 2017, 12:03 a.m., Jason Dere wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55898/
> -----------------------------------------------------------
> 
> (Updated Jan. 25, 2017, 12:03 a.m.)
> 
> 
> Review request for hive, Deepak Jaiswal and Matt McCline.
> 
> 
> Bugs: HIVE-15698
>     https://issues.apache.org/jira/browse/HIVE-15698
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Adds vectorized support for ExprNodeDynamicValue, BETWEEN() with DynamicValue, bloom_filter() aggregation function, and in_bloom_filter()
> 
> 
> Diffs
> -----
> 
>   ant/src/org/apache/hadoop/hive/ant/GenVectorCode.java e9fe8fa 
>   ql/src/gen/vectorization/ExpressionTemplates/FilterColumnBetweenDynamicValue.txt PRE-CREATION 
>   ql/src/gen/vectorization/ExpressionTemplates/FilterDecimalColumnBetween.txt d68edfa 
>   ql/src/gen/vectorization/ExpressionTemplates/FilterStringColumnBetween.txt e8049da 
>   ql/src/gen/vectorization/ExpressionTemplates/FilterTimestampColumnBetween.txt 4298d79 
>   ql/src/gen/vectorization/ExpressionTemplates/FilterTruncStringColumnBetween.txt 94a174d 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/tez/DynamicValueRegistryTez.java 7bbedf6 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorExpressionDescriptor.java 217af3f 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorFilterOperator.java 261246b 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorSelectOperator.java f7fec8f 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizationContext.java c887757 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/DynamicValueVectorExpression.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/VectorExpression.java 8fca8a1 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/VectorInBloomFilterColDynamicValue.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/aggregates/VectorUDAFBloomFilter.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/aggregates/VectorUDAFBloomFilterMerge.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java e3d9d7f 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFInBloomFilter.java 1b7de6c 
>   ql/src/test/org/apache/hadoop/hive/ql/optimizer/physical/TestVectorizer.java 59cb31e 
>   ql/src/test/queries/clientpositive/vectorized_dynamic_semijoin_reduction.q PRE-CREATION 
>   ql/src/test/results/clientpositive/llap/vectorized_dynamic_partition_pruning.q.out 3d087b3 
>   ql/src/test/results/clientpositive/llap/vectorized_dynamic_semijoin_reduction.q.out PRE-CREATION 
>   storage-api/src/java/org/apache/hive/common/util/BloomFilter.java d44bba8 
>   storage-api/src/test/org/apache/hive/common/util/TestBloomFilter.java 63c7050 
> 
> Diff: https://reviews.apache.org/r/55898/diff/
> 
> 
> Testing
> -------
> 
> qtests
> 
> 
> Thanks,
> 
> Jason Dere
> 
>


Re: Review Request 55898: HIVE-15698 Vectorization support for min/max/bloomfilter runtime filtering

Posted by Jason Dere <jd...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55898/
-----------------------------------------------------------

(Updated Jan. 26, 2017, 11:47 p.m.)


Review request for hive, Deepak Jaiswal and Matt McCline.


Changes
-------

Incorporating review feedback.


Bugs: HIVE-15698
    https://issues.apache.org/jira/browse/HIVE-15698


Repository: hive-git


Description
-------

Adds vectorized support for ExprNodeDynamicValue, BETWEEN() with DynamicValue, bloom_filter() aggregation function, and in_bloom_filter()


Diffs (updated)
-----

  ant/src/org/apache/hadoop/hive/ant/GenVectorCode.java e9fe8fa 
  itests/src/test/resources/testconfiguration.properties e966959 
  ql/src/gen/vectorization/ExpressionTemplates/FilterColumnBetweenDynamicValue.txt PRE-CREATION 
  ql/src/gen/vectorization/ExpressionTemplates/FilterDecimalColumnBetween.txt d68edfa 
  ql/src/gen/vectorization/ExpressionTemplates/FilterStringColumnBetween.txt e8049da 
  ql/src/gen/vectorization/ExpressionTemplates/FilterTimestampColumnBetween.txt 4298d79 
  ql/src/gen/vectorization/ExpressionTemplates/FilterTruncStringColumnBetween.txt 94a174d 
  ql/src/java/org/apache/hadoop/hive/ql/exec/tez/DynamicValueRegistryTez.java 7bbedf6 
  ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorExpressionDescriptor.java 217af3f 
  ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorFilterOperator.java 261246b 
  ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorSelectOperator.java f7fec8f 
  ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizationContext.java c887757 
  ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/DynamicValueVectorExpression.java PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/VectorExpression.java 8fca8a1 
  ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/VectorInBloomFilterColDynamicValue.java PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/aggregates/VectorUDAFBloomFilter.java PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/aggregates/VectorUDAFBloomFilterMerge.java PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java e3d9d7f 
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFBloomFilter.java fb9a140 
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFInBloomFilter.java 1b7de6c 
  ql/src/test/org/apache/hadoop/hive/ql/optimizer/physical/TestVectorizer.java 59cb31e 
  ql/src/test/queries/clientpositive/vectorized_dynamic_semijoin_reduction.q PRE-CREATION 
  ql/src/test/results/clientpositive/llap/mergejoin.q.out 4ec2a71 
  ql/src/test/results/clientpositive/llap/orc_llap.q.out 90055a5 
  ql/src/test/results/clientpositive/llap/vector_binary_join_groupby.q.out 9fbce7d 
  ql/src/test/results/clientpositive/llap/vectorized_dynamic_partition_pruning.q.out 3d087b3 
  ql/src/test/results/clientpositive/llap/vectorized_dynamic_semijoin_reduction.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/vector_binary_join_groupby.q.out 850278e 
  storage-api/src/java/org/apache/hive/common/util/BloomFilter.java d44bba8 
  storage-api/src/test/org/apache/hive/common/util/TestBloomFilter.java 63c7050 

Diff: https://reviews.apache.org/r/55898/diff/


Testing
-------

qtests


Thanks,

Jason Dere