You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hive.apache.org by jc...@apache.org on 2018/03/22 16:57:34 UTC
[11/34] hive git commit: HIVE-18979: Enable
AggregateReduceFunctionsRule from Calcite (Jesus Camacho Rodriguez,
reviewed by Ashutosh Chauhan)
http://git-wip-us.apache.org/repos/asf/hive/blob/5cb8867b/ql/src/test/results/clientpositive/spark/parquet_vectorization_2.q.out
----------------------------------------------------------------------
diff --git a/ql/src/test/results/clientpositive/spark/parquet_vectorization_2.q.out b/ql/src/test/results/clientpositive/spark/parquet_vectorization_2.q.out
index 8b3c5f2..423d2e3 100644
--- a/ql/src/test/results/clientpositive/spark/parquet_vectorization_2.q.out
+++ b/ql/src/test/results/clientpositive/spark/parquet_vectorization_2.q.out
@@ -75,25 +75,26 @@ STAGE PLANS:
predicate: (((cdouble < UDFToDouble(ctinyint)) and ((UDFToDouble(ctimestamp2) <> -10669.0D) or (cint < 359))) or ((ctimestamp1 < ctimestamp2) and (cstring2 like 'b%') and (cfloat <= -5638.15))) (type: boolean)
Statistics: Num rows: 4778 Data size: 57336 Basic stats: COMPLETE Column stats: NONE
Select Operator
- expressions: ctinyint (type: tinyint), csmallint (type: smallint), cbigint (type: bigint), cfloat (type: float), cdouble (type: double)
- outputColumnNames: ctinyint, csmallint, cbigint, cfloat, cdouble
+ expressions: csmallint (type: smallint), cfloat (type: float), cbigint (type: bigint), ctinyint (type: tinyint), cdouble (type: double), UDFToDouble(cbigint) (type: double), (UDFToDouble(cbigint) * UDFToDouble(cbigint)) (type: double)
+ outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6
Select Vectorization:
className: VectorSelectOperator
native: true
- projectedOutputColumnNums: [0, 1, 3, 4, 5]
+ projectedOutputColumnNums: [1, 4, 3, 0, 5, 13, 16]
+ selectExpressions: CastLongToDouble(col 3:bigint) -> 13:double, DoubleColMultiplyDoubleColumn(col 14:double, col 15:double)(children: CastLongToDouble(col 3:bigint) -> 14:double, CastLongToDouble(col 3:bigint) -> 15:double) -> 16:double
Statistics: Num rows: 4778 Data size: 57336 Basic stats: COMPLETE Column stats: NONE
Group By Operator
- aggregations: avg(csmallint), sum(cfloat), var_pop(cbigint), count(), min(ctinyint), avg(cdouble)
+ aggregations: sum(_col0), count(_col0), sum(_col1), sum(_col6), sum(_col5), count(_col2), count(), min(_col3), sum(_col4), count(_col4)
Group By Vectorization:
- aggregators: VectorUDAFAvgLong(col 1:smallint) -> struct<count:bigint,sum:double,input:smallint>, VectorUDAFSumDouble(col 4:float) -> double, VectorUDAFVarLong(col 3:bigint) -> struct<count:bigint,sum:double,variance:double> aggregation: var_pop, VectorUDAFCountStar(*) -> bigint, VectorUDAFMinLong(col 0:tinyint) -> tinyint, VectorUDAFAvgDouble(col 5:double) -> struct<count:bigint,sum:double,input:double>
+ aggregators: VectorUDAFSumLong(col 1:smallint) -> bigint, VectorUDAFCount(col 1:smallint) -> bigint, VectorUDAFSumDouble(col 4:float) -> double, VectorUDAFSumDouble(col 16:double) -> double, VectorUDAFSumDouble(col 13:double) -> double, VectorUDAFCount(col 3:bigint) -> bigint, VectorUDAFCountStar(*) -> bigint, VectorUDAFMinLong(col 0:tinyint) -> tinyint, VectorUDAFSumDouble(col 5:double) -> double, VectorUDAFCount(col 5:double) -> bigint
className: VectorGroupByOperator
groupByMode: HASH
native: false
vectorProcessingMode: HASH
- projectedOutputColumnNums: [0, 1, 2, 3, 4, 5]
+ projectedOutputColumnNums: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
mode: hash
- outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5
- Statistics: Num rows: 1 Data size: 256 Basic stats: COMPLETE Column stats: NONE
+ outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8, _col9
+ Statistics: Num rows: 1 Data size: 76 Basic stats: COMPLETE Column stats: NONE
Reduce Output Operator
sort order:
Reduce Sink Vectorization:
@@ -101,9 +102,9 @@ STAGE PLANS:
keyColumnNums: []
native: true
nativeConditionsMet: hive.vectorized.execution.reducesink.new.enabled IS true, hive.execution.engine spark IN [tez, spark] IS true, No PTF TopN IS true, No DISTINCT columns IS true, BinarySortableSerDe for keys IS true, LazyBinarySerDe for values IS true
- valueColumnNums: [0, 1, 2, 3, 4, 5]
- Statistics: Num rows: 1 Data size: 256 Basic stats: COMPLETE Column stats: NONE
- value expressions: _col0 (type: struct<count:bigint,sum:double,input:smallint>), _col1 (type: double), _col2 (type: struct<count:bigint,sum:double,variance:double>), _col3 (type: bigint), _col4 (type: tinyint), _col5 (type: struct<count:bigint,sum:double,input:double>)
+ valueColumnNums: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
+ Statistics: Num rows: 1 Data size: 76 Basic stats: COMPLETE Column stats: NONE
+ value expressions: _col0 (type: bigint), _col1 (type: bigint), _col2 (type: double), _col3 (type: double), _col4 (type: double), _col5 (type: bigint), _col6 (type: bigint), _col7 (type: tinyint), _col8 (type: double), _col9 (type: bigint)
Execution mode: vectorized
Map Vectorization:
enabled: true
@@ -119,26 +120,50 @@ STAGE PLANS:
includeColumns: [0, 1, 2, 3, 4, 5, 7, 8, 9]
dataColumns: ctinyint:tinyint, csmallint:smallint, cint:int, cbigint:bigint, cfloat:float, cdouble:double, cstring1:string, cstring2:string, ctimestamp1:timestamp, ctimestamp2:timestamp, cboolean1:boolean, cboolean2:boolean
partitionColumnCount: 0
- scratchColumnTypeNames: [double]
+ scratchColumnTypeNames: [double, double, double, double]
Reducer 2
+ Execution mode: vectorized
Reduce Vectorization:
enabled: true
enableConditionsMet: hive.vectorized.execution.reduce.enabled IS true, hive.execution.engine spark IN [tez, spark] IS true
- notVectorizedReason: GROUPBY operator: Vector aggregation : "var_pop" for input type: "STRUCT" and output type: "DOUBLE" and mode: FINAL not supported for evaluator GenericUDAFVarianceEvaluator
- vectorized: false
+ reduceColumnNullOrder:
+ reduceColumnSortOrder:
+ allNative: false
+ usesVectorUDFAdaptor: false
+ vectorized: true
+ rowBatchContext:
+ dataColumnCount: 10
+ dataColumns: VALUE._col0:bigint, VALUE._col1:bigint, VALUE._col2:double, VALUE._col3:double, VALUE._col4:double, VALUE._col5:bigint, VALUE._col6:bigint, VALUE._col7:tinyint, VALUE._col8:double, VALUE._col9:bigint
+ partitionColumnCount: 0
+ scratchColumnTypeNames: []
Reduce Operator Tree:
Group By Operator
- aggregations: avg(VALUE._col0), sum(VALUE._col1), var_pop(VALUE._col2), count(VALUE._col3), min(VALUE._col4), avg(VALUE._col5)
+ aggregations: sum(VALUE._col0), count(VALUE._col1), sum(VALUE._col2), sum(VALUE._col3), sum(VALUE._col4), count(VALUE._col5), count(VALUE._col6), min(VALUE._col7), sum(VALUE._col8), count(VALUE._col9)
+ Group By Vectorization:
+ aggregators: VectorUDAFSumLong(col 0:bigint) -> bigint, VectorUDAFCountMerge(col 1:bigint) -> bigint, VectorUDAFSumDouble(col 2:double) -> double, VectorUDAFSumDouble(col 3:double) -> double, VectorUDAFSumDouble(col 4:double) -> double, VectorUDAFCountMerge(col 5:bigint) -> bigint, VectorUDAFCountMerge(col 6:bigint) -> bigint, VectorUDAFMinLong(col 7:tinyint) -> tinyint, VectorUDAFSumDouble(col 8:double) -> double, VectorUDAFCountMerge(col 9:bigint) -> bigint
+ className: VectorGroupByOperator
+ groupByMode: MERGEPARTIAL
+ native: false
+ vectorProcessingMode: GLOBAL
+ projectedOutputColumnNums: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
mode: mergepartial
- outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5
- Statistics: Num rows: 1 Data size: 256 Basic stats: COMPLETE Column stats: NONE
+ outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8, _col9
+ Statistics: Num rows: 1 Data size: 76 Basic stats: COMPLETE Column stats: NONE
Select Operator
- expressions: _col0 (type: double), (_col0 % -563.0D) (type: double), (_col0 + 762.0D) (type: double), _col1 (type: double), _col2 (type: double), (- _col2) (type: double), (_col1 - _col0) (type: double), _col3 (type: bigint), (- (_col1 - _col0)) (type: double), (_col2 - 762.0D) (type: double), _col4 (type: tinyint), ((- _col2) + UDFToDouble(_col4)) (type: double), _col5 (type: double), (((- _col2) + UDFToDouble(_col4)) - _col1) (type: double)
+ expressions: (_col0 / _col1) (type: double), ((_col0 / _col1) % -563.0D) (type: double), ((_col0 / _col1) + 762.0D) (type: double), _col2 (type: double), ((_col3 - ((_col4 * _col4) / _col5)) / _col5) (type: double), (- ((_col3 - ((_col4 * _col4) / _col5)) / _col5)) (type: double), (_col2 - (_col0 / _col1)) (type: double), _col6 (type: bigint), (- (_col2 - (_col0 / _col1))) (type: double), (((_col3 - ((_col4 * _col4) / _col5)) / _col5) - 762.0D) (type: double), _col7 (type: tinyint), ((- ((_col3 - ((_col4 * _col4) / _col5)) / _col5)) + UDFToDouble(_col7)) (type: double), (_col8 / _col9) (type: double), (((- ((_col3 - ((_col4 * _col4) / _col5)) / _col5)) + UDFToDouble(_col7)) - _col2) (type: double)
outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8, _col9, _col10, _col11, _col12, _col13
- Statistics: Num rows: 1 Data size: 256 Basic stats: COMPLETE Column stats: NONE
+ Select Vectorization:
+ className: VectorSelectOperator
+ native: true
+ projectedOutputColumnNums: [10, 12, 13, 2, 14, 11, 16, 6, 15, 17, 7, 20, 18, 19]
+ selectExpressions: LongColDivideLongColumn(col 0:bigint, col 1:bigint) -> 10:double, DoubleColModuloDoubleScalar(col 11:double, val -563.0)(children: LongColDivideLongColumn(col 0:bigint, col 1:bigint) -> 11:double) -> 12:double, DoubleColAddDoubleScalar(col 11:double, val 762.0)(children: LongColDivideLongColumn(col 0:bigint, col 1:bigint) -> 11:double) -> 13:double, DoubleColDivideLongColumn(col 11:double, col 5:bigint)(children: DoubleColSubtractDoubleColumn(col 3:double, col 14:double)(children: DoubleColDivideLongColumn(col 11:double, col 5:bigint)(children: DoubleColMultiplyDoubleColumn(col 4:double, col 4:double) -> 11:double) -> 14:double) -> 11:double) -> 14:double, DoubleColUnaryMinus(col 15:double)(children: DoubleColDivideLongColumn(col 11:double, col 5:bigint)(children: DoubleColSubtractDoubleColumn(col 3:double, col 15:double)(children: DoubleColDivideLongColumn(col 11:double, col 5:bigint)(children: DoubleColMultiplyDoubleColumn(col 4:double, col
4:double) -> 11:double) -> 15:double) -> 11:double) -> 15:double) -> 11:double, DoubleColSubtractDoubleColumn(col 2:double, col 15:double)(children: LongColDivideLongColumn(col 0:bigint, col 1:bigint) -> 15:double) -> 16:double, DoubleColUnaryMinus(col 17:double)(children: DoubleColSubtractDoubleColumn(col 2:double, col 15:double)(children: LongColDivideLongColumn(col 0:bigint, col 1:bigint) -> 15:double) -> 17:double) -> 15:double, DoubleColSubtractDoubleScalar(col 18:double, val 762.0)(children: DoubleColDivideLongColumn(col 17:double, col 5:bigint)(children: DoubleColSubtractDoubleColumn(col 3:double, col 18:double)(children: DoubleColDivideLongColumn(col 17:double, col 5:bigint)(children: DoubleColMultiplyDoubleColumn(col 4:double, col 4:double) -> 17:double) -> 18:double) -> 17:double) -> 18:double) -> 17:double, DoubleColAddDoubleColumn(col 18:double, col 19:double)(children: DoubleColUnaryMinus(col 19:double)(children: DoubleColDivideLongColumn(col 18:double, col 5:bigint)(c
hildren: DoubleColSubtractDoubleColumn(col 3:double, col 19:double)(children: DoubleColDivideLongColumn(col 18:double, col 5:bigint)(children: DoubleColMultiplyDoubleColumn(col 4:double, col 4:double) -> 18:double) -> 19:double) -> 18:double) -> 19:double) -> 18:double, CastLongToDouble(col 7:tinyint) -> 19:double) -> 20:double, DoubleColDivideLongColumn(col 8:double, col 9:bigint) -> 18:double, DoubleColSubtractDoubleColumn(col 22:double, col 2:double)(children: DoubleColAddDoubleColumn(col 19:double, col 21:double)(children: DoubleColUnaryMinus(col 21:double)(children: DoubleColDivideLongColumn(col 19:double, col 5:bigint)(children: DoubleColSubtractDoubleColumn(col 3:double, col 21:double)(children: DoubleColDivideLongColumn(col 19:double, col 5:bigint)(children: DoubleColMultiplyDoubleColumn(col 4:double, col 4:double) -> 19:double) -> 21:double) -> 19:double) -> 21:double) -> 19:double, CastLongToDouble(col 7:tinyint) -> 21:double) -> 22:double) -> 19:double
+ Statistics: Num rows: 1 Data size: 76 Basic stats: COMPLETE Column stats: NONE
File Output Operator
compressed: false
- Statistics: Num rows: 1 Data size: 256 Basic stats: COMPLETE Column stats: NONE
+ File Sink Vectorization:
+ className: VectorFileSinkOperator
+ native: false
+ Statistics: Num rows: 1 Data size: 76 Basic stats: COMPLETE Column stats: NONE
table:
input format: org.apache.hadoop.mapred.SequenceFileInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
@@ -198,4 +223,4 @@ WHERE (((ctimestamp1 < ctimestamp2)
POSTHOOK: type: QUERY
POSTHOOK: Input: default@alltypesparquet
#### A masked pattern was here ####
--5646.467075892857 -16.467075892856883 -4884.467075892857 -2839.634998679161 1.49936299222378778E18 -1.49936299222378778E18 2806.832077213696 3584 -2806.832077213696 1.49936299222378701E18 -64 -1.49936299222378778E18 -5650.1297631138395 -1.49936299222378496E18
+-5646.467075892857 -16.467075892856883 -4884.467075892857 -2839.634998679161 1.49936299222378906E18 -1.49936299222378906E18 2806.832077213696 3584 -2806.832077213696 1.49936299222378829E18 -64 -1.49936299222378906E18 -5650.1297631138395 -1.49936299222378624E18
http://git-wip-us.apache.org/repos/asf/hive/blob/5cb8867b/ql/src/test/results/clientpositive/spark/parquet_vectorization_3.q.out
----------------------------------------------------------------------
diff --git a/ql/src/test/results/clientpositive/spark/parquet_vectorization_3.q.out b/ql/src/test/results/clientpositive/spark/parquet_vectorization_3.q.out
index dd3532b..955f85c 100644
--- a/ql/src/test/results/clientpositive/spark/parquet_vectorization_3.q.out
+++ b/ql/src/test/results/clientpositive/spark/parquet_vectorization_3.q.out
@@ -80,25 +80,26 @@ STAGE PLANS:
predicate: (((UDFToDouble(cbigint) > cdouble) and (CAST( csmallint AS decimal(8,3)) >= 79.553) and (ctimestamp1 > ctimestamp2)) or ((UDFToFloat(cint) <= cfloat) and (CAST( cbigint AS decimal(22,3)) <> 79.553) and (UDFToDouble(ctimestamp2) = -29071.0D))) (type: boolean)
Statistics: Num rows: 2503 Data size: 30036 Basic stats: COMPLETE Column stats: NONE
Select Operator
- expressions: ctinyint (type: tinyint), csmallint (type: smallint), cint (type: int), cfloat (type: float)
- outputColumnNames: ctinyint, csmallint, cint, cfloat
+ expressions: csmallint (type: smallint), ctinyint (type: tinyint), cfloat (type: float), cint (type: int), UDFToDouble(csmallint) (type: double), (UDFToDouble(csmallint) * UDFToDouble(csmallint)) (type: double), UDFToDouble(ctinyint) (type: double), (UDFToDouble(ctinyint) * UDFToDouble(ctinyint)) (type: double), UDFToDouble(cfloat) (type: double), (UDFToDouble(cfloat) * UDFToDouble(cfloat)) (type: double), UDFToDouble(cint) (type: double), (UDFToDouble(cint) * UDFToDouble(cint)) (type: double)
+ outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8, _col9, _col10, _col11
Select Vectorization:
className: VectorSelectOperator
native: true
- projectedOutputColumnNums: [0, 1, 2, 4]
+ projectedOutputColumnNums: [1, 0, 4, 2, 13, 18, 16, 20, 4, 17, 19, 23]
+ selectExpressions: CastLongToDouble(col 1:smallint) -> 13:double, DoubleColMultiplyDoubleColumn(col 16:double, col 17:double)(children: CastLongToDouble(col 1:smallint) -> 16:double, CastLongToDouble(col 1:smallint) -> 17:double) -> 18:double, CastLongToDouble(col 0:tinyint) -> 16:double, DoubleColMultiplyDoubleColumn(col 17:double, col 19:double)(children: CastLongToDouble(col 0:tinyint) -> 17:double, CastLongToDouble(col 0:tinyint) -> 19:double) -> 20:double, DoubleColMultiplyDoubleColumn(col 4:double, col 4:double)(children: col 4:float, col 4:float) -> 17:double, CastLongToDouble(col 2:int) -> 19:double, DoubleColMultiplyDoubleColumn(col 21:double, col 22:double)(children: CastLongToDouble(col 2:int) -> 21:double, CastLongToDouble(col 2:int) -> 22:double) -> 23:double
Statistics: Num rows: 2503 Data size: 30036 Basic stats: COMPLETE Column stats: NONE
Group By Operator
- aggregations: stddev_samp(csmallint), stddev_pop(ctinyint), stddev_samp(cfloat), sum(cfloat), avg(cint), stddev_pop(cint)
+ aggregations: sum(_col5), sum(_col4), count(_col0), sum(_col7), sum(_col6), count(_col1), sum(_col9), sum(_col8), count(_col2), sum(_col2), sum(_col3), count(_col3), sum(_col11), sum(_col10)
Group By Vectorization:
- aggregators: VectorUDAFVarLong(col 1:smallint) -> struct<count:bigint,sum:double,variance:double> aggregation: stddev_samp, VectorUDAFVarLong(col 0:tinyint) -> struct<count:bigint,sum:double,variance:double> aggregation: stddev_pop, VectorUDAFVarDouble(col 4:float) -> struct<count:bigint,sum:double,variance:double> aggregation: stddev_samp, VectorUDAFSumDouble(col 4:float) -> double, VectorUDAFAvgLong(col 2:int) -> struct<count:bigint,sum:double,input:int>, VectorUDAFVarLong(col 2:int) -> struct<count:bigint,sum:double,variance:double> aggregation: stddev_pop
+ aggregators: VectorUDAFSumDouble(col 18:double) -> double, VectorUDAFSumDouble(col 13:double) -> double, VectorUDAFCount(col 1:smallint) -> bigint, VectorUDAFSumDouble(col 20:double) -> double, VectorUDAFSumDouble(col 16:double) -> double, VectorUDAFCount(col 0:tinyint) -> bigint, VectorUDAFSumDouble(col 17:double) -> double, VectorUDAFSumDouble(col 4:double) -> double, VectorUDAFCount(col 4:float) -> bigint, VectorUDAFSumDouble(col 4:float) -> double, VectorUDAFSumLong(col 2:int) -> bigint, VectorUDAFCount(col 2:int) -> bigint, VectorUDAFSumDouble(col 23:double) -> double, VectorUDAFSumDouble(col 19:double) -> double
className: VectorGroupByOperator
groupByMode: HASH
native: false
vectorProcessingMode: HASH
- projectedOutputColumnNums: [0, 1, 2, 3, 4, 5]
+ projectedOutputColumnNums: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]
mode: hash
- outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5
- Statistics: Num rows: 1 Data size: 404 Basic stats: COMPLETE Column stats: NONE
+ outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8, _col9, _col10, _col11, _col12, _col13
+ Statistics: Num rows: 1 Data size: 112 Basic stats: COMPLETE Column stats: NONE
Reduce Output Operator
sort order:
Reduce Sink Vectorization:
@@ -106,9 +107,9 @@ STAGE PLANS:
keyColumnNums: []
native: true
nativeConditionsMet: hive.vectorized.execution.reducesink.new.enabled IS true, hive.execution.engine spark IN [tez, spark] IS true, No PTF TopN IS true, No DISTINCT columns IS true, BinarySortableSerDe for keys IS true, LazyBinarySerDe for values IS true
- valueColumnNums: [0, 1, 2, 3, 4, 5]
- Statistics: Num rows: 1 Data size: 404 Basic stats: COMPLETE Column stats: NONE
- value expressions: _col0 (type: struct<count:bigint,sum:double,variance:double>), _col1 (type: struct<count:bigint,sum:double,variance:double>), _col2 (type: struct<count:bigint,sum:double,variance:double>), _col3 (type: double), _col4 (type: struct<count:bigint,sum:double,input:int>), _col5 (type: struct<count:bigint,sum:double,variance:double>)
+ valueColumnNums: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]
+ Statistics: Num rows: 1 Data size: 112 Basic stats: COMPLETE Column stats: NONE
+ value expressions: _col0 (type: double), _col1 (type: double), _col2 (type: bigint), _col3 (type: double), _col4 (type: double), _col5 (type: bigint), _col6 (type: double), _col7 (type: double), _col8 (type: bigint), _col9 (type: double), _col10 (type: bigint), _col11 (type: bigint), _col12 (type: double), _col13 (type: double)
Execution mode: vectorized
Map Vectorization:
enabled: true
@@ -124,26 +125,50 @@ STAGE PLANS:
includeColumns: [0, 1, 2, 3, 4, 5, 8, 9]
dataColumns: ctinyint:tinyint, csmallint:smallint, cint:int, cbigint:bigint, cfloat:float, cdouble:double, cstring1:string, cstring2:string, ctimestamp1:timestamp, ctimestamp2:timestamp, cboolean1:boolean, cboolean2:boolean
partitionColumnCount: 0
- scratchColumnTypeNames: [double, decimal(22,3), decimal(8,3)]
+ scratchColumnTypeNames: [double, decimal(22,3), decimal(8,3), double, double, double, double, double, double, double, double]
Reducer 2
+ Execution mode: vectorized
Reduce Vectorization:
enabled: true
enableConditionsMet: hive.vectorized.execution.reduce.enabled IS true, hive.execution.engine spark IN [tez, spark] IS true
- notVectorizedReason: GROUPBY operator: Vector aggregation : "stddev_samp" for input type: "STRUCT" and output type: "DOUBLE" and mode: FINAL not supported for evaluator GenericUDAFStdSampleEvaluator
- vectorized: false
+ reduceColumnNullOrder:
+ reduceColumnSortOrder:
+ allNative: false
+ usesVectorUDFAdaptor: false
+ vectorized: true
+ rowBatchContext:
+ dataColumnCount: 14
+ dataColumns: VALUE._col0:double, VALUE._col1:double, VALUE._col2:bigint, VALUE._col3:double, VALUE._col4:double, VALUE._col5:bigint, VALUE._col6:double, VALUE._col7:double, VALUE._col8:bigint, VALUE._col9:double, VALUE._col10:bigint, VALUE._col11:bigint, VALUE._col12:double, VALUE._col13:double
+ partitionColumnCount: 0
+ scratchColumnTypeNames: []
Reduce Operator Tree:
Group By Operator
- aggregations: stddev_samp(VALUE._col0), stddev_pop(VALUE._col1), stddev_samp(VALUE._col2), sum(VALUE._col3), avg(VALUE._col4), stddev_pop(VALUE._col5)
+ aggregations: sum(VALUE._col0), sum(VALUE._col1), count(VALUE._col2), sum(VALUE._col3), sum(VALUE._col4), count(VALUE._col5), sum(VALUE._col6), sum(VALUE._col7), count(VALUE._col8), sum(VALUE._col9), sum(VALUE._col10), count(VALUE._col11), sum(VALUE._col12), sum(VALUE._col13)
+ Group By Vectorization:
+ aggregators: VectorUDAFSumDouble(col 0:double) -> double, VectorUDAFSumDouble(col 1:double) -> double, VectorUDAFCountMerge(col 2:bigint) -> bigint, VectorUDAFSumDouble(col 3:double) -> double, VectorUDAFSumDouble(col 4:double) -> double, VectorUDAFCountMerge(col 5:bigint) -> bigint, VectorUDAFSumDouble(col 6:double) -> double, VectorUDAFSumDouble(col 7:double) -> double, VectorUDAFCountMerge(col 8:bigint) -> bigint, VectorUDAFSumDouble(col 9:double) -> double, VectorUDAFSumLong(col 10:bigint) -> bigint, VectorUDAFCountMerge(col 11:bigint) -> bigint, VectorUDAFSumDouble(col 12:double) -> double, VectorUDAFSumDouble(col 13:double) -> double
+ className: VectorGroupByOperator
+ groupByMode: MERGEPARTIAL
+ native: false
+ vectorProcessingMode: GLOBAL
+ projectedOutputColumnNums: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]
mode: mergepartial
- outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5
- Statistics: Num rows: 1 Data size: 404 Basic stats: COMPLETE Column stats: NONE
+ outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8, _col9, _col10, _col11, _col12, _col13
+ Statistics: Num rows: 1 Data size: 112 Basic stats: COMPLETE Column stats: NONE
Select Operator
- expressions: _col0 (type: double), (_col0 - 10.175D) (type: double), _col1 (type: double), (_col0 * (_col0 - 10.175D)) (type: double), (- _col1) (type: double), (_col0 % 79.553D) (type: double), (- (_col0 * (_col0 - 10.175D))) (type: double), _col2 (type: double), (- _col0) (type: double), _col3 (type: double), ((- (_col0 * (_col0 - 10.175D))) / (_col0 - 10.175D)) (type: double), (- (_col0 - 10.175D)) (type: double), _col4 (type: double), (-3728.0D - _col0) (type: double), _col5 (type: double), (_col4 / _col2) (type: double)
+ expressions: power(((_col0 - ((_col1 * _col1) / _col2)) / CASE WHEN ((_col2 = 1L)) THEN (null) ELSE ((_col2 - 1)) END), 0.5) (type: double), (power(((_col0 - ((_col1 * _col1) / _col2)) / CASE WHEN ((_col2 = 1L)) THEN (null) ELSE ((_col2 - 1)) END), 0.5) - 10.175D) (type: double), power(((_col3 - ((_col4 * _col4) / _col5)) / _col5), 0.5) (type: double), (power(((_col0 - ((_col1 * _col1) / _col2)) / CASE WHEN ((_col2 = 1L)) THEN (null) ELSE ((_col2 - 1)) END), 0.5) * (power(((_col0 - ((_col1 * _col1) / _col2)) / CASE WHEN ((_col2 = 1L)) THEN (null) ELSE ((_col2 - 1)) END), 0.5) - 10.175D)) (type: double), (- power(((_col3 - ((_col4 * _col4) / _col5)) / _col5), 0.5)) (type: double), (power(((_col0 - ((_col1 * _col1) / _col2)) / CASE WHEN ((_col2 = 1L)) THEN (null) ELSE ((_col2 - 1)) END), 0.5) % 79.553D) (type: double), (- (power(((_col0 - ((_col1 * _col1) / _col2)) / CASE WHEN ((_col2 = 1L)) THEN (null) ELSE ((_col2 - 1)) END), 0.5) * (power(((_col0 - ((_col1 * _col1
) / _col2)) / CASE WHEN ((_col2 = 1L)) THEN (null) ELSE ((_col2 - 1)) END), 0.5) - 10.175D))) (type: double), power(((_col6 - ((_col7 * _col7) / _col8)) / CASE WHEN ((_col8 = 1L)) THEN (null) ELSE ((_col8 - 1)) END), 0.5) (type: double), (- power(((_col0 - ((_col1 * _col1) / _col2)) / CASE WHEN ((_col2 = 1L)) THEN (null) ELSE ((_col2 - 1)) END), 0.5)) (type: double), _col9 (type: double), ((- (power(((_col0 - ((_col1 * _col1) / _col2)) / CASE WHEN ((_col2 = 1L)) THEN (null) ELSE ((_col2 - 1)) END), 0.5) * (power(((_col0 - ((_col1 * _col1) / _col2)) / CASE WHEN ((_col2 = 1L)) THEN (null) ELSE ((_col2 - 1)) END), 0.5) - 10.175D))) / (power(((_col0 - ((_col1 * _col1) / _col2)) / CASE WHEN ((_col2 = 1L)) THEN (null) ELSE ((_col2 - 1)) END), 0.5) - 10.175D)) (type: double), (- (power(((_col0 - ((_col1 * _col1) / _col2)) / CASE WHEN ((_col2 = 1L)) THEN (null) ELSE ((_col2 - 1)) END), 0.5) - 10.175D)) (type: double), (_col10 / _col11) (type: double), (-3728.0D - power(((_col0 - ((_col1 * _
col1) / _col2)) / CASE WHEN ((_col2 = 1L)) THEN (null) ELSE ((_col2 - 1)) END), 0.5)) (type: double), power(((_col12 - ((_col13 * _col13) / _col11)) / _col11), 0.5) (type: double), ((_col10 / _col11) / power(((_col6 - ((_col7 * _col7) / _col8)) / CASE WHEN ((_col8 = 1L)) THEN (null) ELSE ((_col8 - 1)) END), 0.5)) (type: double)
outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8, _col9, _col10, _col11, _col12, _col13, _col14, _col15
- Statistics: Num rows: 1 Data size: 404 Basic stats: COMPLETE Column stats: NONE
+ Select Vectorization:
+ className: VectorSelectOperator
+ native: true
+ projectedOutputColumnNums: [14, 19, 15, 23, 26, 29, 22, 32, 40, 9, 43, 35, 46, 54, 53, 59]
+ selectExpressions: FuncPowerDoubleToDouble(col 15:double)(children: DoubleColDivideLongColumn(col 14:double, col 18:bigint)(children: DoubleColSubtractDoubleColumn(col 0:double, col 15:double)(children: DoubleColDivideLongColumn(col 14:double, col 2:bigint)(children: DoubleColMultiplyDoubleColumn(col 1:double, col 1:double) -> 14:double) -> 15:double) -> 14:double, IfExprNullCondExpr(col 16:boolean, null, col 17:bigint)(children: LongColEqualLongScalar(col 2:bigint, val 1) -> 16:boolean, LongColSubtractLongScalar(col 2:bigint, val 1) -> 17:bigint) -> 18:bigint) -> 15:double) -> 14:double, DoubleColSubtractDoubleScalar(col 15:double, val 10.175)(children: FuncPowerDoubleToDouble(col 19:double)(children: DoubleColDivideLongColumn(col 15:double, col 21:bigint)(children: DoubleColSubtractDoubleColumn(col 0:double, col 19:double)(children: DoubleColDivideLongColumn(col 15:double, col 2:bigint)(children: DoubleColMultiplyDoubleColumn(col 1:double, col 1:double) -> 15
:double) -> 19:double) -> 15:double, IfExprNullCondExpr(col 18:boolean, null, col 20:bigint)(children: LongColEqualLongScalar(col 2:bigint, val 1) -> 18:boolean, LongColSubtractLongScalar(col 2:bigint, val 1) -> 20:bigint) -> 21:bigint) -> 19:double) -> 15:double) -> 19:double, FuncPowerDoubleToDouble(col 22:double)(children: DoubleColDivideLongColumn(col 15:double, col 5:bigint)(children: DoubleColSubtractDoubleColumn(col 3:double, col 22:double)(children: DoubleColDivideLongColumn(col 15:double, col 5:bigint)(children: DoubleColMultiplyDoubleColumn(col 4:double, col 4:double) -> 15:double) -> 22:double) -> 15:double) -> 22:double) -> 15:double, DoubleColMultiplyDoubleColumn(col 22:double, col 26:double)(children: FuncPowerDoubleToDouble(col 23:double)(children: DoubleColDivideLongColumn(col 22:double, col 25:bigint)(children: DoubleColSubtractDoubleColumn(col 0:double, col 23:double)(children: DoubleColDivideLongColumn(col 22:double, col 2:bigint)(children: DoubleColMultiplyDouble
Column(col 1:double, col 1:double) -> 22:double) -> 23:double) -> 22:double, IfExprNullCondExpr(col 21:boolean, null, col 24:bigint)(children: LongColEqualLongScalar(col 2:bigint, val 1) -> 21:boolean, LongColSubtractLongScalar(col 2:bigint, val 1) -> 24:bigint) -> 25:bigint) -> 23:double) -> 22:double, DoubleColSubtractDoubleScalar(col 23:double, val 10.175)(children: FuncPowerDoubleToDouble(col 26:double)(children: DoubleColDivideLongColumn(col 23:double, col 28:bigint)(children: DoubleColSubtractDoubleColumn(col 0:double, col 26:double)(children: DoubleColDivideLongColumn(col 23:double, col 2:bigint)(children: DoubleColMultiplyDoubleColumn(col 1:double, col 1:double) -> 23:double) -> 26:double) -> 23:double, IfExprNullCondExpr(col 25:boolean, null, col 27:bigint)(children: LongColEqualLongScalar(col 2:bigint, val 1) -> 25:boolean, LongColSubtractLongScalar(col 2:bigint, val 1) -> 27:bigint) -> 28:bigint) -> 26:double) -> 23:double) -> 26:double) -> 23:double, DoubleColUnaryMinus(
col 22:double)(children: FuncPowerDoubleToDouble(col 26:double)(children: DoubleColDivideLongColumn(col 22:double, col 5:bigint)(children: DoubleColSubtractDoubleColumn(col 3:double, col 26:double)(children: DoubleColDivideLongColumn(col 22:double, col 5:bigint)(children: DoubleColMultiplyDoubleColumn(col 4:double, col 4:double) -> 22:double) -> 26:double) -> 22:double) -> 26:double) -> 22:double) -> 26:double, DoubleColModuloDoubleScalar(col 22:double, val 79.553)(children: FuncPowerDoubleToDouble(col 29:double)(children: DoubleColDivideLongColumn(col 22:double, col 31:bigint)(children: DoubleColSubtractDoubleColumn(col 0:double, col 29:double)(children: DoubleColDivideLongColumn(col 22:double, col 2:bigint)(children: DoubleColMultiplyDoubleColumn(col 1:double, col 1:double) -> 22:double) -> 29:double) -> 22:double, IfExprNullCondExpr(col 28:boolean, null, col 30:bigint)(children: LongColEqualLongScalar(col 2:bigint, val 1) -> 28:boolean, LongColSubtractLongScalar(col 2:bigint, val
1) -> 30:bigint) -> 31:bigint) -> 29:double) -> 22:double) -> 29:double, DoubleColUnaryMinus(col 32:double)(children: DoubleColMultiplyDoubleColumn(col 22:double, col 35:double)(children: FuncPowerDoubleToDouble(col 32:double)(children: DoubleColDivideLongColumn(col 22:double, col 34:bigint)(children: DoubleColSubtractDoubleColumn(col 0:double, col 32:double)(children: DoubleColDivideLongColumn(col 22:double, col 2:bigint)(children: DoubleColMultiplyDoubleColumn(col 1:double, col 1:double) -> 22:double) -> 32:double) -> 22:double, IfExprNullCondExpr(col 31:boolean, null, col 33:bigint)(children: LongColEqualLongScalar(col 2:bigint, val 1) -> 31:boolean, LongColSubtractLongScalar(col 2:bigint, val 1) -> 33:bigint) -> 34:bigint) -> 32:double) -> 22:double, DoubleColSubtractDoubleScalar(col 32:double, val 10.175)(children: FuncPowerDoubleToDouble(col 35:double)(children: DoubleColDivideLongColumn(col 32:double, col 37:bigint)(children: DoubleColSubtractDoubleColumn(col 0:double, col 3
5:double)(children: DoubleColDivideLongColumn(col 32:double, col 2:bigint)(children: DoubleColMultiplyDoubleColumn(col 1:double, col 1:double) -> 32:double) -> 35:double) -> 32:double, IfExprNullCondExpr(col 34:boolean, null, col 36:bigint)(children: LongColEqualLongScalar(col 2:bigint, val 1) -> 34:boolean, LongColSubtractLongScalar(col 2:bigint, val 1) -> 36:bigint) -> 37:bigint) -> 35:double) -> 32:double) -> 35:double) -> 32:double) -> 22:double, FuncPowerDoubleToDouble(col 35:double)(children: DoubleColDivideLongColumn(col 32:double, col 39:bigint)(children: DoubleColSubtractDoubleColumn(col 6:double, col 35:double)(children: DoubleColDivideLongColumn(col 32:double, col 8:bigint)(children: DoubleColMultiplyDoubleColumn(col 7:double, col 7:double) -> 32:double) -> 35:double) -> 32:double, IfExprNullCondExpr(col 37:boolean, null, col 38:bigint)(children: LongColEqualLongScalar(col 8:bigint, val 1) -> 37:boolean, LongColSubtractLongScalar(col 8:bigint, val 1) -> 38:bigint) -> 39:b
igint) -> 35:double) -> 32:double, DoubleColUnaryMinus(col 35:double)(children: FuncPowerDoubleToDouble(col 40:double)(children: DoubleColDivideLongColumn(col 35:double, col 42:bigint)(children: DoubleColSubtractDoubleColumn(col 0:double, col 40:double)(children: DoubleColDivideLongColumn(col 35:double, col 2:bigint)(children: DoubleColMultiplyDoubleColumn(col 1:double, col 1:double) -> 35:double) -> 40:double) -> 35:double, IfExprNullCondExpr(col 39:boolean, null, col 41:bigint)(children: LongColEqualLongScalar(col 2:bigint, val 1) -> 39:boolean, LongColSubtractLongScalar(col 2:bigint, val 1) -> 41:bigint) -> 42:bigint) -> 40:double) -> 35:double) -> 40:double, DoubleColDivideDoubleColumn(col 35:double, col 46:double)(children: DoubleColUnaryMinus(col 43:double)(children: DoubleColMultiplyDoubleColumn(col 35:double, col 46:double)(children: FuncPowerDoubleToDouble(col 43:double)(children: DoubleColDivideLongColumn(col 35:double, col 45:bigint)(children: DoubleColSubtractDoubleColum
n(col 0:double, col 43:double)(children: DoubleColDivideLongColumn(col 35:double, col 2:bigint)(children: DoubleColMultiplyDoubleColumn(col 1:double, col 1:double) -> 35:double) -> 43:double) -> 35:double, IfExprNullCondExpr(col 42:boolean, null, col 44:bigint)(children: LongColEqualLongScalar(col 2:bigint, val 1) -> 42:boolean, LongColSubtractLongScalar(col 2:bigint, val 1) -> 44:bigint) -> 45:bigint) -> 43:double) -> 35:double, DoubleColSubtractDoubleScalar(col 43:double, val 10.175)(children: FuncPowerDoubleToDouble(col 46:double)(children: DoubleColDivideLongColumn(col 43:double, col 48:bigint)(children: DoubleColSubtractDoubleColumn(col 0:double, col 46:double)(children: DoubleColDivideLongColumn(col 43:double, col 2:bigint)(children: DoubleColMultiplyDoubleColumn(col 1:double, col 1:double) -> 43:double) -> 46:double) -> 43:double, IfExprNullCondExpr(col 45:boolean, null, col 47:bigint)(children: LongColEqualLongScalar(col 2:bigint, val 1) -> 45:boolean, LongColSubtractLongSca
lar(col 2:bigint, val 1) -> 47:bigint) -> 48:bigint) -> 46:double) -> 43:double) -> 46:double) -> 43:double) -> 35:double, DoubleColSubtractDoubleScalar(col 43:double, val 10.175)(children: FuncPowerDoubleToDouble(col 46:double)(children: DoubleColDivideLongColumn(col 43:double, col 50:bigint)(children: DoubleColSubtractDoubleColumn(col 0:double, col 46:double)(children: DoubleColDivideLongColumn(col 43:double, col 2:bigint)(children: DoubleColMultiplyDoubleColumn(col 1:double, col 1:double) -> 43:double) -> 46:double) -> 43:double, IfExprNullCondExpr(col 48:boolean, null, col 49:bigint)(children: LongColEqualLongScalar(col 2:bigint, val 1) -> 48:boolean, LongColSubtractLongScalar(col 2:bigint, val 1) -> 49:bigint) -> 50:bigint) -> 46:double) -> 43:double) -> 46:double) -> 43:double, DoubleColUnaryMinus(col 46:double)(children: DoubleColSubtractDoubleScalar(col 35:double, val 10.175)(children: FuncPowerDoubleToDouble(col 46:double)(children: DoubleColDivideLongColumn(col 35:double,
col 52:bigint)(children: DoubleColSubtractDoubleColumn(col 0:double, col 46:double)(children: DoubleColDivideLongColumn(col 35:double, col 2:bigint)(children: DoubleColMultiplyDoubleColumn(col 1:double, col 1:double) -> 35:double) -> 46:double) -> 35:double, IfExprNullCondExpr(col 50:boolean, null, col 51:bigint)(children: LongColEqualLongScalar(col 2:bigint, val 1) -> 50:boolean, LongColSubtractLongScalar(col 2:bigint, val 1) -> 51:bigint) -> 52:bigint) -> 46:double) -> 35:double) -> 46:double) -> 35:double, LongColDivideLongColumn(col 10:bigint, col 11:bigint) -> 46:double, DoubleScalarSubtractDoubleColumn(val -3728.0, col 53:double)(children: FuncPowerDoubleToDouble(col 54:double)(children: DoubleColDivideLongColumn(col 53:double, col 56:bigint)(children: DoubleColSubtractDoubleColumn(col 0:double, col 54:double)(children: DoubleColDivideLongColumn(col 53:double, col 2:bigint)(children: DoubleColMultiplyDoubleColumn(col 1:double, col 1:double) -> 53:double) -> 54:double) -> 53:do
uble, IfExprNullCondExpr(col 52:boolean, null, col 55:bigint)(children: LongColEqualLongScalar(col 2:bigint, val 1) -> 52:boolean, LongColSubtractLongScalar(col 2:bigint, val 1) -> 55:bigint) -> 56:bigint) -> 54:double) -> 53:double) -> 54:double, FuncPowerDoubleToDouble(col 57:double)(children: DoubleColDivideLongColumn(col 53:double, col 11:bigint)(children: DoubleColSubtractDoubleColumn(col 12:double, col 57:double)(children: DoubleColDivideLongColumn(col 53:double, col 11:bigint)(children: DoubleColMultiplyDoubleColumn(col 13:double, col 13:double) -> 53:double) -> 57:double) -> 53:double) -> 57:double) -> 53:double, DoubleColDivideDoubleColumn(col 57:double, col 58:double)(children: LongColDivideLongColumn(col 10:bigint, col 11:bigint) -> 57:double, FuncPowerDoubleToDouble(col 59:double)(children: DoubleColDivideLongColumn(col 58:double, col 61:bigint)(children: DoubleColSubtractDoubleColumn(col 6:double, col 59:double)(children: DoubleColDivideLongColumn(col 58:double, col 8:b
igint)(children: DoubleColMultiplyDoubleColumn(col 7:double, col 7:double) -> 58:double) -> 59:double) -> 58:double, IfExprNullCondExpr(col 56:boolean, null, col 60:bigint)(children: LongColEqualLongScalar(col 8:bigint, val 1) -> 56:boolean, LongColSubtractLongScalar(col 8:bigint, val 1) -> 60:bigint) -> 61:bigint) -> 59:double) -> 58:double) -> 59:double
+ Statistics: Num rows: 1 Data size: 112 Basic stats: COMPLETE Column stats: NONE
File Output Operator
compressed: false
- Statistics: Num rows: 1 Data size: 404 Basic stats: COMPLETE Column stats: NONE
+ File Sink Vectorization:
+ className: VectorFileSinkOperator
+ native: false
+ Statistics: Num rows: 1 Data size: 112 Basic stats: COMPLETE Column stats: NONE
table:
input format: org.apache.hadoop.mapred.SequenceFileInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
@@ -208,4 +233,4 @@ WHERE (((cint <= cfloat)
POSTHOOK: type: QUERY
POSTHOOK: Input: default@alltypesparquet
#### A masked pattern was here ####
-0.0 -10.175 34.287285216637066 -0.0 -34.287285216637066 0.0 0.0 34.34690095515641 -0.0 197.89499950408936 -0.0 10.175 NULL -3728.0 NULL NULL
+0.0 -10.175 34.287285216637066 -0.0 -34.287285216637066 0.0 0.0 34.3469009551564 -0.0 197.89499950408936 -0.0 10.175 NULL -3728.0 NULL NULL
http://git-wip-us.apache.org/repos/asf/hive/blob/5cb8867b/ql/src/test/results/clientpositive/spark/parquet_vectorization_4.q.out
----------------------------------------------------------------------
diff --git a/ql/src/test/results/clientpositive/spark/parquet_vectorization_4.q.out b/ql/src/test/results/clientpositive/spark/parquet_vectorization_4.q.out
index 4a7b0e0..c3b5392 100644
--- a/ql/src/test/results/clientpositive/spark/parquet_vectorization_4.q.out
+++ b/ql/src/test/results/clientpositive/spark/parquet_vectorization_4.q.out
@@ -75,17 +75,18 @@ STAGE PLANS:
predicate: (((UDFToInteger(ctinyint) <= -89010) and (cdouble > 79.553D)) or ((cbigint <> -563L) and ((UDFToLong(ctinyint) <> cbigint) or (cdouble <= -3728.0D))) or (UDFToInteger(csmallint) >= cint)) (type: boolean)
Statistics: Num rows: 12288 Data size: 147456 Basic stats: COMPLETE Column stats: NONE
Select Operator
- expressions: ctinyint (type: tinyint), cint (type: int), cdouble (type: double)
- outputColumnNames: ctinyint, cint, cdouble
+ expressions: cint (type: int), cdouble (type: double), ctinyint (type: tinyint), (cdouble * cdouble) (type: double)
+ outputColumnNames: _col0, _col1, _col2, _col3
Select Vectorization:
className: VectorSelectOperator
native: true
- projectedOutputColumnNums: [0, 2, 5]
+ projectedOutputColumnNums: [2, 5, 0, 13]
+ selectExpressions: DoubleColMultiplyDoubleColumn(col 5:double, col 5:double) -> 13:double
Statistics: Num rows: 12288 Data size: 147456 Basic stats: COMPLETE Column stats: NONE
Group By Operator
- aggregations: sum(cint), stddev_pop(cdouble), avg(cdouble), var_pop(cdouble), min(ctinyint)
+ aggregations: sum(_col0), sum(_col3), sum(_col1), count(_col1), min(_col2)
Group By Vectorization:
- aggregators: VectorUDAFSumLong(col 2:int) -> bigint, VectorUDAFVarDouble(col 5:double) -> struct<count:bigint,sum:double,variance:double> aggregation: stddev_pop, VectorUDAFAvgDouble(col 5:double) -> struct<count:bigint,sum:double,input:double>, VectorUDAFVarDouble(col 5:double) -> struct<count:bigint,sum:double,variance:double> aggregation: var_pop, VectorUDAFMinLong(col 0:tinyint) -> tinyint
+ aggregators: VectorUDAFSumLong(col 2:int) -> bigint, VectorUDAFSumDouble(col 13:double) -> double, VectorUDAFSumDouble(col 5:double) -> double, VectorUDAFCount(col 5:double) -> bigint, VectorUDAFMinLong(col 0:tinyint) -> tinyint
className: VectorGroupByOperator
groupByMode: HASH
native: false
@@ -93,7 +94,7 @@ STAGE PLANS:
projectedOutputColumnNums: [0, 1, 2, 3, 4]
mode: hash
outputColumnNames: _col0, _col1, _col2, _col3, _col4
- Statistics: Num rows: 1 Data size: 252 Basic stats: COMPLETE Column stats: NONE
+ Statistics: Num rows: 1 Data size: 36 Basic stats: COMPLETE Column stats: NONE
Reduce Output Operator
sort order:
Reduce Sink Vectorization:
@@ -102,8 +103,8 @@ STAGE PLANS:
native: true
nativeConditionsMet: hive.vectorized.execution.reducesink.new.enabled IS true, hive.execution.engine spark IN [tez, spark] IS true, No PTF TopN IS true, No DISTINCT columns IS true, BinarySortableSerDe for keys IS true, LazyBinarySerDe for values IS true
valueColumnNums: [0, 1, 2, 3, 4]
- Statistics: Num rows: 1 Data size: 252 Basic stats: COMPLETE Column stats: NONE
- value expressions: _col0 (type: bigint), _col1 (type: struct<count:bigint,sum:double,variance:double>), _col2 (type: struct<count:bigint,sum:double,input:double>), _col3 (type: struct<count:bigint,sum:double,variance:double>), _col4 (type: tinyint)
+ Statistics: Num rows: 1 Data size: 36 Basic stats: COMPLETE Column stats: NONE
+ value expressions: _col0 (type: bigint), _col1 (type: double), _col2 (type: double), _col3 (type: bigint), _col4 (type: tinyint)
Execution mode: vectorized
Map Vectorization:
enabled: true
@@ -119,26 +120,50 @@ STAGE PLANS:
includeColumns: [0, 1, 2, 3, 5]
dataColumns: ctinyint:tinyint, csmallint:smallint, cint:int, cbigint:bigint, cfloat:float, cdouble:double, cstring1:string, cstring2:string, ctimestamp1:timestamp, ctimestamp2:timestamp, cboolean1:boolean, cboolean2:boolean
partitionColumnCount: 0
- scratchColumnTypeNames: []
+ scratchColumnTypeNames: [double]
Reducer 2
+ Execution mode: vectorized
Reduce Vectorization:
enabled: true
enableConditionsMet: hive.vectorized.execution.reduce.enabled IS true, hive.execution.engine spark IN [tez, spark] IS true
- notVectorizedReason: GROUPBY operator: Vector aggregation : "stddev_pop" for input type: "STRUCT" and output type: "DOUBLE" and mode: FINAL not supported for evaluator GenericUDAFStdEvaluator
- vectorized: false
+ reduceColumnNullOrder:
+ reduceColumnSortOrder:
+ allNative: false
+ usesVectorUDFAdaptor: false
+ vectorized: true
+ rowBatchContext:
+ dataColumnCount: 5
+ dataColumns: VALUE._col0:bigint, VALUE._col1:double, VALUE._col2:double, VALUE._col3:bigint, VALUE._col4:tinyint
+ partitionColumnCount: 0
+ scratchColumnTypeNames: []
Reduce Operator Tree:
Group By Operator
- aggregations: sum(VALUE._col0), stddev_pop(VALUE._col1), avg(VALUE._col2), var_pop(VALUE._col3), min(VALUE._col4)
+ aggregations: sum(VALUE._col0), sum(VALUE._col1), sum(VALUE._col2), count(VALUE._col3), min(VALUE._col4)
+ Group By Vectorization:
+ aggregators: VectorUDAFSumLong(col 0:bigint) -> bigint, VectorUDAFSumDouble(col 1:double) -> double, VectorUDAFSumDouble(col 2:double) -> double, VectorUDAFCountMerge(col 3:bigint) -> bigint, VectorUDAFMinLong(col 4:tinyint) -> tinyint
+ className: VectorGroupByOperator
+ groupByMode: MERGEPARTIAL
+ native: false
+ vectorProcessingMode: GLOBAL
+ projectedOutputColumnNums: [0, 1, 2, 3, 4]
mode: mergepartial
outputColumnNames: _col0, _col1, _col2, _col3, _col4
- Statistics: Num rows: 1 Data size: 252 Basic stats: COMPLETE Column stats: NONE
+ Statistics: Num rows: 1 Data size: 36 Basic stats: COMPLETE Column stats: NONE
Select Operator
- expressions: _col0 (type: bigint), (_col0 * -563L) (type: bigint), (-3728L + _col0) (type: bigint), _col1 (type: double), (- _col1) (type: double), _col2 (type: double), ((_col0 * -563L) % _col0) (type: bigint), (UDFToDouble(((_col0 * -563L) % _col0)) / _col2) (type: double), _col3 (type: double), (- (UDFToDouble(((_col0 * -563L) % _col0)) / _col2)) (type: double), ((-3728L + _col0) - (_col0 * -563L)) (type: bigint), _col4 (type: tinyint), _col4 (type: tinyint), (UDFToDouble(_col4) * (- (UDFToDouble(((_col0 * -563L) % _col0)) / _col2))) (type: double)
+ expressions: _col0 (type: bigint), (_col0 * -563L) (type: bigint), (-3728L + _col0) (type: bigint), power(((_col1 - ((_col2 * _col2) / _col3)) / _col3), 0.5) (type: double), (- power(((_col1 - ((_col2 * _col2) / _col3)) / _col3), 0.5)) (type: double), (_col2 / _col3) (type: double), ((_col0 * -563L) % _col0) (type: bigint), (UDFToDouble(((_col0 * -563L) % _col0)) / (_col2 / _col3)) (type: double), ((_col1 - ((_col2 * _col2) / _col3)) / _col3) (type: double), (- (UDFToDouble(((_col0 * -563L) % _col0)) / (_col2 / _col3))) (type: double), ((-3728L + _col0) - (_col0 * -563L)) (type: bigint), _col4 (type: tinyint), _col4 (type: tinyint), (UDFToDouble(_col4) * (- (UDFToDouble(((_col0 * -563L) % _col0)) / (_col2 / _col3)))) (type: double)
outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8, _col9, _col10, _col11, _col12, _col13
- Statistics: Num rows: 1 Data size: 252 Basic stats: COMPLETE Column stats: NONE
+ Select Vectorization:
+ className: VectorSelectOperator
+ native: true
+ projectedOutputColumnNums: [0, 5, 6, 7, 9, 8, 11, 15, 14, 13, 18, 4, 4, 19]
+ selectExpressions: LongColMultiplyLongScalar(col 0:bigint, val -563) -> 5:bigint, LongScalarAddLongColumn(val -3728, col 0:bigint) -> 6:bigint, FuncPowerDoubleToDouble(col 8:double)(children: DoubleColDivideLongColumn(col 7:double, col 3:bigint)(children: DoubleColSubtractDoubleColumn(col 1:double, col 8:double)(children: DoubleColDivideLongColumn(col 7:double, col 3:bigint)(children: DoubleColMultiplyDoubleColumn(col 2:double, col 2:double) -> 7:double) -> 8:double) -> 7:double) -> 8:double) -> 7:double, DoubleColUnaryMinus(col 8:double)(children: FuncPowerDoubleToDouble(col 9:double)(children: DoubleColDivideLongColumn(col 8:double, col 3:bigint)(children: DoubleColSubtractDoubleColumn(col 1:double, col 9:double)(children: DoubleColDivideLongColumn(col 8:double, col 3:bigint)(children: DoubleColMultiplyDoubleColumn(col 2:double, col 2:double) -> 8:double) -> 9:double) -> 8:double) -> 9:double) -> 8:double) -> 9:double, DoubleColDivideLongColumn(col 2:double,
col 3:bigint) -> 8:double, LongColModuloLongColumn(col 10:bigint, col 0:bigint)(children: LongColMultiplyLongScalar(col 0:bigint, val -563) -> 10:bigint) -> 11:bigint, DoubleColDivideDoubleColumn(col 13:double, col 14:double)(children: CastLongToDouble(col 12:bigint)(children: LongColModuloLongColumn(col 10:bigint, col 0:bigint)(children: LongColMultiplyLongScalar(col 0:bigint, val -563) -> 10:bigint) -> 12:bigint) -> 13:double, DoubleColDivideLongColumn(col 2:double, col 3:bigint) -> 14:double) -> 15:double, DoubleColDivideLongColumn(col 13:double, col 3:bigint)(children: DoubleColSubtractDoubleColumn(col 1:double, col 14:double)(children: DoubleColDivideLongColumn(col 13:double, col 3:bigint)(children: DoubleColMultiplyDoubleColumn(col 2:double, col 2:double) -> 13:double) -> 14:double) -> 13:double) -> 14:double, DoubleColUnaryMinus(col 17:double)(children: DoubleColDivideDoubleColumn(col 13:double, col 16:double)(children: CastLongToDouble(col 12:bigint)(children: LongColModuloL
ongColumn(col 10:bigint, col 0:bigint)(children: LongColMultiplyLongScalar(col 0:bigint, val -563) -> 10:bigint) -> 12:bigint) -> 13:double, DoubleColDivideLongColumn(col 2:double, col 3:bigint) -> 16:double) -> 17:double) -> 13:double, LongColSubtractLongColumn(col 10:bigint, col 12:bigint)(children: LongScalarAddLongColumn(val -3728, col 0:bigint) -> 10:bigint, LongColMultiplyLongScalar(col 0:bigint, val -563) -> 12:bigint) -> 18:bigint, DoubleColMultiplyDoubleColumn(col 16:double, col 17:double)(children: CastLongToDouble(col 4:tinyint) -> 16:double, DoubleColUnaryMinus(col 20:double)(children: DoubleColDivideDoubleColumn(col 17:double, col 19:double)(children: CastLongToDouble(col 12:bigint)(children: LongColModuloLongColumn(col 10:bigint, col 0:bigint)(children: LongColMultiplyLongScalar(col 0:bigint, val -563) -> 10:bigint) -> 12:bigint) -> 17:double, DoubleColDivideLongColumn(col 2:double, col 3:bigint) -> 19:double) -> 20:double) -> 17:double) -> 19:double
+ Statistics: Num rows: 1 Data size: 36 Basic stats: COMPLETE Column stats: NONE
File Output Operator
compressed: false
- Statistics: Num rows: 1 Data size: 252 Basic stats: COMPLETE Column stats: NONE
+ File Sink Vectorization:
+ className: VectorFileSinkOperator
+ native: false
+ Statistics: Num rows: 1 Data size: 36 Basic stats: COMPLETE Column stats: NONE
table:
input format: org.apache.hadoop.mapred.SequenceFileInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
@@ -198,4 +223,4 @@ WHERE (((csmallint >= cint)
POSTHOOK: type: QUERY
POSTHOOK: Input: default@alltypesparquet
#### A masked pattern was here ####
--493101012745 277615870175435 -493101016473 136727.7868296355 -136727.7868296355 2298.5515807767374 0 0.0 1.8694487691330246E10 -0.0 -278108971191908 -64 -64 0.0
+-493101012745 277615870175435 -493101016473 136727.78682963562 -136727.78682963562 2298.5515807767374 0 0.0 1.8694487691330276E10 -0.0 -278108971191908 -64 -64 0.0
http://git-wip-us.apache.org/repos/asf/hive/blob/5cb8867b/ql/src/test/results/clientpositive/spark/parquet_vectorization_9.q.out
----------------------------------------------------------------------
diff --git a/ql/src/test/results/clientpositive/spark/parquet_vectorization_9.q.out b/ql/src/test/results/clientpositive/spark/parquet_vectorization_9.q.out
index a35c9c5..303702c 100644
--- a/ql/src/test/results/clientpositive/spark/parquet_vectorization_9.q.out
+++ b/ql/src/test/results/clientpositive/spark/parquet_vectorization_9.q.out
@@ -69,39 +69,40 @@ STAGE PLANS:
predicate: (((cdouble >= -1.389D) or (cstring1 < 'a')) and (cstring2 like '%b%')) (type: boolean)
Statistics: Num rows: 4096 Data size: 49152 Basic stats: COMPLETE Column stats: NONE
Select Operator
- expressions: cdouble (type: double), cstring1 (type: string), ctimestamp1 (type: timestamp)
- outputColumnNames: cdouble, cstring1, ctimestamp1
+ expressions: cstring1 (type: string), cdouble (type: double), ctimestamp1 (type: timestamp), (cdouble * cdouble) (type: double)
+ outputColumnNames: _col0, _col1, _col2, _col3
Select Vectorization:
className: VectorSelectOperator
native: true
- projectedOutputColumnNums: [5, 6, 8]
+ projectedOutputColumnNums: [6, 5, 8, 13]
+ selectExpressions: DoubleColMultiplyDoubleColumn(col 5:double, col 5:double) -> 13:double
Statistics: Num rows: 4096 Data size: 49152 Basic stats: COMPLETE Column stats: NONE
Group By Operator
- aggregations: count(cdouble), stddev_samp(cdouble), min(cdouble)
+ aggregations: count(_col1), sum(_col3), sum(_col1), min(_col1)
Group By Vectorization:
- aggregators: VectorUDAFCount(col 5:double) -> bigint, VectorUDAFVarDouble(col 5:double) -> struct<count:bigint,sum:double,variance:double> aggregation: stddev_samp, VectorUDAFMinDouble(col 5:double) -> double
+ aggregators: VectorUDAFCount(col 5:double) -> bigint, VectorUDAFSumDouble(col 13:double) -> double, VectorUDAFSumDouble(col 5:double) -> double, VectorUDAFMinDouble(col 5:double) -> double
className: VectorGroupByOperator
groupByMode: HASH
- keyExpressions: col 5:double, col 6:string, col 8:timestamp
+ keyExpressions: col 6:string, col 5:double, col 8:timestamp
native: false
vectorProcessingMode: HASH
- projectedOutputColumnNums: [0, 1, 2]
- keys: cdouble (type: double), cstring1 (type: string), ctimestamp1 (type: timestamp)
+ projectedOutputColumnNums: [0, 1, 2, 3]
+ keys: _col0 (type: string), _col1 (type: double), _col2 (type: timestamp)
mode: hash
- outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5
+ outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6
Statistics: Num rows: 4096 Data size: 49152 Basic stats: COMPLETE Column stats: NONE
Reduce Output Operator
- key expressions: _col0 (type: double), _col1 (type: string), _col2 (type: timestamp)
+ key expressions: _col0 (type: string), _col1 (type: double), _col2 (type: timestamp)
sort order: +++
- Map-reduce partition columns: _col0 (type: double), _col1 (type: string), _col2 (type: timestamp)
+ Map-reduce partition columns: _col0 (type: string), _col1 (type: double), _col2 (type: timestamp)
Reduce Sink Vectorization:
className: VectorReduceSinkMultiKeyOperator
keyColumnNums: [0, 1, 2]
native: true
nativeConditionsMet: hive.vectorized.execution.reducesink.new.enabled IS true, hive.execution.engine spark IN [tez, spark] IS true, No PTF TopN IS true, No DISTINCT columns IS true, BinarySortableSerDe for keys IS true, LazyBinarySerDe for values IS true
- valueColumnNums: [3, 4, 5]
+ valueColumnNums: [3, 4, 5, 6]
Statistics: Num rows: 4096 Data size: 49152 Basic stats: COMPLETE Column stats: NONE
- value expressions: _col3 (type: bigint), _col4 (type: struct<count:bigint,sum:double,variance:double>), _col5 (type: double)
+ value expressions: _col3 (type: bigint), _col4 (type: double), _col5 (type: double), _col6 (type: double)
Execution mode: vectorized
Map Vectorization:
enabled: true
@@ -117,26 +118,51 @@ STAGE PLANS:
includeColumns: [5, 6, 7, 8]
dataColumns: ctinyint:tinyint, csmallint:smallint, cint:int, cbigint:bigint, cfloat:float, cdouble:double, cstring1:string, cstring2:string, ctimestamp1:timestamp, ctimestamp2:timestamp, cboolean1:boolean, cboolean2:boolean
partitionColumnCount: 0
- scratchColumnTypeNames: []
+ scratchColumnTypeNames: [double]
Reducer 2
+ Execution mode: vectorized
Reduce Vectorization:
enabled: true
enableConditionsMet: hive.vectorized.execution.reduce.enabled IS true, hive.execution.engine spark IN [tez, spark] IS true
- notVectorizedReason: GROUPBY operator: Vector aggregation : "stddev_samp" for input type: "STRUCT" and output type: "DOUBLE" and mode: FINAL not supported for evaluator GenericUDAFStdSampleEvaluator
- vectorized: false
+ reduceColumnNullOrder: aaa
+ reduceColumnSortOrder: +++
+ allNative: false
+ usesVectorUDFAdaptor: false
+ vectorized: true
+ rowBatchContext:
+ dataColumnCount: 7
+ dataColumns: KEY._col0:string, KEY._col1:double, KEY._col2:timestamp, VALUE._col0:bigint, VALUE._col1:double, VALUE._col2:double, VALUE._col3:double
+ partitionColumnCount: 0
+ scratchColumnTypeNames: []
Reduce Operator Tree:
Group By Operator
- aggregations: count(VALUE._col0), stddev_samp(VALUE._col1), min(VALUE._col2)
- keys: KEY._col0 (type: double), KEY._col1 (type: string), KEY._col2 (type: timestamp)
+ aggregations: count(VALUE._col0), sum(VALUE._col1), sum(VALUE._col2), min(VALUE._col3)
+ Group By Vectorization:
+ aggregators: VectorUDAFCountMerge(col 3:bigint) -> bigint, VectorUDAFSumDouble(col 4:double) -> double, VectorUDAFSumDouble(col 5:double) -> double, VectorUDAFMinDouble(col 6:double) -> double
+ className: VectorGroupByOperator
+ groupByMode: MERGEPARTIAL
+ keyExpressions: col 0:string, col 1:double, col 2:timestamp
+ native: false
+ vectorProcessingMode: MERGE_PARTIAL
+ projectedOutputColumnNums: [0, 1, 2, 3]
+ keys: KEY._col0 (type: string), KEY._col1 (type: double), KEY._col2 (type: timestamp)
mode: mergepartial
- outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5
+ outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6
Statistics: Num rows: 2048 Data size: 24576 Basic stats: COMPLETE Column stats: NONE
Select Operator
- expressions: _col1 (type: string), _col0 (type: double), _col2 (type: timestamp), (_col0 - 9763215.5639D) (type: double), (- (_col0 - 9763215.5639D)) (type: double), _col3 (type: bigint), _col4 (type: double), (- _col4) (type: double), (_col4 * UDFToDouble(_col3)) (type: double), _col5 (type: double), (9763215.5639D / _col0) (type: double), (CAST( _col3 AS decimal(19,0)) / -1.389) (type: decimal(28,6)), _col4 (type: double)
+ expressions: _col0 (type: string), _col1 (type: double), _col2 (type: timestamp), (_col1 - 9763215.5639D) (type: double), (- (_col1 - 9763215.5639D)) (type: double), _col3 (type: bigint), power(((_col4 - ((_col5 * _col5) / _col3)) / CASE WHEN ((_col3 = 1L)) THEN (null) ELSE ((_col3 - 1)) END), 0.5) (type: double), (- power(((_col4 - ((_col5 * _col5) / _col3)) / CASE WHEN ((_col3 = 1L)) THEN (null) ELSE ((_col3 - 1)) END), 0.5)) (type: double), (power(((_col4 - ((_col5 * _col5) / _col3)) / CASE WHEN ((_col3 = 1L)) THEN (null) ELSE ((_col3 - 1)) END), 0.5) * UDFToDouble(_col3)) (type: double), _col6 (type: double), (9763215.5639D / _col1) (type: double), (CAST( _col3 AS decimal(19,0)) / -1.389) (type: decimal(28,6)), power(((_col4 - ((_col5 * _col5) / _col3)) / CASE WHEN ((_col3 = 1L)) THEN (null) ELSE ((_col3 - 1)) END), 0.5) (type: double)
outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8, _col9, _col10, _col11, _col12
+ Select Vectorization:
+ className: VectorSelectOperator
+ native: true
+ projectedOutputColumnNums: [0, 1, 2, 7, 9, 3, 8, 14, 20, 6, 10, 22, 17]
+ selectExpressions: DoubleColSubtractDoubleScalar(col 1:double, val 9763215.5639) -> 7:double, DoubleColUnaryMinus(col 8:double)(children: DoubleColSubtractDoubleScalar(col 1:double, val 9763215.5639) -> 8:double) -> 9:double, FuncPowerDoubleToDouble(col 10:double)(children: DoubleColDivideLongColumn(col 8:double, col 13:bigint)(children: DoubleColSubtractDoubleColumn(col 4:double, col 10:double)(children: DoubleColDivideLongColumn(col 8:double, col 3:bigint)(children: DoubleColMultiplyDoubleColumn(col 5:double, col 5:double) -> 8:double) -> 10:double) -> 8:double, IfExprNullCondExpr(col 11:boolean, null, col 12:bigint)(children: LongColEqualLongScalar(col 3:bigint, val 1) -> 11:boolean, LongColSubtractLongScalar(col 3:bigint, val 1) -> 12:bigint) -> 13:bigint) -> 10:double) -> 8:double, DoubleColUnaryMinus(col 10:double)(children: FuncPowerDoubleToDouble(col 14:double)(children: DoubleColDivideLongColumn(col 10:double, col 16:bigint)(children: DoubleColSubtract
DoubleColumn(col 4:double, col 14:double)(children: DoubleColDivideLongColumn(col 10:double, col 3:bigint)(children: DoubleColMultiplyDoubleColumn(col 5:double, col 5:double) -> 10:double) -> 14:double) -> 10:double, IfExprNullCondExpr(col 13:boolean, null, col 15:bigint)(children: LongColEqualLongScalar(col 3:bigint, val 1) -> 13:boolean, LongColSubtractLongScalar(col 3:bigint, val 1) -> 15:bigint) -> 16:bigint) -> 14:double) -> 10:double) -> 14:double, DoubleColMultiplyDoubleColumn(col 10:double, col 17:double)(children: FuncPowerDoubleToDouble(col 17:double)(children: DoubleColDivideLongColumn(col 10:double, col 19:bigint)(children: DoubleColSubtractDoubleColumn(col 4:double, col 17:double)(children: DoubleColDivideLongColumn(col 10:double, col 3:bigint)(children: DoubleColMultiplyDoubleColumn(col 5:double, col 5:double) -> 10:double) -> 17:double) -> 10:double, IfExprNullCondExpr(col 16:boolean, null, col 18:bigint)(children: LongColEqualLongScalar(col 3:bigint, val 1) -> 16:boo
lean, LongColSubtractLongScalar(col 3:bigint, val 1) -> 18:bigint) -> 19:bigint) -> 17:double) -> 10:double, CastLongToDouble(col 3:bigint) -> 17:double) -> 20:double, DoubleScalarDivideDoubleColumn(val 9763215.5639, col 1:double) -> 10:double, DecimalColDivideDecimalScalar(col 21:decimal(19,0), val -1.389)(children: CastLongToDecimal(col 3:bigint) -> 21:decimal(19,0)) -> 22:decimal(28,6), FuncPowerDoubleToDouble(col 23:double)(children: DoubleColDivideLongColumn(col 17:double, col 25:bigint)(children: DoubleColSubtractDoubleColumn(col 4:double, col 23:double)(children: DoubleColDivideLongColumn(col 17:double, col 3:bigint)(children: DoubleColMultiplyDoubleColumn(col 5:double, col 5:double) -> 17:double) -> 23:double) -> 17:double, IfExprNullCondExpr(col 19:boolean, null, col 24:bigint)(children: LongColEqualLongScalar(col 3:bigint, val 1) -> 19:boolean, LongColSubtractLongScalar(col 3:bigint, val 1) -> 24:bigint) -> 25:bigint) -> 23:double) -> 17:double
Statistics: Num rows: 2048 Data size: 24576 Basic stats: COMPLETE Column stats: NONE
File Output Operator
compressed: false
+ File Sink Vectorization:
+ className: VectorFileSinkOperator
+ native: false
Statistics: Num rows: 2048 Data size: 24576 Basic stats: COMPLETE Column stats: NONE
table:
input format: org.apache.hadoop.mapred.SequenceFileInputFormat
http://git-wip-us.apache.org/repos/asf/hive/blob/5cb8867b/ql/src/test/results/clientpositive/spark/parquet_vectorization_limit.q.out
----------------------------------------------------------------------
diff --git a/ql/src/test/results/clientpositive/spark/parquet_vectorization_limit.q.out b/ql/src/test/results/clientpositive/spark/parquet_vectorization_limit.q.out
index ce188a0..6fd173a 100644
--- a/ql/src/test/results/clientpositive/spark/parquet_vectorization_limit.q.out
+++ b/ql/src/test/results/clientpositive/spark/parquet_vectorization_limit.q.out
@@ -257,18 +257,18 @@ STAGE PLANS:
selectExpressions: DoubleColAddDoubleScalar(col 5:double, val 1.0) -> 13:double
Statistics: Num rows: 12288 Data size: 147456 Basic stats: COMPLETE Column stats: NONE
Group By Operator
- aggregations: avg(_col1)
+ aggregations: sum(_col1), count(_col1)
Group By Vectorization:
- aggregators: VectorUDAFAvgDouble(col 13:double) -> struct<count:bigint,sum:double,input:double>
+ aggregators: VectorUDAFSumDouble(col 13:double) -> double, VectorUDAFCount(col 13:double) -> bigint
className: VectorGroupByOperator
groupByMode: HASH
keyExpressions: col 0:tinyint
native: false
vectorProcessingMode: HASH
- projectedOutputColumnNums: [0]
+ projectedOutputColumnNums: [0, 1]
keys: _col0 (type: tinyint)
mode: hash
- outputColumnNames: _col0, _col1
+ outputColumnNames: _col0, _col1, _col2
Statistics: Num rows: 12288 Data size: 147456 Basic stats: COMPLETE Column stats: NONE
Reduce Output Operator
key expressions: _col0 (type: tinyint)
@@ -280,10 +280,10 @@ STAGE PLANS:
native: true
nativeConditionsMet: hive.vectorized.execution.reducesink.new.enabled IS true, hive.execution.engine spark IN [tez, spark] IS true, No PTF TopN IS true, No DISTINCT columns IS true, BinarySortableSerDe for keys IS true, LazyBinarySerDe for values IS true
partitionColumnNums: [0]
- valueColumnNums: [1]
+ valueColumnNums: [1, 2]
Statistics: Num rows: 12288 Data size: 147456 Basic stats: COMPLETE Column stats: NONE
TopN Hash Memory Usage: 0.3
- value expressions: _col1 (type: struct<count:bigint,sum:double,input:double>)
+ value expressions: _col1 (type: double), _col2 (type: bigint)
Execution mode: vectorized
Map Vectorization:
enabled: true
@@ -311,41 +311,50 @@ STAGE PLANS:
usesVectorUDFAdaptor: false
vectorized: true
rowBatchContext:
- dataColumnCount: 2
- dataColumns: KEY._col0:tinyint, VALUE._col0:struct<count:bigint,sum:double,input:double>
+ dataColumnCount: 3
+ dataColumns: KEY._col0:tinyint, VALUE._col0:double, VALUE._col1:bigint
partitionColumnCount: 0
scratchColumnTypeNames: []
Reduce Operator Tree:
Group By Operator
- aggregations: avg(VALUE._col0)
+ aggregations: sum(VALUE._col0), count(VALUE._col1)
Group By Vectorization:
- aggregators: VectorUDAFAvgFinal(col 1:struct<count:bigint,sum:double,input:double>) -> double
+ aggregators: VectorUDAFSumDouble(col 1:double) -> double, VectorUDAFCountMerge(col 2:bigint) -> bigint
className: VectorGroupByOperator
groupByMode: MERGEPARTIAL
keyExpressions: col 0:tinyint
native: false
vectorProcessingMode: MERGE_PARTIAL
- projectedOutputColumnNums: [0]
+ projectedOutputColumnNums: [0, 1]
keys: KEY._col0 (type: tinyint)
mode: mergepartial
- outputColumnNames: _col0, _col1
+ outputColumnNames: _col0, _col1, _col2
Statistics: Num rows: 6144 Data size: 73728 Basic stats: COMPLETE Column stats: NONE
- Limit
- Number of rows: 20
- Limit Vectorization:
- className: VectorLimitOperator
+ Select Operator
+ expressions: _col0 (type: tinyint), (_col1 / _col2) (type: double)
+ outputColumnNames: _col0, _col1
+ Select Vectorization:
+ className: VectorSelectOperator
native: true
- Statistics: Num rows: 20 Data size: 240 Basic stats: COMPLETE Column stats: NONE
- File Output Operator
- compressed: false
- File Sink Vectorization:
- className: VectorFileSinkOperator
- native: false
+ projectedOutputColumnNums: [0, 3]
+ selectExpressions: DoubleColDivideLongColumn(col 1:double, col 2:bigint) -> 3:double
+ Statistics: Num rows: 6144 Data size: 73728 Basic stats: COMPLETE Column stats: NONE
+ Limit
+ Number of rows: 20
+ Limit Vectorization:
+ className: VectorLimitOperator
+ native: true
Statistics: Num rows: 20 Data size: 240 Basic stats: COMPLETE Column stats: NONE
- table:
- input format: org.apache.hadoop.mapred.SequenceFileInputFormat
- output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
- serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
+ File Output Operator
+ compressed: false
+ File Sink Vectorization:
+ className: VectorFileSinkOperator
+ native: false
+ Statistics: Num rows: 20 Data size: 240 Basic stats: COMPLETE Column stats: NONE
+ table:
+ input format: org.apache.hadoop.mapred.SequenceFileInputFormat
+ output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
+ serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
Stage: Stage-0
Fetch Operator
http://git-wip-us.apache.org/repos/asf/hive/blob/5cb8867b/ql/src/test/results/clientpositive/spark/parquet_vectorization_not.q.out
----------------------------------------------------------------------
diff --git a/ql/src/test/results/clientpositive/spark/parquet_vectorization_not.q.out b/ql/src/test/results/clientpositive/spark/parquet_vectorization_not.q.out
index e581007..e8fa9dd 100644
--- a/ql/src/test/results/clientpositive/spark/parquet_vectorization_not.q.out
+++ b/ql/src/test/results/clientpositive/spark/parquet_vectorization_not.q.out
@@ -55,4 +55,4 @@ WHERE (((cstring2 LIKE '%b%')
POSTHOOK: type: QUERY
POSTHOOK: Input: default@alltypesparquet
#### A masked pattern was here ####
--3.875652215945533E8 3.875652215945533E8 -3.875716535945533E8 1.436387455459401E9 3.875716535945533E8 0.0 2.06347151720204902E18 3.875716535945533E8 3.875652215945533E8 3.875716535945533E8 1.0 10934 -37224.52399241924 1.0517370547117279E9 -2.06347151720204902E18 1.5020929380914048E17 -64 64
+-3.875652215945533E8 3.875652215945533E8 -3.875716535945533E8 1.4363874554593508E9 3.875716535945533E8 0.0 2.06347151720190515E18 3.875716535945533E8 3.875652215945533E8 3.875716535945533E8 1.0 10934 -37224.52399241924 1.051665108770714E9 -2.06347151720190515E18 1.5020929380914048E17 -64 64
http://git-wip-us.apache.org/repos/asf/hive/blob/5cb8867b/ql/src/test/results/clientpositive/spark/parquet_vectorization_pushdown.q.out
----------------------------------------------------------------------
diff --git a/ql/src/test/results/clientpositive/spark/parquet_vectorization_pushdown.q.out b/ql/src/test/results/clientpositive/spark/parquet_vectorization_pushdown.q.out
index a95898f..212a83e 100644
--- a/ql/src/test/results/clientpositive/spark/parquet_vectorization_pushdown.q.out
+++ b/ql/src/test/results/clientpositive/spark/parquet_vectorization_pushdown.q.out
@@ -32,14 +32,14 @@ STAGE PLANS:
outputColumnNames: cbigint
Statistics: Num rows: 4096 Data size: 49152 Basic stats: COMPLETE Column stats: NONE
Group By Operator
- aggregations: avg(cbigint)
+ aggregations: sum(cbigint), count(cbigint)
mode: hash
- outputColumnNames: _col0
- Statistics: Num rows: 1 Data size: 80 Basic stats: COMPLETE Column stats: NONE
+ outputColumnNames: _col0, _col1
+ Statistics: Num rows: 1 Data size: 16 Basic stats: COMPLETE Column stats: NONE
Reduce Output Operator
sort order:
- Statistics: Num rows: 1 Data size: 80 Basic stats: COMPLETE Column stats: NONE
- value expressions: _col0 (type: struct<count:bigint,sum:double,input:bigint>)
+ Statistics: Num rows: 1 Data size: 16 Basic stats: COMPLETE Column stats: NONE
+ value expressions: _col0 (type: bigint), _col1 (type: bigint)
Execution mode: vectorized
Map Vectorization:
enabled: true
@@ -60,17 +60,21 @@ STAGE PLANS:
vectorized: true
Reduce Operator Tree:
Group By Operator
- aggregations: avg(VALUE._col0)
+ aggregations: sum(VALUE._col0), count(VALUE._col1)
mode: mergepartial
- outputColumnNames: _col0
- Statistics: Num rows: 1 Data size: 80 Basic stats: COMPLETE Column stats: NONE
- File Output Operator
- compressed: false
- Statistics: Num rows: 1 Data size: 80 Basic stats: COMPLETE Column stats: NONE
- table:
- input format: org.apache.hadoop.mapred.SequenceFileInputFormat
- output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
- serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
+ outputColumnNames: _col0, _col1
+ Statistics: Num rows: 1 Data size: 16 Basic stats: COMPLETE Column stats: NONE
+ Select Operator
+ expressions: (_col0 / _col1) (type: double)
+ outputColumnNames: _col0
+ Statistics: Num rows: 1 Data size: 16 Basic stats: COMPLETE Column stats: NONE
+ File Output Operator
+ compressed: false
+ Statistics: Num rows: 1 Data size: 16 Basic stats: COMPLETE Column stats: NONE
+ table:
+ input format: org.apache.hadoop.mapred.SequenceFileInputFormat
+ output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
+ serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
Stage: Stage-0
Fetch Operator
http://git-wip-us.apache.org/repos/asf/hive/blob/5cb8867b/ql/src/test/results/clientpositive/spark/spark_explainuser_1.q.out
----------------------------------------------------------------------
diff --git a/ql/src/test/results/clientpositive/spark/spark_explainuser_1.q.out b/ql/src/test/results/clientpositive/spark/spark_explainuser_1.q.out
index 1e5d456..8e4828c 100644
--- a/ql/src/test/results/clientpositive/spark/spark_explainuser_1.q.out
+++ b/ql/src/test/results/clientpositive/spark/spark_explainuser_1.q.out
@@ -2127,42 +2127,42 @@ Stage-0
limit:-1
Stage-1
Reducer 3
- File Output Operator [FS_21]
- Join Operator [JOIN_19] (rows=6 width=227)
+ File Output Operator [FS_22]
+ Join Operator [JOIN_20] (rows=6 width=227)
Output:["_col0","_col1","_col2"],condition map:[{"":"{\"type\":\"Left Semi\",\"left\":0,\"right\":1}"}],keys:{"0":"_col1","1":"_col0"}
<-Reducer 2 [PARTITION-LEVEL SORT]
- PARTITION-LEVEL SORT [RS_17]
+ PARTITION-LEVEL SORT [RS_18]
PartitionCols:_col1
Select Operator [SEL_6] (rows=13 width=227)
Output:["_col0","_col1","_col2"]
- Group By Operator [GBY_5] (rows=13 width=227)
- Output:["_col0","_col1","_col2"],aggregations:["avg(VALUE._col0)"],keys:KEY._col0, KEY._col1
+ Group By Operator [GBY_5] (rows=13 width=235)
+ Output:["_col0","_col1","_col2","_col3"],aggregations:["sum(VALUE._col0)","count(VALUE._col1)"],keys:KEY._col0, KEY._col1
<-Map 1 [GROUP]
GROUP [RS_4]
PartitionCols:_col0, _col1
- Group By Operator [GBY_3] (rows=13 width=295)
- Output:["_col0","_col1","_col2"],aggregations:["avg(p_size)"],keys:p_name, p_mfgr
- Filter Operator [FIL_22] (rows=26 width=223)
+ Group By Operator [GBY_3] (rows=13 width=235)
+ Output:["_col0","_col1","_col2","_col3"],aggregations:["sum(p_size)","count(p_size)"],keys:p_name, p_mfgr
+ Filter Operator [FIL_23] (rows=26 width=223)
predicate:p_name is not null
TableScan [TS_0] (rows=26 width=223)
default@part,part,Tbl:COMPLETE,Col:COMPLETE,Output:["p_name","p_mfgr","p_size"]
<-Reducer 5 [PARTITION-LEVEL SORT]
- PARTITION-LEVEL SORT [RS_18]
+ PARTITION-LEVEL SORT [RS_19]
PartitionCols:_col0
- Group By Operator [GBY_16] (rows=13 width=184)
+ Group By Operator [GBY_17] (rows=13 width=184)
Output:["_col0"],keys:_col0
- Select Operator [SEL_11] (rows=26 width=184)
+ Select Operator [SEL_12] (rows=26 width=184)
Output:["_col0"]
- Filter Operator [FIL_23] (rows=26 width=491)
+ Filter Operator [FIL_24] (rows=26 width=491)
predicate:first_value_window_0 is not null
- PTF Operator [PTF_10] (rows=26 width=491)
+ PTF Operator [PTF_11] (rows=26 width=491)
Function definitions:[{},{"name:":"windowingtablefunction","order by:":"_col5 ASC NULLS FIRST","partition by:":"_col2"}]
- Select Operator [SEL_9] (rows=26 width=491)
+ Select Operator [SEL_10] (rows=26 width=491)
Output:["_col1","_col2","_col5"]
<-Map 4 [PARTITION-LEVEL SORT]
- PARTITION-LEVEL SORT [RS_8]
+ PARTITION-LEVEL SORT [RS_9]
PartitionCols:p_mfgr
- TableScan [TS_7] (rows=26 width=223)
+ TableScan [TS_8] (rows=26 width=223)
default@part,part,Tbl:COMPLETE,Col:COMPLETE,Output:["p_mfgr","p_name","p_size"]
PREHOOK: query: explain select *
@@ -2385,23 +2385,25 @@ Stage-0
PARTITION-LEVEL SORT [RS_22]
Group By Operator [GBY_12] (rows=1 width=16)
Output:["_col0","_col1"],aggregations:["count()","count(_col0)"]
- Group By Operator [GBY_7] (rows=1 width=8)
- Output:["_col0"],aggregations:["avg(VALUE._col0)"]
- <-Map 5 [GROUP]
- GROUP [RS_6]
- Group By Operator [GBY_5] (rows=1 width=76)
- Output:["_col0"],aggregations:["avg(p_size)"]
- Filter Operator [FIL_32] (rows=8 width=4)
- predicate:(p_size < 10)
- TableScan [TS_2] (rows=26 width=4)
- default@part,part,Tbl:COMPLETE,Col:COMPLETE,Output:["p_size"]
+ Select Operator [SEL_8] (rows=1 width=16)
+ Output:["_col0"]
+ Group By Operator [GBY_7] (rows=1 width=16)
+ Output:["_col0","_col1"],aggregations:["sum(VALUE._col0)","count(VALUE._col1)"]
+ <-Map 5 [GROUP]
+ GROUP [RS_6]
+ Group By Operator [GBY_5] (rows=1 width=16)
+ Output:["_col0","_col1"],aggregations:["sum(p_size)","count(p_size)"]
+ Filter Operator [FIL_32] (rows=8 width=4)
+ predicate:(p_size < 10)
+ TableScan [TS_2] (rows=26 width=4)
+ default@part,part,Tbl:COMPLETE,Col:COMPLETE,Output:["p_size"]
<-Reducer 8 [PARTITION-LEVEL SORT]
PARTITION-LEVEL SORT [RS_25]
PartitionCols:_col0
Select Operator [SEL_20] (rows=1 width=12)
Output:["_col0","_col1"]
- Group By Operator [GBY_19] (rows=1 width=8)
- Output:["_col0"],aggregations:["avg(VALUE._col0)"]
+ Group By Operator [GBY_19] (rows=1 width=16)
+ Output:["_col0","_col1"],aggregations:["sum(VALUE._col0)","count(VALUE._col1)"]
<- Please refer to the previous Map 5 [GROUP]
PREHOOK: query: explain select b.p_mfgr, min(p_retailprice)