You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Krisztian Kasa (Jira)" <ji...@apache.org> on 2022/06/30 12:25:00 UTC
[jira] [Resolved] (HIVE-26365) Remove column statistics collection task from merge statement plan
[ https://issues.apache.org/jira/browse/HIVE-26365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Krisztian Kasa resolved HIVE-26365.
-----------------------------------
Resolution: Fixed
Pushed to master. ThanksĀ [~pvary], [~amansinha100], [~asolimando] for review.
> Remove column statistics collection task from merge statement plan
> -------------------------------------------------------------------
>
> Key: HIVE-26365
> URL: https://issues.apache.org/jira/browse/HIVE-26365
> Project: Hive
> Issue Type: Sub-task
> Reporter: Krisztian Kasa
> Assignee: Krisztian Kasa
> Priority: Major
> Labels: pull-request-available
> Fix For: 4.0.0
>
> Time Spent: 20m
> Remaining Estimate: 0h
>
> Merge statements may contain delete and update branches. Update is technically a delete and an insert operation. Column statistics can not be calculated in case of delete operations from the changed records. Example: min, max.
> Currently Hive marks the column stats of the target table invalid after Update/Delete/Merge but for merge extra GBY operators and reducers are generated for insert branches to calculate column stats and Stats works are collecting Column stats too.
> {code}
> POSTHOOK: query: explain
> merge into acidTbl_n0 as t using nonAcidOrcTbl_n0 s ON t.a = s.a
> WHEN MATCHED AND s.a > 8 THEN DELETE
> WHEN MATCHED THEN UPDATE SET b = 7
> WHEN NOT MATCHED THEN INSERT VALUES(s.a, s.b)
> POSTHOOK: type: QUERY
> POSTHOOK: Input: default@acidtbl_n0
> POSTHOOK: Input: default@nonacidorctbl_n0
> POSTHOOK: Output: default@acidtbl_n0
> POSTHOOK: Output: default@acidtbl_n0
> POSTHOOK: Output: default@merge_tmp_table
> STAGE DEPENDENCIES:
> Stage-5 is a root stage
> Stage-6 depends on stages: Stage-5
> Stage-0 depends on stages: Stage-6
> Stage-7 depends on stages: Stage-0
> Stage-1 depends on stages: Stage-6
> Stage-8 depends on stages: Stage-1
> Stage-2 depends on stages: Stage-6
> Stage-9 depends on stages: Stage-2
> Stage-3 depends on stages: Stage-6
> Stage-10 depends on stages: Stage-3
> Stage-4 depends on stages: Stage-6
> Stage-11 depends on stages: Stage-4
> STAGE PLANS:
> Stage: Stage-5
> Tez
> #### A masked pattern was here ####
> Edges:
> Reducer 2 <- Map 1 (SIMPLE_EDGE), Map 10 (SIMPLE_EDGE)
> Reducer 3 <- Reducer 2 (SIMPLE_EDGE)
> Reducer 4 <- Reducer 2 (SIMPLE_EDGE)
> Reducer 5 <- Reducer 2 (SIMPLE_EDGE)
> Reducer 6 <- Reducer 5 (CUSTOM_SIMPLE_EDGE)
> Reducer 7 <- Reducer 2 (SIMPLE_EDGE)
> Reducer 8 <- Reducer 7 (CUSTOM_SIMPLE_EDGE)
> Reducer 9 <- Reducer 2 (SIMPLE_EDGE)
> #### A masked pattern was here ####
> Vertices:
> Map 1
> Map Operator Tree:
> TableScan
> alias: s
> Statistics: Num rows: 4 Data size: 32 Basic stats: COMPLETE Column stats: COMPLETE
> Select Operator
> expressions: a (type: int), b (type: int)
> outputColumnNames: _col0, _col1
> Statistics: Num rows: 4 Data size: 32 Basic stats: COMPLETE Column stats: COMPLETE
> Reduce Output Operator
> key expressions: _col0 (type: int)
> null sort order: z
> sort order: +
> Map-reduce partition columns: _col0 (type: int)
> Statistics: Num rows: 4 Data size: 32 Basic stats: COMPLETE Column stats: COMPLETE
> value expressions: _col1 (type: int)
> Execution mode: vectorized, llap
> LLAP IO: all inputs
> Map 10
> Map Operator Tree:
> TableScan
> alias: t
> filterExpr: a is not null (type: boolean)
> Statistics: Num rows: 2 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE
> Filter Operator
> predicate: a is not null (type: boolean)
> Statistics: Num rows: 2 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE
> Select Operator
> expressions: a (type: int), ROW__ID (type: struct<writeid:bigint,bucketid:int,rowid:bigint>)
> outputColumnNames: _col0, _col1
> Statistics: Num rows: 2 Data size: 160 Basic stats: COMPLETE Column stats: COMPLETE
> Reduce Output Operator
> key expressions: _col0 (type: int)
> null sort order: z
> sort order: +
> Map-reduce partition columns: _col0 (type: int)
> Statistics: Num rows: 2 Data size: 160 Basic stats: COMPLETE Column stats: COMPLETE
> value expressions: _col1 (type: struct<writeid:bigint,bucketid:int,rowid:bigint>)
> Execution mode: vectorized, llap
> LLAP IO: may be used (ACID table)
> Reducer 2
> Execution mode: llap
> Reduce Operator Tree:
> Merge Join Operator
> condition map:
> Left Outer Join 0 to 1
> keys:
> 0 _col0 (type: int)
> 1 _col0 (type: int)
> outputColumnNames: _col0, _col1, _col2, _col3
> Statistics: Num rows: 6 Data size: 288 Basic stats: COMPLETE Column stats: COMPLETE
> Select Operator
> expressions: _col3 (type: struct<writeid:bigint,bucketid:int,rowid:bigint>), _col1 (type: int), _col2 (type: int), _col0 (type: int)
> outputColumnNames: _col0, _col1, _col2, _col3
> Statistics: Num rows: 6 Data size: 288 Basic stats: COMPLETE Column stats: COMPLETE
> Filter Operator
> predicate: ((_col2 = _col3) and (_col3 > 8)) (type: boolean)
> Statistics: Num rows: 1 Data size: 88 Basic stats: COMPLETE Column stats: COMPLETE
> Select Operator
> expressions: _col0 (type: struct<writeid:bigint,bucketid:int,rowid:bigint>)
> outputColumnNames: _col0
> Statistics: Num rows: 1 Data size: 76 Basic stats: COMPLETE Column stats: COMPLETE
> Reduce Output Operator
> key expressions: _col0 (type: struct<writeid:bigint,bucketid:int,rowid:bigint>)
> null sort order: z
> sort order: +
> Map-reduce partition columns: UDFToInteger(_col0) (type: int)
> Statistics: Num rows: 1 Data size: 76 Basic stats: COMPLETE Column stats: COMPLETE
> Filter Operator
> predicate: ((_col2 = _col3) and (_col3 <= 8)) (type: boolean)
> Statistics: Num rows: 2 Data size: 176 Basic stats: COMPLETE Column stats: COMPLETE
> Select Operator
> expressions: _col0 (type: struct<writeid:bigint,bucketid:int,rowid:bigint>)
> outputColumnNames: _col0
> Statistics: Num rows: 2 Data size: 152 Basic stats: COMPLETE Column stats: COMPLETE
> Reduce Output Operator
> key expressions: _col0 (type: struct<writeid:bigint,bucketid:int,rowid:bigint>)
> null sort order: z
> sort order: +
> Map-reduce partition columns: UDFToInteger(_col0) (type: int)
> Statistics: Num rows: 2 Data size: 152 Basic stats: COMPLETE Column stats: COMPLETE
> Filter Operator
> predicate: ((_col2 = _col3) and (_col3 <= 8)) (type: boolean)
> Statistics: Num rows: 2 Data size: 176 Basic stats: COMPLETE Column stats: COMPLETE
> Select Operator
> expressions: _col2 (type: int), 7 (type: int)
> outputColumnNames: _col0, _col1
> Statistics: Num rows: 2 Data size: 16 Basic stats: COMPLETE Column stats: COMPLETE
> Reduce Output Operator
> key expressions: _col0 (type: int)
> null sort order: a
> sort order: +
> Map-reduce partition columns: _col0 (type: int)
> Statistics: Num rows: 2 Data size: 16 Basic stats: COMPLETE Column stats: COMPLETE
> value expressions: _col1 (type: int)
> Filter Operator
> predicate: _col2 is null (type: boolean)
> Statistics: Num rows: 4 Data size: 192 Basic stats: COMPLETE Column stats: COMPLETE
> Select Operator
> expressions: _col3 (type: int), _col1 (type: int)
> outputColumnNames: _col0, _col1
> Statistics: Num rows: 4 Data size: 32 Basic stats: COMPLETE Column stats: COMPLETE
> Reduce Output Operator
> key expressions: _col0 (type: int)
> null sort order: a
> sort order: +
> Map-reduce partition columns: _col0 (type: int)
> Statistics: Num rows: 4 Data size: 32 Basic stats: COMPLETE Column stats: COMPLETE
> value expressions: _col1 (type: int)
> Filter Operator
> predicate: (_col2 = _col3) (type: boolean)
> Statistics: Num rows: 3 Data size: 184 Basic stats: COMPLETE Column stats: COMPLETE
> Select Operator
> expressions: _col0 (type: struct<writeid:bigint,bucketid:int,rowid:bigint>)
> outputColumnNames: _col0
> Statistics: Num rows: 3 Data size: 184 Basic stats: COMPLETE Column stats: COMPLETE
> Group By Operator
> aggregations: count()
> keys: _col0 (type: struct<writeid:bigint,bucketid:int,rowid:bigint>)
> minReductionHashAggr: 0.4
> mode: hash
> outputColumnNames: _col0, _col1
> Statistics: Num rows: 2 Data size: 168 Basic stats: COMPLETE Column stats: COMPLETE
> Reduce Output Operator
> key expressions: _col0 (type: struct<writeid:bigint,bucketid:int,rowid:bigint>)
> null sort order: z
> sort order: +
> Map-reduce partition columns: _col0 (type: struct<writeid:bigint,bucketid:int,rowid:bigint>)
> Statistics: Num rows: 2 Data size: 168 Basic stats: COMPLETE Column stats: COMPLETE
> value expressions: _col1 (type: bigint)
> Reducer 3
> Execution mode: vectorized, llap
> Reduce Operator Tree:
> Select Operator
> expressions: KEY.reducesinkkey0 (type: struct<writeid:bigint,bucketid:int,rowid:bigint>)
> outputColumnNames: _col0
> Statistics: Num rows: 1 Data size: 76 Basic stats: COMPLETE Column stats: COMPLETE
> File Output Operator
> compressed: false
> Statistics: Num rows: 1 Data size: 76 Basic stats: COMPLETE Column stats: COMPLETE
> table:
> input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
> output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
> serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde
> name: default.acidtbl_n0
> Write Type: DELETE
> Reducer 4
> Execution mode: vectorized, llap
> Reduce Operator Tree:
> Select Operator
> expressions: KEY.reducesinkkey0 (type: struct<writeid:bigint,bucketid:int,rowid:bigint>)
> outputColumnNames: _col0
> Statistics: Num rows: 2 Data size: 152 Basic stats: COMPLETE Column stats: COMPLETE
> File Output Operator
> compressed: false
> Statistics: Num rows: 2 Data size: 152 Basic stats: COMPLETE Column stats: COMPLETE
> table:
> input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
> output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
> serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde
> name: default.acidtbl_n0
> Write Type: DELETE
> Reducer 5
> Execution mode: vectorized, llap
> Reduce Operator Tree:
> Select Operator
> expressions: KEY.reducesinkkey0 (type: int), VALUE._col0 (type: int)
> outputColumnNames: _col0, _col1
> Statistics: Num rows: 2 Data size: 16 Basic stats: COMPLETE Column stats: COMPLETE
> File Output Operator
> compressed: false
> Statistics: Num rows: 2 Data size: 16 Basic stats: COMPLETE Column stats: COMPLETE
> table:
> input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
> output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
> serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde
> name: default.acidtbl_n0
> Write Type: INSERT
> Select Operator
> expressions: _col0 (type: int), _col1 (type: int)
> outputColumnNames: a, b
> Statistics: Num rows: 2 Data size: 16 Basic stats: COMPLETE Column stats: COMPLETE
> Group By Operator
> aggregations: min(a), max(a), count(1), count(a), compute_bit_vector_hll(a), min(b), max(b), count(b), compute_bit_vector_hll(b)
> minReductionHashAggr: 0.5
> mode: hash
> outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8
> Statistics: Num rows: 1 Data size: 328 Basic stats: COMPLETE Column stats: COMPLETE
> Reduce Output Operator
> null sort order:
> sort order:
> Statistics: Num rows: 1 Data size: 328 Basic stats: COMPLETE Column stats: COMPLETE
> value expressions: _col0 (type: int), _col1 (type: int), _col2 (type: bigint), _col3 (type: bigint), _col4 (type: binary), _col5 (type: int), _col6 (type: int), _col7 (type: bigint), _col8 (type: binary)
> Reducer 6
> Execution mode: vectorized, llap
> Reduce Operator Tree:
> Group By Operator
> aggregations: min(VALUE._col0), max(VALUE._col1), count(VALUE._col2), count(VALUE._col3), compute_bit_vector_hll(VALUE._col4), min(VALUE._col5), max(VALUE._col6), count(VALUE._col7), compute_bit_vector_hll(VALUE._col8)
> mode: mergepartial
> outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8
> Statistics: Num rows: 1 Data size: 328 Basic stats: COMPLETE Column stats: COMPLETE
> Select Operator
> expressions: 'LONG' (type: string), UDFToLong(_col0) (type: bigint), UDFToLong(_col1) (type: bigint), (_col2 - _col3) (type: bigint), COALESCE(ndv_compute_bit_vector(_col4),0) (type: bigint), _col4 (type: binary), 'LONG' (type: string), UDFToLong(_col5) (type: bigint), UDFToLong(_col6) (type: bigint), (_col2 - _col7) (type: bigint), COALESCE(ndv_compute_bit_vector(_col8),0) (type: bigint), _col8 (type: binary)
> outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8, _col9, _col10, _col11
> Statistics: Num rows: 1 Data size: 528 Basic stats: COMPLETE Column stats: COMPLETE
> File Output Operator
> compressed: false
> Statistics: Num rows: 1 Data size: 528 Basic stats: COMPLETE Column stats: COMPLETE
> table:
> input format: org.apache.hadoop.mapred.SequenceFileInputFormat
> output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
> serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> Reducer 7
> Execution mode: vectorized, llap
> Reduce Operator Tree:
> Select Operator
> expressions: KEY.reducesinkkey0 (type: int), VALUE._col0 (type: int)
> outputColumnNames: _col0, _col1
> Statistics: Num rows: 4 Data size: 32 Basic stats: COMPLETE Column stats: COMPLETE
> File Output Operator
> compressed: false
> Statistics: Num rows: 4 Data size: 32 Basic stats: COMPLETE Column stats: COMPLETE
> table:
> input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
> output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
> serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde
> name: default.acidtbl_n0
> Write Type: INSERT
> Select Operator
> expressions: _col0 (type: int), _col1 (type: int)
> outputColumnNames: a, b
> Statistics: Num rows: 4 Data size: 32 Basic stats: COMPLETE Column stats: COMPLETE
> Group By Operator
> aggregations: min(a), max(a), count(1), count(a), compute_bit_vector_hll(a), min(b), max(b), count(b), compute_bit_vector_hll(b)
> minReductionHashAggr: 0.75
> mode: hash
> outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8
> Statistics: Num rows: 1 Data size: 328 Basic stats: COMPLETE Column stats: COMPLETE
> Reduce Output Operator
> null sort order:
> sort order:
> Statistics: Num rows: 1 Data size: 328 Basic stats: COMPLETE Column stats: COMPLETE
> value expressions: _col0 (type: int), _col1 (type: int), _col2 (type: bigint), _col3 (type: bigint), _col4 (type: binary), _col5 (type: int), _col6 (type: int), _col7 (type: bigint), _col8 (type: binary)
> Reducer 8
> Execution mode: vectorized, llap
> Reduce Operator Tree:
> Group By Operator
> aggregations: min(VALUE._col0), max(VALUE._col1), count(VALUE._col2), count(VALUE._col3), compute_bit_vector_hll(VALUE._col4), min(VALUE._col5), max(VALUE._col6), count(VALUE._col7), compute_bit_vector_hll(VALUE._col8)
> mode: mergepartial
> outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8
> Statistics: Num rows: 1 Data size: 328 Basic stats: COMPLETE Column stats: COMPLETE
> Select Operator
> expressions: 'LONG' (type: string), UDFToLong(_col0) (type: bigint), UDFToLong(_col1) (type: bigint), (_col2 - _col3) (type: bigint), COALESCE(ndv_compute_bit_vector(_col4),0) (type: bigint), _col4 (type: binary), 'LONG' (type: string), UDFToLong(_col5) (type: bigint), UDFToLong(_col6) (type: bigint), (_col2 - _col7) (type: bigint), COALESCE(ndv_compute_bit_vector(_col8),0) (type: bigint), _col8 (type: binary)
> outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8, _col9, _col10, _col11
> Statistics: Num rows: 1 Data size: 528 Basic stats: COMPLETE Column stats: COMPLETE
> File Output Operator
> compressed: false
> Statistics: Num rows: 1 Data size: 528 Basic stats: COMPLETE Column stats: COMPLETE
> table:
> input format: org.apache.hadoop.mapred.SequenceFileInputFormat
> output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
> serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> Reducer 9
> Execution mode: llap
> Reduce Operator Tree:
> Group By Operator
> aggregations: count(VALUE._col0)
> keys: KEY._col0 (type: struct<writeid:bigint,bucketid:int,rowid:bigint>)
> mode: mergepartial
> outputColumnNames: _col0, _col1
> Statistics: Num rows: 2 Data size: 168 Basic stats: COMPLETE Column stats: COMPLETE
> Filter Operator
> predicate: (_col1 > 1L) (type: boolean)
> Statistics: Num rows: 1 Data size: 84 Basic stats: COMPLETE Column stats: COMPLETE
> Select Operator
> expressions: cardinality_violation(_col0) (type: int)
> outputColumnNames: _col0
> Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: COMPLETE
> File Output Operator
> compressed: false
> Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: COMPLETE
> table:
> input format: org.apache.hadoop.mapred.TextInputFormat
> output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
> serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> name: default.merge_tmp_table
> Stage: Stage-6
> Dependency Collection
> Stage: Stage-0
> Move Operator
> tables:
> replace: false
> table:
> input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
> output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
> serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde
> name: default.acidtbl_n0
> Write Type: DELETE
> Stage: Stage-7
> Stats Work
> Basic Stats Work:
> Stage: Stage-1
> Move Operator
> tables:
> replace: false
> table:
> input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
> output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
> serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde
> name: default.acidtbl_n0
> Write Type: DELETE
> Stage: Stage-8
> Stats Work
> Basic Stats Work:
> Stage: Stage-2
> Move Operator
> tables:
> replace: false
> table:
> input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
> output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
> serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde
> name: default.acidtbl_n0
> Write Type: INSERT
> Stage: Stage-9
> Stats Work
> Basic Stats Work:
> Stage: Stage-3
> Move Operator
> tables:
> replace: false
> table:
> input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
> output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
> serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde
> name: default.acidtbl_n0
> Write Type: INSERT
> Stage: Stage-10
> Stats Work
> Basic Stats Work:
> Column Stats Desc:
> Columns: a, b
> Column Types: int, int
> Table: default.acidtbl_n0
> Stage: Stage-4
> Move Operator
> tables:
> replace: false
> table:
> input format: org.apache.hadoop.mapred.TextInputFormat
> output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
> serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> name: default.merge_tmp_table
> Stage: Stage-11
> Stats Work
> Basic Stats Work:
> {code}
> One of the insert Reducers and the follow-up Reducer for col stats collecting:
> {code}
> Reducer 5
> Execution mode: vectorized, llap
> Reduce Operator Tree:
> Select Operator
> expressions: KEY.reducesinkkey0 (type: int), VALUE._col0 (type: int)
> outputColumnNames: _col0, _col1
> Statistics: Num rows: 2 Data size: 16 Basic stats: COMPLETE Column stats: COMPLETE
> File Output Operator
> compressed: false
> Statistics: Num rows: 2 Data size: 16 Basic stats: COMPLETE Column stats: COMPLETE
> table:
> input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
> output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
> serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde
> name: default.acidtbl_n0
> Write Type: INSERT
> Select Operator
> expressions: _col0 (type: int), _col1 (type: int)
> outputColumnNames: a, b
> Statistics: Num rows: 2 Data size: 16 Basic stats: COMPLETE Column stats: COMPLETE
> Group By Operator
> aggregations: min(a), max(a), count(1), count(a), compute_bit_vector_hll(a), min(b), max(b), count(b), compute_bit_vector_hll(b)
> minReductionHashAggr: 0.5
> mode: hash
> outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8
> Statistics: Num rows: 1 Data size: 328 Basic stats: COMPLETE Column stats: COMPLETE
> Reduce Output Operator
> null sort order:
> sort order:
> Statistics: Num rows: 1 Data size: 328 Basic stats: COMPLETE Column stats: COMPLETE
> value expressions: _col0 (type: int), _col1 (type: int), _col2 (type: bigint), _col3 (type: bigint), _col4 (type: binary), _col5 (type: int), _col6 (type: int), _col7 (type: bigint), _col8 (type: binary)
> Reducer 6
> Execution mode: vectorized, llap
> Reduce Operator Tree:
> Group By Operator
> aggregations: min(VALUE._col0), max(VALUE._col1), count(VALUE._col2), count(VALUE._col3), compute_bit_vector_hll(VALUE._col4), min(VALUE._col5), max(VALUE._col6), count(VALUE._col7), compute_bit_vector_hll(VALUE._col8)
> mode: mergepartial
> outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8
> Statistics: Num rows: 1 Data size: 328 Basic stats: COMPLETE Column stats: COMPLETE
> Select Operator
> expressions: 'LONG' (type: string), UDFToLong(_col0) (type: bigint), UDFToLong(_col1) (type: bigint), (_col2 - _col3) (type: bigint), COALESCE(ndv_compute_bit_vector(_col4),0) (type: bigint), _col4 (type: binary), 'LONG' (type: string), UDFToLong(_col5) (type: bigint), UDFToLong(_col6) (type: bigint), (_col2 - _col7) (type: bigint), COALESCE(ndv_compute_bit_vector(_col8),0) (type: bigint), _col8 (type: binary)
> outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8, _col9, _col10, _col11
> Statistics: Num rows: 1 Data size: 528 Basic stats: COMPLETE Column stats: COMPLETE
> File Output Operator
> compressed: false
> Statistics: Num rows: 1 Data size: 528 Basic stats: COMPLETE Column stats: COMPLETE
> table:
> input format: org.apache.hadoop.mapred.SequenceFileInputFormat
> output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
> serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)