You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Eugene Koifman (JIRA)" <ji...@apache.org> on 2016/11/08 01:05:58 UTC
[jira] [Created] (HIVE-15146) Too many Stats-Aggr Operator in
multi-insert
Eugene Koifman created HIVE-15146:
-------------------------------------
Summary: Too many Stats-Aggr Operator in multi-insert
Key: HIVE-15146
URL: https://issues.apache.org/jira/browse/HIVE-15146
Project: Hive
Issue Type: Bug
Components: Query Planning
Reporter: Eugene Koifman
Assignee: Pengcheng Xiong
Consider:
{noformat}
create table if not exists srcpart (a int, b int, c int)
partitioned by (z int)
clustered by (a) into 2 buckets
stored as orc
tblproperties("transactional"="true");
create temporary table if not exists data1 (x int);
insert into data1 values (1),(2),(3);
explain from data1
insert into srcpart partition(z) select 0,0,1,x
insert into srcpart partition(z=1) select 0,0,1;
{noformat}
Then the plan looks like:
{noformat}
2016-11-07T16:56:19,045 INFO [main] ql.TestTxnCommands2: STAGE DEPENDENCIES:
Stage-2 is a root stage
Stage-0 depends on stages: Stage-2
Stage-3 depends on stages: Stage-0
Stage-4 depends on stages: Stage-2
Stage-1 depends on stages: Stage-4
Stage-5 depends on stages: Stage-1
STAGE PLANS:
Stage: Stage-2
Map Reduce
Map Operator Tree:
TableScan
alias: data1
Statistics: Num rows: 1 Data size: 6 Basic stats: COMPLETE Column stats: NONE
Select Operator
expressions: x (type: int)
outputColumnNames: _col3
Statistics: Num rows: 1 Data size: 6 Basic stats: COMPLETE Column stats: NONE
Reduce Output Operator
sort order:
Map-reduce partition columns: 0 (type: int)
Statistics: Num rows: 1 Data size: 6 Basic stats: COMPLETE Column stats: NONE
value expressions: _col3 (type: int)
Select Operator
Statistics: Num rows: 1 Data size: 6 Basic stats: COMPLETE Column stats: NONE
File Output Operator
compressed: false
table:
input format: org.apache.hadoop.mapred.SequenceFileInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
serde: org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe
Reduce Operator Tree:
Select Operator
expressions: 0 (type: int), 0 (type: int), 1 (type: int), VALUE._col2 (type: int)
outputColumnNames: _col0, _col1, _col2, _col3
Statistics: Num rows: 1 Data size: 6 Basic stats: COMPLETE Column stats: NONE
File Output Operator
compressed: false
Statistics: Num rows: 1 Data size: 6 Basic stats: COMPLETE Column stats: NONE
table:
input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde
name: default.srcpart
Stage: Stage-0
Move Operator
tables:
partition:
z
replace: false
table:
input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde
name: default.srcpart
Stage: Stage-3
Stats-Aggr Operator
Stage: Stage-4
Map Reduce
Map Operator Tree:
TableScan
Reduce Output Operator
sort order:
Map-reduce partition columns: 0 (type: int)
Statistics: Num rows: 1 Data size: 6 Basic stats: COMPLETE Column stats: NONE
Reduce Operator Tree:
Select Operator
expressions: 0 (type: int), 0 (type: int), 1 (type: int)
outputColumnNames: _col0, _col1, _col2
Statistics: Num rows: 1 Data size: 6 Basic stats: COMPLETE Column stats: NONE
File Output Operator
compressed: false
Statistics: Num rows: 1 Data size: 6 Basic stats: COMPLETE Column stats: NONE
table:
input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde
name: default.srcpart
Stage: Stage-1
Move Operator
tables:
partition:
z 1
replace: false
table:
input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde
name: default.srcpart
Stage: Stage-5
Stats-Aggr Operator
{noformat}
Note that there are 2 stats aggregation tasks but both branches of the multi-insert update the same partition
Once HIVE-14943 is in, there will be other ways to generate the same sitation
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)