You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "James Ball (JIRA)" <ji...@apache.org> on 2017/01/11 08:35:58 UTC
[jira] [Created] (HIVE-15581) Unable to use advanced aggregation
with multiple inserts clause
James Ball created HIVE-15581:
---------------------------------
Summary: Unable to use advanced aggregation with multiple inserts clause
Key: HIVE-15581
URL: https://issues.apache.org/jira/browse/HIVE-15581
Project: Hive
Issue Type: Bug
Components: Hive
Affects Versions: 1.2.1
Reporter: James Ball
■Use Cases
- Use multiple insert clauses within a single query to insert multiple static (user-defined) partitions into a single table.
- Use advanced aggregation (cube) features within each insert clause to include subtotals of columns for each partition
■Expected Behaviour
- Subtotals are inserted for all combinations of the set of columns
■Observed Behaviour
- No subtotals are not inserted for any combination of the set of columns
■Sample Queries
{code:sql}
// Create test tables
create table if not exists
table1
(
column1 string,
column2 string,
column3 int
)
stored as orc
tblproperties
(
"orc.compress" = "SNAPPY"
);
create table if not exists
table2
(
column1 string,
column2 string,
column3 int
)
partitioned by
(
partition1 string
)
stored as orc
tblproperties
(
"orc.compress" = "SNAPPY"
);
create table if not exists
table3
(
column1 string,
column2 string,
column3 int
)
partitioned by
(
partition1 string
)
stored as orc
tblproperties
(
"orc.compress" = "SNAPPY"
);
{code}
{code:sql}
// Insert test values
insert overwrite table
table1
values
('value1', 'value1', 1),
('value2', 'value2', 1),
('value3', 'value3', 1);
{code}
{code:sql}
// Single insert clause with multiple inserts syntax
// Subtotals are inserted into target table
from
table1
insert overwrite table
table2
partition
(
partition1 = 'value1'
)
select
column1,
column2,
sum(column3) as column3
group by
column1,
column2
with cube;
{code}
{code:sql}
// Multiple insert clauses with multiple inserts syntax
// Subtotals are not inserted into target table
from
table1
insert overwrite table
table3
partition
(
partition1 = 'value1'
)
select
column1,
column2,
sum(column3) as column3
group by
column1,
column2
with cube
insert overwrite table
table3
partition
(
partition1 = 'value2'
)
select
column1,
column2,
sum(column3) as column3
group by
column1,
column2
with cube;
{code}
■Executions Plans
- Single insert clause with multiple inserts syntax
{noformat}
STAGE DEPENDENCIES:
Stage-1 is a root stage
Stage-0 depends on stages: Stage-1
Stage-2 depends on stages: Stage-0
STAGE PLANS:
Stage: Stage-1
Map Reduce
Map Operator Tree:
TableScan
alias: table1
Statistics: Num rows: 3 Data size: 552 Basic stats: COMPLETE Column stats: NONE
Select Operator
expressions: column1 (type: string), column2 (type: string), column3 (type: int)
outputColumnNames: column1, column2, column3
Statistics: Num rows: 3 Data size: 552 Basic stats: COMPLETE Column stats: NONE
Group By Operator
aggregations: sum(column3)
keys: column1 (type: string), column2 (type: string), '0' (type: string)
mode: hash
outputColumnNames: _col0, _col1, _col2, _col3
Statistics: Num rows: 12 Data size: 2208 Basic stats: COMPLETE Column stats: NONE
Reduce Output Operator
key expressions: _col0 (type: string), _col1 (type: string), _col2 (type: string)
sort order: +++
Map-reduce partition columns: _col0 (type: string), _col1 (type: string), _col2 (type: string)
Statistics: Num rows: 12 Data size: 2208 Basic stats: COMPLETE Column stats: NONE
value expressions: _col3 (type: bigint)
Reduce Operator Tree:
Group By Operator
aggregations: sum(VALUE._col0)
keys: KEY._col0 (type: string), KEY._col1 (type: string), KEY._col2 (type: string)
mode: mergepartial
outputColumnNames: _col0, _col1, _col3
Statistics: Num rows: 6 Data size: 1104 Basic stats: COMPLETE Column stats: NONE
pruneGroupingSetId: true
Select Operator
expressions: _col0 (type: string), _col1 (type: string), UDFToInteger(_col3) (type: int)
outputColumnNames: _col0, _col1, _col2
Statistics: Num rows: 6 Data size: 1104 Basic stats: COMPLETE Column stats: NONE
File Output Operator
compressed: false
Statistics: Num rows: 6 Data size: 1104 Basic stats: COMPLETE Column stats: NONE
table:
input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde
name: zzz_james_ball.table2
Stage: Stage-0
Move Operator
tables:
partition:
partition1 value1
replace: true
table:
input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde
name: zzz_james_ball.table2
Stage: Stage-2
Stats-Aggr Operator
{noformat}
- Single insert clause with multiple inserts syntax
{noformat}
STAGE DEPENDENCIES:
Stage-2 is a root stage
Stage-0 depends on stages: Stage-2
Stage-3 depends on stages: Stage-0
Stage-1 depends on stages: Stage-2
Stage-4 depends on stages: Stage-1
STAGE PLANS:
Stage: Stage-2
Map Reduce
Map Operator Tree:
TableScan
alias: table1
Statistics: Num rows: 3 Data size: 552 Basic stats: COMPLETE Column stats: NONE
Select Operator
expressions: column1 (type: string), column2 (type: string), column3 (type: int)
outputColumnNames: column1, column2, column3
Statistics: Num rows: 3 Data size: 552 Basic stats: COMPLETE Column stats: NONE
Reduce Output Operator
key expressions: column1 (type: string), column2 (type: string)
sort order: ++
Map-reduce partition columns: column1 (type: string), column2 (type: string)
Statistics: Num rows: 3 Data size: 552 Basic stats: COMPLETE Column stats: NONE
value expressions: column3 (type: int)
Reduce Operator Tree:
Forward
Statistics: Num rows: 3 Data size: 552 Basic stats: COMPLETE Column stats: NONE
Group By Operator
aggregations: sum(VALUE._col0)
keys: KEY._col0 (type: string), KEY._col1 (type: string)
mode: complete
outputColumnNames: _col0, _col1, _col2
Statistics: Num rows: 1 Data size: 184 Basic stats: COMPLETE Column stats: NONE
Select Operator
expressions: _col0 (type: string), _col1 (type: string), UDFToInteger(_col2) (type: int)
outputColumnNames: _col0, _col1, _col2
Statistics: Num rows: 1 Data size: 184 Basic stats: COMPLETE Column stats: NONE
File Output Operator
compressed: false
Statistics: Num rows: 1 Data size: 184 Basic stats: COMPLETE Column stats: NONE
table:
input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde
name: zzz_james_ball.table3
Group By Operator
aggregations: sum(VALUE._col0)
keys: KEY._col0 (type: string), KEY._col1 (type: string)
mode: complete
outputColumnNames: _col0, _col1, _col2
Statistics: Num rows: 1 Data size: 184 Basic stats: COMPLETE Column stats: NONE
Select Operator
expressions: _col0 (type: string), _col1 (type: string), UDFToInteger(_col2) (type: int)
outputColumnNames: _col0, _col1, _col2
Statistics: Num rows: 1 Data size: 184 Basic stats: COMPLETE Column stats: NONE
File Output Operator
compressed: false
Statistics: Num rows: 1 Data size: 184 Basic stats: COMPLETE Column stats: NONE
table:
input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde
name: zzz_james_ball.table3
Stage: Stage-0
Move Operator
tables:
partition:
partition1 value1
replace: true
table:
input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde
name: zzz_james_ball.table3
Stage: Stage-3
Stats-Aggr Operator
Stage: Stage-1
Move Operator
tables:
partition:
partition1 value2
replace: true
table:
input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde
name: zzz_james_ball.table3
Stage: Stage-4
Stats-Aggr Operator
{noformat}
■Notes
- This problem occurs with all advanced aggregation features (cube, grouping sets, rollup)
- This problem occurs whether hive.map.aggr is set to true or false
- Dynamic partitions are not used because the partition values are set manually within the where conditions of each insert clause
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)