You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "James Ball (JIRA)" <ji...@apache.org> on 2017/01/11 08:35:58 UTC
[jira] [Created] (HIVE-15581) Unable to use advanced aggregation with multiple inserts clause

James Ball created HIVE-15581:
---------------------------------

             Summary: Unable to use advanced aggregation with multiple inserts clause
                 Key: HIVE-15581
                 URL: https://issues.apache.org/jira/browse/HIVE-15581
             Project: Hive
          Issue Type: Bug
          Components: Hive
    Affects Versions: 1.2.1
            Reporter: James Ball


■Use Cases
- Use multiple insert clauses within a single query to insert multiple static (user-defined) partitions into a single table.
- Use advanced aggregation (cube) features within each insert clause to include subtotals of columns for each partition

■Expected Behaviour
- Subtotals are inserted for all combinations of the set of columns

■Observed Behaviour
- No subtotals are not inserted for any combination of the set of columns

■Sample Queries
{code:sql}
// Create test tables
create table if not exists
	table1
	(
		column1 string,
		column2 string,
		column3 int
		)
	stored as orc
	tblproperties
	(
		"orc.compress" = "SNAPPY"
		);

create table if not exists
	table2
	(
		column1 string,
		column2 string,
		column3 int
		)
	partitioned by
	(
		partition1 string
		)
	stored as orc
	tblproperties
	(
		"orc.compress" = "SNAPPY"
		);

create table if not exists
	table3
	(
		column1 string,
		column2 string,
		column3 int
		)
	partitioned by
	(
		partition1 string
		)
	stored as orc
	tblproperties
	(
		"orc.compress" = "SNAPPY"
		);
{code}

{code:sql}
// Insert test values
insert overwrite table
	table1
	values
		('value1', 'value1', 1),
		('value2', 'value2', 1),
		('value3', 'value3', 1);
{code}

{code:sql}
// Single insert clause with multiple inserts syntax
// Subtotals are inserted into target table
from
	table1
insert overwrite table
	table2
	partition
	(
		partition1 = 'value1'
		)
	select
		column1,
		column2,
		sum(column3) as column3
	group by
		column1,
		column2
	with cube;
{code}

{code:sql}
// Multiple insert clauses with multiple inserts syntax
// Subtotals are not inserted into target table
from
	table1
insert overwrite table
	table3
	partition
	(
		partition1 = 'value1'
		)
	select
		column1,
		column2,
		sum(column3) as column3
	group by
		column1,
		column2
	with cube
insert overwrite table
	table3
	partition
	(
		partition1 = 'value2'
		)
	select
		column1,
		column2,
		sum(column3) as column3
	group by
		column1,
		column2
	with cube;
{code}

■Executions Plans
- Single insert clause with multiple inserts syntax
{noformat}
STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 depends on stages: Stage-1
  Stage-2 depends on stages: Stage-0

STAGE PLANS:
  Stage: Stage-1
    Map Reduce
      Map Operator Tree:
          TableScan
            alias: table1
            Statistics: Num rows: 3 Data size: 552 Basic stats: COMPLETE Column stats: NONE
            Select Operator
              expressions: column1 (type: string), column2 (type: string), column3 (type: int)
              outputColumnNames: column1, column2, column3
              Statistics: Num rows: 3 Data size: 552 Basic stats: COMPLETE Column stats: NONE
              Group By Operator
                aggregations: sum(column3)
                keys: column1 (type: string), column2 (type: string), '0' (type: string)
                mode: hash
                outputColumnNames: _col0, _col1, _col2, _col3
                Statistics: Num rows: 12 Data size: 2208 Basic stats: COMPLETE Column stats: NONE
                Reduce Output Operator
                  key expressions: _col0 (type: string), _col1 (type: string), _col2 (type: string)
                  sort order: +++
                  Map-reduce partition columns: _col0 (type: string), _col1 (type: string), _col2 (type: string)
                  Statistics: Num rows: 12 Data size: 2208 Basic stats: COMPLETE Column stats: NONE
                  value expressions: _col3 (type: bigint)
      Reduce Operator Tree:
        Group By Operator
          aggregations: sum(VALUE._col0)
          keys: KEY._col0 (type: string), KEY._col1 (type: string), KEY._col2 (type: string)
          mode: mergepartial
          outputColumnNames: _col0, _col1, _col3
          Statistics: Num rows: 6 Data size: 1104 Basic stats: COMPLETE Column stats: NONE
          pruneGroupingSetId: true
          Select Operator
            expressions: _col0 (type: string), _col1 (type: string), UDFToInteger(_col3) (type: int)
            outputColumnNames: _col0, _col1, _col2
            Statistics: Num rows: 6 Data size: 1104 Basic stats: COMPLETE Column stats: NONE
            File Output Operator
              compressed: false
              Statistics: Num rows: 6 Data size: 1104 Basic stats: COMPLETE Column stats: NONE
              table:
                  input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
                  output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
                  serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde
                  name: zzz_james_ball.table2

  Stage: Stage-0
    Move Operator
      tables:
          partition:
            partition1 value1
          replace: true
          table:
              input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
              output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
              serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde
              name: zzz_james_ball.table2

  Stage: Stage-2
    Stats-Aggr Operator
{noformat}
- Single insert clause with multiple inserts syntax
{noformat}
STAGE DEPENDENCIES:
  Stage-2 is a root stage
  Stage-0 depends on stages: Stage-2
  Stage-3 depends on stages: Stage-0
  Stage-1 depends on stages: Stage-2
  Stage-4 depends on stages: Stage-1

STAGE PLANS:
  Stage: Stage-2
    Map Reduce
      Map Operator Tree:
          TableScan
            alias: table1
            Statistics: Num rows: 3 Data size: 552 Basic stats: COMPLETE Column stats: NONE
            Select Operator
              expressions: column1 (type: string), column2 (type: string), column3 (type: int)
              outputColumnNames: column1, column2, column3
              Statistics: Num rows: 3 Data size: 552 Basic stats: COMPLETE Column stats: NONE
              Reduce Output Operator
                key expressions: column1 (type: string), column2 (type: string)
                sort order: ++
                Map-reduce partition columns: column1 (type: string), column2 (type: string)
                Statistics: Num rows: 3 Data size: 552 Basic stats: COMPLETE Column stats: NONE
                value expressions: column3 (type: int)
      Reduce Operator Tree:
        Forward
          Statistics: Num rows: 3 Data size: 552 Basic stats: COMPLETE Column stats: NONE
          Group By Operator
            aggregations: sum(VALUE._col0)
            keys: KEY._col0 (type: string), KEY._col1 (type: string)
            mode: complete
            outputColumnNames: _col0, _col1, _col2
            Statistics: Num rows: 1 Data size: 184 Basic stats: COMPLETE Column stats: NONE
            Select Operator
              expressions: _col0 (type: string), _col1 (type: string), UDFToInteger(_col2) (type: int)
              outputColumnNames: _col0, _col1, _col2
              Statistics: Num rows: 1 Data size: 184 Basic stats: COMPLETE Column stats: NONE
              File Output Operator
                compressed: false
                Statistics: Num rows: 1 Data size: 184 Basic stats: COMPLETE Column stats: NONE
                table:
                    input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
                    output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
                    serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde
                    name: zzz_james_ball.table3
          Group By Operator
            aggregations: sum(VALUE._col0)
            keys: KEY._col0 (type: string), KEY._col1 (type: string)
            mode: complete
            outputColumnNames: _col0, _col1, _col2
            Statistics: Num rows: 1 Data size: 184 Basic stats: COMPLETE Column stats: NONE
            Select Operator
              expressions: _col0 (type: string), _col1 (type: string), UDFToInteger(_col2) (type: int)
              outputColumnNames: _col0, _col1, _col2
              Statistics: Num rows: 1 Data size: 184 Basic stats: COMPLETE Column stats: NONE
              File Output Operator
                compressed: false
                Statistics: Num rows: 1 Data size: 184 Basic stats: COMPLETE Column stats: NONE
                table:
                    input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
                    output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
                    serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde
                    name: zzz_james_ball.table3

  Stage: Stage-0
    Move Operator
      tables:
          partition:
            partition1 value1
          replace: true
          table:
              input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
              output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
              serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde
              name: zzz_james_ball.table3

  Stage: Stage-3
    Stats-Aggr Operator

  Stage: Stage-1
    Move Operator
      tables:
          partition:
            partition1 value2
          replace: true
          table:
              input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
              output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
              serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde
              name: zzz_james_ball.table3

  Stage: Stage-4
    Stats-Aggr Operator
{noformat}

■Notes
- This problem occurs with all advanced aggregation features (cube, grouping sets, rollup)
- This problem occurs whether hive.map.aggr is set to true or false
- Dynamic partitions are not used because the partition values are set manually within the where conditions of each insert clause



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)