You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Hive QA (JIRA)" <ji...@apache.org> on 2014/07/01 18:52:25 UTC
[jira] [Commented] (HIVE-7205) Wrong results when union all of
grouping followed by group by with correlation optimization
[ https://issues.apache.org/jira/browse/HIVE-7205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14049060#comment-14049060 ]
Hive QA commented on HIVE-7205:
-------------------------------
{color:red}Overall{color}: -1 at least one tests failed
Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12651916/HIVE-7205.2.patch.txt
{color:red}ERROR:{color} -1 due to 16 failed/errored test(s), 5657 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_correlationoptimizer1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_correlationoptimizer10
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_correlationoptimizer11
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_correlationoptimizer14
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_correlationoptimizer15
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_correlationoptimizer2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_correlationoptimizer3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_correlationoptimizer4
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_correlationoptimizer5
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_correlationoptimizer6
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_correlationoptimizer7
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_correlationoptimizer8
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_correlationoptimizer9
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_multiMapJoin2
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table
org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection
{noformat}
Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/648/testReport
Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/648/console
Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-648/
Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 16 tests failed
{noformat}
This message is automatically generated.
ATTACHMENT ID: 12651916
> Wrong results when union all of grouping followed by group by with correlation optimization
> -------------------------------------------------------------------------------------------
>
> Key: HIVE-7205
> URL: https://issues.apache.org/jira/browse/HIVE-7205
> Project: Hive
> Issue Type: Bug
> Affects Versions: 0.12.0, 0.13.0, 0.13.1
> Reporter: dima machlin
> Assignee: Navis
> Priority: Critical
> Attachments: HIVE-7205.1.patch.txt, HIVE-7205.2.patch.txt
>
>
> use case :
> table TBL (a string,b string) contains single row : 'a','a'
> the following query :
> {code:sql}
> select b, sum(cc) from (
> select b,count(1) as cc from TBL group by b
> union all
> select a as b,count(1) as cc from TBL group by a
> ) z
> group by b
> {code}
> returns
> a 1
> a 1
> while set hive.optimize.correlation=true;
> if we change set hive.optimize.correlation=false;
> it returns correct results : a 2
> The plan with correlation optimization :
> {code:sql}
> ABSTRACT SYNTAX TREE:
> (TOK_QUERY (TOK_FROM (TOK_SUBQUERY (TOK_UNION (TOK_QUERY (TOK_FROM (TOK_TABREF (TOK_TABNAME DB TBL))) (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR (TOK_TABLE_OR_COL b)) (TOK_SELEXPR (TOK_FUNCTION count 1) cc)) (TOK_GROUPBY (TOK_TABLE_OR_COL b)))) (TOK_QUERY (TOK_FROM (TOK_TABREF (TOK_TABNAME DB TBL))) (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR (TOK_TABLE_OR_COL a) b) (TOK_SELEXPR (TOK_FUNCTION count 1) cc)) (TOK_GROUPBY (TOK_TABLE_OR_COL a))))) z)) (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR (TOK_TABLE_OR_COL b)) (TOK_SELEXPR (TOK_FUNCTION sum (TOK_TABLE_OR_COL cc)))) (TOK_GROUPBY (TOK_TABLE_OR_COL b))))
> STAGE DEPENDENCIES:
> Stage-1 is a root stage
> Stage-0 is a root stage
> STAGE PLANS:
> Stage: Stage-1
> Map Reduce
> Alias -> Map Operator Tree:
> null-subquery1:z-subquery1:TBL
> TableScan
> alias: TBL
> Select Operator
> expressions:
> expr: b
> type: string
> outputColumnNames: b
> Group By Operator
> aggregations:
> expr: count(1)
> bucketGroup: false
> keys:
> expr: b
> type: string
> mode: hash
> outputColumnNames: _col0, _col1
> Reduce Output Operator
> key expressions:
> expr: _col0
> type: string
> sort order: +
> Map-reduce partition columns:
> expr: _col0
> type: string
> tag: 0
> value expressions:
> expr: _col1
> type: bigint
> null-subquery2:z-subquery2:TBL
> TableScan
> alias: TBL
> Select Operator
> expressions:
> expr: a
> type: string
> outputColumnNames: a
> Group By Operator
> aggregations:
> expr: count(1)
> bucketGroup: false
> keys:
> expr: a
> type: string
> mode: hash
> outputColumnNames: _col0, _col1
> Reduce Output Operator
> key expressions:
> expr: _col0
> type: string
> sort order: +
> Map-reduce partition columns:
> expr: _col0
> type: string
> tag: 1
> value expressions:
> expr: _col1
> type: bigint
> Reduce Operator Tree:
> Demux Operator
> Group By Operator
> aggregations:
> expr: count(VALUE._col0)
> bucketGroup: false
> keys:
> expr: KEY._col0
> type: string
> mode: mergepartial
> outputColumnNames: _col0, _col1
> Select Operator
> expressions:
> expr: _col0
> type: string
> expr: _col1
> type: bigint
> outputColumnNames: _col0, _col1
> Union
> Select Operator
> expressions:
> expr: _col0
> type: string
> expr: _col1
> type: bigint
> outputColumnNames: _col0, _col1
> Mux Operator
> Group By Operator
> aggregations:
> expr: sum(_col1)
> bucketGroup: false
> keys:
> expr: _col0
> type: string
> mode: complete
> outputColumnNames: _col0, _col1
> Select Operator
> expressions:
> expr: _col0
> type: string
> expr: _col1
> type: bigint
> outputColumnNames: _col0, _col1
> File Output Operator
> compressed: false
> GlobalTableId: 0
> table:
> input format: org.apache.hadoop.mapred.TextInputFormat
> output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
> Group By Operator
> aggregations:
> expr: count(VALUE._col0)
> bucketGroup: false
> keys:
> expr: KEY._col0
> type: string
> mode: mergepartial
> outputColumnNames: _col0, _col1
> Select Operator
> expressions:
> expr: _col0
> type: string
> expr: _col1
> type: bigint
> outputColumnNames: _col0, _col1
> Union
> Select Operator
> expressions:
> expr: _col0
> type: string
> expr: _col1
> type: bigint
> outputColumnNames: _col0, _col1
> Mux Operator
> Group By Operator
> aggregations:
> expr: sum(_col1)
> bucketGroup: false
> keys:
> expr: _col0
> type: string
> mode: complete
> outputColumnNames: _col0, _col1
> Select Operator
> expressions:
> expr: _col0
> type: string
> expr: _col1
> type: bigint
> outputColumnNames: _col0, _col1
> File Output Operator
> compressed: false
> GlobalTableId: 0
> table:
> input format: org.apache.hadoop.mapred.TextInputFormat
> output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
> Stage: Stage-0
> Fetch Operator
> limit: -1
> {code}
> Plan without correlation optimization :
> {code:sql}
> ABSTRACT SYNTAX TREE:
> (TOK_QUERY (TOK_FROM (TOK_SUBQUERY (TOK_UNION (TOK_QUERY (TOK_FROM (TOK_TABREF (TOK_TABNAME DB TBL))) (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR (TOK_TABLE_OR_COL b)) (TOK_SELEXPR (TOK_FUNCTION count 1) cc)) (TOK_GROUPBY (TOK_TABLE_OR_COL b)))) (TOK_QUERY (TOK_FROM (TOK_TABREF (TOK_TABNAME DB TBL))) (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR (TOK_TABLE_OR_COL a) b) (TOK_SELEXPR (TOK_FUNCTION count 1) cc)) (TOK_GROUPBY (TOK_TABLE_OR_COL a))))) z)) (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR (TOK_TABLE_OR_COL b)) (TOK_SELEXPR (TOK_FUNCTION sum (TOK_TABLE_OR_COL cc)))) (TOK_GROUPBY (TOK_TABLE_OR_COL b))))
> STAGE DEPENDENCIES:
> Stage-1 is a root stage
> Stage-2 depends on stages: Stage-1, Stage-3
> Stage-3 is a root stage
> Stage-0 is a root stage
> STAGE PLANS:
> Stage: Stage-1
> Map Reduce
> Alias -> Map Operator Tree:
> null-subquery2:z-subquery2:TBL
> TableScan
> alias: TBL
> Select Operator
> expressions:
> expr: a
> type: string
> outputColumnNames: a
> Group By Operator
> aggregations:
> expr: count(1)
> bucketGroup: false
> keys:
> expr: a
> type: string
> mode: hash
> outputColumnNames: _col0, _col1
> Reduce Output Operator
> key expressions:
> expr: _col0
> type: string
> sort order: +
> Map-reduce partition columns:
> expr: _col0
> type: string
> tag: -1
> value expressions:
> expr: _col1
> type: bigint
> Reduce Operator Tree:
> Group By Operator
> aggregations:
> expr: count(VALUE._col0)
> bucketGroup: false
> keys:
> expr: KEY._col0
> type: string
> mode: mergepartial
> outputColumnNames: _col0, _col1
> Select Operator
> expressions:
> expr: _col0
> type: string
> expr: _col1
> type: bigint
> outputColumnNames: _col0, _col1
> File Output Operator
> compressed: false
> GlobalTableId: 0
> table:
> input format: org.apache.hadoop.mapred.SequenceFileInputFormat
> output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
> Stage: Stage-2
> Map Reduce
> Alias -> Map Operator Tree:
> maprfs:/user/hadoop/tmp/hive/hive_2014-06-10_15-42-37_188_2403850033480056671-5/-mr-10002
> TableScan
> Union
> Select Operator
> expressions:
> expr: _col0
> type: string
> expr: _col1
> type: bigint
> outputColumnNames: _col0, _col1
> Group By Operator
> aggregations:
> expr: sum(_col1)
> bucketGroup: false
> keys:
> expr: _col0
> type: string
> mode: hash
> outputColumnNames: _col0, _col1
> Reduce Output Operator
> key expressions:
> expr: _col0
> type: string
> sort order: +
> Map-reduce partition columns:
> expr: _col0
> type: string
> tag: -1
> value expressions:
> expr: _col1
> type: bigint
> maprfs:/user/hadoop/tmp/hive/hive_2014-06-10_15-42-37_188_2403850033480056671-5/-mr-10003
> TableScan
> Union
> Select Operator
> expressions:
> expr: _col0
> type: string
> expr: _col1
> type: bigint
> outputColumnNames: _col0, _col1
> Group By Operator
> aggregations:
> expr: sum(_col1)
> bucketGroup: false
> keys:
> expr: _col0
> type: string
> mode: hash
> outputColumnNames: _col0, _col1
> Reduce Output Operator
> key expressions:
> expr: _col0
> type: string
> sort order: +
> Map-reduce partition columns:
> expr: _col0
> type: string
> tag: -1
> value expressions:
> expr: _col1
> type: bigint
> Reduce Operator Tree:
> Group By Operator
> aggregations:
> expr: sum(VALUE._col0)
> bucketGroup: false
> keys:
> expr: KEY._col0
> type: string
> mode: mergepartial
> outputColumnNames: _col0, _col1
> Select Operator
> expressions:
> expr: _col0
> type: string
> expr: _col1
> type: bigint
> outputColumnNames: _col0, _col1
> File Output Operator
> compressed: false
> GlobalTableId: 0
> table:
> input format: org.apache.hadoop.mapred.TextInputFormat
> output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
> Stage: Stage-3
> Map Reduce
> Alias -> Map Operator Tree:
> null-subquery1:z-subquery1:TBL
> TableScan
> alias: TBL
> Select Operator
> expressions:
> expr: b
> type: string
> outputColumnNames: b
> Group By Operator
> aggregations:
> expr: count(1)
> bucketGroup: false
> keys:
> expr: b
> type: string
> mode: hash
> outputColumnNames: _col0, _col1
> Reduce Output Operator
> key expressions:
> expr: _col0
> type: string
> sort order: +
> Map-reduce partition columns:
> expr: _col0
> type: string
> tag: -1
> value expressions:
> expr: _col1
> type: bigint
> Reduce Operator Tree:
> Group By Operator
> aggregations:
> expr: count(VALUE._col0)
> bucketGroup: false
> keys:
> expr: KEY._col0
> type: string
> mode: mergepartial
> outputColumnNames: _col0, _col1
> Select Operator
> expressions:
> expr: _col0
> type: string
> expr: _col1
> type: bigint
> outputColumnNames: _col0, _col1
> File Output Operator
> compressed: false
> GlobalTableId: 0
> table:
> input format: org.apache.hadoop.mapred.SequenceFileInputFormat
> output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
> Stage: Stage-0
> Fetch Operator
> limit: -1
> {code}
--
This message was sent by Atlassian JIRA
(v6.2#6252)