You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hive.apache.org by "yu xiang (JIRA)" <ji...@apache.org> on 2011/07/06 12:38:18 UTC

[jira] [Created] (HIVE-2262) mapjoin followed by union all, groupby does not work

mapjoin followed by union all, groupby does not work
----------------------------------------------------

                 Key: HIVE-2262
                 URL: https://issues.apache.org/jira/browse/HIVE-2262
             Project: Hive
          Issue Type: Bug
          Components: Query Processor
    Affects Versions: 0.7.1
            Reporter: yu xiang
            Priority: Trivial
             Fix For: 0.7.1


sql:
CREATE TABLE nulltest2(int_data1 INT, int_data2 INT, boolean_data BOOLEAN, double_data DOUBLE, string_data STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';

CREATE TABLE nulltest3(int_data1 INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';

explain select int_data2,count(1) from (select /*+mapjoin(a)*/ int_data2, 1 as c1, 0 as c2 from nulltest2 a join nulltest3 b on(a.int_data1 = b.int_data1) union all select /*+mapjoin(a)*/ int_data2, 1 as c1, 2 as c2 from nulltest2 a join nulltest3 b on(a.int_data1 = b.int_data1)) mapjointable group by int_data2;

exception:
FAILED: Hive Internal Error: java.lang.NullPointerException(null)
java.lang.NullPointerException
        at org.apache.hadoop.hive.ql.optimizer.ppr.PartitionPruner.prune(PartitionPruner.java:156)
        at org.apache.hadoop.hive.ql.optimizer.GenMapRedUtils.setTaskPlan(GenMapRedUtils.java:551)
        at org.apache.hadoop.hive.ql.optimizer.GenMapRedUtils.setTaskPlan(GenMapRedUtils.java:514)
        at org.apache.hadoop.hive.ql.optimizer.GenMapRedUtils.initPlan(GenMapRedUtils.java:125)
        at org.apache.hadoop.hive.ql.optimizer.GenMRRedSink1.process(GenMRRedSink1.java:76)
        at org.apache.hadoop.hive.ql.optimizer.GenMRRedSink3.process(GenMRRedSink3.java:64)
        at org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:89)

Analyse the reason:
1.When use mapjoin,union,groupby together,the UnionProcFactory.MapJoinUnion()(optimizer) will set the MapJoinSubq true, and set up the UnionParseContext.
2.In GenMRUnion1, hive will call mergeMapJoinUnion, and also set task plan.
3.In GenMRRedSink3, hive judges the uCtx.isMapOnlySubq(), and call GenMRRedSink1()).process() to init the plan.But the utask's plan has been set yet, it just need to set reducer.And also the utask is processing temporary table, there is no topOp map to table.So here we get null exception.

Solutions:
1.SQL solution:use a sub query to modify the sql;
2.Code solution:when in mergeMapJoinUnion, after the task plan have been set, set a settaskplan flag true to indicate the plan for this utask has been set.When in GenMRRedSink3 ,if this flag sets true, don't use the GenMRRedSink1()).process() to reinit the plan.
++++++++++++++++++++++++++++
if (uCtx.isMapOnlySubq()&&!upc.isIssetTaskPlan())
++++++++++++++++++++++++++++

I don't know whether the code solution is suitable.
Is there any better solution?
thx





--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HIVE-2262) mapjoin followed by union all, groupby does not work

Posted by "Carl Steinbach (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HIVE-2262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Carl Steinbach resolved HIVE-2262.
----------------------------------

    Resolution: Cannot Reproduce
    
> mapjoin followed by union all, groupby does not work
> ----------------------------------------------------
>
>                 Key: HIVE-2262
>                 URL: https://issues.apache.org/jira/browse/HIVE-2262
>             Project: Hive
>          Issue Type: Bug
>          Components: Query Processor
>    Affects Versions: 0.7.1
>            Reporter: yu xiang
>            Priority: Trivial
>
> sql:
> CREATE TABLE nulltest2(int_data1 INT, int_data2 INT, boolean_data BOOLEAN, double_data DOUBLE, string_data STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
> CREATE TABLE nulltest3(int_data1 INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
> explain select int_data2,count(1) from (select /*+mapjoin(a)*/ int_data2, 1 as c1, 0 as c2 from nulltest2 a join nulltest3 b on(a.int_data1 = b.int_data1) union all select /*+mapjoin(a)*/ int_data2, 1 as c1, 2 as c2 from nulltest2 a join nulltest3 b on(a.int_data1 = b.int_data1)) mapjointable group by int_data2;
> exception:
> FAILED: Hive Internal Error: java.lang.NullPointerException(null)
> java.lang.NullPointerException
>         at org.apache.hadoop.hive.ql.optimizer.ppr.PartitionPruner.prune(PartitionPruner.java:156)
>         at org.apache.hadoop.hive.ql.optimizer.GenMapRedUtils.setTaskPlan(GenMapRedUtils.java:551)
>         at org.apache.hadoop.hive.ql.optimizer.GenMapRedUtils.setTaskPlan(GenMapRedUtils.java:514)
>         at org.apache.hadoop.hive.ql.optimizer.GenMapRedUtils.initPlan(GenMapRedUtils.java:125)
>         at org.apache.hadoop.hive.ql.optimizer.GenMRRedSink1.process(GenMRRedSink1.java:76)
>         at org.apache.hadoop.hive.ql.optimizer.GenMRRedSink3.process(GenMRRedSink3.java:64)
>         at org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:89)
> Analyse the reason:
> 1.When use mapjoin,union,groupby together,the UnionProcFactory.MapJoinUnion()(optimizer) will set the MapJoinSubq true, and set up the UnionParseContext.
> 2.In GenMRUnion1, hive will call mergeMapJoinUnion, and also set task plan.
> 3.In GenMRRedSink3, hive judges the uCtx.isMapOnlySubq(), and call GenMRRedSink1()).process() to init the plan.But the utask's plan has been set yet, it just need to set reducer.And also the utask is processing temporary table, there is no topOp map to table.So here we get null exception.
> Solutions:
> 1.SQL solution:use a sub query to modify the sql;
> 2.Code solution:when in mergeMapJoinUnion, after the task plan have been set, set a settaskplan flag true to indicate the plan for this utask has been set.When in GenMRRedSink3 ,if this flag sets true, don't use the GenMRRedSink1()).process() to reinit the plan.
> ++++++++++++++++++++++++++++
> if (uCtx.isMapOnlySubq()&&!upc.isIssetTaskPlan())
> ++++++++++++++++++++++++++++
> I don't know whether the code solution is suitable.
> Is there any better solution?
> thx

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2262) mapjoin followed by union all, groupby does not work

Posted by "Carl Steinbach (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HIVE-2262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Carl Steinbach updated HIVE-2262:
---------------------------------

    Fix Version/s:     (was: 0.7.1)

> mapjoin followed by union all, groupby does not work
> ----------------------------------------------------
>
>                 Key: HIVE-2262
>                 URL: https://issues.apache.org/jira/browse/HIVE-2262
>             Project: Hive
>          Issue Type: Bug
>          Components: Query Processor
>    Affects Versions: 0.7.1
>            Reporter: yu xiang
>            Priority: Trivial
>
> sql:
> CREATE TABLE nulltest2(int_data1 INT, int_data2 INT, boolean_data BOOLEAN, double_data DOUBLE, string_data STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
> CREATE TABLE nulltest3(int_data1 INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
> explain select int_data2,count(1) from (select /*+mapjoin(a)*/ int_data2, 1 as c1, 0 as c2 from nulltest2 a join nulltest3 b on(a.int_data1 = b.int_data1) union all select /*+mapjoin(a)*/ int_data2, 1 as c1, 2 as c2 from nulltest2 a join nulltest3 b on(a.int_data1 = b.int_data1)) mapjointable group by int_data2;
> exception:
> FAILED: Hive Internal Error: java.lang.NullPointerException(null)
> java.lang.NullPointerException
>         at org.apache.hadoop.hive.ql.optimizer.ppr.PartitionPruner.prune(PartitionPruner.java:156)
>         at org.apache.hadoop.hive.ql.optimizer.GenMapRedUtils.setTaskPlan(GenMapRedUtils.java:551)
>         at org.apache.hadoop.hive.ql.optimizer.GenMapRedUtils.setTaskPlan(GenMapRedUtils.java:514)
>         at org.apache.hadoop.hive.ql.optimizer.GenMapRedUtils.initPlan(GenMapRedUtils.java:125)
>         at org.apache.hadoop.hive.ql.optimizer.GenMRRedSink1.process(GenMRRedSink1.java:76)
>         at org.apache.hadoop.hive.ql.optimizer.GenMRRedSink3.process(GenMRRedSink3.java:64)
>         at org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:89)
> Analyse the reason:
> 1.When use mapjoin,union,groupby together,the UnionProcFactory.MapJoinUnion()(optimizer) will set the MapJoinSubq true, and set up the UnionParseContext.
> 2.In GenMRUnion1, hive will call mergeMapJoinUnion, and also set task plan.
> 3.In GenMRRedSink3, hive judges the uCtx.isMapOnlySubq(), and call GenMRRedSink1()).process() to init the plan.But the utask's plan has been set yet, it just need to set reducer.And also the utask is processing temporary table, there is no topOp map to table.So here we get null exception.
> Solutions:
> 1.SQL solution:use a sub query to modify the sql;
> 2.Code solution:when in mergeMapJoinUnion, after the task plan have been set, set a settaskplan flag true to indicate the plan for this utask has been set.When in GenMRRedSink3 ,if this flag sets true, don't use the GenMRRedSink1()).process() to reinit the plan.
> ++++++++++++++++++++++++++++
> if (uCtx.isMapOnlySubq()&&!upc.isIssetTaskPlan())
> ++++++++++++++++++++++++++++
> I don't know whether the code solution is suitable.
> Is there any better solution?
> thx

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HIVE-2262) mapjoin followed by union all, groupby does not work

Posted by "Ashutosh Chauhan (Resolved) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HIVE-2262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ashutosh Chauhan resolved HIVE-2262.
------------------------------------

    Resolution: Fixed

This is no longer reproducible on trunk. Feel free to reopen if there is some other variant which can produce this.
                
> mapjoin followed by union all, groupby does not work
> ----------------------------------------------------
>
>                 Key: HIVE-2262
>                 URL: https://issues.apache.org/jira/browse/HIVE-2262
>             Project: Hive
>          Issue Type: Bug
>          Components: Query Processor
>    Affects Versions: 0.7.1
>            Reporter: yu xiang
>            Priority: Trivial
>
> sql:
> CREATE TABLE nulltest2(int_data1 INT, int_data2 INT, boolean_data BOOLEAN, double_data DOUBLE, string_data STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
> CREATE TABLE nulltest3(int_data1 INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
> explain select int_data2,count(1) from (select /*+mapjoin(a)*/ int_data2, 1 as c1, 0 as c2 from nulltest2 a join nulltest3 b on(a.int_data1 = b.int_data1) union all select /*+mapjoin(a)*/ int_data2, 1 as c1, 2 as c2 from nulltest2 a join nulltest3 b on(a.int_data1 = b.int_data1)) mapjointable group by int_data2;
> exception:
> FAILED: Hive Internal Error: java.lang.NullPointerException(null)
> java.lang.NullPointerException
>         at org.apache.hadoop.hive.ql.optimizer.ppr.PartitionPruner.prune(PartitionPruner.java:156)
>         at org.apache.hadoop.hive.ql.optimizer.GenMapRedUtils.setTaskPlan(GenMapRedUtils.java:551)
>         at org.apache.hadoop.hive.ql.optimizer.GenMapRedUtils.setTaskPlan(GenMapRedUtils.java:514)
>         at org.apache.hadoop.hive.ql.optimizer.GenMapRedUtils.initPlan(GenMapRedUtils.java:125)
>         at org.apache.hadoop.hive.ql.optimizer.GenMRRedSink1.process(GenMRRedSink1.java:76)
>         at org.apache.hadoop.hive.ql.optimizer.GenMRRedSink3.process(GenMRRedSink3.java:64)
>         at org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:89)
> Analyse the reason:
> 1.When use mapjoin,union,groupby together,the UnionProcFactory.MapJoinUnion()(optimizer) will set the MapJoinSubq true, and set up the UnionParseContext.
> 2.In GenMRUnion1, hive will call mergeMapJoinUnion, and also set task plan.
> 3.In GenMRRedSink3, hive judges the uCtx.isMapOnlySubq(), and call GenMRRedSink1()).process() to init the plan.But the utask's plan has been set yet, it just need to set reducer.And also the utask is processing temporary table, there is no topOp map to table.So here we get null exception.
> Solutions:
> 1.SQL solution:use a sub query to modify the sql;
> 2.Code solution:when in mergeMapJoinUnion, after the task plan have been set, set a settaskplan flag true to indicate the plan for this utask has been set.When in GenMRRedSink3 ,if this flag sets true, don't use the GenMRRedSink1()).process() to reinit the plan.
> ++++++++++++++++++++++++++++
> if (uCtx.isMapOnlySubq()&&!upc.isIssetTaskPlan())
> ++++++++++++++++++++++++++++
> I don't know whether the code solution is suitable.
> Is there any better solution?
> thx

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Reopened] (HIVE-2262) mapjoin followed by union all, groupby does not work

Posted by "Carl Steinbach (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HIVE-2262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Carl Steinbach reopened HIVE-2262:
----------------------------------

    
> mapjoin followed by union all, groupby does not work
> ----------------------------------------------------
>
>                 Key: HIVE-2262
>                 URL: https://issues.apache.org/jira/browse/HIVE-2262
>             Project: Hive
>          Issue Type: Bug
>          Components: Query Processor
>    Affects Versions: 0.7.1
>            Reporter: yu xiang
>            Priority: Trivial
>
> sql:
> CREATE TABLE nulltest2(int_data1 INT, int_data2 INT, boolean_data BOOLEAN, double_data DOUBLE, string_data STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
> CREATE TABLE nulltest3(int_data1 INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
> explain select int_data2,count(1) from (select /*+mapjoin(a)*/ int_data2, 1 as c1, 0 as c2 from nulltest2 a join nulltest3 b on(a.int_data1 = b.int_data1) union all select /*+mapjoin(a)*/ int_data2, 1 as c1, 2 as c2 from nulltest2 a join nulltest3 b on(a.int_data1 = b.int_data1)) mapjointable group by int_data2;
> exception:
> FAILED: Hive Internal Error: java.lang.NullPointerException(null)
> java.lang.NullPointerException
>         at org.apache.hadoop.hive.ql.optimizer.ppr.PartitionPruner.prune(PartitionPruner.java:156)
>         at org.apache.hadoop.hive.ql.optimizer.GenMapRedUtils.setTaskPlan(GenMapRedUtils.java:551)
>         at org.apache.hadoop.hive.ql.optimizer.GenMapRedUtils.setTaskPlan(GenMapRedUtils.java:514)
>         at org.apache.hadoop.hive.ql.optimizer.GenMapRedUtils.initPlan(GenMapRedUtils.java:125)
>         at org.apache.hadoop.hive.ql.optimizer.GenMRRedSink1.process(GenMRRedSink1.java:76)
>         at org.apache.hadoop.hive.ql.optimizer.GenMRRedSink3.process(GenMRRedSink3.java:64)
>         at org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:89)
> Analyse the reason:
> 1.When use mapjoin,union,groupby together,the UnionProcFactory.MapJoinUnion()(optimizer) will set the MapJoinSubq true, and set up the UnionParseContext.
> 2.In GenMRUnion1, hive will call mergeMapJoinUnion, and also set task plan.
> 3.In GenMRRedSink3, hive judges the uCtx.isMapOnlySubq(), and call GenMRRedSink1()).process() to init the plan.But the utask's plan has been set yet, it just need to set reducer.And also the utask is processing temporary table, there is no topOp map to table.So here we get null exception.
> Solutions:
> 1.SQL solution:use a sub query to modify the sql;
> 2.Code solution:when in mergeMapJoinUnion, after the task plan have been set, set a settaskplan flag true to indicate the plan for this utask has been set.When in GenMRRedSink3 ,if this flag sets true, don't use the GenMRRedSink1()).process() to reinit the plan.
> ++++++++++++++++++++++++++++
> if (uCtx.isMapOnlySubq()&&!upc.isIssetTaskPlan())
> ++++++++++++++++++++++++++++
> I don't know whether the code solution is suitable.
> Is there any better solution?
> thx

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-2262) mapjoin followed by union all, groupby does not work

Posted by "Ashutosh Chauhan (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HIVE-2262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13140759#comment-13140759 ] 

Ashutosh Chauhan commented on HIVE-2262:
----------------------------------------

That looks like a possible solution to me. I would like to see what others think as well. Yu, can you generate a svn friendly patch. That will be easier to review.
                
> mapjoin followed by union all, groupby does not work
> ----------------------------------------------------
>
>                 Key: HIVE-2262
>                 URL: https://issues.apache.org/jira/browse/HIVE-2262
>             Project: Hive
>          Issue Type: Bug
>          Components: Query Processor
>    Affects Versions: 0.7.1
>            Reporter: yu xiang
>            Priority: Trivial
>
> sql:
> CREATE TABLE nulltest2(int_data1 INT, int_data2 INT, boolean_data BOOLEAN, double_data DOUBLE, string_data STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
> CREATE TABLE nulltest3(int_data1 INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
> explain select int_data2,count(1) from (select /*+mapjoin(a)*/ int_data2, 1 as c1, 0 as c2 from nulltest2 a join nulltest3 b on(a.int_data1 = b.int_data1) union all select /*+mapjoin(a)*/ int_data2, 1 as c1, 2 as c2 from nulltest2 a join nulltest3 b on(a.int_data1 = b.int_data1)) mapjointable group by int_data2;
> exception:
> FAILED: Hive Internal Error: java.lang.NullPointerException(null)
> java.lang.NullPointerException
>         at org.apache.hadoop.hive.ql.optimizer.ppr.PartitionPruner.prune(PartitionPruner.java:156)
>         at org.apache.hadoop.hive.ql.optimizer.GenMapRedUtils.setTaskPlan(GenMapRedUtils.java:551)
>         at org.apache.hadoop.hive.ql.optimizer.GenMapRedUtils.setTaskPlan(GenMapRedUtils.java:514)
>         at org.apache.hadoop.hive.ql.optimizer.GenMapRedUtils.initPlan(GenMapRedUtils.java:125)
>         at org.apache.hadoop.hive.ql.optimizer.GenMRRedSink1.process(GenMRRedSink1.java:76)
>         at org.apache.hadoop.hive.ql.optimizer.GenMRRedSink3.process(GenMRRedSink3.java:64)
>         at org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:89)
> Analyse the reason:
> 1.When use mapjoin,union,groupby together,the UnionProcFactory.MapJoinUnion()(optimizer) will set the MapJoinSubq true, and set up the UnionParseContext.
> 2.In GenMRUnion1, hive will call mergeMapJoinUnion, and also set task plan.
> 3.In GenMRRedSink3, hive judges the uCtx.isMapOnlySubq(), and call GenMRRedSink1()).process() to init the plan.But the utask's plan has been set yet, it just need to set reducer.And also the utask is processing temporary table, there is no topOp map to table.So here we get null exception.
> Solutions:
> 1.SQL solution:use a sub query to modify the sql;
> 2.Code solution:when in mergeMapJoinUnion, after the task plan have been set, set a settaskplan flag true to indicate the plan for this utask has been set.When in GenMRRedSink3 ,if this flag sets true, don't use the GenMRRedSink1()).process() to reinit the plan.
> ++++++++++++++++++++++++++++
> if (uCtx.isMapOnlySubq()&&!upc.isIssetTaskPlan())
> ++++++++++++++++++++++++++++
> I don't know whether the code solution is suitable.
> Is there any better solution?
> thx

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira