You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Carl Steinbach (JIRA)" <ji...@apache.org> on 2011/07/27 06:19:09 UTC
[jira] [Updated] (HIVE-2262) mapjoin followed by union all, groupby does not work

     [ https://issues.apache.org/jira/browse/HIVE-2262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Carl Steinbach updated HIVE-2262:
---------------------------------

    Fix Version/s:     (was: 0.7.1)

> mapjoin followed by union all, groupby does not work
> ----------------------------------------------------
>
>                 Key: HIVE-2262
>                 URL: https://issues.apache.org/jira/browse/HIVE-2262
>             Project: Hive
>          Issue Type: Bug
>          Components: Query Processor
>    Affects Versions: 0.7.1
>            Reporter: yu xiang
>            Priority: Trivial
>
> sql:
> CREATE TABLE nulltest2(int_data1 INT, int_data2 INT, boolean_data BOOLEAN, double_data DOUBLE, string_data STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
> CREATE TABLE nulltest3(int_data1 INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
> explain select int_data2,count(1) from (select /*+mapjoin(a)*/ int_data2, 1 as c1, 0 as c2 from nulltest2 a join nulltest3 b on(a.int_data1 = b.int_data1) union all select /*+mapjoin(a)*/ int_data2, 1 as c1, 2 as c2 from nulltest2 a join nulltest3 b on(a.int_data1 = b.int_data1)) mapjointable group by int_data2;
> exception:
> FAILED: Hive Internal Error: java.lang.NullPointerException(null)
> java.lang.NullPointerException
>         at org.apache.hadoop.hive.ql.optimizer.ppr.PartitionPruner.prune(PartitionPruner.java:156)
>         at org.apache.hadoop.hive.ql.optimizer.GenMapRedUtils.setTaskPlan(GenMapRedUtils.java:551)
>         at org.apache.hadoop.hive.ql.optimizer.GenMapRedUtils.setTaskPlan(GenMapRedUtils.java:514)
>         at org.apache.hadoop.hive.ql.optimizer.GenMapRedUtils.initPlan(GenMapRedUtils.java:125)
>         at org.apache.hadoop.hive.ql.optimizer.GenMRRedSink1.process(GenMRRedSink1.java:76)
>         at org.apache.hadoop.hive.ql.optimizer.GenMRRedSink3.process(GenMRRedSink3.java:64)
>         at org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:89)
> Analyse the reason:
> 1.When use mapjoin,union,groupby together,the UnionProcFactory.MapJoinUnion()(optimizer) will set the MapJoinSubq true, and set up the UnionParseContext.
> 2.In GenMRUnion1, hive will call mergeMapJoinUnion, and also set task plan.
> 3.In GenMRRedSink3, hive judges the uCtx.isMapOnlySubq(), and call GenMRRedSink1()).process() to init the plan.But the utask's plan has been set yet, it just need to set reducer.And also the utask is processing temporary table, there is no topOp map to table.So here we get null exception.
> Solutions:
> 1.SQL solution:use a sub query to modify the sql;
> 2.Code solution:when in mergeMapJoinUnion, after the task plan have been set, set a settaskplan flag true to indicate the plan for this utask has been set.When in GenMRRedSink3 ,if this flag sets true, don't use the GenMRRedSink1()).process() to reinit the plan.
> ++++++++++++++++++++++++++++
> if (uCtx.isMapOnlySubq()&&!upc.isIssetTaskPlan())
> ++++++++++++++++++++++++++++
> I don't know whether the code solution is suitable.
> Is there any better solution?
> thx

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira