You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Jason Michael <jm...@videoegg.com> on 2010/02/19 23:23:11 UTC
Having trouble with lateral view
I'm currently running a hive build from trunk, revision number 911889. I've built a UDTF called map_explode which just emits the key and value of each entry in a map as a row in the result table. The table I'm running it against looks like:
hive> describe mytable;
product string from deserializer
...
interactions map<string,int> from deserializer
If I use the map_explode in the select clause, I get the expected results:
hive> select map_explode(interactions) as (key, value) from mytable where day = '2010-02-18' and hour = 1 limit 10;
...
OK
invite_impression 1
invite_impression 1
invite_impression 1
invite_impression 1
rollout 12
invite_impression 1
invite_impression 1
invite_impression 1
rollout 4
invite_impression 1
Time taken: 22.11 seconds
However, if I try to use LATERAL JOIN to relate the exploded values back to the parent table, like so:
hive> select product, key, sum(value) from mytable LATERAL VIEW map_explode(interactions) interacts as key, value where day = '2010-02-18' and hour = 1 group by product, key;
I get the following error:
FAILED: Unknown exception: null
Looking in hive.log, I see the follow stack trace:
2010-02-19 14:15:17,215 ERROR ql.Driver (SessionState.java:printError(255)) - FAILED: Unknown exception: null
java.lang.NullPointerException
at org.apache.hadoop.hive.ql.ppd.ExprWalkerProcFactory$ColumnExprProcessor.process(ExprWalkerProcFactory.java:87)
at org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:89)
at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89)
at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:129)
at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:103)
at org.apache.hadoop.hive.ql.ppd.ExprWalkerProcFactory.extractPushdownPreds(ExprWalkerProcFactory.java:273)
at org.apache.hadoop.hive.ql.ppd.OpProcFactory$DefaultPPD.mergeWithChildrenPred(OpProcFactory.java:317)
at org.apache.hadoop.hive.ql.ppd.OpProcFactory$DefaultPPD.process(OpProcFactory.java:258)
at org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:89)
at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89)
at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:129)
at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:103)
at org.apache.hadoop.hive.ql.ppd.PredicatePushDown.transform(PredicatePushDown.java:103)
at org.apache.hadoop.hive.ql.optimizer.Optimizer.optimize(Optimizer.java:74)
at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:5758)
at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:125)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:304)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:377)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:138)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:197)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:303)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
I peeked at ExprWalkerProcFactory, but couldn't readily see what was causing the problem. Any ideas?
Jason
Re: Having trouble with lateral view
Posted by Zheng Shao <zs...@gmail.com>.
Jason,
Do you want to open a JIRA and contrib your map_explode function to Hive?
That will be greatly appreciated.
Zheng
On Fri, Feb 19, 2010 at 2:49 PM, Yongqiang He
<he...@software.ict.ac.cn> wrote:
> Hi Jason,
>
> This is a known bug, see https://issues.apache.org/jira/browse/HIVE-1056
>
> You can first disable ppd with “set hive.optimize.ppd=false;”
>
> Thanks
> Yongqiang
> On 2/19/10 2:23 PM, "Jason Michael" <jm...@videoegg.com> wrote:
>
> I’m currently running a hive build from trunk, revision number 911889. I’ve
> built a UDTF called map_explode which just emits the key and value of each
> entry in a map as a row in the result table. The table I’m running it
> against looks like:
>
> hive> describe mytable;
> product string from deserializer
> ...
> interactions map<string,int> from deserializer
>
> If I use the map_explode in the select clause, I get the expected results:
>
> hive> select map_explode(interactions) as (key, value) from mytable where
> day = '2010-02-18' and hour = 1 limit 10;
> ...
> OK
> invite_impression 1
> invite_impression 1
> invite_impression 1
> invite_impression 1
> rollout 12
> invite_impression 1
> invite_impression 1
> invite_impression 1
> rollout 4
> invite_impression 1
> Time taken: 22.11 seconds
>
> However, if I try to use LATERAL JOIN to relate the exploded values back to
> the parent table, like so:
>
> hive> select product, key, sum(value) from mytable LATERAL VIEW
> map_explode(interactions) interacts as key, value where day = '2010-02-18'
> and hour = 1 group by product, key;
>
> I get the following error:
>
> FAILED: Unknown exception: null
>
> Looking in hive.log, I see the follow stack trace:
>
> 2010-02-19 14:15:17,215 ERROR ql.Driver (SessionState.java:printError(255))
> - FAILED: Unknown exception: null
> java.lang.NullPointerException
> at
> org.apache.hadoop.hive.ql.ppd.ExprWalkerProcFactory$ColumnExprProcessor.process(ExprWalkerProcFactory.java:87)
> at
> org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:89)
> at
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89)
> at
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:129)
> at
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:103)
> at
> org.apache.hadoop.hive.ql.ppd.ExprWalkerProcFactory.extractPushdownPreds(ExprWalkerProcFactory.java:273)
> at
> org.apache.hadoop.hive.ql.ppd.OpProcFactory$DefaultPPD.mergeWithChildrenPred(OpProcFactory.java:317)
> at
> org.apache.hadoop.hive.ql.ppd.OpProcFactory$DefaultPPD.process(OpProcFactory.java:258)
> at
> org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:89)
> at
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89)
> at
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:129)
> at
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:103)
> at
> org.apache.hadoop.hive.ql.ppd.PredicatePushDown.transform(PredicatePushDown.java:103)
> at
> org.apache.hadoop.hive.ql.optimizer.Optimizer.optimize(Optimizer.java:74)
> at
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:5758)
> at
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:125)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:304)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:377)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:138)
> at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:197)
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:303)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
>
> I peeked at ExprWalkerProcFactory, but couldn’t readily see what was causing
> the problem. Any ideas?
>
> Jason
>
--
Yours,
Zheng
Re: Having trouble with lateral view
Posted by Yongqiang He <he...@software.ict.ac.cn>.
Hi Jason,
This is a known bug, see https://issues.apache.org/jira/browse/HIVE-1056
You can first disable ppd with ³set hive.optimize.ppd=false;²
Thanks
Yongqiang
On 2/19/10 2:23 PM, "Jason Michael" <jm...@videoegg.com> wrote:
> I¹m currently running a hive build from trunk, revision number 911889. I¹ve
> built a UDTF called map_explode which just emits the key and value of each
> entry in a map as a row in the result table. The table I¹m running it against
> looks like:
>
> hive> describe mytable;
> product string from deserializer
> ...
> interactions map<string,int> from deserializer
>
> If I use the map_explode in the select clause, I get the expected results:
>
> hive> select map_explode(interactions) as (key, value) from mytable where day
> = '2010-02-18' and hour = 1 limit 10;
> ...
> OK
> invite_impression 1
> invite_impression 1
> invite_impression 1
> invite_impression 1
> rollout 12
> invite_impression 1
> invite_impression 1
> invite_impression 1
> rollout 4
> invite_impression 1
> Time taken: 22.11 seconds
>
> However, if I try to use LATERAL JOIN to relate the exploded values back to
> the parent table, like so:
>
> hive> select product, key, sum(value) from mytable LATERAL VIEW
> map_explode(interactions) interacts as key, value where day = '2010-02-18' and
> hour = 1 group by product, key;
>
> I get the following error:
>
> FAILED: Unknown exception: null
>
> Looking in hive.log, I see the follow stack trace:
>
> 2010-02-19 14:15:17,215 ERROR ql.Driver (SessionState.java:printError(255)) -
> FAILED: Unknown exception: null
> java.lang.NullPointerException
> at
> org.apache.hadoop.hive.ql.ppd.ExprWalkerProcFactory$ColumnExprProcessor.proces
> s(ExprWalkerProcFactory.java:87)
> at
> org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispat
> cher.java:89)
> at
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.j
> ava:89)
> at
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:
> 129)
> at
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalk
> er.java:103)
> at
> org.apache.hadoop.hive.ql.ppd.ExprWalkerProcFactory.extractPushdownPreds(ExprW
> alkerProcFactory.java:273)
> at
> org.apache.hadoop.hive.ql.ppd.OpProcFactory$DefaultPPD.mergeWithChildrenPred(O
> pProcFactory.java:317)
> at
> org.apache.hadoop.hive.ql.ppd.OpProcFactory$DefaultPPD.process(OpProcFactory.j
> ava:258)
> at
> org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispat
> cher.java:89)
> at
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.j
> ava:89)
> at
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:
> 129)
> at
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalk
> er.java:103)
> at
> org.apache.hadoop.hive.ql.ppd.PredicatePushDown.transform(PredicatePushDown.ja
> va:103)
> at
> org.apache.hadoop.hive.ql.optimizer.Optimizer.optimize(Optimizer.java:74)
> at
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnaly
> zer.java:5758)
> at
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnaly
> zer.java:125)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:304)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:377)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:138)
> at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:197)
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:303)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.j
> ava:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
>
> I peeked at ExprWalkerProcFactory, but couldn¹t readily see what was causing
> the problem. Any ideas?
>
> Jason