You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Mark Grover <mg...@oanda.com> on 2012/02/01 20:06:03 UTC
Re: Invoke a UDAF inside another UDAF
Rohan,
You could do it one of the following ways:
1) Write a UDAF that does the avg(f2 - avg_f2) computation.
2) Write a custom reducer that does the avg(f2 - avg_f2) computation.
3) Do it with multiple passes over the data. Something like this (untested):
select
table.f1,
avg_table.avg_f2,
avg(table.f2-avg_table.avg_f2)
from
(
select
f1,
avg(f2) as avg_f2
from
table
group by
f1)avg_table
join
table
ON (avg_table.f1=table.f1)
group by
table.f1,
avg_table.avg_f2;
Mark
Mark Grover, Business Intelligence Analyst
OANDA Corporation
www: oanda.com www: fxtrade.com
e: mgrover@oanda.com
"Best Trading Platform" - World Finance's Forex Awards 2009.
"The One to Watch" - Treasury Today's Adam Smith Awards 2009.
----- Original Message -----
From: "rohan monga" <mo...@gmail.com>
To: user@hive.apache.org
Sent: Friday, January 20, 2012 6:00:54 PM
Subject: Re: Invoke a UDAF inside another UDAF
my bad, i hastily converted the query to a wrong example.
it should be like this
select f1, avg(f2) as avg_f2, avg(f2 - avg_f2) from table group by f1;
In essence, I just want to use the value generated by one UDAF ( in this case avg(f2) ) as a single number and then apply that value to the group inside a different UDAF.
For e.g. if I were to use a streaming reducer, it would be something like this
avg1 = computeSum(list) / len(list)
return computeSum(x-avg1 for x in list) / len(list)
As I write this I realize why this might not be possible [ the group computation being done in one step and the information being lost ] :)
But why the nullpointer exception?
Regards,
--
Rohan Monga
On Fri, Jan 20, 2012 at 2:32 PM, Edward Capriolo < edlinuxguru@gmail.com > wrote:
IMHO You can not possibly nest the percentile calculation because the
results would be meaningless. percentile has to aggregate a set and
pick the Nth element, But if you nest then the inner percentile only
returns one result to the outer percentile, and that is pretty
meaningless.
(I think someone talked about this on list in the last month or so).
Without seeing your input data and your expected results, i can not
understand what your query wants to do, and suggest an alternative.
On 1/20/12, rohan monga < monga.rohan@gmail.com > wrote:
> thanks edward that seems to work :)
>
> However, I have another query is like this
>
> select a, avg(b) as avg_b, percentile_approx( avg_b - percentile_approx( b,
> .5), .5 ) from table1 group by a
>
> Here I will loose the group info if I include the inner query in the FROM
> clause, is there a way to get this to work?
>
> Thanks,
> --
> Rohan Monga
>
>
> On Fri, Jan 20, 2012 at 12:51 PM, Edward Capriolo
> < edlinuxguru@gmail.com >wrote:
>
>> I think if you are grouping by b, b has to be in your select list. Try
>> this.
>> FROM (
>> select b,count(a) as theCount from table one group by b
>> ) a select mean(theCount);
>>
>> I think that should work.
>>
>> On 1/20/12, rohan monga < monga.rohan@gmail.com > wrote:
>> > Hi,
>> > I am trying to run a query like
>> > "select mean(count(a)) from table1 group by b;"
>> >
>> > I am getting the following error
>> > <snip>
>> > FAILED: Hive Internal Error: java.lang.NullPointerException(null)
>> > java.lang.NullPointerException
>> > at
>> >
>> org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc.newInstance(ExprNodeGenericFuncDesc.java:151)
>> > at
>> >
>> org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory$DefaultExprProcessor.getXpathOrFuncExprNodeDesc(TypeCheckProcFactory.java:656)
>> > at
>> >
>> org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory$DefaultExprProcessor.process(TypeCheckProcFactory.java:777)
>> > at
>> >
>> org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:89)
>> > at
>> >
>> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:88)
>> > at
>> >
>> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:125)
>> > at
>> >
>> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:102)
>> > at
>> >
>> org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory.genExprNode(TypeCheckProcFactory.java:157)
>> > at
>> >
>> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genExprNodeDesc(SemanticAnalyzer.java:7447)
>> > at
>> >
>> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genExprNodeDesc(SemanticAnalyzer.java:7405)
>> > at
>> >
>> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genGroupByPlanMapGroupByOperator(SemanticAnalyzer.java:2747)
>> > at
>> >
>> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genGroupByPlanMapAggr1MR(SemanticAnalyzer.java:3365)
>> > at
>> >
>> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:5858)
>> > at
>> >
>> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:6480)
>> > at
>> >
>> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:7223)
>> > at
>> >
>> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:240)
>> > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:428)
>> > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:336)
>> > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:901)
>> > at
>> > org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:253)
>> > at
>> org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:210)
>> > at
>> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:401)
>> > at
>> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:336)
>> > at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:635)
>> > at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:552)
>> > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> > at
>> >
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>> > at
>> >
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>> > at java.lang.reflect.Method.invoke(Method.java:597)
>> > at org.apache.hadoop.util.RunJar.main(RunJar.java:197)
>> > </snip>
>> >
>> > Is there a workaround ? I have tried with hive 0.7.1 and 0.8
>> >
>> > Thanks
>> > --
>> > Rohan Monga
>> >
>>
>
Re: Invoke a UDAF inside another UDAF
Posted by rohan monga <mo...@gmail.com>.
thanks Mark,
I ended up going the custom reducer way. I will try out the query you have
sent.
Regards,
--
Rohan Monga
On Wed, Feb 1, 2012 at 11:06 AM, Mark Grover <mg...@oanda.com> wrote:
> Rohan,
> You could do it one of the following ways:
> 1) Write a UDAF that does the avg(f2 - avg_f2) computation.
> 2) Write a custom reducer that does the avg(f2 - avg_f2) computation.
> 3) Do it with multiple passes over the data. Something like this
> (untested):
>
> select
> table.f1,
> avg_table.avg_f2,
> avg(table.f2-avg_table.avg_f2)
> from
> (
> select
> f1,
> avg(f2) as avg_f2
> from
> table
> group by
> f1)avg_table
> join
> table
> ON (avg_table.f1=table.f1)
> group by
> table.f1,
> avg_table.avg_f2;
>
> Mark
>
> Mark Grover, Business Intelligence Analyst
> OANDA Corporation
>
> www: oanda.com www: fxtrade.com
> e: mgrover@oanda.com
>
> "Best Trading Platform" - World Finance's Forex Awards 2009.
> "The One to Watch" - Treasury Today's Adam Smith Awards 2009.
>
>
> ----- Original Message -----
> From: "rohan monga" <mo...@gmail.com>
> To: user@hive.apache.org
> Sent: Friday, January 20, 2012 6:00:54 PM
> Subject: Re: Invoke a UDAF inside another UDAF
>
> my bad, i hastily converted the query to a wrong example.
>
> it should be like this
>
> select f1, avg(f2) as avg_f2, avg(f2 - avg_f2) from table group by f1;
>
> In essence, I just want to use the value generated by one UDAF ( in this
> case avg(f2) ) as a single number and then apply that value to the group
> inside a different UDAF.
> For e.g. if I were to use a streaming reducer, it would be something like
> this
>
> avg1 = computeSum(list) / len(list)
> return computeSum(x-avg1 for x in list) / len(list)
>
> As I write this I realize why this might not be possible [ the group
> computation being done in one step and the information being lost ] :)
>
> But why the nullpointer exception?
>
> Regards,
> --
> Rohan Monga
>
>
>
> On Fri, Jan 20, 2012 at 2:32 PM, Edward Capriolo < edlinuxguru@gmail.com> wrote:
>
>
> IMHO You can not possibly nest the percentile calculation because the
> results would be meaningless. percentile has to aggregate a set and
> pick the Nth element, But if you nest then the inner percentile only
> returns one result to the outer percentile, and that is pretty
> meaningless.
>
> (I think someone talked about this on list in the last month or so).
> Without seeing your input data and your expected results, i can not
> understand what your query wants to do, and suggest an alternative.
>
>
>
>
>
> On 1/20/12, rohan monga < monga.rohan@gmail.com > wrote:
> > thanks edward that seems to work :)
> >
> > However, I have another query is like this
> >
> > select a, avg(b) as avg_b, percentile_approx( avg_b - percentile_approx(
> b,
> > .5), .5 ) from table1 group by a
> >
> > Here I will loose the group info if I include the inner query in the FROM
> > clause, is there a way to get this to work?
> >
> > Thanks,
> > --
> > Rohan Monga
> >
> >
> > On Fri, Jan 20, 2012 at 12:51 PM, Edward Capriolo
> > < edlinuxguru@gmail.com >wrote:
> >
> >> I think if you are grouping by b, b has to be in your select list. Try
> >> this.
> >> FROM (
> >> select b,count(a) as theCount from table one group by b
> >> ) a select mean(theCount);
> >>
> >> I think that should work.
> >>
> >> On 1/20/12, rohan monga < monga.rohan@gmail.com > wrote:
> >> > Hi,
> >> > I am trying to run a query like
> >> > "select mean(count(a)) from table1 group by b;"
> >> >
> >> > I am getting the following error
> >> > <snip>
> >> > FAILED: Hive Internal Error: java.lang.NullPointerException(null)
> >> > java.lang.NullPointerException
> >> > at
> >> >
> >>
> org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc.newInstance(ExprNodeGenericFuncDesc.java:151)
> >> > at
> >> >
> >>
> org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory$DefaultExprProcessor.getXpathOrFuncExprNodeDesc(TypeCheckProcFactory.java:656)
> >> > at
> >> >
> >>
> org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory$DefaultExprProcessor.process(TypeCheckProcFactory.java:777)
> >> > at
> >> >
> >>
> org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:89)
> >> > at
> >> >
> >>
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:88)
> >> > at
> >> >
> >>
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:125)
> >> > at
> >> >
> >>
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:102)
> >> > at
> >> >
> >>
> org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory.genExprNode(TypeCheckProcFactory.java:157)
> >> > at
> >> >
> >>
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genExprNodeDesc(SemanticAnalyzer.java:7447)
> >> > at
> >> >
> >>
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genExprNodeDesc(SemanticAnalyzer.java:7405)
> >> > at
> >> >
> >>
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genGroupByPlanMapGroupByOperator(SemanticAnalyzer.java:2747)
> >> > at
> >> >
> >>
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genGroupByPlanMapAggr1MR(SemanticAnalyzer.java:3365)
> >> > at
> >> >
> >>
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:5858)
> >> > at
> >> >
> >>
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:6480)
> >> > at
> >> >
> >>
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:7223)
> >> > at
> >> >
> >>
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:240)
> >> > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:428)
> >> > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:336)
> >> > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:901)
> >> > at
> >> >
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:253)
> >> > at
> >> org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:210)
> >> > at
> >> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:401)
> >> > at
> >> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:336)
> >> > at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:635)
> >> > at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:552)
> >> > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> >> > at
> >> >
> >>
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> >> > at
> >> >
> >>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> >> > at java.lang.reflect.Method.invoke(Method.java:597)
> >> > at org.apache.hadoop.util.RunJar.main(RunJar.java:197)
> >> > </snip>
> >> >
> >> > Is there a workaround ? I have tried with hive 0.7.1 and 0.8
> >> >
> >> > Thanks
> >> > --
> >> > Rohan Monga
> >> >
> >>
> >
>
>