You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hive.apache.org by Mark Grover <mg...@oanda.com> on 2012/02/01 20:06:03 UTC

Re: Invoke a UDAF inside another UDAF

Rohan,
You could do it one of the following ways:
1) Write a UDAF that does the avg(f2 - avg_f2) computation.
2) Write a custom reducer that does the avg(f2 - avg_f2) computation.
3) Do it with multiple passes over the data. Something like this (untested):

select
   table.f1,
   avg_table.avg_f2,
   avg(table.f2-avg_table.avg_f2)
from 
(
select 
   f1,
   avg(f2) as avg_f2 
from
   table
group by
   f1)avg_table
join
table 
ON (avg_table.f1=table.f1)
group by 
   table.f1,
   avg_table.avg_f2;

Mark

Mark Grover, Business Intelligence Analyst
OANDA Corporation 

www: oanda.com www: fxtrade.com 
e: mgrover@oanda.com 

"Best Trading Platform" - World Finance's Forex Awards 2009. 
"The One to Watch" - Treasury Today's Adam Smith Awards 2009. 


----- Original Message -----
From: "rohan monga" <mo...@gmail.com>
To: user@hive.apache.org
Sent: Friday, January 20, 2012 6:00:54 PM
Subject: Re: Invoke a UDAF inside another UDAF

my bad, i hastily converted the query to a wrong example. 

it should be like this 

select f1, avg(f2) as avg_f2, avg(f2 - avg_f2) from table group by f1; 

In essence, I just want to use the value generated by one UDAF ( in this case avg(f2) ) as a single number and then apply that value to the group inside a different UDAF. 
For e.g. if I were to use a streaming reducer, it would be something like this 

avg1 = computeSum(list) / len(list) 
return computeSum(x-avg1 for x in list) / len(list) 

As I write this I realize why this might not be possible [ the group computation being done in one step and the information being lost ] :) 

But why the nullpointer exception? 

Regards, 
-- 
Rohan Monga 



On Fri, Jan 20, 2012 at 2:32 PM, Edward Capriolo < edlinuxguru@gmail.com > wrote: 


IMHO You can not possibly nest the percentile calculation because the 
results would be meaningless. percentile has to aggregate a set and 
pick the Nth element, But if you nest then the inner percentile only 
returns one result to the outer percentile, and that is pretty 
meaningless. 

(I think someone talked about this on list in the last month or so). 
Without seeing your input data and your expected results, i can not 
understand what your query wants to do, and suggest an alternative. 





On 1/20/12, rohan monga < monga.rohan@gmail.com > wrote: 
> thanks edward that seems to work :) 
> 
> However, I have another query is like this 
> 
> select a, avg(b) as avg_b, percentile_approx( avg_b - percentile_approx( b, 
> .5), .5 ) from table1 group by a 
> 
> Here I will loose the group info if I include the inner query in the FROM 
> clause, is there a way to get this to work? 
> 
> Thanks, 
> -- 
> Rohan Monga 
> 
> 
> On Fri, Jan 20, 2012 at 12:51 PM, Edward Capriolo 
> < edlinuxguru@gmail.com >wrote: 
> 
>> I think if you are grouping by b, b has to be in your select list. Try 
>> this. 
>> FROM ( 
>> select b,count(a) as theCount from table one group by b 
>> ) a select mean(theCount); 
>> 
>> I think that should work. 
>> 
>> On 1/20/12, rohan monga < monga.rohan@gmail.com > wrote: 
>> > Hi, 
>> > I am trying to run a query like 
>> > "select mean(count(a)) from table1 group by b;" 
>> > 
>> > I am getting the following error 
>> > <snip> 
>> > FAILED: Hive Internal Error: java.lang.NullPointerException(null) 
>> > java.lang.NullPointerException 
>> > at 
>> > 
>> org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc.newInstance(ExprNodeGenericFuncDesc.java:151) 
>> > at 
>> > 
>> org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory$DefaultExprProcessor.getXpathOrFuncExprNodeDesc(TypeCheckProcFactory.java:656) 
>> > at 
>> > 
>> org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory$DefaultExprProcessor.process(TypeCheckProcFactory.java:777) 
>> > at 
>> > 
>> org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:89) 
>> > at 
>> > 
>> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:88) 
>> > at 
>> > 
>> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:125) 
>> > at 
>> > 
>> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:102) 
>> > at 
>> > 
>> org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory.genExprNode(TypeCheckProcFactory.java:157) 
>> > at 
>> > 
>> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genExprNodeDesc(SemanticAnalyzer.java:7447) 
>> > at 
>> > 
>> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genExprNodeDesc(SemanticAnalyzer.java:7405) 
>> > at 
>> > 
>> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genGroupByPlanMapGroupByOperator(SemanticAnalyzer.java:2747) 
>> > at 
>> > 
>> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genGroupByPlanMapAggr1MR(SemanticAnalyzer.java:3365) 
>> > at 
>> > 
>> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:5858) 
>> > at 
>> > 
>> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:6480) 
>> > at 
>> > 
>> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:7223) 
>> > at 
>> > 
>> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:240) 
>> > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:428) 
>> > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:336) 
>> > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:901) 
>> > at 
>> > org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:253) 
>> > at 
>> org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:210) 
>> > at 
>> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:401) 
>> > at 
>> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:336) 
>> > at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:635) 
>> > at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:552) 
>> > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
>> > at 
>> > 
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) 
>> > at 
>> > 
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) 
>> > at java.lang.reflect.Method.invoke(Method.java:597) 
>> > at org.apache.hadoop.util.RunJar.main(RunJar.java:197) 
>> > </snip> 
>> > 
>> > Is there a workaround ? I have tried with hive 0.7.1 and 0.8 
>> > 
>> > Thanks 
>> > -- 
>> > Rohan Monga 
>> > 
>> 
>

Re: Invoke a UDAF inside another UDAF

Posted by rohan monga <mo...@gmail.com>.

thanks Mark,
I ended up going the custom reducer way. I will try out the query you have
sent.

Regards,
--
Rohan Monga


On Wed, Feb 1, 2012 at 11:06 AM, Mark Grover <mg...@oanda.com> wrote:

> Rohan,
> You could do it one of the following ways:
> 1) Write a UDAF that does the avg(f2 - avg_f2) computation.
> 2) Write a custom reducer that does the avg(f2 - avg_f2) computation.
> 3) Do it with multiple passes over the data. Something like this
> (untested):
>
> select
>   table.f1,
>   avg_table.avg_f2,
>   avg(table.f2-avg_table.avg_f2)
> from
> (
> select
>   f1,
>   avg(f2) as avg_f2
> from
>   table
> group by
>   f1)avg_table
> join
> table
> ON (avg_table.f1=table.f1)
> group by
>   table.f1,
>   avg_table.avg_f2;
>
> Mark
>
> Mark Grover, Business Intelligence Analyst
> OANDA Corporation
>
> www: oanda.com www: fxtrade.com
> e: mgrover@oanda.com
>
> "Best Trading Platform" - World Finance's Forex Awards 2009.
> "The One to Watch" - Treasury Today's Adam Smith Awards 2009.
>
>
> ----- Original Message -----
> From: "rohan monga" <mo...@gmail.com>
> To: user@hive.apache.org
> Sent: Friday, January 20, 2012 6:00:54 PM
> Subject: Re: Invoke a UDAF inside another UDAF
>
> my bad, i hastily converted the query to a wrong example.
>
> it should be like this
>
> select f1, avg(f2) as avg_f2, avg(f2 - avg_f2) from table group by f1;
>
> In essence, I just want to use the value generated by one UDAF ( in this
> case avg(f2) ) as a single number and then apply that value to the group
> inside a different UDAF.
> For e.g. if I were to use a streaming reducer, it would be something like
> this
>
> avg1 = computeSum(list) / len(list)
> return computeSum(x-avg1 for x in list) / len(list)
>
> As I write this I realize why this might not be possible [ the group
> computation being done in one step and the information being lost ] :)
>
> But why the nullpointer exception?
>
> Regards,
> --
> Rohan Monga
>
>
>
> On Fri, Jan 20, 2012 at 2:32 PM, Edward Capriolo < edlinuxguru@gmail.com> wrote:
>
>
> IMHO You can not possibly nest the percentile calculation because the
> results would be meaningless. percentile has to aggregate a set and
> pick the Nth element, But if you nest then the inner percentile only
> returns one result to the outer percentile, and that is pretty
> meaningless.
>
> (I think someone talked about this on list in the last month or so).
> Without seeing your input data and your expected results, i can not
> understand what your query wants to do, and suggest an alternative.
>
>
>
>
>
> On 1/20/12, rohan monga < monga.rohan@gmail.com > wrote:
> > thanks edward that seems to work :)
> >
> > However, I have another query is like this
> >
> > select a, avg(b) as avg_b, percentile_approx( avg_b - percentile_approx(
> b,
> > .5), .5 ) from table1 group by a
> >
> > Here I will loose the group info if I include the inner query in the FROM
> > clause, is there a way to get this to work?
> >
> > Thanks,
> > --
> > Rohan Monga
> >
> >
> > On Fri, Jan 20, 2012 at 12:51 PM, Edward Capriolo
> > < edlinuxguru@gmail.com >wrote:
> >
> >> I think if you are grouping by b, b has to be in your select list. Try
> >> this.
> >> FROM (
> >> select b,count(a) as theCount from table one group by b
> >> ) a select mean(theCount);
> >>
> >> I think that should work.
> >>
> >> On 1/20/12, rohan monga < monga.rohan@gmail.com > wrote:
> >> > Hi,
> >> > I am trying to run a query like
> >> > "select mean(count(a)) from table1 group by b;"
> >> >
> >> > I am getting the following error
> >> > <snip>
> >> > FAILED: Hive Internal Error: java.lang.NullPointerException(null)
> >> > java.lang.NullPointerException
> >> > at
> >> >
> >>
> org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc.newInstance(ExprNodeGenericFuncDesc.java:151)
> >> > at
> >> >
> >>
> org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory$DefaultExprProcessor.getXpathOrFuncExprNodeDesc(TypeCheckProcFactory.java:656)
> >> > at
> >> >
> >>
> org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory$DefaultExprProcessor.process(TypeCheckProcFactory.java:777)
> >> > at
> >> >
> >>
> org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:89)
> >> > at
> >> >
> >>
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:88)
> >> > at
> >> >
> >>
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:125)
> >> > at
> >> >
> >>
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:102)
> >> > at
> >> >
> >>
> org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory.genExprNode(TypeCheckProcFactory.java:157)
> >> > at
> >> >
> >>
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genExprNodeDesc(SemanticAnalyzer.java:7447)
> >> > at
> >> >
> >>
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genExprNodeDesc(SemanticAnalyzer.java:7405)
> >> > at
> >> >
> >>
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genGroupByPlanMapGroupByOperator(SemanticAnalyzer.java:2747)
> >> > at
> >> >
> >>
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genGroupByPlanMapAggr1MR(SemanticAnalyzer.java:3365)
> >> > at
> >> >
> >>
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:5858)
> >> > at
> >> >
> >>
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:6480)
> >> > at
> >> >
> >>
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:7223)
> >> > at
> >> >
> >>
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:240)
> >> > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:428)
> >> > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:336)
> >> > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:901)
> >> > at
> >> >
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:253)
> >> > at
> >> org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:210)
> >> > at
> >> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:401)
> >> > at
> >> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:336)
> >> > at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:635)
> >> > at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:552)
> >> > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> >> > at
> >> >
> >>
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> >> > at
> >> >
> >>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> >> > at java.lang.reflect.Method.invoke(Method.java:597)
> >> > at org.apache.hadoop.util.RunJar.main(RunJar.java:197)
> >> > </snip>
> >> >
> >> > Is there a workaround ? I have tried with hive 0.7.1 and 0.8
> >> >
> >> > Thanks
> >> > --
> >> > Rohan Monga
> >> >
> >>
> >
>
>