You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Sonal Goyal <so...@gmail.com> on 2010/02/03 12:23:52 UTC

Resolvers for UDAFs

Hi,

I am writing a UDAF which takes in 4 parameters. I have 2 cases - one where
all the paramters are ints, and second where the last parameter is double. I
wrote two evaluators for this, with iterate as

public boolean iterate(int max, int groupBy, int attribute, int count)

and

public boolean iterate(int max, int groupBy, int attribute, double count)

However, when I run a query, I get the exception:
org.apache.hadoop.hive.ql.exec.AmbiguousMethodException: Ambiguous method
for class org.apache.hadoop.hive.udaf.TopXPerGroup with [int, int, int, int]
        at
org.apache.hadoop.hive.ql.exec.DefaultUDAFEvaluatorResolver.getEvaluatorClass(DefaultUDAFEvaluatorResolver.java:83)
        at
org.apache.hadoop.hive.ql.udf.generic.GenericUDAFBridge.getEvaluator(GenericUDAFBridge.java:57)
        at
org.apache.hadoop.hive.ql.exec.FunctionRegistry.getGenericUDAFEvaluator(FunctionRegistry.java:594)
        at
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getGenericUDAFEvaluator(SemanticAnalyzer.java:1882)
        at
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genGroupByPlanMapGroupByOperator(SemanticAnalyzer.java:2270)
        at
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genGroupByPlanMapAggr1MR(SemanticAnalyzer.java:2821)
        at
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:4543)
        at
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:5058)
        at
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:4999)
        at
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:5020)
        at
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:4999)
        at
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:5020)
        at
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:5587)
        at
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:114)
        at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:317)
        at org.apache.hadoop.hive.ql.Driver.runCommand(Driver.java:370)
        at org.apache.hadoop.hive.ql.Driver.run(Driver.java:362)
        at
org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:140)
        at
org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:200)
        at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:311)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:156)

One option for me is to write  a resolver which I will do. But, I just
wanted to know if this is a bug in hive whereby it is not able to get the
write evaluator. Or if this is a gap in my understanding.

I look forward to hearing your views on this.

Thanks and Regards,
Sonal

Re: Resolvers for UDAFs

Posted by Zheng Shao <zs...@gmail.com>.
Yes it should be:

SELECT customer_id, topx(2, product_id, product_count)
FROM products_bought
GROUP BY customer_id;



On Wed, Feb 3, 2010 at 11:31 PM, Sonal Goyal <so...@gmail.com> wrote:
> Hi Zheng,
>
> Wouldnt the query you mentioned need a group by clause? I need the top x
> customers per product id. Sorry, can you please explain.
>
> Thanks and Regards,
> Sonal
>
>
> On Thu, Feb 4, 2010 at 12:07 PM, Sonal Goyal <so...@gmail.com> wrote:
>>
>> Hi Zheng,
>>
>> Thanks for your email and your feedback. I will try to change the code as
>> suggested by you.
>>
>> Here is the output of describe:
>>
>> hive> describe products_bought;
>> OK
>> product_id    int
>> customer_id    int
>> product_count    int
>>
>>
>> My function was working fine earlier with this table and iterate(int, int,
>> int, int). Once I introduced the other iterate, it stopped working.
>>
>>
>> Thanks and Regards,
>> Sonal
>>
>>
>> On Thu, Feb 4, 2010 at 11:37 AM, Zheng Shao <zs...@gmail.com> wrote:
>>>
>>> Hi Sonal,
>>>
>>> 1. We usually move the group_by column out of the UDAF - just like we
>>> do "SELECT key, sum(value) FROM table".
>>>
>>> I think you should write:
>>>
>>> SELECT customer_id, topx(2, product_id, product_count)
>>> FROM products_bought
>>>
>>> and in topx:
>>> public boolean iterate(int max, int attribute, int count).
>>>
>>>
>>> 2. Can you run "describe products_bought"? Does product_count column
>>> have type "int"?
>>>
>>> You might want to try removing the other interate function to see
>>> whether that solves the problem.
>>>
>>>
>>> Zheng
>>>
>>>
>>> On Wed, Feb 3, 2010 at 9:58 PM, Sonal Goyal <so...@gmail.com>
>>> wrote:
>>> > Hi Zheng,
>>> >
>>> > My query is:
>>> >
>>> > select a.myTable.key, a.myTable.attribute, a.myTable.count from (select
>>> > explode (t.pc) as myTable from (select topx(2, product_id, customer_id,
>>> > product_count) as pc from (select product_id, customer_id,
>>> > product_count
>>> > from products_bought order by product_id, product_count desc) r ) t )a;
>>> >
>>> > My overloaded iterators are:
>>> >
>>> > public boolean iterate(int max, int groupBy, int attribute, int count)
>>> >
>>> > public boolean iterate(int max, int groupBy, int attribute, double
>>> > count)
>>> >
>>> > Before overloading, my query was running fine. My table products_bought
>>> > is:
>>> > product_id int, customer_id int, product_count int
>>> >
>>> > And I get:
>>> > FAILED: Error in semantic analysis: Ambiguous method for class
>>> > org.apache.hadoop.hive.udaf.TopXPerGroup with [int, int, int, int]
>>> >
>>> > The hive logs say:
>>> > 2010-02-03 11:18:15,721 ERROR processors.DeleteResourceProcessor
>>> > (SessionState.java:printError(255)) - Usage: delete [FILE|JAR|ARCHIVE]
>>> > <value> [<value>]*
>>> > 2010-02-03 11:22:14,663 ERROR ql.Driver
>>> > (SessionState.java:printError(255))
>>> > - FAILED: Error in semantic analysis: Ambiguous method for class
>>> > org.apache.hadoop.hive.udaf.TopXPerGroup with [int, int, int, int]
>>> > org.apache.hadoop.hive.ql.exec.AmbiguousMethodException: Ambiguous
>>> > method
>>> > for class org.apache.hadoop.hive.udaf.TopXPerGroup with [int, int, int,
>>> > int]
>>> >         at
>>> >
>>> > org.apache.hadoop.hive.ql.exec.DefaultUDAFEvaluatorResolver.getEvaluatorClass(DefaultUDAFEvaluatorResolver.java:83)
>>> >         at
>>> >
>>> > org.apache.hadoop.hive.ql.udf.generic.GenericUDAFBridge.getEvaluator(GenericUDAFBridge.java:57)
>>> >         at
>>> >
>>> > org.apache.hadoop.hive.ql.exec.FunctionRegistry.getGenericUDAFEvaluator(FunctionRegistry.java:594)
>>> >         at
>>> >
>>> > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getGenericUDAFEvaluator(SemanticAnalyzer.java:1882)
>>> >         at
>>> >
>>> > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genGroupByPlanMapGroupByOperator(SemanticAnalyzer.java:2270)
>>> >         at
>>> >
>>> > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genGroupByPlanMapAggr1MR(SemanticAnalyzer.java:2821)
>>> >         at
>>> >
>>> > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:4543)
>>> >         at
>>> >
>>> > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:5058)
>>> >         at
>>> >
>>> > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:4999)
>>> >         at
>>> >
>>> > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:5020)
>>> >         at
>>> >
>>> > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:4999)
>>> >         at
>>> >
>>> > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:5020)
>>> >         at
>>> >
>>> > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:5587)
>>> >         at
>>> >
>>> > org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:114)
>>> >         at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:317)
>>> >         at org.apache.hadoop.hive.ql.Driver.runCommand(Driver.java:370)
>>> >         at org.apache.hadoop.hive.ql.Driver.run(Driver.java:362)
>>> >         at
>>> > org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:140)
>>> >         at
>>> > org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:200)
>>> >         at
>>> > org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:311)
>>> >         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>> >         at
>>> >
>>> > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>> >         at
>>> >
>>> > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>> >         at java.lang.reflect.Method.invoke(Method.java:597)
>>> >         at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
>>> >
>>> >
>>> >
>>> > Thanks and Regards,
>>> > Sonal
>>> >
>>> >
>>> > On Thu, Feb 4, 2010 at 12:12 AM, Zheng Shao <zs...@gmail.com> wrote:
>>> >>
>>> >> Can you post the Hive query? What are the types of the parameters that
>>> >> you passed to the function?
>>> >>
>>> >> Zheng
>>> >>
>>> >> On Wed, Feb 3, 2010 at 3:23 AM, Sonal Goyal <so...@gmail.com>
>>> >> wrote:
>>> >> > Hi,
>>> >> >
>>> >> > I am writing a UDAF which takes in 4 parameters. I have 2 cases -
>>> >> > one
>>> >> > where
>>> >> > all the paramters are ints, and second where the last parameter is
>>> >> > double. I
>>> >> > wrote two evaluators for this, with iterate as
>>> >> >
>>> >> > public boolean iterate(int max, int groupBy, int attribute, int
>>> >> > count)
>>> >> >
>>> >> > and
>>> >> >
>>> >> > public boolean iterate(int max, int groupBy, int attribute, double
>>> >> > count)
>>> >> >
>>> >> > However, when I run a query, I get the exception:
>>> >> > org.apache.hadoop.hive.ql.exec.AmbiguousMethodException: Ambiguous
>>> >> > method
>>> >> > for class org.apache.hadoop.hive.udaf.TopXPerGroup with [int, int,
>>> >> > int,
>>> >> > int]
>>> >> >         at
>>> >> >
>>> >> >
>>> >> > org.apache.hadoop.hive.ql.exec.DefaultUDAFEvaluatorResolver.getEvaluatorClass(DefaultUDAFEvaluatorResolver.java:83)
>>> >> >         at
>>> >> >
>>> >> >
>>> >> > org.apache.hadoop.hive.ql.udf.generic.GenericUDAFBridge.getEvaluator(GenericUDAFBridge.java:57)
>>> >> >         at
>>> >> >
>>> >> >
>>> >> > org.apache.hadoop.hive.ql.exec.FunctionRegistry.getGenericUDAFEvaluator(FunctionRegistry.java:594)
>>> >> >         at
>>> >> >
>>> >> >
>>> >> > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getGenericUDAFEvaluator(SemanticAnalyzer.java:1882)
>>> >> >         at
>>> >> >
>>> >> >
>>> >> > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genGroupByPlanMapGroupByOperator(SemanticAnalyzer.java:2270)
>>> >> >         at
>>> >> >
>>> >> >
>>> >> > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genGroupByPlanMapAggr1MR(SemanticAnalyzer.java:2821)
>>> >> >         at
>>> >> >
>>> >> >
>>> >> > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:4543)
>>> >> >         at
>>> >> >
>>> >> >
>>> >> > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:5058)
>>> >> >         at
>>> >> >
>>> >> >
>>> >> > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:4999)
>>> >> >         at
>>> >> >
>>> >> >
>>> >> > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:5020)
>>> >> >         at
>>> >> >
>>> >> >
>>> >> > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:4999)
>>> >> >         at
>>> >> >
>>> >> >
>>> >> > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:5020)
>>> >> >         at
>>> >> >
>>> >> >
>>> >> > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:5587)
>>> >> >         at
>>> >> >
>>> >> >
>>> >> > org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:114)
>>> >> >         at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:317)
>>> >> >         at
>>> >> > org.apache.hadoop.hive.ql.Driver.runCommand(Driver.java:370)
>>> >> >         at org.apache.hadoop.hive.ql.Driver.run(Driver.java:362)
>>> >> >         at
>>> >> > org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:140)
>>> >> >         at
>>> >> > org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:200)
>>> >> >         at
>>> >> > org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:311)
>>> >> >         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
>>> >> > Method)
>>> >> >         at
>>> >> >
>>> >> >
>>> >> > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>> >> >         at
>>> >> >
>>> >> >
>>> >> > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>> >> >         at java.lang.reflect.Method.invoke(Method.java:597)
>>> >> >         at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
>>> >> >
>>> >> > One option for me is to write  a resolver which I will do. But, I
>>> >> > just
>>> >> > wanted to know if this is a bug in hive whereby it is not able to
>>> >> > get
>>> >> > the
>>> >> > write evaluator. Or if this is a gap in my understanding.
>>> >> >
>>> >> > I look forward to hearing your views on this.
>>> >> >
>>> >> > Thanks and Regards,
>>> >> > Sonal
>>> >> >
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> Yours,
>>> >> Zheng
>>> >
>>> >
>>>
>>>
>>>
>>> --
>>> Yours,
>>> Zheng
>>
>
>



-- 
Yours,
Zheng

Re: Resolvers for UDAFs

Posted by Sonal Goyal <so...@gmail.com>.
Hi Zheng,

Wouldnt the query you mentioned need a group by clause? I need the top x
customers per product id. Sorry, can you please explain.

Thanks and Regards,
Sonal


On Thu, Feb 4, 2010 at 12:07 PM, Sonal Goyal <so...@gmail.com> wrote:

> Hi Zheng,
>
> Thanks for your email and your feedback. I will try to change the code as
> suggested by you.
>
> Here is the output of describe:
>
> *hive> describe products_bought;
> OK
>
> product_id    int
> customer_id    int
> product_count    int
>
>
> *My function was working fine earlier with this table and iterate(int,
> int, int, int). Once I introduced the other iterate, it stopped working.
>
>
> Thanks and Regards,
> Sonal
>
>
>
> On Thu, Feb 4, 2010 at 11:37 AM, Zheng Shao <zs...@gmail.com> wrote:
>
>> Hi Sonal,
>>
>> 1. We usually move the group_by column out of the UDAF - just like we
>> do "SELECT key, sum(value) FROM table".
>>
>> I think you should write:
>>
>> SELECT customer_id, topx(2, product_id, product_count)
>> FROM products_bought
>>
>> and in topx:
>> public boolean iterate(int max, int attribute, int count).
>>
>>
>> 2. Can you run "describe products_bought"? Does product_count column
>> have type "int"?
>>
>> You might want to try removing the other interate function to see
>> whether that solves the problem.
>>
>>
>> Zheng
>>
>>
>> On Wed, Feb 3, 2010 at 9:58 PM, Sonal Goyal <so...@gmail.com>
>> wrote:
>> > Hi Zheng,
>> >
>> > My query is:
>> >
>> > select a.myTable.key, a.myTable.attribute, a.myTable.count from (select
>> > explode (t.pc) as myTable from (select topx(2, product_id, customer_id,
>> > product_count) as pc from (select product_id, customer_id, product_count
>> > from products_bought order by product_id, product_count desc) r ) t )a;
>> >
>> > My overloaded iterators are:
>> >
>> > public boolean iterate(int max, int groupBy, int attribute, int count)
>> >
>> > public boolean iterate(int max, int groupBy, int attribute, double
>> count)
>> >
>> > Before overloading, my query was running fine. My table products_bought
>> is:
>> > product_id int, customer_id int, product_count int
>> >
>> > And I get:
>> > FAILED: Error in semantic analysis: Ambiguous method for class
>> > org.apache.hadoop.hive.udaf.TopXPerGroup with [int, int, int, int]
>> >
>> > The hive logs say:
>> > 2010-02-03 11:18:15,721 ERROR processors.DeleteResourceProcessor
>> > (SessionState.java:printError(255)) - Usage: delete [FILE|JAR|ARCHIVE]
>> > <value> [<value>]*
>> > 2010-02-03 11:22:14,663 ERROR ql.Driver
>> (SessionState.java:printError(255))
>> > - FAILED: Error in semantic analysis: Ambiguous method for class
>> > org.apache.hadoop.hive.udaf.TopXPerGroup with [int, int, int, int]
>> > org.apache.hadoop.hive.ql.exec.AmbiguousMethodException: Ambiguous
>> method
>> > for class org.apache.hadoop.hive.udaf.TopXPerGroup with [int, int, int,
>> int]
>> >         at
>> >
>> org.apache.hadoop.hive.ql.exec.DefaultUDAFEvaluatorResolver.getEvaluatorClass(DefaultUDAFEvaluatorResolver.java:83)
>> >         at
>> >
>> org.apache.hadoop.hive.ql.udf.generic.GenericUDAFBridge.getEvaluator(GenericUDAFBridge.java:57)
>> >         at
>> >
>> org.apache.hadoop.hive.ql.exec.FunctionRegistry.getGenericUDAFEvaluator(FunctionRegistry.java:594)
>> >         at
>> >
>> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getGenericUDAFEvaluator(SemanticAnalyzer.java:1882)
>> >         at
>> >
>> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genGroupByPlanMapGroupByOperator(SemanticAnalyzer.java:2270)
>> >         at
>> >
>> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genGroupByPlanMapAggr1MR(SemanticAnalyzer.java:2821)
>> >         at
>> >
>> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:4543)
>> >         at
>> >
>> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:5058)
>> >         at
>> >
>> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:4999)
>> >         at
>> >
>> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:5020)
>> >         at
>> >
>> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:4999)
>> >         at
>> >
>> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:5020)
>> >         at
>> >
>> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:5587)
>> >         at
>> >
>> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:114)
>> >         at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:317)
>> >         at org.apache.hadoop.hive.ql.Driver.runCommand(Driver.java:370)
>> >         at org.apache.hadoop.hive.ql.Driver.run(Driver.java:362)
>> >         at
>> > org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:140)
>> >         at
>> > org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:200)
>> >         at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:311)
>> >         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> >         at
>> >
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>> >         at
>> >
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>> >         at java.lang.reflect.Method.invoke(Method.java:597)
>> >         at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
>> >
>> >
>> >
>> > Thanks and Regards,
>> > Sonal
>> >
>> >
>> > On Thu, Feb 4, 2010 at 12:12 AM, Zheng Shao <zs...@gmail.com> wrote:
>> >>
>> >> Can you post the Hive query? What are the types of the parameters that
>> >> you passed to the function?
>> >>
>> >> Zheng
>> >>
>> >> On Wed, Feb 3, 2010 at 3:23 AM, Sonal Goyal <so...@gmail.com>
>> wrote:
>> >> > Hi,
>> >> >
>> >> > I am writing a UDAF which takes in 4 parameters. I have 2 cases - one
>> >> > where
>> >> > all the paramters are ints, and second where the last parameter is
>> >> > double. I
>> >> > wrote two evaluators for this, with iterate as
>> >> >
>> >> > public boolean iterate(int max, int groupBy, int attribute, int
>> count)
>> >> >
>> >> > and
>> >> >
>> >> > public boolean iterate(int max, int groupBy, int attribute, double
>> >> > count)
>> >> >
>> >> > However, when I run a query, I get the exception:
>> >> > org.apache.hadoop.hive.ql.exec.AmbiguousMethodException: Ambiguous
>> >> > method
>> >> > for class org.apache.hadoop.hive.udaf.TopXPerGroup with [int, int,
>> int,
>> >> > int]
>> >> >         at
>> >> >
>> >> >
>> org.apache.hadoop.hive.ql.exec.DefaultUDAFEvaluatorResolver.getEvaluatorClass(DefaultUDAFEvaluatorResolver.java:83)
>> >> >         at
>> >> >
>> >> >
>> org.apache.hadoop.hive.ql.udf.generic.GenericUDAFBridge.getEvaluator(GenericUDAFBridge.java:57)
>> >> >         at
>> >> >
>> >> >
>> org.apache.hadoop.hive.ql.exec.FunctionRegistry.getGenericUDAFEvaluator(FunctionRegistry.java:594)
>> >> >         at
>> >> >
>> >> >
>> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getGenericUDAFEvaluator(SemanticAnalyzer.java:1882)
>> >> >         at
>> >> >
>> >> >
>> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genGroupByPlanMapGroupByOperator(SemanticAnalyzer.java:2270)
>> >> >         at
>> >> >
>> >> >
>> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genGroupByPlanMapAggr1MR(SemanticAnalyzer.java:2821)
>> >> >         at
>> >> >
>> >> >
>> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:4543)
>> >> >         at
>> >> >
>> >> >
>> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:5058)
>> >> >         at
>> >> >
>> >> >
>> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:4999)
>> >> >         at
>> >> >
>> >> >
>> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:5020)
>> >> >         at
>> >> >
>> >> >
>> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:4999)
>> >> >         at
>> >> >
>> >> >
>> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:5020)
>> >> >         at
>> >> >
>> >> >
>> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:5587)
>> >> >         at
>> >> >
>> >> >
>> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:114)
>> >> >         at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:317)
>> >> >         at
>> org.apache.hadoop.hive.ql.Driver.runCommand(Driver.java:370)
>> >> >         at org.apache.hadoop.hive.ql.Driver.run(Driver.java:362)
>> >> >         at
>> >> > org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:140)
>> >> >         at
>> >> > org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:200)
>> >> >         at
>> org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:311)
>> >> >         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
>> Method)
>> >> >         at
>> >> >
>> >> >
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>> >> >         at
>> >> >
>> >> >
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>> >> >         at java.lang.reflect.Method.invoke(Method.java:597)
>> >> >         at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
>> >> >
>> >> > One option for me is to write  a resolver which I will do. But, I
>> just
>> >> > wanted to know if this is a bug in hive whereby it is not able to get
>> >> > the
>> >> > write evaluator. Or if this is a gap in my understanding.
>> >> >
>> >> > I look forward to hearing your views on this.
>> >> >
>> >> > Thanks and Regards,
>> >> > Sonal
>> >> >
>> >>
>> >>
>> >>
>> >> --
>> >> Yours,
>> >> Zheng
>> >
>> >
>>
>>
>>
>> --
>> Yours,
>> Zheng
>>
>
>

Re: Resolvers for UDAFs

Posted by Sonal Goyal <so...@gmail.com>.
Hi Zheng,

Thanks for your email and your feedback. I will try to change the code as
suggested by you.

Here is the output of describe:

*hive> describe products_bought;
OK
product_id    int
customer_id    int
product_count    int


*My function was working fine earlier with this table and iterate(int, int,
int, int). Once I introduced the other iterate, it stopped working.


Thanks and Regards,
Sonal


On Thu, Feb 4, 2010 at 11:37 AM, Zheng Shao <zs...@gmail.com> wrote:

> Hi Sonal,
>
> 1. We usually move the group_by column out of the UDAF - just like we
> do "SELECT key, sum(value) FROM table".
>
> I think you should write:
>
> SELECT customer_id, topx(2, product_id, product_count)
> FROM products_bought
>
> and in topx:
> public boolean iterate(int max, int attribute, int count).
>
>
> 2. Can you run "describe products_bought"? Does product_count column
> have type "int"?
>
> You might want to try removing the other interate function to see
> whether that solves the problem.
>
>
> Zheng
>
>
> On Wed, Feb 3, 2010 at 9:58 PM, Sonal Goyal <so...@gmail.com> wrote:
> > Hi Zheng,
> >
> > My query is:
> >
> > select a.myTable.key, a.myTable.attribute, a.myTable.count from (select
> > explode (t.pc) as myTable from (select topx(2, product_id, customer_id,
> > product_count) as pc from (select product_id, customer_id, product_count
> > from products_bought order by product_id, product_count desc) r ) t )a;
> >
> > My overloaded iterators are:
> >
> > public boolean iterate(int max, int groupBy, int attribute, int count)
> >
> > public boolean iterate(int max, int groupBy, int attribute, double count)
> >
> > Before overloading, my query was running fine. My table products_bought
> is:
> > product_id int, customer_id int, product_count int
> >
> > And I get:
> > FAILED: Error in semantic analysis: Ambiguous method for class
> > org.apache.hadoop.hive.udaf.TopXPerGroup with [int, int, int, int]
> >
> > The hive logs say:
> > 2010-02-03 11:18:15,721 ERROR processors.DeleteResourceProcessor
> > (SessionState.java:printError(255)) - Usage: delete [FILE|JAR|ARCHIVE]
> > <value> [<value>]*
> > 2010-02-03 11:22:14,663 ERROR ql.Driver
> (SessionState.java:printError(255))
> > - FAILED: Error in semantic analysis: Ambiguous method for class
> > org.apache.hadoop.hive.udaf.TopXPerGroup with [int, int, int, int]
> > org.apache.hadoop.hive.ql.exec.AmbiguousMethodException: Ambiguous method
> > for class org.apache.hadoop.hive.udaf.TopXPerGroup with [int, int, int,
> int]
> >         at
> >
> org.apache.hadoop.hive.ql.exec.DefaultUDAFEvaluatorResolver.getEvaluatorClass(DefaultUDAFEvaluatorResolver.java:83)
> >         at
> >
> org.apache.hadoop.hive.ql.udf.generic.GenericUDAFBridge.getEvaluator(GenericUDAFBridge.java:57)
> >         at
> >
> org.apache.hadoop.hive.ql.exec.FunctionRegistry.getGenericUDAFEvaluator(FunctionRegistry.java:594)
> >         at
> >
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getGenericUDAFEvaluator(SemanticAnalyzer.java:1882)
> >         at
> >
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genGroupByPlanMapGroupByOperator(SemanticAnalyzer.java:2270)
> >         at
> >
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genGroupByPlanMapAggr1MR(SemanticAnalyzer.java:2821)
> >         at
> >
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:4543)
> >         at
> >
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:5058)
> >         at
> >
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:4999)
> >         at
> >
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:5020)
> >         at
> >
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:4999)
> >         at
> >
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:5020)
> >         at
> >
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:5587)
> >         at
> >
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:114)
> >         at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:317)
> >         at org.apache.hadoop.hive.ql.Driver.runCommand(Driver.java:370)
> >         at org.apache.hadoop.hive.ql.Driver.run(Driver.java:362)
> >         at
> > org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:140)
> >         at
> > org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:200)
> >         at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:311)
> >         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> >         at
> >
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> >         at
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> >         at java.lang.reflect.Method.invoke(Method.java:597)
> >         at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> >
> >
> >
> > Thanks and Regards,
> > Sonal
> >
> >
> > On Thu, Feb 4, 2010 at 12:12 AM, Zheng Shao <zs...@gmail.com> wrote:
> >>
> >> Can you post the Hive query? What are the types of the parameters that
> >> you passed to the function?
> >>
> >> Zheng
> >>
> >> On Wed, Feb 3, 2010 at 3:23 AM, Sonal Goyal <so...@gmail.com>
> wrote:
> >> > Hi,
> >> >
> >> > I am writing a UDAF which takes in 4 parameters. I have 2 cases - one
> >> > where
> >> > all the paramters are ints, and second where the last parameter is
> >> > double. I
> >> > wrote two evaluators for this, with iterate as
> >> >
> >> > public boolean iterate(int max, int groupBy, int attribute, int count)
> >> >
> >> > and
> >> >
> >> > public boolean iterate(int max, int groupBy, int attribute, double
> >> > count)
> >> >
> >> > However, when I run a query, I get the exception:
> >> > org.apache.hadoop.hive.ql.exec.AmbiguousMethodException: Ambiguous
> >> > method
> >> > for class org.apache.hadoop.hive.udaf.TopXPerGroup with [int, int,
> int,
> >> > int]
> >> >         at
> >> >
> >> >
> org.apache.hadoop.hive.ql.exec.DefaultUDAFEvaluatorResolver.getEvaluatorClass(DefaultUDAFEvaluatorResolver.java:83)
> >> >         at
> >> >
> >> >
> org.apache.hadoop.hive.ql.udf.generic.GenericUDAFBridge.getEvaluator(GenericUDAFBridge.java:57)
> >> >         at
> >> >
> >> >
> org.apache.hadoop.hive.ql.exec.FunctionRegistry.getGenericUDAFEvaluator(FunctionRegistry.java:594)
> >> >         at
> >> >
> >> >
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getGenericUDAFEvaluator(SemanticAnalyzer.java:1882)
> >> >         at
> >> >
> >> >
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genGroupByPlanMapGroupByOperator(SemanticAnalyzer.java:2270)
> >> >         at
> >> >
> >> >
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genGroupByPlanMapAggr1MR(SemanticAnalyzer.java:2821)
> >> >         at
> >> >
> >> >
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:4543)
> >> >         at
> >> >
> >> >
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:5058)
> >> >         at
> >> >
> >> >
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:4999)
> >> >         at
> >> >
> >> >
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:5020)
> >> >         at
> >> >
> >> >
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:4999)
> >> >         at
> >> >
> >> >
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:5020)
> >> >         at
> >> >
> >> >
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:5587)
> >> >         at
> >> >
> >> >
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:114)
> >> >         at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:317)
> >> >         at
> org.apache.hadoop.hive.ql.Driver.runCommand(Driver.java:370)
> >> >         at org.apache.hadoop.hive.ql.Driver.run(Driver.java:362)
> >> >         at
> >> > org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:140)
> >> >         at
> >> > org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:200)
> >> >         at
> org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:311)
> >> >         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> >> >         at
> >> >
> >> >
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> >> >         at
> >> >
> >> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> >> >         at java.lang.reflect.Method.invoke(Method.java:597)
> >> >         at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> >> >
> >> > One option for me is to write  a resolver which I will do. But, I just
> >> > wanted to know if this is a bug in hive whereby it is not able to get
> >> > the
> >> > write evaluator. Or if this is a gap in my understanding.
> >> >
> >> > I look forward to hearing your views on this.
> >> >
> >> > Thanks and Regards,
> >> > Sonal
> >> >
> >>
> >>
> >>
> >> --
> >> Yours,
> >> Zheng
> >
> >
>
>
>
> --
> Yours,
> Zheng
>

Re: Resolvers for UDAFs

Posted by Zheng Shao <zs...@gmail.com>.
Hi Sonal,

1. We usually move the group_by column out of the UDAF - just like we
do "SELECT key, sum(value) FROM table".

I think you should write:

SELECT customer_id, topx(2, product_id, product_count)
FROM products_bought

and in topx:
public boolean iterate(int max, int attribute, int count).


2. Can you run "describe products_bought"? Does product_count column
have type "int"?

You might want to try removing the other interate function to see
whether that solves the problem.


Zheng


On Wed, Feb 3, 2010 at 9:58 PM, Sonal Goyal <so...@gmail.com> wrote:
> Hi Zheng,
>
> My query is:
>
> select a.myTable.key, a.myTable.attribute, a.myTable.count from (select
> explode (t.pc) as myTable from (select topx(2, product_id, customer_id,
> product_count) as pc from (select product_id, customer_id, product_count
> from products_bought order by product_id, product_count desc) r ) t )a;
>
> My overloaded iterators are:
>
> public boolean iterate(int max, int groupBy, int attribute, int count)
>
> public boolean iterate(int max, int groupBy, int attribute, double count)
>
> Before overloading, my query was running fine. My table products_bought is:
> product_id int, customer_id int, product_count int
>
> And I get:
> FAILED: Error in semantic analysis: Ambiguous method for class
> org.apache.hadoop.hive.udaf.TopXPerGroup with [int, int, int, int]
>
> The hive logs say:
> 2010-02-03 11:18:15,721 ERROR processors.DeleteResourceProcessor
> (SessionState.java:printError(255)) - Usage: delete [FILE|JAR|ARCHIVE]
> <value> [<value>]*
> 2010-02-03 11:22:14,663 ERROR ql.Driver (SessionState.java:printError(255))
> - FAILED: Error in semantic analysis: Ambiguous method for class
> org.apache.hadoop.hive.udaf.TopXPerGroup with [int, int, int, int]
> org.apache.hadoop.hive.ql.exec.AmbiguousMethodException: Ambiguous method
> for class org.apache.hadoop.hive.udaf.TopXPerGroup with [int, int, int, int]
>         at
> org.apache.hadoop.hive.ql.exec.DefaultUDAFEvaluatorResolver.getEvaluatorClass(DefaultUDAFEvaluatorResolver.java:83)
>         at
> org.apache.hadoop.hive.ql.udf.generic.GenericUDAFBridge.getEvaluator(GenericUDAFBridge.java:57)
>         at
> org.apache.hadoop.hive.ql.exec.FunctionRegistry.getGenericUDAFEvaluator(FunctionRegistry.java:594)
>         at
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getGenericUDAFEvaluator(SemanticAnalyzer.java:1882)
>         at
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genGroupByPlanMapGroupByOperator(SemanticAnalyzer.java:2270)
>         at
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genGroupByPlanMapAggr1MR(SemanticAnalyzer.java:2821)
>         at
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:4543)
>         at
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:5058)
>         at
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:4999)
>         at
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:5020)
>         at
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:4999)
>         at
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:5020)
>         at
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:5587)
>         at
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:114)
>         at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:317)
>         at org.apache.hadoop.hive.ql.Driver.runCommand(Driver.java:370)
>         at org.apache.hadoop.hive.ql.Driver.run(Driver.java:362)
>         at
> org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:140)
>         at
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:200)
>         at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:311)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>         at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
>
>
>
> Thanks and Regards,
> Sonal
>
>
> On Thu, Feb 4, 2010 at 12:12 AM, Zheng Shao <zs...@gmail.com> wrote:
>>
>> Can you post the Hive query? What are the types of the parameters that
>> you passed to the function?
>>
>> Zheng
>>
>> On Wed, Feb 3, 2010 at 3:23 AM, Sonal Goyal <so...@gmail.com> wrote:
>> > Hi,
>> >
>> > I am writing a UDAF which takes in 4 parameters. I have 2 cases - one
>> > where
>> > all the paramters are ints, and second where the last parameter is
>> > double. I
>> > wrote two evaluators for this, with iterate as
>> >
>> > public boolean iterate(int max, int groupBy, int attribute, int count)
>> >
>> > and
>> >
>> > public boolean iterate(int max, int groupBy, int attribute, double
>> > count)
>> >
>> > However, when I run a query, I get the exception:
>> > org.apache.hadoop.hive.ql.exec.AmbiguousMethodException: Ambiguous
>> > method
>> > for class org.apache.hadoop.hive.udaf.TopXPerGroup with [int, int, int,
>> > int]
>> >         at
>> >
>> > org.apache.hadoop.hive.ql.exec.DefaultUDAFEvaluatorResolver.getEvaluatorClass(DefaultUDAFEvaluatorResolver.java:83)
>> >         at
>> >
>> > org.apache.hadoop.hive.ql.udf.generic.GenericUDAFBridge.getEvaluator(GenericUDAFBridge.java:57)
>> >         at
>> >
>> > org.apache.hadoop.hive.ql.exec.FunctionRegistry.getGenericUDAFEvaluator(FunctionRegistry.java:594)
>> >         at
>> >
>> > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getGenericUDAFEvaluator(SemanticAnalyzer.java:1882)
>> >         at
>> >
>> > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genGroupByPlanMapGroupByOperator(SemanticAnalyzer.java:2270)
>> >         at
>> >
>> > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genGroupByPlanMapAggr1MR(SemanticAnalyzer.java:2821)
>> >         at
>> >
>> > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:4543)
>> >         at
>> >
>> > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:5058)
>> >         at
>> >
>> > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:4999)
>> >         at
>> >
>> > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:5020)
>> >         at
>> >
>> > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:4999)
>> >         at
>> >
>> > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:5020)
>> >         at
>> >
>> > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:5587)
>> >         at
>> >
>> > org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:114)
>> >         at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:317)
>> >         at org.apache.hadoop.hive.ql.Driver.runCommand(Driver.java:370)
>> >         at org.apache.hadoop.hive.ql.Driver.run(Driver.java:362)
>> >         at
>> > org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:140)
>> >         at
>> > org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:200)
>> >         at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:311)
>> >         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> >         at
>> >
>> > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>> >         at
>> >
>> > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>> >         at java.lang.reflect.Method.invoke(Method.java:597)
>> >         at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
>> >
>> > One option for me is to write  a resolver which I will do. But, I just
>> > wanted to know if this is a bug in hive whereby it is not able to get
>> > the
>> > write evaluator. Or if this is a gap in my understanding.
>> >
>> > I look forward to hearing your views on this.
>> >
>> > Thanks and Regards,
>> > Sonal
>> >
>>
>>
>>
>> --
>> Yours,
>> Zheng
>
>



-- 
Yours,
Zheng

Re: Resolvers for UDAFs

Posted by Sonal Goyal <so...@gmail.com>.
Hi Zheng,

My query is:

select a.myTable.key, a.myTable.attribute, a.myTable.count from (select
explode (t.pc) as myTable from (select topx(2, product_id, customer_id,
product_count) as pc from (select product_id, customer_id, product_count
from products_bought order by product_id, product_count desc) r ) t )a;

My overloaded iterators are:

public boolean iterate(int max, int groupBy, int attribute, int count)

public boolean iterate(int max, int groupBy, int attribute, double count)

Before overloading, my query was running fine. My table products_bought is:
product_id int, customer_id int, product_count int

And I get:
FAILED: Error in semantic analysis: Ambiguous method for class
org.apache.hadoop.hive.udaf.TopXPerGroup with [int, int, int, int]

The hive logs say:
2010-02-03 11:18:15,721 ERROR processors.DeleteResourceProcessor
(SessionState.java:printError(255)) - Usage: delete [FILE|JAR|ARCHIVE]
<value> [<value>]*
2010-02-03 11:22:14,663 ERROR ql.Driver (SessionState.java:printError(255))
- FAILED: Error in semantic analysis: Ambiguous method for class
org.apache.hadoop.hive.udaf.TopXPerGroup with [int, int, int, int]
org.apache.hadoop.hive.ql.exec.AmbiguousMethodException: Ambiguous method
for class org.apache.hadoop.hive.udaf.TopXPerGroup with [int, int, int, int]
        at
org.apache.hadoop.hive.ql.exec.DefaultUDAFEvaluatorResolver.getEvaluatorClass(DefaultUDAFEvaluatorResolver.java:83)
        at
org.apache.hadoop.hive.ql.udf.generic.GenericUDAFBridge.getEvaluator(GenericUDAFBridge.java:57)
        at
org.apache.hadoop.hive.ql.exec.FunctionRegistry.getGenericUDAFEvaluator(FunctionRegistry.java:594)
        at
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getGenericUDAFEvaluator(SemanticAnalyzer.java:1882)
        at
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genGroupByPlanMapGroupByOperator(SemanticAnalyzer.java:2270)
        at
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genGroupByPlanMapAggr1MR(SemanticAnalyzer.java:2821)
        at
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:4543)
        at
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:5058)
        at
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:4999)
        at
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:5020)
        at
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:4999)
        at
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:5020)
        at
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:5587)
        at
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:114)
        at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:317)
        at org.apache.hadoop.hive.ql.Driver.runCommand(Driver.java:370)
        at org.apache.hadoop.hive.ql.Driver.run(Driver.java:362)
        at
org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:140)
        at
org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:200)
        at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:311)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:156)



Thanks and Regards,
Sonal


On Thu, Feb 4, 2010 at 12:12 AM, Zheng Shao <zs...@gmail.com> wrote:

> Can you post the Hive query? What are the types of the parameters that
> you passed to the function?
>
> Zheng
>
> On Wed, Feb 3, 2010 at 3:23 AM, Sonal Goyal <so...@gmail.com> wrote:
> > Hi,
> >
> > I am writing a UDAF which takes in 4 parameters. I have 2 cases - one
> where
> > all the paramters are ints, and second where the last parameter is
> double. I
> > wrote two evaluators for this, with iterate as
> >
> > public boolean iterate(int max, int groupBy, int attribute, int count)
> >
> > and
> >
> > public boolean iterate(int max, int groupBy, int attribute, double count)
> >
> > However, when I run a query, I get the exception:
> > org.apache.hadoop.hive.ql.exec.AmbiguousMethodException: Ambiguous method
> > for class org.apache.hadoop.hive.udaf.TopXPerGroup with [int, int, int,
> int]
> >         at
> >
> org.apache.hadoop.hive.ql.exec.DefaultUDAFEvaluatorResolver.getEvaluatorClass(DefaultUDAFEvaluatorResolver.java:83)
> >         at
> >
> org.apache.hadoop.hive.ql.udf.generic.GenericUDAFBridge.getEvaluator(GenericUDAFBridge.java:57)
> >         at
> >
> org.apache.hadoop.hive.ql.exec.FunctionRegistry.getGenericUDAFEvaluator(FunctionRegistry.java:594)
> >         at
> >
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getGenericUDAFEvaluator(SemanticAnalyzer.java:1882)
> >         at
> >
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genGroupByPlanMapGroupByOperator(SemanticAnalyzer.java:2270)
> >         at
> >
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genGroupByPlanMapAggr1MR(SemanticAnalyzer.java:2821)
> >         at
> >
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:4543)
> >         at
> >
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:5058)
> >         at
> >
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:4999)
> >         at
> >
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:5020)
> >         at
> >
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:4999)
> >         at
> >
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:5020)
> >         at
> >
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:5587)
> >         at
> >
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:114)
> >         at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:317)
> >         at org.apache.hadoop.hive.ql.Driver.runCommand(Driver.java:370)
> >         at org.apache.hadoop.hive.ql.Driver.run(Driver.java:362)
> >         at
> > org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:140)
> >         at
> > org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:200)
> >         at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:311)
> >         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> >         at
> >
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> >         at
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> >         at java.lang.reflect.Method.invoke(Method.java:597)
> >         at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> >
> > One option for me is to write  a resolver which I will do. But, I just
> > wanted to know if this is a bug in hive whereby it is not able to get the
> > write evaluator. Or if this is a gap in my understanding.
> >
> > I look forward to hearing your views on this.
> >
> > Thanks and Regards,
> > Sonal
> >
>
>
>
> --
> Yours,
> Zheng
>

Re: Resolvers for UDAFs

Posted by Zheng Shao <zs...@gmail.com>.
Can you post the Hive query? What are the types of the parameters that
you passed to the function?

Zheng

On Wed, Feb 3, 2010 at 3:23 AM, Sonal Goyal <so...@gmail.com> wrote:
> Hi,
>
> I am writing a UDAF which takes in 4 parameters. I have 2 cases - one where
> all the paramters are ints, and second where the last parameter is double. I
> wrote two evaluators for this, with iterate as
>
> public boolean iterate(int max, int groupBy, int attribute, int count)
>
> and
>
> public boolean iterate(int max, int groupBy, int attribute, double count)
>
> However, when I run a query, I get the exception:
> org.apache.hadoop.hive.ql.exec.AmbiguousMethodException: Ambiguous method
> for class org.apache.hadoop.hive.udaf.TopXPerGroup with [int, int, int, int]
>         at
> org.apache.hadoop.hive.ql.exec.DefaultUDAFEvaluatorResolver.getEvaluatorClass(DefaultUDAFEvaluatorResolver.java:83)
>         at
> org.apache.hadoop.hive.ql.udf.generic.GenericUDAFBridge.getEvaluator(GenericUDAFBridge.java:57)
>         at
> org.apache.hadoop.hive.ql.exec.FunctionRegistry.getGenericUDAFEvaluator(FunctionRegistry.java:594)
>         at
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getGenericUDAFEvaluator(SemanticAnalyzer.java:1882)
>         at
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genGroupByPlanMapGroupByOperator(SemanticAnalyzer.java:2270)
>         at
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genGroupByPlanMapAggr1MR(SemanticAnalyzer.java:2821)
>         at
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:4543)
>         at
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:5058)
>         at
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:4999)
>         at
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:5020)
>         at
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:4999)
>         at
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:5020)
>         at
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:5587)
>         at
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:114)
>         at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:317)
>         at org.apache.hadoop.hive.ql.Driver.runCommand(Driver.java:370)
>         at org.apache.hadoop.hive.ql.Driver.run(Driver.java:362)
>         at
> org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:140)
>         at
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:200)
>         at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:311)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>         at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
>
> One option for me is to write  a resolver which I will do. But, I just
> wanted to know if this is a bug in hive whereby it is not able to get the
> write evaluator. Or if this is a gap in my understanding.
>
> I look forward to hearing your views on this.
>
> Thanks and Regards,
> Sonal
>



-- 
Yours,
Zheng