You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Sonal Goyal <so...@gmail.com> on 2010/02/01 06:38:37 UTC

Help writing UDAF with custom object

Hi,

I am writing a UDAF which returns the top x results per key. Lets say my
input is

key attribute count
1      1            6
1      2            5
1      3            4
2      1            8
2      2            4
2      3            1

I want the top 2 results per key. Which will be:

key attribute count
1      1            6
1      2            5
2      1            8
2      2            4

I have written a UDAF for this in the attached file. However, when I run the
code, I get the exception:
FAILED: Unknown exception :
org.apache.hadoop.hive.serde2.objectinspector.primitive.JavaIntObjectInspector
cannot be cast to
org.apache.hadoop.hive.serde2.objectinspector.primitive.SettableIntObjectInspector


Can anyone please let me know what I could be doing wrong?
Thanks and Regards,
Sonal

Re: Help writing UDAF with custom object

Posted by Sonal Goyal <so...@gmail.com>.
Hi Zheng,

Thanks for looking into this. I was earlier working off the hive binary
install. But after moving to the latest from trunk, I no longer have this
issue.

Thanks for your help!

Thanks and Regards,
Sonal


On Thu, Feb 4, 2010 at 3:11 AM, Zheng Shao <zs...@gmail.com> wrote:

> Which version of Hive are you using?
>
> I looked at the code for trunk and cannot find
> PrimitiveObjectInspectorFactory.java:166
>
> Zheng
>
> On Mon, Feb 1, 2010 at 3:41 AM, Sonal Goyal <so...@gmail.com> wrote:
> > Hi Zheng,
> >
> > Thanks for your response. I had initially used ints, but due to the error
> I
> > got, I changed them to Integers. I have now reverted the code to use ints
> as
> > suggested by you.
> >
> > My problem:
> > I have a table called products_bought which has a number of products
> bought
> > by each customer ordered by count bought. I want to get the top x
> customers
> > of each product.
> >
> > Table products_bought
> >  product_id customer_id product_count
> >   1      1            6
> >   1      2            5
> >   1      3            4
> >   2      1            8
> >   2      2            4
> >   2      3            1
> >
> >   I want the say, top 2 results per products. Which will be:
> >
> >   product_id customer_id product_count
> >   1      1            6
> >   1      2            5
> >   2      1            8
> >   2      2            4
> >
> > Solution:
> > I create a jar with the code I sent and do the following steps in cli
> >
> > 1. add jar jarname
> > 2. create temporary function topx as 'class name';
> > 3. select topx(2, product_id, customer_id, product_count) from
> > products_bought
> >
> > The logs give me the error:
> > 0/02/01 16:56:28 DEBUG ipc.RPC: Call: mkdirs 23
> > 10/02/01 16:56:28 INFO parse.SemanticAnalyzer: Completed getting MetaData
> in
> > Semantic Analysis
> > 10/02/01 16:56:28 DEBUG parse.SemanticAnalyzer: Created Table Plan for
> > products_bought org.apache.hadoop.hive.ql.exec.TableScanOperator@72d8978c
> > 10/02/01 16:56:28 DEBUG exec.FunctionRegistry: Looking up GenericUDAF:
> topx
> > FAILED: Unknown exception : Internal error: Cannot recognize int
> > 10/02/01 16:56:28 ERROR ql.Driver: FAILED: Unknown exception : Internal
> > error: Cannot recognize int
> > java.lang.RuntimeException: Internal error: Cannot recognize int
> >     at
> >
> org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorFactory.getPrimitiveObjectInspectorFromClass(PrimitiveObjectInspectorFactory.java:166)
> >     at
> >
> org.apache.hadoop.hive.ql.udf.generic.GenericUDFUtils$PrimitiveConversionHelper.<init>(GenericUDFUtils.java:197)
> >     at
> >
> org.apache.hadoop.hive.ql.udf.generic.GenericUDAFBridge$GenericUDAFBridgeEvaluator.init(GenericUDAFBridge.java:123)
> >     at
> >
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getGenericUDAFInfo(SemanticAnalyzer.java:1592)
> >     at
> >
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genGroupByPlanMapGroupByOperator(SemanticAnalyzer.java:1912)
> >     at
> >
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genGroupByPlanMapAggr1MR(SemanticAnalyzer.java:2452)
> >     at
> >
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:3733)
> >     at
> >
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:4184)
> >     at
> >
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:4425)
> >     at
> >
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:76)
> >     at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:249)
> >     at org.apache.hadoop.hive.ql.Driver.run(Driver.java:281)
> >     at
> org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:123)
> >     at
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:181)
> >     at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:287)
> >     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> >     at
> >
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> >     at
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> >     at java.lang.reflect.Method.invoke(Method.java:597)
> >     at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> >
> > I am going through the code mentioned by Zheng to see if there is
> something
> > wrong I am doing. At this point of time, I think my main concern is to
> get
> > the function to output something and to verify that Hive specific hooks
> are
> > in place. If you have any suggestions, please do let me know.
> >
> > Thanks and Regards,
> > Sonal
> >
> >
> > On Mon, Feb 1, 2010 at 1:19 PM, Zheng Shao <zs...@gmail.com> wrote:
> >>
> >> The first problem is:
> >>
> >>                private Integer key;
> >>                private Integer attribute;
> >>                private Integer count;
> >>
> >> Java Integer objects are non-modifiable, which means we have to create
> >> a new object per row (which in turn makes the code really
> >> inefficient).
> >>
> >> You can change it to "private int" to make it efficient (and also
> >> works for Hive).
> >>
> >>
> >> Second, can you post your Hive query? It seems your code does not do
> >> what you want. You might want to take a look at
> >> http://issues.apache.org/jira/browse/HIVE-894 for the UDAF max_n and
> >> see how that works for Hive.
> >>
> >> Zheng
> >>
> >> On Sun, Jan 31, 2010 at 9:38 PM, Sonal Goyal <so...@gmail.com>
> >> wrote:
> >> > Hi,
> >> >
> >> > I am writing a UDAF which returns the top x results per key. Lets say
> my
> >> > input is
> >> >
> >> > key attribute count
> >> > 1      1            6
> >> > 1      2            5
> >> > 1      3            4
> >> > 2      1            8
> >> > 2      2            4
> >> > 2      3            1
> >> >
> >> > I want the top 2 results per key. Which will be:
> >> >
> >> > key attribute count
> >> > 1      1            6
> >> > 1      2            5
> >> > 2      1            8
> >> > 2      2            4
> >> >
> >> > I have written a UDAF for this in the attached file. However, when I
> run
> >> > the
> >> > code, I get the exception:
> >> > FAILED: Unknown exception :
> >> >
> >> >
> org.apache.hadoop.hive.serde2.objectinspector.primitive.JavaIntObjectInspector
> >> > cannot be cast to
> >> >
> >> >
> org.apache.hadoop.hive.serde2.objectinspector.primitive.SettableIntObjectInspector
> >> >
> >> >
> >> > Can anyone please let me know what I could be doing wrong?
> >> > Thanks and Regards,
> >> > Sonal
> >> >
> >>
> >>
> >>
> >> --
> >> Yours,
> >> Zheng
> >
> >
>
>
>
> --
> Yours,
> Zheng
>

Re: Help writing UDAF with custom object

Posted by Zheng Shao <zs...@gmail.com>.
Which version of Hive are you using?

I looked at the code for trunk and cannot find
PrimitiveObjectInspectorFactory.java:166

Zheng

On Mon, Feb 1, 2010 at 3:41 AM, Sonal Goyal <so...@gmail.com> wrote:
> Hi Zheng,
>
> Thanks for your response. I had initially used ints, but due to the error I
> got, I changed them to Integers. I have now reverted the code to use ints as
> suggested by you.
>
> My problem:
> I have a table called products_bought which has a number of products bought
> by each customer ordered by count bought. I want to get the top x customers
> of each product.
>
> Table products_bought
>  product_id customer_id product_count
>   1      1            6
>   1      2            5
>   1      3            4
>   2      1            8
>   2      2            4
>   2      3            1
>
>   I want the say, top 2 results per products. Which will be:
>
>   product_id customer_id product_count
>   1      1            6
>   1      2            5
>   2      1            8
>   2      2            4
>
> Solution:
> I create a jar with the code I sent and do the following steps in cli
>
> 1. add jar jarname
> 2. create temporary function topx as 'class name';
> 3. select topx(2, product_id, customer_id, product_count) from
> products_bought
>
> The logs give me the error:
> 0/02/01 16:56:28 DEBUG ipc.RPC: Call: mkdirs 23
> 10/02/01 16:56:28 INFO parse.SemanticAnalyzer: Completed getting MetaData in
> Semantic Analysis
> 10/02/01 16:56:28 DEBUG parse.SemanticAnalyzer: Created Table Plan for
> products_bought org.apache.hadoop.hive.ql.exec.TableScanOperator@72d8978c
> 10/02/01 16:56:28 DEBUG exec.FunctionRegistry: Looking up GenericUDAF: topx
> FAILED: Unknown exception : Internal error: Cannot recognize int
> 10/02/01 16:56:28 ERROR ql.Driver: FAILED: Unknown exception : Internal
> error: Cannot recognize int
> java.lang.RuntimeException: Internal error: Cannot recognize int
>     at
> org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorFactory.getPrimitiveObjectInspectorFromClass(PrimitiveObjectInspectorFactory.java:166)
>     at
> org.apache.hadoop.hive.ql.udf.generic.GenericUDFUtils$PrimitiveConversionHelper.<init>(GenericUDFUtils.java:197)
>     at
> org.apache.hadoop.hive.ql.udf.generic.GenericUDAFBridge$GenericUDAFBridgeEvaluator.init(GenericUDAFBridge.java:123)
>     at
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getGenericUDAFInfo(SemanticAnalyzer.java:1592)
>     at
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genGroupByPlanMapGroupByOperator(SemanticAnalyzer.java:1912)
>     at
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genGroupByPlanMapAggr1MR(SemanticAnalyzer.java:2452)
>     at
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:3733)
>     at
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:4184)
>     at
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:4425)
>     at
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:76)
>     at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:249)
>     at org.apache.hadoop.hive.ql.Driver.run(Driver.java:281)
>     at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:123)
>     at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:181)
>     at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:287)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>     at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>     at java.lang.reflect.Method.invoke(Method.java:597)
>     at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
>
> I am going through the code mentioned by Zheng to see if there is something
> wrong I am doing. At this point of time, I think my main concern is to get
> the function to output something and to verify that Hive specific hooks are
> in place. If you have any suggestions, please do let me know.
>
> Thanks and Regards,
> Sonal
>
>
> On Mon, Feb 1, 2010 at 1:19 PM, Zheng Shao <zs...@gmail.com> wrote:
>>
>> The first problem is:
>>
>>                private Integer key;
>>                private Integer attribute;
>>                private Integer count;
>>
>> Java Integer objects are non-modifiable, which means we have to create
>> a new object per row (which in turn makes the code really
>> inefficient).
>>
>> You can change it to "private int" to make it efficient (and also
>> works for Hive).
>>
>>
>> Second, can you post your Hive query? It seems your code does not do
>> what you want. You might want to take a look at
>> http://issues.apache.org/jira/browse/HIVE-894 for the UDAF max_n and
>> see how that works for Hive.
>>
>> Zheng
>>
>> On Sun, Jan 31, 2010 at 9:38 PM, Sonal Goyal <so...@gmail.com>
>> wrote:
>> > Hi,
>> >
>> > I am writing a UDAF which returns the top x results per key. Lets say my
>> > input is
>> >
>> > key attribute count
>> > 1      1            6
>> > 1      2            5
>> > 1      3            4
>> > 2      1            8
>> > 2      2            4
>> > 2      3            1
>> >
>> > I want the top 2 results per key. Which will be:
>> >
>> > key attribute count
>> > 1      1            6
>> > 1      2            5
>> > 2      1            8
>> > 2      2            4
>> >
>> > I have written a UDAF for this in the attached file. However, when I run
>> > the
>> > code, I get the exception:
>> > FAILED: Unknown exception :
>> >
>> > org.apache.hadoop.hive.serde2.objectinspector.primitive.JavaIntObjectInspector
>> > cannot be cast to
>> >
>> > org.apache.hadoop.hive.serde2.objectinspector.primitive.SettableIntObjectInspector
>> >
>> >
>> > Can anyone please let me know what I could be doing wrong?
>> > Thanks and Regards,
>> > Sonal
>> >
>>
>>
>>
>> --
>> Yours,
>> Zheng
>
>



-- 
Yours,
Zheng

Re: Help writing UDAF with custom object

Posted by Sonal Goyal <so...@gmail.com>.
Hi Zheng,

Thanks for your response. I had initially used ints, but due to the error I
got, I changed them to Integers. I have now reverted the code to use ints as
suggested by you.

My problem:
I have a table called products_bought which has a number of products bought
by each customer ordered by count bought. I want to get the top x customers
of each product.

Table products_bought
 product_id customer_id product_count
  1      1            6
  1      2            5
  1      3            4
  2      1            8
  2      2            4
  2      3            1

  I want the say, top 2 results per products. Which will be:

  product_id customer_id product_count
  1      1            6
  1      2            5
  2      1            8
  2      2            4

Solution:
I create a jar with the code I sent and do the following steps in cli

1. add jar jarname
2. create temporary function topx as 'class name';
3. select topx(2, product_id, customer_id, product_count) from
products_bought

The logs give me the error:
0/02/01 16:56:28 DEBUG ipc.RPC: Call: mkdirs 23
10/02/01 16:56:28 INFO parse.SemanticAnalyzer: Completed getting MetaData in
Semantic Analysis
10/02/01 16:56:28 DEBUG parse.SemanticAnalyzer: Created Table Plan for
products_bought org.apache.hadoop.hive.ql.exec.TableScanOperator@72d8978c
10/02/01 16:56:28 DEBUG exec.FunctionRegistry: Looking up GenericUDAF: topx
FAILED: Unknown exception : Internal error: Cannot recognize int
10/02/01 16:56:28 ERROR ql.Driver: FAILED: Unknown exception : Internal
error: Cannot recognize int
java.lang.RuntimeException: Internal error: Cannot recognize int
    at
org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorFactory.getPrimitiveObjectInspectorFromClass(PrimitiveObjectInspectorFactory.java:166)
    at
org.apache.hadoop.hive.ql.udf.generic.GenericUDFUtils$PrimitiveConversionHelper.<init>(GenericUDFUtils.java:197)
    at
org.apache.hadoop.hive.ql.udf.generic.GenericUDAFBridge$GenericUDAFBridgeEvaluator.init(GenericUDAFBridge.java:123)
    at
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getGenericUDAFInfo(SemanticAnalyzer.java:1592)
    at
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genGroupByPlanMapGroupByOperator(SemanticAnalyzer.java:1912)
    at
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genGroupByPlanMapAggr1MR(SemanticAnalyzer.java:2452)
    at
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:3733)
    at
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:4184)
    at
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:4425)
    at
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:76)
    at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:249)
    at org.apache.hadoop.hive.ql.Driver.run(Driver.java:281)
    at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:123)
    at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:181)
    at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:287)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:156)

I am going through the code mentioned by Zheng to see if there is something
wrong I am doing. At this point of time, I think my main concern is to get
the function to output something and to verify that Hive specific hooks are
in place. If you have any suggestions, please do let me know.

Thanks and Regards,
Sonal


On Mon, Feb 1, 2010 at 1:19 PM, Zheng Shao <zs...@gmail.com> wrote:

> The first problem is:
>
>                private Integer key;
>                private Integer attribute;
>                private Integer count;
>
> Java Integer objects are non-modifiable, which means we have to create
> a new object per row (which in turn makes the code really
> inefficient).
>
> You can change it to "private int" to make it efficient (and also
> works for Hive).
>
>
> Second, can you post your Hive query? It seems your code does not do
> what you want. You might want to take a look at
> http://issues.apache.org/jira/browse/HIVE-894 for the UDAF max_n and
> see how that works for Hive.
>
> Zheng
>
> On Sun, Jan 31, 2010 at 9:38 PM, Sonal Goyal <so...@gmail.com>
> wrote:
> > Hi,
> >
> > I am writing a UDAF which returns the top x results per key. Lets say my
> > input is
> >
> > key attribute count
> > 1      1            6
> > 1      2            5
> > 1      3            4
> > 2      1            8
> > 2      2            4
> > 2      3            1
> >
> > I want the top 2 results per key. Which will be:
> >
> > key attribute count
> > 1      1            6
> > 1      2            5
> > 2      1            8
> > 2      2            4
> >
> > I have written a UDAF for this in the attached file. However, when I run
> the
> > code, I get the exception:
> > FAILED: Unknown exception :
> >
> org.apache.hadoop.hive.serde2.objectinspector.primitive.JavaIntObjectInspector
> > cannot be cast to
> >
> org.apache.hadoop.hive.serde2.objectinspector.primitive.SettableIntObjectInspector
> >
> >
> > Can anyone please let me know what I could be doing wrong?
> > Thanks and Regards,
> > Sonal
> >
>
>
>
> --
> Yours,
> Zheng
>

Re: Help writing UDAF with custom object

Posted by Zheng Shao <zs...@gmail.com>.
The first problem is:

		private Integer key;
		private Integer attribute;
		private Integer count;

Java Integer objects are non-modifiable, which means we have to create
a new object per row (which in turn makes the code really
inefficient).

You can change it to "private int" to make it efficient (and also
works for Hive).


Second, can you post your Hive query? It seems your code does not do
what you want. You might want to take a look at
http://issues.apache.org/jira/browse/HIVE-894 for the UDAF max_n and
see how that works for Hive.

Zheng

On Sun, Jan 31, 2010 at 9:38 PM, Sonal Goyal <so...@gmail.com> wrote:
> Hi,
>
> I am writing a UDAF which returns the top x results per key. Lets say my
> input is
>
> key attribute count
> 1      1            6
> 1      2            5
> 1      3            4
> 2      1            8
> 2      2            4
> 2      3            1
>
> I want the top 2 results per key. Which will be:
>
> key attribute count
> 1      1            6
> 1      2            5
> 2      1            8
> 2      2            4
>
> I have written a UDAF for this in the attached file. However, when I run the
> code, I get the exception:
> FAILED: Unknown exception :
> org.apache.hadoop.hive.serde2.objectinspector.primitive.JavaIntObjectInspector
> cannot be cast to
> org.apache.hadoop.hive.serde2.objectinspector.primitive.SettableIntObjectInspector
>
>
> Can anyone please let me know what I could be doing wrong?
> Thanks and Regards,
> Sonal
>



-- 
Yours,
Zheng