You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Ross Levin <ro...@simulmedia.com> on 2013/11/11 20:31:33 UTC
Developing a GenericUDAF
Hello,
I'm writing a generic UDAF function that closely resembles SUM() with the
main difference being that it accepts an array datatype parameter and
returns an array datatype.
I've already done this for a GenericUDF successfully. I believe I am having
difficulty coding the proper ObjectInspectors for my parameter & return
objects since I am getting .ClassCastException exceptions for Long ->
LongArray. I am using a hybrid of the GenericUDAFSum.java sample and the
GenericUDAFCollect sample from the Programming Hive book.
My parameter is a fixed length array of longs and the return is the same
length array of longs. As with the SUM function, I do not need to keep the
individual row values that I collect, I can iterate the array, SUM it to
the container and move on to the next row. With this in mind, I think I
can disregard having an internalMergeOI.
Any input is appreciated.
Thanks,
Ross
Here is the exception:
-----
Diagnostic Messages for this Task:
java.lang.RuntimeException: Hive Runtime Error while closing operators
at
org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:232)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:441)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:377)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1132)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException:
java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot be
cast to [Ljava.lang.Object;
at
org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:1137)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:588)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:597)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:597)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:597)
at
org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:199)
... 8 more
Caused by: java.lang.ClassCastException: org.apache.hadoop.io.LongWritable
cannot be cast to [Ljava.lang.Object;
at
org.apache.hadoop.hive.serde2.objectinspector.StandardListObjectInspector.getListLength(StandardListObjectInspector.java:83)
at
org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe.serialize(LazyBinarySerDe.java:418)
at
org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe.serialize(LazyBinarySerDe.java:438)
at
org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe.serializeStruct(LazyBinarySerDe.java:257)
at
org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe.serialize(LazyBinarySerDe.java:204)
at
org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:245)
at
org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:502)
at
org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:832)
at
org.apache.hadoop.hive.ql.exec.GroupByOperator.forward(GroupByOperator.java:1066)
at
org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:1118)
... 13 more
Here is the pertinent code:
@Override
public ObjectInspector init(Mode m, ObjectInspector[] parameters)
throws HiveException
{
super.init(m, parameters);
if (m == Mode.PARTIAL1)
{
System.out.println("1 - init() mode: " + m + "
parameter[0]=" + parameters[0].toString());
inputOI = (StandardListObjectInspector) parameters[0];
return
ObjectInspectorFactory.getStandardListObjectInspector(inputOI);
}
else
{
System.out.println("2 - init() mode: " + m + "
parameter[0]=" + parameters[0].toString());
JavaLongObjectInspector doi;
doi =
PrimitiveObjectInspectorFactory.javaLongObjectInspector;
// Set up the list object inspector for the output, and
return it
ListObjectInspector loi;
loi =
ObjectInspectorFactory.getStandardListObjectInspector(doi);
return loi;
// inputOI = (StandardListObjectInspector)
ObjectInspectorUtils.getStandardObjectInspector(parameters[0]);
// return (StandardListObjectInspector)
ObjectInspectorFactory.getStandardListObjectInspector(inputOI);
}
}
static class BitmapAggregationBuffer implements AggregationBuffer {
ArrayList<Object> container;
}
@Override
public Object terminate(AggregationBuffer agg) throws HiveException
{
BitmapAggregationBuffer myAgg = (BitmapAggregationBuffer) agg;
return myAgg.container;
}
--
Ross Levin
Principal Software Engineer
*Simulmedia* | *People Ads Want*
(m) 609.760.5027
670 Broadway, 2nd Floor, New York, NY 10012
*Check out our new data and tools for TV* *Advertising at
**OpenAccess*<http://www.simulmedia.com/OpenAccess/>
*.*
Re: Developing a GenericUDAF
Posted by Ross Levin <ro...@simulmedia.com>.
Thanks Navis, that got me past this exception!
Ross
On Mon, Nov 11, 2013 at 6:03 PM, Navis류승우 <na...@nexr.com> wrote:
> in handling PARTIAL1,
>
> inputOI = (StandardListObjectInspector) parameters[0];
> return ObjectInspectorFactory.getStandardListObjectInspector(inputOI);
>
> 1.
> inputOI is not guaranteed to be a StandardListObjectInspector.
> Use ListObjectInspector instead.
>
> 2.
> ObjectInspectorFactory.getStandardListObjectInspector(inputOI)
>
> this is list of list. What you meant to be
>
>
> ObjectInspectorFactory.getStandardListObjectInspector(PrimitiveObjectInspectorFactory.javaLongObjectInspector)
>
>
>
> 2013/11/12 Ross Levin <ro...@simulmedia.com>
>
>> Hello,
>>
>> I'm writing a generic UDAF function that closely resembles SUM() with the
>> main difference being that it accepts an array datatype parameter and
>> returns an array datatype.
>>
>> I've already done this for a GenericUDF successfully. I believe I am
>> having difficulty coding the proper ObjectInspectors for my parameter &
>> return objects since I am getting .ClassCastException exceptions for Long
>> -> LongArray. I am using a hybrid of the GenericUDAFSum.java sample and
>> the GenericUDAFCollect sample from the Programming Hive book.
>>
>> My parameter is a fixed length array of longs and the return is the same
>> length array of longs. As with the SUM function, I do not need to keep the
>> individual row values that I collect, I can iterate the array, SUM it to
>> the container and move on to the next row. With this in mind, I think I
>> can disregard having an internalMergeOI.
>>
>> Any input is appreciated.
>>
>> Thanks,
>> Ross
>>
>>
>> Here is the exception:
>> -----
>> Diagnostic Messages for this Task:
>> java.lang.RuntimeException: Hive Runtime Error while closing operators
>> at
>> org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:232)
>> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
>> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:441)
>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:377)
>> at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
>> at java.security.AccessController.doPrivileged(Native Method)
>> at javax.security.auth.Subject.doAs(Subject.java:415)
>> at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1132)
>> at org.apache.hadoop.mapred.Child.main(Child.java:249)
>> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException:
>> java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot be
>> cast to [Ljava.lang.Object;
>> at
>> org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:1137)
>> at
>> org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:588)
>> at
>> org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:597)
>> at
>> org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:597)
>> at
>> org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:597)
>> at
>> org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:199)
>> ... 8 more
>> Caused by: java.lang.ClassCastException:
>> org.apache.hadoop.io.LongWritable cannot be cast to [Ljava.lang.Object;
>> at
>> org.apache.hadoop.hive.serde2.objectinspector.StandardListObjectInspector.getListLength(StandardListObjectInspector.java:83)
>> at
>> org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe.serialize(LazyBinarySerDe.java:418)
>> at
>> org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe.serialize(LazyBinarySerDe.java:438)
>> at
>> org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe.serializeStruct(LazyBinarySerDe.java:257)
>> at
>> org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe.serialize(LazyBinarySerDe.java:204)
>> at
>> org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:245)
>> at
>> org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:502)
>> at
>> org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:832)
>> at
>> org.apache.hadoop.hive.ql.exec.GroupByOperator.forward(GroupByOperator.java:1066)
>> at
>> org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:1118)
>> ... 13 more
>>
>> Here is the pertinent code:
>>
>> @Override
>> public ObjectInspector init(Mode m, ObjectInspector[] parameters)
>> throws HiveException
>> {
>> super.init(m, parameters);
>> if (m == Mode.PARTIAL1)
>> {
>> System.out.println("1 - init() mode: " + m + "
>> parameter[0]=" + parameters[0].toString());
>> inputOI = (StandardListObjectInspector) parameters[0];
>> return
>> ObjectInspectorFactory.getStandardListObjectInspector(inputOI);
>> }
>> else
>> {
>> System.out.println("2 - init() mode: " + m + "
>> parameter[0]=" + parameters[0].toString());
>> JavaLongObjectInspector doi;
>> doi =
>> PrimitiveObjectInspectorFactory.javaLongObjectInspector;
>>
>> // Set up the list object inspector for the output, and
>> return it
>> ListObjectInspector loi;
>> loi =
>> ObjectInspectorFactory.getStandardListObjectInspector(doi);
>> return loi;
>>
>>
>> // inputOI = (StandardListObjectInspector)
>> ObjectInspectorUtils.getStandardObjectInspector(parameters[0]);
>> // return (StandardListObjectInspector)
>> ObjectInspectorFactory.getStandardListObjectInspector(inputOI);
>> }
>> }
>>
>> static class BitmapAggregationBuffer implements AggregationBuffer
>> {
>> ArrayList<Object> container;
>> }
>>
>> @Override
>> public Object terminate(AggregationBuffer agg) throws
>> HiveException
>> {
>> BitmapAggregationBuffer myAgg = (BitmapAggregationBuffer) agg;
>> return myAgg.container;
>> }
>>
>>
>> --
>>
>> Ross Levin
>> Principal Software Engineer
>> *Simulmedia* | *People Ads Want*
>> (m) 609.760.5027
>> 670 Broadway, 2nd Floor, New York, NY 10012
>>
>>
>> *Check out our new data and tools for TV* *Advertising at **OpenAccess*<http://www.simulmedia.com/OpenAccess/>
>> *.*
>>
>
>
--
Ross Levin
Principal Software Engineer
*Simulmedia* | *People Ads Want*
(m) 609.760.5027
670 Broadway, 2nd Floor, New York, NY 10012
*Check out our new data and tools for TV* *Advertising at
**OpenAccess*<http://www.simulmedia.com/OpenAccess/>
*.*
Re: Developing a GenericUDAF
Posted by Navis류승우 <na...@nexr.com>.
in handling PARTIAL1,
inputOI = (StandardListObjectInspector) parameters[0];
return ObjectInspectorFactory.getStandardListObjectInspector(inputOI);
1.
inputOI is not guaranteed to be a StandardListObjectInspector.
Use ListObjectInspector instead.
2.
ObjectInspectorFactory.getStandardListObjectInspector(inputOI)
this is list of list. What you meant to be
ObjectInspectorFactory.getStandardListObjectInspector(PrimitiveObjectInspectorFactory.javaLongObjectInspector)
2013/11/12 Ross Levin <ro...@simulmedia.com>
> Hello,
>
> I'm writing a generic UDAF function that closely resembles SUM() with the
> main difference being that it accepts an array datatype parameter and
> returns an array datatype.
>
> I've already done this for a GenericUDF successfully. I believe I am
> having difficulty coding the proper ObjectInspectors for my parameter &
> return objects since I am getting .ClassCastException exceptions for Long
> -> LongArray. I am using a hybrid of the GenericUDAFSum.java sample and
> the GenericUDAFCollect sample from the Programming Hive book.
>
> My parameter is a fixed length array of longs and the return is the same
> length array of longs. As with the SUM function, I do not need to keep the
> individual row values that I collect, I can iterate the array, SUM it to
> the container and move on to the next row. With this in mind, I think I
> can disregard having an internalMergeOI.
>
> Any input is appreciated.
>
> Thanks,
> Ross
>
>
> Here is the exception:
> -----
> Diagnostic Messages for this Task:
> java.lang.RuntimeException: Hive Runtime Error while closing operators
> at
> org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:232)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:441)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:377)
> at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1132)
> at org.apache.hadoop.mapred.Child.main(Child.java:249)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException:
> java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot be
> cast to [Ljava.lang.Object;
> at
> org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:1137)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:588)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:597)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:597)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:597)
> at
> org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:199)
> ... 8 more
> Caused by: java.lang.ClassCastException: org.apache.hadoop.io.LongWritable
> cannot be cast to [Ljava.lang.Object;
> at
> org.apache.hadoop.hive.serde2.objectinspector.StandardListObjectInspector.getListLength(StandardListObjectInspector.java:83)
> at
> org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe.serialize(LazyBinarySerDe.java:418)
> at
> org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe.serialize(LazyBinarySerDe.java:438)
> at
> org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe.serializeStruct(LazyBinarySerDe.java:257)
> at
> org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe.serialize(LazyBinarySerDe.java:204)
> at
> org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:245)
> at
> org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:502)
> at
> org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:832)
> at
> org.apache.hadoop.hive.ql.exec.GroupByOperator.forward(GroupByOperator.java:1066)
> at
> org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:1118)
> ... 13 more
>
> Here is the pertinent code:
>
> @Override
> public ObjectInspector init(Mode m, ObjectInspector[] parameters)
> throws HiveException
> {
> super.init(m, parameters);
> if (m == Mode.PARTIAL1)
> {
> System.out.println("1 - init() mode: " + m + "
> parameter[0]=" + parameters[0].toString());
> inputOI = (StandardListObjectInspector) parameters[0];
> return
> ObjectInspectorFactory.getStandardListObjectInspector(inputOI);
> }
> else
> {
> System.out.println("2 - init() mode: " + m + "
> parameter[0]=" + parameters[0].toString());
> JavaLongObjectInspector doi;
> doi =
> PrimitiveObjectInspectorFactory.javaLongObjectInspector;
>
> // Set up the list object inspector for the output, and
> return it
> ListObjectInspector loi;
> loi =
> ObjectInspectorFactory.getStandardListObjectInspector(doi);
> return loi;
>
>
> // inputOI = (StandardListObjectInspector)
> ObjectInspectorUtils.getStandardObjectInspector(parameters[0]);
> // return (StandardListObjectInspector)
> ObjectInspectorFactory.getStandardListObjectInspector(inputOI);
> }
> }
>
> static class BitmapAggregationBuffer implements AggregationBuffer {
> ArrayList<Object> container;
> }
>
> @Override
> public Object terminate(AggregationBuffer agg) throws HiveException
> {
> BitmapAggregationBuffer myAgg = (BitmapAggregationBuffer) agg;
> return myAgg.container;
> }
>
>
> --
>
> Ross Levin
> Principal Software Engineer
> *Simulmedia* | *People Ads Want*
> (m) 609.760.5027
> 670 Broadway, 2nd Floor, New York, NY 10012
>
>
> *Check out our new data and tools for TV* *Advertising at **OpenAccess*<http://www.simulmedia.com/OpenAccess/>
> *.*
>