You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Jan Dolinár <do...@gmail.com> on 2012/06/21 14:02:20 UTC

UDTF fails when used in LATERAL VIEW

Hi,

I've hit problems when writing custom UDTF that should return string
values. I couldn't find anywhere what type should have the values that
get forward()ed to collector. The only info I could dig out from
google was few blogs with examples and 4 UDTFs that are among the hive
sources. From that I figured out, that it should be OK to simply pass
Strings inside the forwarded Object[] array. Here are the relevant
parts of my code:

      private Object[] forwardListObj;

      @Override
      public StructObjectInspector initialize(ObjectInspector[] args)
throws UDFArgumentException {

        // snipped irrelevant code

        forwardListObj = new Object[1];
        forwardListObj[0] = new String();

        ArrayList<String> fieldNames = new ArrayList<String>(1);
        ArrayList<ObjectInspector> fieldOIs = new ArrayList<ObjectInspector>(1);

        fieldNames.add("section");
        fieldOIs.add(PrimitiveObjectInspectorFactory.javaStringObjectInspector);

        return ObjectInspectorFactory.getStandardStructObjectInspector(fieldNames,
fieldOIs);
      }

In proces() there is simple forwarding of some String:

      forwardListObj[0] = "";
      forward(forwardListObj);
      // OR
      String s = ...
      forwardListObj[0] = s;
      forward(forwardListObj);


I was testing the function with a simple query

SELECT my_func(arg) AS x FROM logs WHERE (dt=2011120104);

and it worked just as intended. But at the moment I got from testing
to actually using the function in more complex queries, I got into
trouble. Even LATERAL VIEW statement can cause failures:

SELECT x FROM logs LATERAL VIEW my_func(arg) t AS x WHERE (dt=2011120104);

causes tasks to fail with exception

java.lang.ClassCastException: java.lang.String cannot be cast to
org.apache.hadoop.io.Text
	at org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableStringObjectInspector.getPrimitiveJavaObject(WritableStringObjectInspector.java:45)
	at org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorUtils.getDouble(PrimitiveObjectInspectorUtils.java:607)
	at org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorConverter$DoubleConverter.convert(PrimitiveObjectInspectorConverter.java:229)
	at org.apache.hadoop.hive.ql.udf.generic.GenericUDFOPEqual.evaluate(GenericUDFOPEqual.java:73)
	at org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator.evaluate(ExprNodeGenericFuncEvaluator.java:86)
	at org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator$DeferredExprObject.get(ExprNodeGenericFuncEvaluator.java:56)
	at org.apache.hadoop.hive.ql.udf.generic.GenericUDFOPAnd.evaluate(GenericUDFOPAnd.java:52)
	at org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator.evaluate(ExprNodeGenericFuncEvaluator.java:86)
	at org.apache.hadoop.hive.ql.exec.FilterOperator.processOp(FilterOperator.java:83)
	at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
	at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:744)
	at org.apache.hadoop.hive.ql.exec.LateralViewJoinOperator.processOp(LateralViewJoinOperator.java:133)
	at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
	at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:744)
	at org.apache.hadoop.hive.ql.exec.UDTFOperator.forwardUDTFOutput(UDTFOperator.java:112)
	at org.apache.hadoop.hive.ql.udf.generic.UDTFCollector.collect(UDTFCollector.java:44)
	at org.apache.hadoop.hive.ql.udf.generic.GenericUDTF.forward(GenericUDTF.java:81)
	at cz.seznam.im.functions.ExplodeSection.process(ExplodeSection.java:103)
	...

I should also mention that I use custom SerDe and InputFormat for the
'logs' table. When I was trying to figure it out, I was trying to run
the same queries as listed above on different table without the
customizations and it worked correctly too. So I think the SerDe
and/or InputFormat probably play some role in this as well. What I
don't understand is why the problem exhibits itself only with LATERAL
VIEW. Any ideas anyone? Also, is it really correct to send String in
forward()?

Best regards,
Jan

Re: UDTF fails when used in LATERAL VIEW

Posted by Mark Grover <mg...@oanda.com>.
Hi Jan,
Yeah, you are right, initialize has to use the correct ObjectInspector.

Here is a blog post I am in process of writing on how to write a UDF:

http://mark.thegrovers.ca/1/post/2012/06/how-to-write-a-hive-udf.html

And, the code it references is a UDF I wrote:
https://github.com/markgrover/hive-translate/blob/master/GenericUDFTranslate.java

It isn't directly related to your use case but it shows how to object inspectors correspond to various types. As you will see in the code, because I am returning Text object from evaluate(), initialize() is returning PrimitiveObjectInspectorFactory.writableStringObjectInspector

Glad you found the solution. Sorry that you learned it the hard way though!

Mark

----- Original Message -----
From: "Jan Dolinár" <do...@gmail.com>
To: user@hive.apache.org
Sent: Friday, June 22, 2012 1:59:03 AM
Subject: Re: UDTF fails when used in LATERAL VIEW

Hi Mark,

Thanks for suggestion, it is not that naïve :) I tried a lot of things
and combinations, including Text and even LazyString (as I was getting
exceptions about converting String to LazyString at one moment...).

But I guess what I missed was correct setting of field object
inspectors in initialize(). Only today I found out the correct way to
do this is using WritableStringObjectInspector:

    fieldNames.add("section");
    fieldOIs.add(PrimitiveObjectInspectorFactory.writableStringObjectInspector);
    return ObjectInspectorFactory.getStandardStructObjectInspector(fieldNames,
fieldOIs);

I couldn't find it before, since I was looking for
TextObjectInspector, which obviously doesn't exist - silly me :)
Anyway, it doesn't fail this way, but things get even weirder.

The simple queries over table without my SerDe and InputFormat, as
well as the SELECT my_func() ... work well, but the LATERAL VIEW query
now returns 0 lines.

At the end of a task log there is following:

2012-06-22 07:42:23,974 INFO
org.apache.hadoop.hive.ql.exec.MapOperator: 6 finished. closing...
2012-06-22 07:42:23,974 INFO
org.apache.hadoop.hive.ql.exec.MapOperator: 6 forwarded 1294158 rows
2012-06-22 07:42:23,974 INFO
org.apache.hadoop.hive.ql.exec.MapOperator: DESERIALIZE_ERRORS:0
2012-06-22 07:42:23,974 INFO
org.apache.hadoop.hive.ql.exec.TableScanOperator: 0 finished.
closing...
2012-06-22 07:42:23,974 INFO
org.apache.hadoop.hive.ql.exec.TableScanOperator: 0 forwarded 1294158
rows
2012-06-22 07:42:23,974 INFO
org.apache.hadoop.hive.ql.exec.FilterOperator: 1 finished. closing...
2012-06-22 07:42:23,974 INFO
org.apache.hadoop.hive.ql.exec.FilterOperator: 1 forwarded 1229240
rows
2012-06-22 07:42:23,974 INFO
org.apache.hadoop.hive.ql.exec.FilterOperator: PASSED:1229240
2012-06-22 07:42:23,974 INFO
org.apache.hadoop.hive.ql.exec.FilterOperator: FILTERED:64918
2012-06-22 07:42:23,974 INFO
org.apache.hadoop.hive.ql.exec.FilterOperator: 2 finished. closing...
2012-06-22 07:42:23,974 INFO
org.apache.hadoop.hive.ql.exec.FilterOperator: 2 forwarded 1229240
rows
2012-06-22 07:42:23,974 INFO
org.apache.hadoop.hive.ql.exec.FilterOperator: PASSED:1229240
2012-06-22 07:42:23,974 INFO
org.apache.hadoop.hive.ql.exec.FilterOperator: FILTERED:0
2012-06-22 07:42:23,974 INFO
org.apache.hadoop.hive.ql.exec.SelectOperator: 3 finished. closing...
2012-06-22 07:42:23,974 INFO
org.apache.hadoop.hive.ql.exec.SelectOperator: 3 forwarded 1229240
rows
2012-06-22 07:42:23,974 INFO
org.apache.hadoop.hive.ql.exec.UDTFOperator: 4 finished. closing...
2012-06-22 07:42:23,975 INFO
org.apache.hadoop.hive.ql.exec.UDTFOperator: 4 forwarded 2654579 rows
2012-06-22 07:42:23,975 INFO
org.apache.hadoop.hive.ql.exec.FileSinkOperator: 5 finished.
closing...
2012-06-22 07:42:23,975 INFO
org.apache.hadoop.hive.ql.exec.FileSinkOperator: 5 forwarded 0 rows
2012-06-22 07:42:24,067 INFO
org.apache.hadoop.hive.ql.exec.UDTFOperator: 4 Close done
2012-06-22 07:42:24,067 INFO
org.apache.hadoop.hive.ql.exec.SelectOperator: 3 Close done
2012-06-22 07:42:24,067 INFO
org.apache.hadoop.hive.ql.exec.FilterOperator: 2 Close done
2012-06-22 07:42:24,067 INFO
org.apache.hadoop.hive.ql.exec.FilterOperator: 1 Close done
2012-06-22 07:42:24,067 INFO
org.apache.hadoop.hive.ql.exec.TableScanOperator: 0 Close done
2012-06-22 07:42:24,067 INFO
org.apache.hadoop.hive.ql.exec.MapOperator: 6 Close done

So it looks like UDTF returns something but it dissapears in
FileSinkOperator. Or is this because the query was executed from hive
cli, so it is not writen to file, but streamed directly?

Also, I would like to ask what is the correct way to set the Text
value before forwarding. I've tried the following three ways:


        PrimitiveObjectInspectorFactory.writableStringObjectInspector.getPrimitiveWritableObject(forwardListObj[0]).set(output);

        ((Text)forwardListObj[0]).set(output);

        forwardListObj[0] = new Text(output);

All of them seem to work exactly the same. I know that the third could
cause performance problems, but I'm not sure which of the first two is
preferred.

Thank again for your assistance,

Jan

On 6/22/12, Mark Grover <mg...@oanda.com> wrote:
> Hi Jan,
> Here's my first naïve question:-)
>
> Have you tried returning a Text value instead of String? Atleast in the case
> of UDFs, returning Text instead of Strings is possible and recommended too.
> I would think it would be the same case with UDTFs.
>
> Mark
>
> ----- Original Message -----
> From: "Jan Dolinár" <do...@gmail.com>
> To: "user" <us...@hive.apache.org>
> Sent: Thursday, June 21, 2012 8:02:20 AM
> Subject: UDTF fails when used in LATERAL VIEW
>
> Hi,
>
> I've hit problems when writing custom UDTF that should return string
> values. I couldn't find anywhere what type should have the values that
> get forward()ed to collector. The only info I could dig out from
> google was few blogs with examples and 4 UDTFs that are among the hive
> sources. From that I figured out, that it should be OK to simply pass
> Strings inside the forwarded Object[] array. Here are the relevant
> parts of my code:
>
>       private Object[] forwardListObj;
>
>       @Override
>       public StructObjectInspector initialize(ObjectInspector[] args)
> throws UDFArgumentException {
>
>         // snipped irrelevant code
>
>         forwardListObj = new Object[1];
>         forwardListObj[0] = new String();
>
>         ArrayList<String> fieldNames = new ArrayList<String>(1);
>         ArrayList<ObjectInspector> fieldOIs = new
> ArrayList<ObjectInspector>(1);
>
>         fieldNames.add("section");
>
> fieldOIs.add(PrimitiveObjectInspectorFactory.javaStringObjectInspector);
>
>         return
> ObjectInspectorFactory.getStandardStructObjectInspector(fieldNames,
> fieldOIs);
>       }
>
> In proces() there is simple forwarding of some String:
>
>       forwardListObj[0] = "";
>       forward(forwardListObj);
>       // OR
>       String s = ...
>       forwardListObj[0] = s;
>       forward(forwardListObj);
>
>
> I was testing the function with a simple query
>
> SELECT my_func(arg) AS x FROM logs WHERE (dt=2011120104);
>
> and it worked just as intended. But at the moment I got from testing
> to actually using the function in more complex queries, I got into
> trouble. Even LATERAL VIEW statement can cause failures:
>
> SELECT x FROM logs LATERAL VIEW my_func(arg) t AS x WHERE (dt=2011120104);
>
> causes tasks to fail with exception
>
> java.lang.ClassCastException: java.lang.String cannot be cast to
> org.apache.hadoop.io.Text
> 	at
> org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableStringObjectInspector.getPrimitiveJavaObject(WritableStringObjectInspector.java:45)
> 	at
> org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorUtils.getDouble(PrimitiveObjectInspectorUtils.java:607)
> 	at
> org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorConverter$DoubleConverter.convert(PrimitiveObjectInspectorConverter.java:229)
> 	at
> org.apache.hadoop.hive.ql.udf.generic.GenericUDFOPEqual.evaluate(GenericUDFOPEqual.java:73)
> 	at
> org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator.evaluate(ExprNodeGenericFuncEvaluator.java:86)
> 	at
> org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator$DeferredExprObject.get(ExprNodeGenericFuncEvaluator.java:56)
> 	at
> org.apache.hadoop.hive.ql.udf.generic.GenericUDFOPAnd.evaluate(GenericUDFOPAnd.java:52)
> 	at
> org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator.evaluate(ExprNodeGenericFuncEvaluator.java:86)
> 	at
> org.apache.hadoop.hive.ql.exec.FilterOperator.processOp(FilterOperator.java:83)
> 	at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
> 	at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:744)
> 	at
> org.apache.hadoop.hive.ql.exec.LateralViewJoinOperator.processOp(LateralViewJoinOperator.java:133)
> 	at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
> 	at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:744)
> 	at
> org.apache.hadoop.hive.ql.exec.UDTFOperator.forwardUDTFOutput(UDTFOperator.java:112)
> 	at
> org.apache.hadoop.hive.ql.udf.generic.UDTFCollector.collect(UDTFCollector.java:44)
> 	at
> org.apache.hadoop.hive.ql.udf.generic.GenericUDTF.forward(GenericUDTF.java:81)
> 	at cz.seznam.im.functions.ExplodeSection.process(ExplodeSection.java:103)
> 	...
>
> I should also mention that I use custom SerDe and InputFormat for the
> 'logs' table. When I was trying to figure it out, I was trying to run
> the same queries as listed above on different table without the
> customizations and it worked correctly too. So I think the SerDe
> and/or InputFormat probably play some role in this as well. What I
> don't understand is why the problem exhibits itself only with LATERAL
> VIEW. Any ideas anyone? Also, is it really correct to send String in
> forward()?
>
> Best regards,
> Jan
>

Re: UDTF fails when used in LATERAL VIEW

Posted by Jan Dolinár <do...@gmail.com>.
Hi Mark,

Thanks for suggestion, it is not that naïve :) I tried a lot of things
and combinations, including Text and even LazyString (as I was getting
exceptions about converting String to LazyString at one moment...).

But I guess what I missed was correct setting of field object
inspectors in initialize(). Only today I found out the correct way to
do this is using WritableStringObjectInspector:

    fieldNames.add("section");
    fieldOIs.add(PrimitiveObjectInspectorFactory.writableStringObjectInspector);
    return ObjectInspectorFactory.getStandardStructObjectInspector(fieldNames,
fieldOIs);

I couldn't find it before, since I was looking for
TextObjectInspector, which obviously doesn't exist - silly me :)
Anyway, it doesn't fail this way, but things get even weirder.

The simple queries over table without my SerDe and InputFormat, as
well as the SELECT my_func() ... work well, but the LATERAL VIEW query
now returns 0 lines.

At the end of a task log there is following:

2012-06-22 07:42:23,974 INFO
org.apache.hadoop.hive.ql.exec.MapOperator: 6 finished. closing...
2012-06-22 07:42:23,974 INFO
org.apache.hadoop.hive.ql.exec.MapOperator: 6 forwarded 1294158 rows
2012-06-22 07:42:23,974 INFO
org.apache.hadoop.hive.ql.exec.MapOperator: DESERIALIZE_ERRORS:0
2012-06-22 07:42:23,974 INFO
org.apache.hadoop.hive.ql.exec.TableScanOperator: 0 finished.
closing...
2012-06-22 07:42:23,974 INFO
org.apache.hadoop.hive.ql.exec.TableScanOperator: 0 forwarded 1294158
rows
2012-06-22 07:42:23,974 INFO
org.apache.hadoop.hive.ql.exec.FilterOperator: 1 finished. closing...
2012-06-22 07:42:23,974 INFO
org.apache.hadoop.hive.ql.exec.FilterOperator: 1 forwarded 1229240
rows
2012-06-22 07:42:23,974 INFO
org.apache.hadoop.hive.ql.exec.FilterOperator: PASSED:1229240
2012-06-22 07:42:23,974 INFO
org.apache.hadoop.hive.ql.exec.FilterOperator: FILTERED:64918
2012-06-22 07:42:23,974 INFO
org.apache.hadoop.hive.ql.exec.FilterOperator: 2 finished. closing...
2012-06-22 07:42:23,974 INFO
org.apache.hadoop.hive.ql.exec.FilterOperator: 2 forwarded 1229240
rows
2012-06-22 07:42:23,974 INFO
org.apache.hadoop.hive.ql.exec.FilterOperator: PASSED:1229240
2012-06-22 07:42:23,974 INFO
org.apache.hadoop.hive.ql.exec.FilterOperator: FILTERED:0
2012-06-22 07:42:23,974 INFO
org.apache.hadoop.hive.ql.exec.SelectOperator: 3 finished. closing...
2012-06-22 07:42:23,974 INFO
org.apache.hadoop.hive.ql.exec.SelectOperator: 3 forwarded 1229240
rows
2012-06-22 07:42:23,974 INFO
org.apache.hadoop.hive.ql.exec.UDTFOperator: 4 finished. closing...
2012-06-22 07:42:23,975 INFO
org.apache.hadoop.hive.ql.exec.UDTFOperator: 4 forwarded 2654579 rows
2012-06-22 07:42:23,975 INFO
org.apache.hadoop.hive.ql.exec.FileSinkOperator: 5 finished.
closing...
2012-06-22 07:42:23,975 INFO
org.apache.hadoop.hive.ql.exec.FileSinkOperator: 5 forwarded 0 rows
2012-06-22 07:42:24,067 INFO
org.apache.hadoop.hive.ql.exec.UDTFOperator: 4 Close done
2012-06-22 07:42:24,067 INFO
org.apache.hadoop.hive.ql.exec.SelectOperator: 3 Close done
2012-06-22 07:42:24,067 INFO
org.apache.hadoop.hive.ql.exec.FilterOperator: 2 Close done
2012-06-22 07:42:24,067 INFO
org.apache.hadoop.hive.ql.exec.FilterOperator: 1 Close done
2012-06-22 07:42:24,067 INFO
org.apache.hadoop.hive.ql.exec.TableScanOperator: 0 Close done
2012-06-22 07:42:24,067 INFO
org.apache.hadoop.hive.ql.exec.MapOperator: 6 Close done

So it looks like UDTF returns something but it dissapears in
FileSinkOperator. Or is this because the query was executed from hive
cli, so it is not writen to file, but streamed directly?

Also, I would like to ask what is the correct way to set the Text
value before forwarding. I've tried the following three ways:


        PrimitiveObjectInspectorFactory.writableStringObjectInspector.getPrimitiveWritableObject(forwardListObj[0]).set(output);

        ((Text)forwardListObj[0]).set(output);

        forwardListObj[0] = new Text(output);

All of them seem to work exactly the same. I know that the third could
cause performance problems, but I'm not sure which of the first two is
preferred.

Thank again for your assistance,

Jan

On 6/22/12, Mark Grover <mg...@oanda.com> wrote:
> Hi Jan,
> Here's my first naïve question:-)
>
> Have you tried returning a Text value instead of String? Atleast in the case
> of UDFs, returning Text instead of Strings is possible and recommended too.
> I would think it would be the same case with UDTFs.
>
> Mark
>
> ----- Original Message -----
> From: "Jan Dolinár" <do...@gmail.com>
> To: "user" <us...@hive.apache.org>
> Sent: Thursday, June 21, 2012 8:02:20 AM
> Subject: UDTF fails when used in LATERAL VIEW
>
> Hi,
>
> I've hit problems when writing custom UDTF that should return string
> values. I couldn't find anywhere what type should have the values that
> get forward()ed to collector. The only info I could dig out from
> google was few blogs with examples and 4 UDTFs that are among the hive
> sources. From that I figured out, that it should be OK to simply pass
> Strings inside the forwarded Object[] array. Here are the relevant
> parts of my code:
>
>       private Object[] forwardListObj;
>
>       @Override
>       public StructObjectInspector initialize(ObjectInspector[] args)
> throws UDFArgumentException {
>
>         // snipped irrelevant code
>
>         forwardListObj = new Object[1];
>         forwardListObj[0] = new String();
>
>         ArrayList<String> fieldNames = new ArrayList<String>(1);
>         ArrayList<ObjectInspector> fieldOIs = new
> ArrayList<ObjectInspector>(1);
>
>         fieldNames.add("section");
>
> fieldOIs.add(PrimitiveObjectInspectorFactory.javaStringObjectInspector);
>
>         return
> ObjectInspectorFactory.getStandardStructObjectInspector(fieldNames,
> fieldOIs);
>       }
>
> In proces() there is simple forwarding of some String:
>
>       forwardListObj[0] = "";
>       forward(forwardListObj);
>       // OR
>       String s = ...
>       forwardListObj[0] = s;
>       forward(forwardListObj);
>
>
> I was testing the function with a simple query
>
> SELECT my_func(arg) AS x FROM logs WHERE (dt=2011120104);
>
> and it worked just as intended. But at the moment I got from testing
> to actually using the function in more complex queries, I got into
> trouble. Even LATERAL VIEW statement can cause failures:
>
> SELECT x FROM logs LATERAL VIEW my_func(arg) t AS x WHERE (dt=2011120104);
>
> causes tasks to fail with exception
>
> java.lang.ClassCastException: java.lang.String cannot be cast to
> org.apache.hadoop.io.Text
> 	at
> org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableStringObjectInspector.getPrimitiveJavaObject(WritableStringObjectInspector.java:45)
> 	at
> org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorUtils.getDouble(PrimitiveObjectInspectorUtils.java:607)
> 	at
> org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorConverter$DoubleConverter.convert(PrimitiveObjectInspectorConverter.java:229)
> 	at
> org.apache.hadoop.hive.ql.udf.generic.GenericUDFOPEqual.evaluate(GenericUDFOPEqual.java:73)
> 	at
> org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator.evaluate(ExprNodeGenericFuncEvaluator.java:86)
> 	at
> org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator$DeferredExprObject.get(ExprNodeGenericFuncEvaluator.java:56)
> 	at
> org.apache.hadoop.hive.ql.udf.generic.GenericUDFOPAnd.evaluate(GenericUDFOPAnd.java:52)
> 	at
> org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator.evaluate(ExprNodeGenericFuncEvaluator.java:86)
> 	at
> org.apache.hadoop.hive.ql.exec.FilterOperator.processOp(FilterOperator.java:83)
> 	at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
> 	at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:744)
> 	at
> org.apache.hadoop.hive.ql.exec.LateralViewJoinOperator.processOp(LateralViewJoinOperator.java:133)
> 	at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
> 	at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:744)
> 	at
> org.apache.hadoop.hive.ql.exec.UDTFOperator.forwardUDTFOutput(UDTFOperator.java:112)
> 	at
> org.apache.hadoop.hive.ql.udf.generic.UDTFCollector.collect(UDTFCollector.java:44)
> 	at
> org.apache.hadoop.hive.ql.udf.generic.GenericUDTF.forward(GenericUDTF.java:81)
> 	at cz.seznam.im.functions.ExplodeSection.process(ExplodeSection.java:103)
> 	...
>
> I should also mention that I use custom SerDe and InputFormat for the
> 'logs' table. When I was trying to figure it out, I was trying to run
> the same queries as listed above on different table without the
> customizations and it worked correctly too. So I think the SerDe
> and/or InputFormat probably play some role in this as well. What I
> don't understand is why the problem exhibits itself only with LATERAL
> VIEW. Any ideas anyone? Also, is it really correct to send String in
> forward()?
>
> Best regards,
> Jan
>

Re: UDTF fails when used in LATERAL VIEW

Posted by Mark Grover <mg...@oanda.com>.
Hi Jan,
Here's my first naïve question:-)

Have you tried returning a Text value instead of String? Atleast in the case of UDFs, returning Text instead of Strings is possible and recommended too. I would think it would be the same case with UDTFs.

Mark

----- Original Message -----
From: "Jan Dolinár" <do...@gmail.com>
To: "user" <us...@hive.apache.org>
Sent: Thursday, June 21, 2012 8:02:20 AM
Subject: UDTF fails when used in LATERAL VIEW

Hi,

I've hit problems when writing custom UDTF that should return string
values. I couldn't find anywhere what type should have the values that
get forward()ed to collector. The only info I could dig out from
google was few blogs with examples and 4 UDTFs that are among the hive
sources. From that I figured out, that it should be OK to simply pass
Strings inside the forwarded Object[] array. Here are the relevant
parts of my code:

      private Object[] forwardListObj;

      @Override
      public StructObjectInspector initialize(ObjectInspector[] args)
throws UDFArgumentException {

        // snipped irrelevant code

        forwardListObj = new Object[1];
        forwardListObj[0] = new String();

        ArrayList<String> fieldNames = new ArrayList<String>(1);
        ArrayList<ObjectInspector> fieldOIs = new ArrayList<ObjectInspector>(1);

        fieldNames.add("section");
        fieldOIs.add(PrimitiveObjectInspectorFactory.javaStringObjectInspector);

        return ObjectInspectorFactory.getStandardStructObjectInspector(fieldNames,
fieldOIs);
      }

In proces() there is simple forwarding of some String:

      forwardListObj[0] = "";
      forward(forwardListObj);
      // OR
      String s = ...
      forwardListObj[0] = s;
      forward(forwardListObj);


I was testing the function with a simple query

SELECT my_func(arg) AS x FROM logs WHERE (dt=2011120104);

and it worked just as intended. But at the moment I got from testing
to actually using the function in more complex queries, I got into
trouble. Even LATERAL VIEW statement can cause failures:

SELECT x FROM logs LATERAL VIEW my_func(arg) t AS x WHERE (dt=2011120104);

causes tasks to fail with exception

java.lang.ClassCastException: java.lang.String cannot be cast to
org.apache.hadoop.io.Text
	at org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableStringObjectInspector.getPrimitiveJavaObject(WritableStringObjectInspector.java:45)
	at org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorUtils.getDouble(PrimitiveObjectInspectorUtils.java:607)
	at org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorConverter$DoubleConverter.convert(PrimitiveObjectInspectorConverter.java:229)
	at org.apache.hadoop.hive.ql.udf.generic.GenericUDFOPEqual.evaluate(GenericUDFOPEqual.java:73)
	at org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator.evaluate(ExprNodeGenericFuncEvaluator.java:86)
	at org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator$DeferredExprObject.get(ExprNodeGenericFuncEvaluator.java:56)
	at org.apache.hadoop.hive.ql.udf.generic.GenericUDFOPAnd.evaluate(GenericUDFOPAnd.java:52)
	at org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator.evaluate(ExprNodeGenericFuncEvaluator.java:86)
	at org.apache.hadoop.hive.ql.exec.FilterOperator.processOp(FilterOperator.java:83)
	at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
	at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:744)
	at org.apache.hadoop.hive.ql.exec.LateralViewJoinOperator.processOp(LateralViewJoinOperator.java:133)
	at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
	at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:744)
	at org.apache.hadoop.hive.ql.exec.UDTFOperator.forwardUDTFOutput(UDTFOperator.java:112)
	at org.apache.hadoop.hive.ql.udf.generic.UDTFCollector.collect(UDTFCollector.java:44)
	at org.apache.hadoop.hive.ql.udf.generic.GenericUDTF.forward(GenericUDTF.java:81)
	at cz.seznam.im.functions.ExplodeSection.process(ExplodeSection.java:103)
	...

I should also mention that I use custom SerDe and InputFormat for the
'logs' table. When I was trying to figure it out, I was trying to run
the same queries as listed above on different table without the
customizations and it worked correctly too. So I think the SerDe
and/or InputFormat probably play some role in this as well. What I
don't understand is why the problem exhibits itself only with LATERAL
VIEW. Any ideas anyone? Also, is it really correct to send String in
forward()?

Best regards,
Jan