You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Raj Hadoop <ha...@yahoo.com> on 2014/02/04 20:32:11 UTC
Re: GenericUDF Testing in Hive
How to test a Hive GenericUDF which accepts two parameters List<T>, T
List<T> -> Can it be the output of a collect set. Please advise.
I have a generic udf which takes List<T>, T. I want to test it how it works through Hive.
On Monday, January 20, 2014 5:19 PM, Raj Hadoop <ha...@yahoo.com> wrote:
The following is a an example for a GenericUDF. I wanted to test this through a Hive query. Basically want to pass parameters some thing like "select ComplexUDFExample('a','b','c') from employees limit 10".
------------------------------------------------------------------------------------------------------------------------------------------------
https://github.com/rathboma/hive-extension-examples/blob/master/src/main/java/com/matthewrathbone/example/ComplexUDFExample.java
class ComplexUDFExample extends GenericUDF {
ListObjectInspector listOI;
StringObjectInspector elementOI;
@Override
public String getDisplayString(String[] arg0) {
return "arrayContainsExample()"; // this should probably be better
}
@Override
public ObjectInspector initialize(ObjectInspector[] arguments) throws UDFArgumentException {
if (arguments.length != 2) {
throw new UDFArgumentLengthException("arrayContainsExample only takes 2 arguments: List<T>, T");
}
// 1. Check we received the right object types.
ObjectInspector a = arguments[0];
ObjectInspector b = arguments[1];
if (!(a instanceof ListObjectInspector) || !(b instanceof StringObjectInspector)) {
throw new UDFArgumentException("first argument must be a list / array, second argument must be a
string");
}
this.listOI = (ListObjectInspector) a;
this.elementOI = (StringObjectInspector) b;
// 2. Check that the list contains strings
if(!(listOI.getListElementObjectInspector() instanceof StringObjectInspector)) {
throw new UDFArgumentException("first argument must be a list of strings");
}
// the return type of our function is a boolean, so we provide the correct object inspector
return PrimitiveObjectInspectorFactory.javaBooleanObjectInspector;
}
@Override
public Object evaluate(DeferredObject[] arguments) throws HiveException {
// get the list and string from the deferred objects using the object
inspectors
List<String> list = (List<String>) this.listOI.getList(arguments[0].get());
String arg = elementOI.getPrimitiveJavaObject(arguments[1].get());
// check for nulls
if (list == null || arg == null) {
return null;
}
// see if our list contains the value we need
for(String s: list) {
if (arg.equals(s)) return new Boolean(true);
}
return new Boolean(false);
}
}
hive> select ComplexUDFExample('a','b','c') from email_list_1 limit 10;
FAILED: SemanticException [Error 10015]: Line 1:7 Arguments length mismatch ''c'': arrayContainsExample only takes 2 arguments: List<T>, T
------------------------------------------------------------------------------------------------------------------------------------------
How to test this example in Hive query. I know I am invoking it wrong. But how can I invoke it correctly.
My requirement is to pass a String of arrays as first argument and another string as second argument in Hive like below.
Select col1, ComplexUDFExample( collectset(col2) , 'xyz')
from
Employees
Group By col1;
How do i do that?
Thanks in advance.
Regards,
Raj
Re: GenericUDF Testing in Hive
Posted by Jason Dere <jd...@hortonworks.com>.
Tried your example with Hive trunk. Didn't quite work out of the box, you'll need to replace List<String> with List<Text>.
Otherwise, this seemed to work:
hive> select ComplexUDFExample(array('a', 'b', 'c'), 'a') from src limit 3;
….
OK
true
true
true
Time taken: 6.271 seconds, Fetched: 3 row(s)
On Feb 4, 2014, at 11:50 AM, Raj Hadoop <ha...@yahoo.com> wrote:
>
> I want to do a simple test like this - but not working -
>
> select ComplexUDFExample(List("a", "b", "c"), "b") from table1 limit 10;
>
> FAILED: SemanticException [Error 10011]: Line 1:25 Invalid function 'List'
>
>
>
>
> On Tuesday, February 4, 2014 2:34 PM, Raj Hadoop <ha...@yahoo.com> wrote:
> How to test a Hive GenericUDF which accepts two parameters List<T>, T
>
> List<T> -> Can it be the output of a collect set. Please advise.
>
> I have a generic udf which takes List<T>, T. I want to test it how it works through Hive.
>
>
>
> On Monday, January 20, 2014 5:19 PM, Raj Hadoop <ha...@yahoo.com> wrote:
>
> The following is a an example for a GenericUDF. I wanted to test this through a Hive query. Basically want to pass parameters some thing like "select ComplexUDFExample('a','b','c') from employees limit 10".
> ------------------------------------------------------------------------------------------------------------------------------------------------
>
>
> https://github.com/rathboma/hive-extension-examples/blob/master/src/main/java/com/matthewrathbone/example/ComplexUDFExample.java
>
>
>
> class ComplexUDFExample extends GenericUDF {
> ListObjectInspector listOI;
> StringObjectInspector elementOI;
> @Override
> public String getDisplayString(String[] arg0) {
> return "arrayContainsExample()"; // this should probably be better
> }
> @Override
> public ObjectInspector initialize(ObjectInspector[] arguments) throws UDFArgumentException {
> if (arguments.length != 2) {
> throw new UDFArgumentLengthException("arrayContainsExample only takes 2 arguments: List<T>, T");
> }
> // 1. Check we received the right object types.
> ObjectInspector a = arguments[0];
> ObjectInspector b = arguments[1];
> if (!(a instanceof ListObjectInspector) || !(b instanceof StringObjectInspector)) {
> throw new UDFArgumentException("first argument must be a list / array, second argument must be a string");
> }
> this.listOI = (ListObjectInspector) a;
> this.elementOI = (StringObjectInspector) b;
>
> // 2. Check that the list contains strings
> if(!(listOI.getListElementObjectInspector() instanceof StringObjectInspector)) {
> throw new UDFArgumentException("first argument must be a list of strings");
> }
>
> // the return type of our function is a boolean, so we provide the correct object inspector
> return PrimitiveObjectInspectorFactory.javaBooleanObjectInspector;
> }
>
> @Override
> public Object evaluate(DeferredObject[] arguments) throws HiveException {
>
> // get the list and string from the deferred objects using the object inspectors
> List<String> list = (List<String>) this.listOI.getList(arguments[0].get());
> String arg = elementOI.getPrimitiveJavaObject(arguments[1].get());
>
> // check for nulls
> if (list == null || arg == null) {
> return null;
> }
>
> // see if our list contains the value we need
> for(String s: list) {
> if (arg.equals(s)) return new Boolean(true);
> }
> return new Boolean(false);
> }
>
> }
>
>
> hive> select ComplexUDFExample('a','b','c') from email_list_1 limit 10;
> FAILED: SemanticException [Error 10015]: Line 1:7 Arguments length mismatch ''c'': arrayContainsExample only takes 2 arguments: List<T>, T
>
> ------------------------------------------------------------------------------------------------------------------------------------------
>
> How to test this example in Hive query. I know I am invoking it wrong. But how can I invoke it correctly.
>
> My requirement is to pass a String of arrays as first argument and another string as second argument in Hive like below.
>
>
> Select col1, ComplexUDFExample( collectset(col2) , 'xyz')
> from
> Employees
> Group By col1;
>
> How do i do that?
>
> Thanks in advance.
>
> Regards,
> Raj
>
>
>
>
>
>
>
>
>
>
>
>
>
>
--
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to
which it is addressed and may contain information that is confidential,
privileged and exempt from disclosure under applicable law. If the reader
of this message is not the intended recipient, you are hereby notified that
any printing, copying, dissemination, distribution, disclosure or
forwarding of this communication is strictly prohibited. If you have
received this communication in error, please contact the sender immediately
and delete it from your system. Thank You.
Re: GenericUDF Testing in Hive
Posted by Raj Hadoop <ha...@yahoo.com>.
I want to do a simple test like this - but not working -
select ComplexUDFExample(List("a", "b", "c"), "b") from table1 limit 10;
FAILED: SemanticException [Error 10011]: Line 1:25 Invalid function 'List'
On Tuesday, February 4, 2014 2:34 PM, Raj Hadoop <ha...@yahoo.com> wrote:
How to test a Hive GenericUDF which accepts two parameters List<T>, T
List<T> -> Can it be the output of a collect set. Please advise.
I have a generic udf which takes List<T>, T. I want to test it how it works through Hive.
On Monday, January 20, 2014 5:19 PM, Raj Hadoop <ha...@yahoo.com> wrote:
The following is a an example for a GenericUDF. I wanted to test this through a Hive query. Basically want to pass parameters some thing like "select ComplexUDFExample('a','b','c') from employees limit 10".
------------------------------------------------------------------------------------------------------------------------------------------------
https://github.com/rathboma/hive-extension-examples/blob/master/src/main/java/com/matthewrathbone/example/ComplexUDFExample.java
class ComplexUDFExample extends GenericUDF {
ListObjectInspector listOI;
StringObjectInspector elementOI;
@Override
public String getDisplayString(String[] arg0) {
return "arrayContainsExample()"; // this should probably be better
}
@Override
public ObjectInspector initialize(ObjectInspector[] arguments) throws UDFArgumentException {
if (arguments.length != 2) {
throw new UDFArgumentLengthException("arrayContainsExample only takes 2 arguments: List<T>, T");
}
// 1. Check we received the right object types.
ObjectInspector a = arguments[0];
ObjectInspector b = arguments[1];
if (!(a instanceof ListObjectInspector) || !(b instanceof StringObjectInspector)) {
throw new UDFArgumentException("first argument must be a list / array, second argument must be a
string");
}
this.listOI = (ListObjectInspector) a;
this.elementOI = (StringObjectInspector) b;
// 2. Check that the list contains strings
if(!(listOI.getListElementObjectInspector() instanceof StringObjectInspector)) {
throw new UDFArgumentException("first argument must be a list of strings");
}
// the return type of our function is a boolean, so we provide the correct object inspector
return PrimitiveObjectInspectorFactory.javaBooleanObjectInspector;
}
@Override
public
Object evaluate(DeferredObject[] arguments) throws HiveException {
// get the list and string from the deferred objects using the object
inspectors
List<String> list = (List<String>) this.listOI.getList(arguments[0].get());
String arg = elementOI.getPrimitiveJavaObject(arguments[1].get());
// check for nulls
if (list == null || arg == null) {
return null;
}
// see if our list contains the value we need
for(String s: list) {
if (arg.equals(s)) return new Boolean(true);
}
return new Boolean(false);
}
}
hive> select ComplexUDFExample('a','b','c') from email_list_1 limit 10;
FAILED: SemanticException [Error 10015]: Line 1:7 Arguments length mismatch ''c'': arrayContainsExample only takes 2 arguments: List<T>, T
------------------------------------------------------------------------------------------------------------------------------------------
How to test this example in Hive query. I know I am invoking it wrong. But how can I invoke it correctly.
My requirement is to pass a String of arrays as first argument and another string as second argument in Hive like below.
Select col1, ComplexUDFExample( collectset(col2) , 'xyz')
from
Employees
Group By col1;
How do i do that?
Thanks in advance.
Regards,
Raj