You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Raj Hadoop <ha...@yahoo.com> on 2014/01/20 23:19:22 UTC

GenericUDF Testing in Hive

 
The following is a an example for a GenericUDF. I wanted to test this through a Hive query. Basically want to pass parameters some thing like "select ComplexUDFExample('a','b','c') from employees limit 10".

------------------------------------------------------------------------------------------------------------------------------------------------
 
 
https://github.com/rathboma/hive-extension-examples/blob/master/src/main/java/com/matthewrathbone/example/ComplexUDFExample.java
 
 
 
class ComplexUDFExample extends GenericUDF {
  ListObjectInspector listOI;
  StringObjectInspector elementOI;
  @Override
  public String getDisplayString(String[] arg0) {
    return "arrayContainsExample()"; // this should probably be better
  }
  @Override
  public ObjectInspector initialize(ObjectInspector[] arguments) throws UDFArgumentException {
    if (arguments.length != 2) {
      throw new UDFArgumentLengthException("arrayContainsExample only takes 2 arguments: List<T>, T");
    }
    // 1. Check we received the right object types.
    ObjectInspector a = arguments[0];
    ObjectInspector b = arguments[1];
    if (!(a instanceof ListObjectInspector) || !(b instanceof StringObjectInspector)) {
      throw new UDFArgumentException("first argument must be a list / array, second argument must be a string");
    }
    this.listOI = (ListObjectInspector) a;
    this.elementOI = (StringObjectInspector) b;
    
    // 2. Check that the list contains strings
    if(!(listOI.getListElementObjectInspector() instanceof StringObjectInspector)) {
      throw new UDFArgumentException("first argument must be a list of strings");
    }
    
    // the return type of our function is a boolean, so we provide the correct object inspector
    return PrimitiveObjectInspectorFactory.javaBooleanObjectInspector;
  }
  
  @Override
  public Object evaluate(DeferredObject[] arguments) throws HiveException {
    
    // get the list and string from the deferred objects using the object inspectors
    List<String> list = (List<String>) this.listOI.getList(arguments[0].get());
    String arg = elementOI.getPrimitiveJavaObject(arguments[1].get());
    
    // check for nulls
    if (list == null || arg == null) {
      return null;
    }
    
    // see if our list contains the value we need
    for(String s: list) {
      if (arg.equals(s)) return new Boolean(true);
    }
    return new Boolean(false);
  }
  
}
 
 
hive> select ComplexUDFExample('a','b','c') from email_list_1 limit 10;
FAILED: SemanticException [Error 10015]: Line 1:7 Arguments length mismatch ''c'': arrayContainsExample only takes 2 arguments: List<T>, T
 
------------------------------------------------------------------------------------------------------------------------------------------
 
How to test this example in Hive query. I know I am invoking it wrong. But how can I invoke it correctly.
 
My requirement is to pass a String of arrays as first argument and another string as second argument in Hive like below.
 
 
Select col1, ComplexUDFExample( collectset(col2) , 'xyz')
from 
Employees
Group By col1;
 
How do i do that?
 
Thanks in advance.
 
Regards,
Raj

Re: GenericUDF Testing in Hive

Posted by Jason Dere <jd...@hortonworks.com>.
Tried your example with Hive trunk. Didn't quite work out of the box, you'll need to replace List<String> with List<Text>.
Otherwise, this seemed to work:

hive> select ComplexUDFExample(array('a', 'b', 'c'), 'a') from src limit 3;
….
OK
true
true
true
Time taken: 6.271 seconds, Fetched: 3 row(s)


On Feb 4, 2014, at 11:50 AM, Raj Hadoop <ha...@yahoo.com> wrote:

> 
> I want to do a simple test like this - but not working -
> 
> select ComplexUDFExample(List("a", "b", "c"), "b") from table1 limit 10;
> 
> FAILED: SemanticException [Error 10011]: Line 1:25 Invalid function 'List'
> 
> 
> 
> 
> On Tuesday, February 4, 2014 2:34 PM, Raj Hadoop <ha...@yahoo.com> wrote:
> How to test a Hive GenericUDF which accepts two parameters List<T>, T 
> 
> List<T> -> Can it be the output of a collect set. Please advise.
> 
> I have a generic udf which takes List<T>, T. I want to test it how it works through Hive. 
> 
> 
> 
> On Monday, January 20, 2014 5:19 PM, Raj Hadoop <ha...@yahoo.com> wrote:
>  
> The following is a an example for a GenericUDF. I wanted to test this through a Hive query. Basically want to pass parameters some thing like "select ComplexUDFExample('a','b','c') from employees limit 10".
> ------------------------------------------------------------------------------------------------------------------------------------------------
>  
>  
> https://github.com/rathboma/hive-extension-examples/blob/master/src/main/java/com/matthewrathbone/example/ComplexUDFExample.java
>  
>  
>  
> class ComplexUDFExample extends GenericUDF {
>   ListObjectInspector listOI;
>   StringObjectInspector elementOI;
>   @Override
>   public String getDisplayString(String[] arg0) {
>     return "arrayContainsExample()"; // this should probably be better
>   }
>   @Override
>   public ObjectInspector initialize(ObjectInspector[] arguments) throws UDFArgumentException {
>     if (arguments.length != 2) {
>       throw new UDFArgumentLengthException("arrayContainsExample only takes 2 arguments: List<T>, T");
>     }
>     // 1. Check we received the right object types.
>     ObjectInspector a = arguments[0];
>     ObjectInspector b = arguments[1];
>     if (!(a instanceof ListObjectInspector) || !(b instanceof StringObjectInspector)) {
>       throw new UDFArgumentException("first argument must be a list / array, second argument must be a string");
>     }
>     this.listOI = (ListObjectInspector) a;
>     this.elementOI = (StringObjectInspector) b;
>     
>     // 2. Check that the list contains strings
>     if(!(listOI.getListElementObjectInspector() instanceof StringObjectInspector)) {
>       throw new UDFArgumentException("first argument must be a list of strings");
>     }
>     
>     // the return type of our function is a boolean, so we provide the correct object inspector
>     return PrimitiveObjectInspectorFactory.javaBooleanObjectInspector;
>   }
>   
>   @Override
>   public Object evaluate(DeferredObject[] arguments) throws HiveException {
>     
>     // get the list and string from the deferred objects using the object inspectors
>     List<String> list = (List<String>) this.listOI.getList(arguments[0].get());
>     String arg = elementOI.getPrimitiveJavaObject(arguments[1].get());
>     
>     // check for nulls
>     if (list == null || arg == null) {
>       return null;
>     }
>     
>     // see if our list contains the value we need
>     for(String s: list) {
>       if (arg.equals(s)) return new Boolean(true);
>     }
>     return new Boolean(false);
>   }
>   
> }
>  
>  
> hive> select ComplexUDFExample('a','b','c') from email_list_1 limit 10;
> FAILED: SemanticException [Error 10015]: Line 1:7 Arguments length mismatch ''c'': arrayContainsExample only takes 2 arguments: List<T>, T
>  
> ------------------------------------------------------------------------------------------------------------------------------------------
>  
> How to test this example in Hive query. I know I am invoking it wrong. But how can I invoke it correctly.
>  
> My requirement is to pass a String of arrays as first argument and another string as second argument in Hive like below.
>  
>  
> Select col1, ComplexUDFExample( collectset(col2) , 'xyz')
> from
> Employees
> Group By col1;
>  
> How do i do that?
>  
> Thanks in advance.
>  
> Regards,
> Raj
>  
>  
>  
>  
>  
>  
>  
>  
>  
>  
> 
> 
> 
> 


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: GenericUDF Testing in Hive

Posted by Raj Hadoop <ha...@yahoo.com>.
I want to do a simple test like this - but not working -

select ComplexUDFExample(List("a", "b", "c"), "b") from table1 limit 10;


FAILED: SemanticException [Error 10011]: Line 1:25 Invalid function 'List'






On Tuesday, February 4, 2014 2:34 PM, Raj Hadoop <ha...@yahoo.com> wrote:
 
How to test a Hive GenericUDF which accepts two parameters List<T>, T 

List<T> -> Can it be the output of a collect set. Please advise.

I have a generic udf which takes List<T>, T. I want to test it how it works through Hive. 





On Monday, January 20, 2014 5:19 PM, Raj Hadoop <ha...@yahoo.com> wrote:
 
 
The following is a an example for a GenericUDF. I wanted to test this through a Hive query. Basically want to pass parameters some thing like "select ComplexUDFExample('a','b','c') from employees limit 10".

------------------------------------------------------------------------------------------------------------------------------------------------
 
 
https://github.com/rathboma/hive-extension-examples/blob/master/src/main/java/com/matthewrathbone/example/ComplexUDFExample.java
 
 
 
class ComplexUDFExample extends GenericUDF {
  ListObjectInspector listOI;
  StringObjectInspector elementOI;
  @Override
  public String getDisplayString(String[] arg0) {
    return "arrayContainsExample()"; // this should probably be better
  }
  @Override
  public ObjectInspector initialize(ObjectInspector[] arguments) throws UDFArgumentException {
    if (arguments.length != 2) {
      throw new UDFArgumentLengthException("arrayContainsExample only takes 2 arguments: List<T>, T");
    }
    // 1. Check we received the right object types.
    ObjectInspector a = arguments[0];
    ObjectInspector b = arguments[1];
    if (!(a instanceof ListObjectInspector) || !(b instanceof StringObjectInspector)) {
      throw new UDFArgumentException("first argument must be a list / array, second argument must be a
 string");
    }
    this.listOI = (ListObjectInspector) a;
    this.elementOI = (StringObjectInspector) b;
    
    // 2. Check that the list contains strings
    if(!(listOI.getListElementObjectInspector() instanceof StringObjectInspector)) {
      throw new UDFArgumentException("first argument must be a list of strings");
    }
    
    // the return type of our function is a boolean, so we provide the correct object inspector
    return PrimitiveObjectInspectorFactory.javaBooleanObjectInspector;
  }
  
  @Override
  public
 Object evaluate(DeferredObject[] arguments) throws HiveException {
    
    // get the list and string from the deferred objects using the object
 inspectors
    List<String> list = (List<String>) this.listOI.getList(arguments[0].get());
    String arg = elementOI.getPrimitiveJavaObject(arguments[1].get());
    
    // check for nulls
    if (list == null || arg == null) {
      return null;
    }
    
    // see if our list contains the value we need
    for(String s: list) {
      if (arg.equals(s)) return new Boolean(true);
    }
    return new Boolean(false);
  }
  
}
 
 
hive> select ComplexUDFExample('a','b','c') from email_list_1 limit 10;
FAILED: SemanticException [Error 10015]: Line 1:7 Arguments length mismatch ''c'': arrayContainsExample only takes 2 arguments: List<T>, T
 
------------------------------------------------------------------------------------------------------------------------------------------
 
How to test this example in Hive query. I know I am invoking it wrong. But how can I invoke it correctly.
 
My requirement is to pass a String of arrays as first argument and another string as second argument in Hive like below.
 
 
Select col1, ComplexUDFExample( collectset(col2) , 'xyz')
from 
Employees
Group By col1;
 
How do i do that?
 
Thanks in advance.
 
Regards,
Raj

Re: GenericUDF Testing in Hive

Posted by Raj Hadoop <ha...@yahoo.com>.
How to test a Hive GenericUDF which accepts two parameters List<T>, T 

List<T> -> Can it be the output of a collect set. Please advise.

I have a generic udf which takes List<T>, T. I want to test it how it works through Hive. 





On Monday, January 20, 2014 5:19 PM, Raj Hadoop <ha...@yahoo.com> wrote:
 
 
The following is a an example for a GenericUDF. I wanted to test this through a Hive query. Basically want to pass parameters some thing like "select ComplexUDFExample('a','b','c') from employees limit 10".

------------------------------------------------------------------------------------------------------------------------------------------------
 
 
https://github.com/rathboma/hive-extension-examples/blob/master/src/main/java/com/matthewrathbone/example/ComplexUDFExample.java
 
 
 
class ComplexUDFExample extends GenericUDF {
  ListObjectInspector listOI;
  StringObjectInspector elementOI;
  @Override
  public String getDisplayString(String[] arg0) {
    return "arrayContainsExample()"; // this should probably be better
  }
  @Override
  public ObjectInspector initialize(ObjectInspector[] arguments) throws UDFArgumentException {
    if (arguments.length != 2) {
      throw new UDFArgumentLengthException("arrayContainsExample only takes 2 arguments: List<T>, T");
    }
    // 1. Check we received the right object types.
    ObjectInspector a = arguments[0];
    ObjectInspector b = arguments[1];
    if (!(a instanceof ListObjectInspector) || !(b instanceof StringObjectInspector)) {
      throw new UDFArgumentException("first argument must be a list / array, second argument must be a
 string");
    }
    this.listOI = (ListObjectInspector) a;
    this.elementOI = (StringObjectInspector) b;
    
    // 2. Check that the list contains strings
    if(!(listOI.getListElementObjectInspector() instanceof StringObjectInspector)) {
      throw new UDFArgumentException("first argument must be a list of strings");
    }
    
    // the return type of our function is a boolean, so we provide the correct object inspector
    return PrimitiveObjectInspectorFactory.javaBooleanObjectInspector;
  }
  
  @Override
  public Object evaluate(DeferredObject[] arguments) throws HiveException {
    
    // get the list and string from the deferred objects using the object
 inspectors
    List<String> list = (List<String>) this.listOI.getList(arguments[0].get());
    String arg = elementOI.getPrimitiveJavaObject(arguments[1].get());
    
    // check for nulls
    if (list == null || arg == null) {
      return null;
    }
    
    // see if our list contains the value we need
    for(String s: list) {
      if (arg.equals(s)) return new Boolean(true);
    }
    return new Boolean(false);
  }
  
}
 
 
hive> select ComplexUDFExample('a','b','c') from email_list_1 limit 10;
FAILED: SemanticException [Error 10015]: Line 1:7 Arguments length mismatch ''c'': arrayContainsExample only takes 2 arguments: List<T>, T
 
------------------------------------------------------------------------------------------------------------------------------------------
 
How to test this example in Hive query. I know I am invoking it wrong. But how can I invoke it correctly.
 
My requirement is to pass a String of arrays as first argument and another string as second argument in Hive like below.
 
 
Select col1, ComplexUDFExample( collectset(col2) , 'xyz')
from 
Employees
Group By col1;
 
How do i do that?
 
Thanks in advance.
 
Regards,
Raj