You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Saurabh Nanda <sa...@gmail.com> on 2009/07/14 10:09:01 UTC

Creating a UDF (was Hive SerDe?)

I'm trying to register a UDF to parse my log file format. Where can I find
documentation for creating and registering a UDF?

My attempts failed with this error:

hive> create temporary function process_line as 'LogProcessor';
FAILED: Unknown exception : Registering UDF Class class LogProcessor which
does not extends class org.apache.hadoop.hive.ql.exec.UDF

Specific questions:

1. Do I need to define the a particular function in the class? For example,
run()
2. What arguments should that function accept?
3. What should be the return type of that function?
4. What if the function needs to return multiple values? Each value mapping
to a column in the table?

Saurabh.
-- 
http://nandz.blogspot.com
http://foodieforlife.blogspot.com

Re: Creating a UDF (was Hive SerDe?)

Posted by Saurabh Nanda <sa...@gmail.com>.
Any chance of making a binary Hive release with the latest features?

Saurabh.

On Sat, Jul 18, 2009 at 12:53 AM, Ashish Thusoo <at...@facebook.com>wrote:

>  You should try it. Eva though mentioned that there was something wrong
> with group by and joins in the trunk but we should be able to figure that
> out if that is a problem soon. We have already deployed the trunk to our
> adhoc users within FB so it should be stable enough.
>
> Ashish
>
>  ------------------------------
> *From:* Saurabh Nanda [mailto:saurabhnanda@gmail.com]
> *Sent:* Thursday, July 16, 2009 11:33 PM
> *To:* hive-user@hadoop.apache.org
> *Subject:* Re: Creating a UDF (was Hive SerDe?)
>
>
>
> the release is quite old,  we implemented "add jar" after this release.
>
>
>
> Should I just compile Hive directly from
> http://svn.apache.org/repos/asf/hadoop/hive/trunk/ ? Is it stable enough?
>
> Saurabh.
> --
> http://nandz.blogspot.com
> http://foodieforlife.blogspot.com
>



-- 
http://nandz.blogspot.com
http://foodieforlife.blogspot.com

RE: Creating a UDF (was Hive SerDe?)

Posted by Ashish Thusoo <at...@facebook.com>.
You should try it. Eva though mentioned that there was something wrong with group by and joins in the trunk but we should be able to figure that out if that is a problem soon. We have already deployed the trunk to our adhoc users within FB so it should be stable enough.

Ashish

________________________________
From: Saurabh Nanda [mailto:saurabhnanda@gmail.com]
Sent: Thursday, July 16, 2009 11:33 PM
To: hive-user@hadoop.apache.org
Subject: Re: Creating a UDF (was Hive SerDe?)



the release is quite old,  we implemented "add jar" after this release.


Should I just compile Hive directly from http://svn.apache.org/repos/asf/hadoop/hive/trunk/ ? Is it stable enough?

Saurabh.
--
http://nandz.blogspot.com
http://foodieforlife.blogspot.com

Re: Creating a UDF (was Hive SerDe?)

Posted by Saurabh Nanda <sa...@gmail.com>.
the release is quite old,  we implemented "add jar" after this release.



Should I just compile Hive directly from
http://svn.apache.org/repos/asf/hadoop/hive/trunk/ ? Is it stable enough?

Saurabh.
-- 
http://nandz.blogspot.com
http://foodieforlife.blogspot.com

Re: Creating a UDF (was Hive SerDe?)

Posted by Min Zhou <co...@gmail.com>.
the release is quite old,  we implemented "add jar" after this release.

On Thu, Jul 16, 2009 at 7:44 PM, Saurabh Nanda <sa...@gmail.com>wrote:

>
>  Did you run ‘add jar path_to_the-jar-including-your-udfclass’ first
>> before you issue ‘CREATE TEMPORARY FUNCTION’;
>>
>
>
> hive> add jar myjar.jar;
> Usage: add [FILE] <value> [<value>]*
> hive> add file myjar.jar;
> hive>
>
> Apparently, ADD JAR doesn't work for me. I am however using ADD FILE before
> CREATE TEMPORARY FUNCTION.
>
> I'm on the hive-0.3.0-hadoop-0.18.0-bin release.
>
> Saurabh.
> --
> http://nandz.blogspot.com
> http://foodieforlife.blogspot.com
>



-- 
My research interests are distributed systems, parallel computing and
bytecode based virtual machine.

My profile:
http://www.linkedin.com/in/coderplay
My blog:
http://coderplay.javaeye.com

Re: Creating a UDF (was Hive SerDe?)

Posted by Saurabh Nanda <sa...@gmail.com>.
>  Did you run ‘add jar path_to_the-jar-including-your-udfclass’ first
> before you issue ‘CREATE TEMPORARY FUNCTION’;
>


hive> add jar myjar.jar;
Usage: add [FILE] <value> [<value>]*
hive> add file myjar.jar;
hive>

Apparently, ADD JAR doesn't work for me. I am however using ADD FILE before
CREATE TEMPORARY FUNCTION.

I'm on the hive-0.3.0-hadoop-0.18.0-bin release.

Saurabh.
-- 
http://nandz.blogspot.com
http://foodieforlife.blogspot.com

Re: Creating a UDF (was Hive SerDe?)

Posted by He Yongqiang <he...@software.ict.ac.cn>.
Did you run ‘add jar path_to_the-jar-including-your-udfclass’ first before
you issue ‘CREATE TEMPORARY FUNCTION’;
The actual mapper and reducer are run in the hadoop cluster, and hive’s “add
jar command” will distribute you jar file to the worker nodes using hadoop’s
distributed cache, so the mappers and reducers running in those machines can
find the class. 


On 09-7-16 下午7:10, "Saurabh Nanda" <sa...@gmail.com> wrote:

> I've added the JAR (containing my UDF class) in the session. I've issued the
> CREATE TEMPORARY FUNCTION command. However, all my map tasks fail with  a
> ClassNotFoundException when I try to run a query with the UDF:
> 
> select ct_ip_address(line) from raw limit 10;
> 
> (ct_ip_address is the UDF I have registered against my class)
> 
> What am I doing wrong? Does a class extending UDF need to be in a particular
> package? 
> 
> Saurabh.


Re: Creating a UDF (was Hive SerDe?)

Posted by Saurabh Nanda <sa...@gmail.com>.
I've added the JAR (containing my UDF class) in the session. I've issued the
CREATE TEMPORARY FUNCTION command. However, all my map tasks fail with a
ClassNotFoundException when I try to run a query with the UDF:

select ct_ip_address(line) from raw limit 10;

(ct_ip_address is the UDF I have registered against my class)

What am I doing wrong? Does a class extending UDF need to be in a particular
package?

Saurabh.
-- 
http://nandz.blogspot.com
http://foodieforlife.blogspot.com

Re: Creating a UDF (was Hive SerDe?)

Posted by Zheng Shao <zs...@gmail.com>.
HI Saurabh,

Hive supports both UDF and GenericUDF.

UDF are much easier to write, but it is currently limited to work with
primitive types (including String).

GenericUDF supports advanced features including complex type
parameters/return values, short-circuit computation, complete object
reuse (no need to create a single new object for each call) etc.
Some of these features are not currently provided in other systems
yet, so GenericUDF looks more complicated.

I guess you just a normal UDF for now. Please take a look at
UDF*.java. Those are very easy to understand and write.


Zheng

On Wed, Jul 15, 2009 at 5:20 AM, Saurabh Nanda<sa...@gmail.com> wrote:
>
>>> 0.3 is quite old. You should look at trunk
>>> http://svn.apache.org/repos/asf/hadoop/hive. We are going to create a 0.4
>>> branch soon.
>
> http://svn.apache.org/repos/asf/hadoop/hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/udf/generic/GenericUDFTestTranslate.java
> looks so intimidating! Also, this is not exactly a UDF that returns multiple
> values, is it?
>
> Have you compared this with the approach Cloudbase is taking to UDFs? It's a
> breeze. Why is Hive putting so much of complexity into this?
>
> Saurabh.
> --
> http://nandz.blogspot.com
> http://foodieforlife.blogspot.com
>



-- 
Yours,
Zheng

Re: Creating a UDF (was Hive SerDe?)

Posted by Saurabh Nanda <sa...@gmail.com>.
> 0.3 is quite old. You should look at trunk
>> http://svn.apache.org/repos/asf/hadoop/hive. We are going to create a 0.4
>> branch soon.
>
>
http://svn.apache.org/repos/asf/hadoop/hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/udf/generic/GenericUDFTestTranslate.javalooks
so intimidating! Also, this is not exactly a UDF that returns multiple
values, is it?

Have you compared this with the approach Cloudbase is taking to UDFs? It's a
breeze. Why is Hive putting so much of complexity into this?

Saurabh.
-- 
http://nandz.blogspot.com
http://foodieforlife.blogspot.com

Re: Creating a UDF (was Hive SerDe?)

Posted by Saurabh Nanda <sa...@gmail.com>.
0.3 is quite old. You should look at trunk
> http://svn.apache.org/repos/asf/hadoop/hive. We are going to create a 0.4
> branch soon.


Is the file not available in 0.3 or the feature itself is not available?
Does this mean I need to compile Hive from source to create a UDF?

Saurabh.
-- 
http://nandz.blogspot.com
http://foodieforlife.blogspot.com

Re: Creating a UDF (was Hive SerDe?)

Posted by Raghu Murthy <rm...@facebook.com>.
0.3 is quite old. You should look at trunk
http://svn.apache.org/repos/asf/hadoop/hive. We are going to create a 0.4
branch soon.

On 7/15/09 4:55 AM, "Saurabh Nanda" <sa...@gmail.com> wrote:

>>  
>> You can look at the following test case in the source tree to guide you on
>> how 
>> to build a udf. Will put this on the wiki.
>>  
>> create_genericudf.q
>> udf_testlength.q
> 
> Hi Ashish,
> 
> I found the udf_testlength.q script and the class that if refers. However, I
> couldn't find the create_genericudf.q file. I grepped the entire release, but
> could not find the string 'genericudf' anywhere. This is the release in which
> I'm looking -- 
> http://apache.mirrors.tds.net/hadoop/hive/hive-0.3.0/hive-0.3.0-hadoop-0.18.0-
> dev.tar.gz
> 
> In fact, the file ./src/ql/src/java/org/apache/hadoop/hive/ql/exec/UDF.java
> itself does not refer to GenericUDF.
> 
> Saurabh.


Re: Creating a UDF (was Hive SerDe?)

Posted by Saurabh Nanda <sa...@gmail.com>.
>
>
> You can look at the following test case in the source tree to guide you on
> how to build a udf. Will put this on the wiki.
>
> create_genericudf.q
> udf_testlength.q
>

Hi Ashish,

I found the udf_testlength.q script and the class that if refers. However, I
couldn't find the create_genericudf.q file. I grepped the entire release,
but could not find the string 'genericudf' anywhere. This is the release in
which I'm looking --
http://apache.mirrors.tds.net/hadoop/hive/hive-0.3.0/hive-0.3.0-hadoop-0.18.0-dev.tar.gz

In fact, the file ./src/ql/src/java/org/apache/hadoop/hive/ql/exec/UDF.java
itself does not refer to GenericUDF.

Saurabh.
-- 
http://nandz.blogspot.com
http://foodieforlife.blogspot.com

Re: Creating a UDF (was Hive SerDe?)

Posted by Min Zhou <co...@gmail.com>.
Hi Saurabh,

Ahish is right.  You udf must inherit UDF or GenericUDF.  If you build that
udf class into a seperate jar,  "add jar " command should be run at first.

hive> add jar jar_path;
hive> create temporary function udf_name as 'UdfClass';

Hope helpful.


Min

On Wed, Jul 15, 2009 at 2:44 AM, Ashish Thusoo <at...@facebook.com> wrote:

>  Not sure if you got an answer for this.
>
> You can look at the following test case in the source tree to guide you on
> how to build a udf. Will put this on the wiki.
>
> create_genericudf.q
> udf_testlength.q
>
> The udf has to implement either the UDF interface or the GenericUDF
> interface. The later handles cases for UDFs that can take complex objects as
> arguments or have variable length arguments or return complex objects. The
> UDF interface is easier to program to, but is more limited than the
> GenericUDF interface.
>
> There are some nuances that you need to be aware of about the function
> resolution logic incase the UDF has polymorphism in the evaluate functions.
> I can go into more details if that is the case for you.
>
> Ashish
>
>
>  ------------------------------
> *From:* Saurabh Nanda [mailto:saurabhnanda@gmail.com]
> *Sent:* Tuesday, July 14, 2009 1:09 AM
> *To:* hive-user@hadoop.apache.org
> *Subject:* Creating a UDF (was Hive SerDe?)
>
> I'm trying to register a UDF to parse my log file format. Where can I find
> documentation for creating and registering a UDF?
>
> My attempts failed with this error:
>
> hive> create temporary function process_line as 'LogProcessor';
> FAILED: Unknown exception : Registering UDF Class class LogProcessor which
> does not extends class org.apache.hadoop.hive.ql.exec.UDF
>
> Specific questions:
>
> 1. Do I need to define the a particular function in the class? For example,
> run()
> 2. What arguments should that function accept?
> 3. What should be the return type of that function?
> 4. What if the function needs to return multiple values? Each value mapping
> to a column in the table?
>
> Saurabh.
> --
> http://nandz.blogspot.com
> http://foodieforlife.blogspot.com
>



-- 
My research interests are distributed systems, parallel computing and
bytecode based virtual machine.

My profile:
http://www.linkedin.com/in/coderplay
My blog:
http://coderplay.javaeye.com

RE: Creating a UDF (was Hive SerDe?)

Posted by Ashish Thusoo <at...@facebook.com>.
Not sure if you got an answer for this.

You can look at the following test case in the source tree to guide you on how to build a udf. Will put this on the wiki.

create_genericudf.q
udf_testlength.q

The udf has to implement either the UDF interface or the GenericUDF interface. The later handles cases for UDFs that can take complex objects as arguments or have variable length arguments or return complex objects. The UDF interface is easier to program to, but is more limited than the GenericUDF interface.

There are some nuances that you need to be aware of about the function resolution logic incase the UDF has polymorphism in the evaluate functions. I can go into more details if that is the case for you.

Ashish


________________________________
From: Saurabh Nanda [mailto:saurabhnanda@gmail.com]
Sent: Tuesday, July 14, 2009 1:09 AM
To: hive-user@hadoop.apache.org
Subject: Creating a UDF (was Hive SerDe?)

I'm trying to register a UDF to parse my log file format. Where can I find documentation for creating and registering a UDF?

My attempts failed with this error:

hive> create temporary function process_line as 'LogProcessor';
FAILED: Unknown exception : Registering UDF Class class LogProcessor which does not extends class org.apache.hadoop.hive.ql.exec.UDF

Specific questions:

1. Do I need to define the a particular function in the class? For example, run()
2. What arguments should that function accept?
3. What should be the return type of that function?
4. What if the function needs to return multiple values? Each value mapping to a column in the table?

Saurabh.
--
http://nandz.blogspot.com
http://foodieforlife.blogspot.com