You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Michael Jiang <it...@gmail.com> on 2011/03/28 20:13:22 UTC

UDF of processing a column like pig?

I want to have a function that returns the hour information from a date
string (wiz not a known format supported by current date functions, but
basically this is just a use case example). In pig, you can create a UDF
(user defined function) and apply it to a column. Is there an equivalent
thing in Hive? I know I can write a mapper to take whole line and transform
the specific column I want. But it's not convenient because you have to
duplicate the same code for each different source table.

Thanks!

Re: UDF of processing a column like pig?

Posted by Michael Jiang <it...@gmail.com>.
Ok, there is UDF support in hive. So, I guess my question is how to create a
UDF, what language is supported, and how to use a udf?


On Mon, Mar 28, 2011 at 11:13 AM, Michael Jiang <it...@gmail.com>wrote:

> I want to have a function that returns the hour information from a date
> string (wiz not a known format supported by current date functions, but
> basically this is just a use case example). In pig, you can create a UDF
> (user defined function) and apply it to a column. Is there an equivalent
> thing in Hive? I know I can write a mapper to take whole line and transform
> the specific column I want. But it's not convenient because you have to
> duplicate the same code for each different source table.
>
> Thanks!
>

Re: UDF of processing a column like pig?

Posted by Sameer Kalburgi <sa...@gmail.com>.
Somewhat unrelated, but is there any info/docs on how much overhead using a
UDF adds to the map jobs?

On Mon, Mar 28, 2011 at 3:48 PM, Edward Capriolo <ed...@gmail.com>wrote:

> On Mon, Mar 28, 2011 at 2:45 PM, Christopher, Pat
> <pa...@hp.com> wrote:
> > I think so.  I haven’t looked for a non-java API though.
> >
> >
> >
> > Pat
> >
> >
> >
> > From: Michael Jiang [mailto:it.mjjiang@gmail.com]
> > Sent: Monday, March 28, 2011 11:31 AM
> > To: user@hive.apache.org
> > Cc: Christopher, Pat
> > Subject: Re: UDF of processing a column like pig?
> >
> >
> >
> > Thanks, so, I guess only java is supported, right?
> >
> > On Mon, Mar 28, 2011 at 11:27 AM, Christopher, Pat
> > <pa...@hp.com> wrote:
> >
> > You can create UDFs, UDAFs and UDTFs.
> > http://wiki.apache.org/hadoop/Hive/HivePlugins
> >
> >
> >
> > Pat
> >
> >
> >
> > From: Michael Jiang [mailto:it.mjjiang@gmail.com]
> > Sent: Monday, March 28, 2011 11:13 AM
> > To: user@hive.apache.org
> > Subject: UDF of processing a column like pig?
> >
> >
> >
> > I want to have a function that returns the hour information from a date
> > string (wiz not a known format supported by current date functions, but
> > basically this is just a use case example). In pig, you can create a UDF
> > (user defined function) and apply it to a column. Is there an equivalent
> > thing in Hive? I know I can write a mapper to take whole line and
> transform
> > the specific column I want. But it's not convenient because you have to
> > duplicate the same code for each different source table.
> >
> > Thanks!
> >
> >
>
> Hive is written in Java. Thus UDF's are written in java. If you want
> to use a language outside of Java you need to use streaming.
>
> http://wiki.apache.org/hadoop/Hive/LanguageManual/Transform
>
> (but why would you want to since Java is the rockstar of all languages?)
>

Re: UDF of processing a column like pig?

Posted by Michael Jiang <it...@gmail.com>.
Streaming is not efficient as I mentioned in first email. Sure most time
will write udf in java. But sometimes you might want to use sth else like
python to do a quick check :)

On Mon, Mar 28, 2011 at 12:48 PM, Edward Capriolo <ed...@gmail.com>wrote:

> On Mon, Mar 28, 2011 at 2:45 PM, Christopher, Pat
> <pa...@hp.com> wrote:
> > I think so.  I haven’t looked for a non-java API though.
> >
> >
> >
> > Pat
> >
> >
> >
> > From: Michael Jiang [mailto:it.mjjiang@gmail.com]
> > Sent: Monday, March 28, 2011 11:31 AM
> > To: user@hive.apache.org
> > Cc: Christopher, Pat
> > Subject: Re: UDF of processing a column like pig?
> >
> >
> >
> > Thanks, so, I guess only java is supported, right?
> >
> > On Mon, Mar 28, 2011 at 11:27 AM, Christopher, Pat
> > <pa...@hp.com> wrote:
> >
> > You can create UDFs, UDAFs and UDTFs.
> > http://wiki.apache.org/hadoop/Hive/HivePlugins
> >
> >
> >
> > Pat
> >
> >
> >
> > From: Michael Jiang [mailto:it.mjjiang@gmail.com]
> > Sent: Monday, March 28, 2011 11:13 AM
> > To: user@hive.apache.org
> > Subject: UDF of processing a column like pig?
> >
> >
> >
> > I want to have a function that returns the hour information from a date
> > string (wiz not a known format supported by current date functions, but
> > basically this is just a use case example). In pig, you can create a UDF
> > (user defined function) and apply it to a column. Is there an equivalent
> > thing in Hive? I know I can write a mapper to take whole line and
> transform
> > the specific column I want. But it's not convenient because you have to
> > duplicate the same code for each different source table.
> >
> > Thanks!
> >
> >
>
> Hive is written in Java. Thus UDF's are written in java. If you want
> to use a language outside of Java you need to use streaming.
>
> http://wiki.apache.org/hadoop/Hive/LanguageManual/Transform
>
> (but why would you want to since Java is the rockstar of all languages?)
>

Re: UDF of processing a column like pig?

Posted by Edward Capriolo <ed...@gmail.com>.
On Mon, Mar 28, 2011 at 2:45 PM, Christopher, Pat
<pa...@hp.com> wrote:
> I think so.  I haven’t looked for a non-java API though.
>
>
>
> Pat
>
>
>
> From: Michael Jiang [mailto:it.mjjiang@gmail.com]
> Sent: Monday, March 28, 2011 11:31 AM
> To: user@hive.apache.org
> Cc: Christopher, Pat
> Subject: Re: UDF of processing a column like pig?
>
>
>
> Thanks, so, I guess only java is supported, right?
>
> On Mon, Mar 28, 2011 at 11:27 AM, Christopher, Pat
> <pa...@hp.com> wrote:
>
> You can create UDFs, UDAFs and UDTFs.
> http://wiki.apache.org/hadoop/Hive/HivePlugins
>
>
>
> Pat
>
>
>
> From: Michael Jiang [mailto:it.mjjiang@gmail.com]
> Sent: Monday, March 28, 2011 11:13 AM
> To: user@hive.apache.org
> Subject: UDF of processing a column like pig?
>
>
>
> I want to have a function that returns the hour information from a date
> string (wiz not a known format supported by current date functions, but
> basically this is just a use case example). In pig, you can create a UDF
> (user defined function) and apply it to a column. Is there an equivalent
> thing in Hive? I know I can write a mapper to take whole line and transform
> the specific column I want. But it's not convenient because you have to
> duplicate the same code for each different source table.
>
> Thanks!
>
>

Hive is written in Java. Thus UDF's are written in java. If you want
to use a language outside of Java you need to use streaming.

http://wiki.apache.org/hadoop/Hive/LanguageManual/Transform

(but why would you want to since Java is the rockstar of all languages?)

RE: UDF of processing a column like pig?

Posted by "Christopher, Pat" <pa...@hp.com>.
I think so.  I haven't looked for a non-java API though.

Pat

From: Michael Jiang [mailto:it.mjjiang@gmail.com]
Sent: Monday, March 28, 2011 11:31 AM
To: user@hive.apache.org
Cc: Christopher, Pat
Subject: Re: UDF of processing a column like pig?

Thanks, so, I guess only java is supported, right?
On Mon, Mar 28, 2011 at 11:27 AM, Christopher, Pat <pa...@hp.com>> wrote:
You can create UDFs, UDAFs and UDTFs.  http://wiki.apache.org/hadoop/Hive/HivePlugins

Pat

From: Michael Jiang [mailto:it.mjjiang@gmail.com<ma...@gmail.com>]
Sent: Monday, March 28, 2011 11:13 AM
To: user@hive.apache.org<ma...@hive.apache.org>
Subject: UDF of processing a column like pig?

I want to have a function that returns the hour information from a date string (wiz not a known format supported by current date functions, but basically this is just a use case example). In pig, you can create a UDF (user defined function) and apply it to a column. Is there an equivalent thing in Hive? I know I can write a mapper to take whole line and transform the specific column I want. But it's not convenient because you have to duplicate the same code for each different source table.

Thanks!


Re: UDF of processing a column like pig?

Posted by Michael Jiang <it...@gmail.com>.
Thanks, so, I guess only java is supported, right?

On Mon, Mar 28, 2011 at 11:27 AM, Christopher, Pat <
patrick.christopher@hp.com> wrote:

> You can create UDFs, UDAFs and UDTFs.
> http://wiki.apache.org/hadoop/Hive/HivePlugins
>
>
>
> Pat
>
>
>
> *From:* Michael Jiang [mailto:it.mjjiang@gmail.com]
> *Sent:* Monday, March 28, 2011 11:13 AM
> *To:* user@hive.apache.org
> *Subject:* UDF of processing a column like pig?
>
>
>
> I want to have a function that returns the hour information from a date
> string (wiz not a known format supported by current date functions, but
> basically this is just a use case example). In pig, you can create a UDF
> (user defined function) and apply it to a column. Is there an equivalent
> thing in Hive? I know I can write a mapper to take whole line and transform
> the specific column I want. But it's not convenient because you have to
> duplicate the same code for each different source table.
>
> Thanks!
>

RE: UDF of processing a column like pig?

Posted by "Christopher, Pat" <pa...@hp.com>.
You can create UDFs, UDAFs and UDTFs.  http://wiki.apache.org/hadoop/Hive/HivePlugins

Pat

From: Michael Jiang [mailto:it.mjjiang@gmail.com]
Sent: Monday, March 28, 2011 11:13 AM
To: user@hive.apache.org
Subject: UDF of processing a column like pig?

I want to have a function that returns the hour information from a date string (wiz not a known format supported by current date functions, but basically this is just a use case example). In pig, you can create a UDF (user defined function) and apply it to a column. Is there an equivalent thing in Hive? I know I can write a mapper to take whole line and transform the specific column I want. But it's not convenient because you have to duplicate the same code for each different source table.

Thanks!