You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Tamir Kamara <ta...@gmail.com> on 2010/03/10 09:52:07 UTC

Using external jar in UDF

Hi,

I have a function (eval) that needs to use an external jar.
In M/R world this can be accomplished by uploading the jar to the dfs and
using DistributedCache.addFileToClassPath.
How do I do the same (have the jar available for the udf) in pig?

Thanks,
Tamir

Re: Using external jar in UDF

Posted by Tamir Kamara <ta...@gmail.com>.
Hi Jeff,

You are right - I want to use another jar in my own udf.
Packaging both into a single jar is certainly an option but I was hoping pig
would be able to do something similar to regular map-reduce where I push the
jar before hand to the DFS and then add it to the class path via the
distributed cache.

Thanks,
Tamir


On Wed, Mar 10, 2010 at 11:25 AM, Jeff Zhang <zj...@gmail.com> wrote:

> Sorry maybe I misunderstand you.  It seems you'd like to use third-party
> library in your udf, then you need to package your udf and third-party
> library in one jar.
>
>
> On Wed, Mar 10, 2010 at 5:21 PM, Jeff Zhang <zj...@gmail.com> wrote:
>
> > Using *REGISTER myfunc.jar;*
> >
> > refer here:
> >
> http://hadoop.apache.org/pig/docs/r0.5.0/piglatin_reference.html#REGISTER
> >
> >
> >
> > On Wed, Mar 10, 2010 at 4:52 PM, Tamir Kamara <tamirkamara@gmail.com
> >wrote:
> >
> >> Hi,
> >>
> >> I have a function (eval) that needs to use an external jar.
> >> In M/R world this can be accomplished by uploading the jar to the dfs
> and
> >> using DistributedCache.addFileToClassPath.
> >> How do I do the same (have the jar available for the udf) in pig?
> >>
> >> Thanks,
> >> Tamir
> >>
> >
> >
> >
> > --
> > Best Regards
> >
> > Jeff Zhang
> >
>
>
>
> --
> Best Regards
>
> Jeff Zhang
>

Re: Using external jar in UDF

Posted by Jeff Zhang <zj...@gmail.com>.
Sorry maybe I misunderstand you.  It seems you'd like to use third-party
library in your udf, then you need to package your udf and third-party
library in one jar.


On Wed, Mar 10, 2010 at 5:21 PM, Jeff Zhang <zj...@gmail.com> wrote:

> Using *REGISTER myfunc.jar;*
>
> refer here:
> http://hadoop.apache.org/pig/docs/r0.5.0/piglatin_reference.html#REGISTER
>
>
>
> On Wed, Mar 10, 2010 at 4:52 PM, Tamir Kamara <ta...@gmail.com>wrote:
>
>> Hi,
>>
>> I have a function (eval) that needs to use an external jar.
>> In M/R world this can be accomplished by uploading the jar to the dfs and
>> using DistributedCache.addFileToClassPath.
>> How do I do the same (have the jar available for the udf) in pig?
>>
>> Thanks,
>> Tamir
>>
>
>
>
> --
> Best Regards
>
> Jeff Zhang
>



-- 
Best Regards

Jeff Zhang

Re: Using external jar in UDF

Posted by Tamir Kamara <ta...@gmail.com>.
Hi,

In M/R when you need an extra jar to use do you add the jar into the class
path by calling:
DistributedCache.addFileToClassPath(dfs-path-to-jar);

I imagine that the register command does something similar under the covers
but I was just looking for a way to have the UDF load its own dependency jar
and thus not leaving it up to the user to remember to issue the second
register command (for the dependency jar) on it own.

Thanks,
Tamir


On Wed, Mar 10, 2010 at 12:28 PM, Jeff Zhang <zj...@gmail.com> wrote:

> Sorry, what do you mean M/R way ? Actually you do not have way to touch the
> M/R code in pig.
>
> On Wed, Mar 10, 2010 at 6:21 PM, Tamir Kamara <ta...@gmail.com>
> wrote:
>
> > Hi,
> >
> > Register is working fine but it means that the user needs to know when
> it's
> > needed to register the additional jar. What about my question regarding
> the
> > M/R way of doing this ?
> >
> > Thanks,
> > Tamir
> >
> > On Wed, Mar 10, 2010 at 11:21 AM, Jeff Zhang <zj...@gmail.com> wrote:
> >
> > > Using *REGISTER myfunc.jar;*
> > >
> > > refer here:
> > >
> >
> http://hadoop.apache.org/pig/docs/r0.5.0/piglatin_reference.html#REGISTER
> > >
> > >
> > > On Wed, Mar 10, 2010 at 4:52 PM, Tamir Kamara <ta...@gmail.com>
> > > wrote:
> > >
> > > > Hi,
> > > >
> > > > I have a function (eval) that needs to use an external jar.
> > > > In M/R world this can be accomplished by uploading the jar to the dfs
> > and
> > > > using DistributedCache.addFileToClassPath.
> > > > How do I do the same (have the jar available for the udf) in pig?
> > > >
> > > > Thanks,
> > > > Tamir
> > > >
> > >
> > >
> > >
> > > --
> > > Best Regards
> > >
> > > Jeff Zhang
> > >
> >
>
>
>
> --
> Best Regards
>
> Jeff Zhang
>

Re: Using external jar in UDF

Posted by Jeff Zhang <zj...@gmail.com>.
Sorry, what do you mean M/R way ? Actually you do not have way to touch the
M/R code in pig.

On Wed, Mar 10, 2010 at 6:21 PM, Tamir Kamara <ta...@gmail.com> wrote:

> Hi,
>
> Register is working fine but it means that the user needs to know when it's
> needed to register the additional jar. What about my question regarding the
> M/R way of doing this ?
>
> Thanks,
> Tamir
>
> On Wed, Mar 10, 2010 at 11:21 AM, Jeff Zhang <zj...@gmail.com> wrote:
>
> > Using *REGISTER myfunc.jar;*
> >
> > refer here:
> >
> http://hadoop.apache.org/pig/docs/r0.5.0/piglatin_reference.html#REGISTER
> >
> >
> > On Wed, Mar 10, 2010 at 4:52 PM, Tamir Kamara <ta...@gmail.com>
> > wrote:
> >
> > > Hi,
> > >
> > > I have a function (eval) that needs to use an external jar.
> > > In M/R world this can be accomplished by uploading the jar to the dfs
> and
> > > using DistributedCache.addFileToClassPath.
> > > How do I do the same (have the jar available for the udf) in pig?
> > >
> > > Thanks,
> > > Tamir
> > >
> >
> >
> >
> > --
> > Best Regards
> >
> > Jeff Zhang
> >
>



-- 
Best Regards

Jeff Zhang

Re: Using external jar in UDF

Posted by zaki rahaman <za...@gmail.com>.
Hey,

How's the progress on teh JSON UDF? If you post it on the Pig JIRA I could
get a chance to take a look and help out. Also it would get the ball rolling
on getting the UDF added to piggybank

On Mon, Mar 15, 2010 at 4:52 PM, Corbin Hoenes <co...@tynt.com> wrote:

> Okay what do you mean by "package and send along"?  What is the pig way to
> include additional jars?  e.g. we want to use a 3rd party library to encode
> json and how can our UDF reference that jar?
>
> On Mar 15, 2010, at 12:49 PM, Alan Gates wrote:
>
> > The UDF interface does not currently include the ability for a UDF to
> indicate additional jars it would like to have packaged and sent along.
> >
> > Alan.
> >
> > On Mar 10, 2010, at 2:21 AM, Tamir Kamara wrote:
> >
> >> Hi,
> >>
> >> Register is working fine but it means that the user needs to know when
> it's
> >> needed to register the additional jar. What about my question regarding
> the
> >> M/R way of doing this ?
> >>
> >> Thanks,
> >> Tamir
> >>
> >> On Wed, Mar 10, 2010 at 11:21 AM, Jeff Zhang <zj...@gmail.com> wrote:
> >>
> >>> Using *REGISTER myfunc.jar;*
> >>>
> >>> refer here:
> >>>
> http://hadoop.apache.org/pig/docs/r0.5.0/piglatin_reference.html#REGISTER
> >>>
> >>>
> >>> On Wed, Mar 10, 2010 at 4:52 PM, Tamir Kamara <ta...@gmail.com>
> >>> wrote:
> >>>
> >>>> Hi,
> >>>>
> >>>> I have a function (eval) that needs to use an external jar.
> >>>> In M/R world this can be accomplished by uploading the jar to the dfs
> and
> >>>> using DistributedCache.addFileToClassPath.
> >>>> How do I do the same (have the jar available for the udf) in pig?
> >>>>
> >>>> Thanks,
> >>>> Tamir
> >>>>
> >>>
> >>>
> >>>
> >>> --
> >>> Best Regards
> >>>
> >>> Jeff Zhang
> >>>
> >
>
>


-- 
Zaki Rahaman

Re: Using external jar in UDF

Posted by zaki rahaman <za...@gmail.com>.
Hey Corbin,

Alternatively, you could use whatever build tool you're using (Maven, Ant)
and include the JSON library as a dependency and configure so that you can
have it build a jar with dependencies.

On Mon, Mar 15, 2010 at 5:08 PM, Dmitriy Ryaboy <dv...@gmail.com> wrote:

> Your UDF will reference the classes the regular way - just use imports. The
> trick is to make sure the jars are on the machine & classpath. Two ways to
> do this -- pre-load them on the cluster and have them configured to be on
> the default classpath, or use Pig's "REGISTER" keyword to register both
> your
> UDF jar and the dependencies (once per each jar).  What Alan is saying,
> there is no way to create a udf that would somehow tell pig that it needs
> to
> package up and send a jar file located somewhere on the client machine --
> you have to do that in the pig script yourself.
>
> Additionally, thanks to Thejas, you can register jars on the command line
> if
> you are on Pig 0.7 (trunk): https://issues.apache.org/jira/browse/PIG-1226
>
>
> On Mon, Mar 15, 2010 at 1:52 PM, Corbin Hoenes <co...@tynt.com> wrote:
>
> > Okay what do you mean by "package and send along"?  What is the pig way
> to
> > include additional jars?  e.g. we want to use a 3rd party library to
> encode
> > json and how can our UDF reference that jar?
> >
> > On Mar 15, 2010, at 12:49 PM, Alan Gates wrote:
> >
> > > The UDF interface does not currently include the ability for a UDF to
> > indicate additional jars it would like to have packaged and sent along.
> > >
> > > Alan.
> > >
> > > On Mar 10, 2010, at 2:21 AM, Tamir Kamara wrote:
> > >
> > >> Hi,
> > >>
> > >> Register is working fine but it means that the user needs to know when
> > it's
> > >> needed to register the additional jar. What about my question
> regarding
> > the
> > >> M/R way of doing this ?
> > >>
> > >> Thanks,
> > >> Tamir
> > >>
> > >> On Wed, Mar 10, 2010 at 11:21 AM, Jeff Zhang <zj...@gmail.com>
> wrote:
> > >>
> > >>> Using *REGISTER myfunc.jar;*
> > >>>
> > >>> refer here:
> > >>>
> >
> http://hadoop.apache.org/pig/docs/r0.5.0/piglatin_reference.html#REGISTER
> > >>>
> > >>>
> > >>> On Wed, Mar 10, 2010 at 4:52 PM, Tamir Kamara <tamirkamara@gmail.com
> >
> > >>> wrote:
> > >>>
> > >>>> Hi,
> > >>>>
> > >>>> I have a function (eval) that needs to use an external jar.
> > >>>> In M/R world this can be accomplished by uploading the jar to the
> dfs
> > and
> > >>>> using DistributedCache.addFileToClassPath.
> > >>>> How do I do the same (have the jar available for the udf) in pig?
> > >>>>
> > >>>> Thanks,
> > >>>> Tamir
> > >>>>
> > >>>
> > >>>
> > >>>
> > >>> --
> > >>> Best Regards
> > >>>
> > >>> Jeff Zhang
> > >>>
> > >
> >
> >
>



-- 
Zaki Rahaman

Re: Using external jar in UDF

Posted by Dmitriy Ryaboy <dv...@gmail.com>.
Your UDF will reference the classes the regular way - just use imports. The
trick is to make sure the jars are on the machine & classpath. Two ways to
do this -- pre-load them on the cluster and have them configured to be on
the default classpath, or use Pig's "REGISTER" keyword to register both your
UDF jar and the dependencies (once per each jar).  What Alan is saying,
there is no way to create a udf that would somehow tell pig that it needs to
package up and send a jar file located somewhere on the client machine --
you have to do that in the pig script yourself.

Additionally, thanks to Thejas, you can register jars on the command line if
you are on Pig 0.7 (trunk): https://issues.apache.org/jira/browse/PIG-1226


On Mon, Mar 15, 2010 at 1:52 PM, Corbin Hoenes <co...@tynt.com> wrote:

> Okay what do you mean by "package and send along"?  What is the pig way to
> include additional jars?  e.g. we want to use a 3rd party library to encode
> json and how can our UDF reference that jar?
>
> On Mar 15, 2010, at 12:49 PM, Alan Gates wrote:
>
> > The UDF interface does not currently include the ability for a UDF to
> indicate additional jars it would like to have packaged and sent along.
> >
> > Alan.
> >
> > On Mar 10, 2010, at 2:21 AM, Tamir Kamara wrote:
> >
> >> Hi,
> >>
> >> Register is working fine but it means that the user needs to know when
> it's
> >> needed to register the additional jar. What about my question regarding
> the
> >> M/R way of doing this ?
> >>
> >> Thanks,
> >> Tamir
> >>
> >> On Wed, Mar 10, 2010 at 11:21 AM, Jeff Zhang <zj...@gmail.com> wrote:
> >>
> >>> Using *REGISTER myfunc.jar;*
> >>>
> >>> refer here:
> >>>
> http://hadoop.apache.org/pig/docs/r0.5.0/piglatin_reference.html#REGISTER
> >>>
> >>>
> >>> On Wed, Mar 10, 2010 at 4:52 PM, Tamir Kamara <ta...@gmail.com>
> >>> wrote:
> >>>
> >>>> Hi,
> >>>>
> >>>> I have a function (eval) that needs to use an external jar.
> >>>> In M/R world this can be accomplished by uploading the jar to the dfs
> and
> >>>> using DistributedCache.addFileToClassPath.
> >>>> How do I do the same (have the jar available for the udf) in pig?
> >>>>
> >>>> Thanks,
> >>>> Tamir
> >>>>
> >>>
> >>>
> >>>
> >>> --
> >>> Best Regards
> >>>
> >>> Jeff Zhang
> >>>
> >
>
>

Re: Using external jar in UDF

Posted by Corbin Hoenes <co...@tynt.com>.
Okay what do you mean by "package and send along"?  What is the pig way to include additional jars?  e.g. we want to use a 3rd party library to encode json and how can our UDF reference that jar?

On Mar 15, 2010, at 12:49 PM, Alan Gates wrote:

> The UDF interface does not currently include the ability for a UDF to indicate additional jars it would like to have packaged and sent along.
> 
> Alan.
> 
> On Mar 10, 2010, at 2:21 AM, Tamir Kamara wrote:
> 
>> Hi,
>> 
>> Register is working fine but it means that the user needs to know when it's
>> needed to register the additional jar. What about my question regarding the
>> M/R way of doing this ?
>> 
>> Thanks,
>> Tamir
>> 
>> On Wed, Mar 10, 2010 at 11:21 AM, Jeff Zhang <zj...@gmail.com> wrote:
>> 
>>> Using *REGISTER myfunc.jar;*
>>> 
>>> refer here:
>>> http://hadoop.apache.org/pig/docs/r0.5.0/piglatin_reference.html#REGISTER
>>> 
>>> 
>>> On Wed, Mar 10, 2010 at 4:52 PM, Tamir Kamara <ta...@gmail.com>
>>> wrote:
>>> 
>>>> Hi,
>>>> 
>>>> I have a function (eval) that needs to use an external jar.
>>>> In M/R world this can be accomplished by uploading the jar to the dfs and
>>>> using DistributedCache.addFileToClassPath.
>>>> How do I do the same (have the jar available for the udf) in pig?
>>>> 
>>>> Thanks,
>>>> Tamir
>>>> 
>>> 
>>> 
>>> 
>>> --
>>> Best Regards
>>> 
>>> Jeff Zhang
>>> 
> 


Re: Using external jar in UDF

Posted by Alan Gates <ga...@yahoo-inc.com>.
The UDF interface does not currently include the ability for a UDF to  
indicate additional jars it would like to have packaged and sent along.

Alan.

On Mar 10, 2010, at 2:21 AM, Tamir Kamara wrote:

> Hi,
>
> Register is working fine but it means that the user needs to know  
> when it's
> needed to register the additional jar. What about my question  
> regarding the
> M/R way of doing this ?
>
> Thanks,
> Tamir
>
> On Wed, Mar 10, 2010 at 11:21 AM, Jeff Zhang <zj...@gmail.com> wrote:
>
>> Using *REGISTER myfunc.jar;*
>>
>> refer here:
>> http://hadoop.apache.org/pig/docs/r0.5.0/piglatin_reference.html#REGISTER
>>
>>
>> On Wed, Mar 10, 2010 at 4:52 PM, Tamir Kamara <ta...@gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> I have a function (eval) that needs to use an external jar.
>>> In M/R world this can be accomplished by uploading the jar to the  
>>> dfs and
>>> using DistributedCache.addFileToClassPath.
>>> How do I do the same (have the jar available for the udf) in pig?
>>>
>>> Thanks,
>>> Tamir
>>>
>>
>>
>>
>> --
>> Best Regards
>>
>> Jeff Zhang
>>


Re: Using external jar in UDF

Posted by Tamir Kamara <ta...@gmail.com>.
Hi,

Register is working fine but it means that the user needs to know when it's
needed to register the additional jar. What about my question regarding the
M/R way of doing this ?

Thanks,
Tamir

On Wed, Mar 10, 2010 at 11:21 AM, Jeff Zhang <zj...@gmail.com> wrote:

> Using *REGISTER myfunc.jar;*
>
> refer here:
> http://hadoop.apache.org/pig/docs/r0.5.0/piglatin_reference.html#REGISTER
>
>
> On Wed, Mar 10, 2010 at 4:52 PM, Tamir Kamara <ta...@gmail.com>
> wrote:
>
> > Hi,
> >
> > I have a function (eval) that needs to use an external jar.
> > In M/R world this can be accomplished by uploading the jar to the dfs and
> > using DistributedCache.addFileToClassPath.
> > How do I do the same (have the jar available for the udf) in pig?
> >
> > Thanks,
> > Tamir
> >
>
>
>
> --
> Best Regards
>
> Jeff Zhang
>

Re: Using external jar in UDF

Posted by Jeff Zhang <zj...@gmail.com>.
Using *REGISTER myfunc.jar;*

refer here:
http://hadoop.apache.org/pig/docs/r0.5.0/piglatin_reference.html#REGISTER


On Wed, Mar 10, 2010 at 4:52 PM, Tamir Kamara <ta...@gmail.com> wrote:

> Hi,
>
> I have a function (eval) that needs to use an external jar.
> In M/R world this can be accomplished by uploading the jar to the dfs and
> using DistributedCache.addFileToClassPath.
> How do I do the same (have the jar available for the udf) in pig?
>
> Thanks,
> Tamir
>



-- 
Best Regards

Jeff Zhang