You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Michaƫl Mimeault <mi...@gmail.com> on 2013/12/18 22:41:48 UTC

Pig 0.11 - Custom UDF - Classpath

Hi,

I'm trying to get running Pig 0.11 over CDH4. We are currently using Pig
0.9 on CDH3.

I have a little problem with pig classpath in CDH4.
I'm trying to do something very simple and it doesn't to works as it was
working on CDH3.

I have a custom java project, let call it "my". It does have a lot of util
function. Let take "ExtractDate".

I'm registering the jar:

%DEFAULT MY_JAR '/path_to_jar/my.jar'
REGISTER $MY_JAR


And declaring the function:

DEFINE ExtractDate  com.whatever.pig.ExtractDate();


And using it:

a = FOREACH a GENERATE *,FLATTEN(ExtractDate(date))


The problem is that my ExtractDate class use classes defined in libraries
that it depends on(let call it MagicalDateProcess). When running the pig
job, it fail because it can't find MagicalDateProcess classes.
I added all "my.jar" dependencies (jar lib) into the PIG_CLASSPATH. *I ran
the pig job in debug mode and HADOOP_CLASSPATH is also well defined.

In CDH3, adding my all my librairies into PIG_CLASSPATH was enough to make
the MapReduce Job (pig script) works.

To make it work, I have to register dependencies, by example
MagicalDateProcess jar.
But the problem is that my project depends on many more classes(libs) than
MagicalDateProcess. I don't want to add all the Register for every jar that
my project rely on.
That is why we were using an automated generated PIG_CLASSPATH.

Just to visualize my jar dependency:

I want to do this:

   - [Pig script]
      - DEFINE [(ExtractDate) My.jar]
         - *lib/*
            - *[(MagicalDateProcess) MagicalProcessLib.jar]*
            - *and others*
            - *...*

I don't want this:


   - [Pig script]
      - DEFINE [(ExtractDate) My.jar]
      - DEFINE [(MagicalDateProcess) MagicalProcessLib.jar]
      - DEFINE all others [My.jar] lib
      - ...


For now I use the -additionalJars option so it automaticly register all my
libs, but we were wondering why using classpath was enough in Pig 0.9 (on
CDH3) and not Pig 0.11 (on CDH4). Is it suppose to work?

Thanks

Thanks for any help.

Michael.