You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by Marco Didonna <m....@gmail.com> on 2011/09/14 10:50:36 UTC

Re: -libjars?

Hello everyone,
sorry to bring this up again but I need some clarification. I wrote a
map-reduce application that need cloud9 library
(https://github.com/lintool/Cloud9). This library is packet in a jar
file and I want to make it available to the whole cluster. So far I
have been working in standalone mode and I have unsuccessfully tried
to use the -libjars options. I always get ClassNotDefException: the
only way I made everything work fine is by copying the cloud9.jar into
hadoop/lib folder.
I suppose I cannot do it when using a cluster of N machines since I
would have to copy it on the N machines and this approach isn't
feasible.

Here's how I perform the job "hadoop jar myjob.jar
myjob.driver.PreprocessANC -libjars ../umd-hadoop-core/cloud9.jar
home/my/pyworkspace/openAnc.xml index/ 10 1"

Is there some code that needs to be written in the driver in order to
have the darn library added to the "global" classpath? This -libjars
option is really poor documented IMHO.

Any help would be very much appreciated ;)

Marco Didonna

On 17 August 2011 03:57, Anty <an...@gmail.com> wrote:
> Thanks very much , todd. I get it.
>
>
> On Wed, Aug 17, 2011 at 6:23 AM, Todd Lipcon <to...@cloudera.com> wrote:
>> Putting files on the classpath doesn't make them accessible to JVM's
>> resource loader. If you have dir/foo.properties, then "dir" needs to
>> be on the classpath, not "dir/foo.properties". Since the working dir
>> of the task is on the classpath, then -files works since it gets the
>> properties file into a directory on the classpath.
>>
>> -Todd
>>
>> On Mon, Aug 15, 2011 at 8:09 PM, Anty <an...@gmail.com> wrote:
>>> thanks very much for you reply, todd.
>>> I am at a complete loss. I want to ship a configuration file to the
>>> cluster to run my mapreduce job.
>>>
>>> if I use -libjars option to ship the configuration file, the launched
>>> child JVM created  by task tracker
>>>  can't find the configuration file,curiously, the configuration file
>>> is already on the classpath of the child JVM.
>>>
>>> if I use -files option to ship the configuration file, the child JVM
>>> can find the file.
>>> IMO, what's the difference between -libjars and -files  is that -files
>>> will create a  symbol sink  to the configuration file
>>> in current workding directory of child JVM.
>>>
>>> I dig into the source code,but it's so complicated, i can't figure out
>>> the root cause of this.
>>> So my question is :
>>> with -libjars option ,the configuration file is already on the
>>> classpath, why classload can't the configuration file ,
>>> but why JVM classload CAN find the shipped jar with -libjars option?
>>>
>>> any help will be appreciated.
>>>
>>> On Tue, Aug 16, 2011 at 1:06 AM, Todd Lipcon <to...@cloudera.com> wrote:
>>>> Your "driver" is the program that submits the job. The task is the
>>>> thing that runs on the cluster. They have separate classpaths.
>>>>
>>>> Better to ask on the public lists if you want a more indepth explanation
>>>>
>>>> -Todd
>>>>
>>>> On Mon, Aug 15, 2011 at 9:02 AM, Anty <an...@gmail.com> wrote:
>>>>> Hi:Todd
>>>>> Would you please explain a litter more?
>>>>>
>>>>> On Sat, Dec 11, 2010 at 2:08 AM, Todd Lipcon <to...@cloudera.com> wrote:
>>>>>>
>>>>>> You need to put the library jar on your classpath (eg using
>>>>>> HADOOP_CLASSPATH) as well. The -libjars will ship it to the cluster
>>>>>> and put it on the classpath of your task, but not the classpath of
>>>>>> your "driver" code.
>>>>>>
>>>>> I still can't understand you mean by  " but not the classpath of
>>>>> your "driver" code."
>>>>>
>>>>> THX advance.
>>>>>
>>>>>
>>>>>> -Todd
>>>>>>
>>>>>> On Thu, Dec 9, 2010 at 10:29 PM, Vipul Pandey <vi...@gmail.com> wrote:
>>>>>> > disclaimer : a newbie!!!
>>>>>> > Howdy?
>>>>>> > Got a quick question. -libjars option doesn't seem to work for me in -
>>>>>> > prettymuch - my first (or mayby second) mapreduce job.
>>>>>> > Here's what i'm doing :
>>>>>> > $bin/hadoop jar  sherlock.jar somepkg.FindSchoolsJob -libjars
>>>>>> >  HStats-1A18.jar input output
>>>>>> >
>>>>>> > sherlock.jar has my main class (ofcourse)  FindSchoolsJob, which runs
>>>>>> > just
>>>>>> > fine by itself till I add a dependency on a class in HStats-1A18.jar.
>>>>>> > When I run the above command with -libjars specified - it fails to find
>>>>>> > my
>>>>>> > classes that 'are' inside HStats jar file.
>>>>>> > Exception in thread "main" java.lang.NoClassDefFoundError:
>>>>>> > com/*****/HAgent
>>>>>> > at com.*****.FindSchoolsJob.run(FindSchoolsJob.java:46)
>>>>>> > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>>>>>> > at com.******.FindSchoolsJob.main(FindSchoolsJob.java:101)
>>>>>> > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>>> > at
>>>>>> >
>>>>>> > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>>>>> > at
>>>>>> >
>>>>>> > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>>>>> > at java.lang.reflect.Method.invoke(Method.java:597)
>>>>>> > at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
>>>>>> > Caused by: java.lang.ClassNotFoundException:com/*****/HAgent
>>>>>> > at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
>>>>>> > at java.security.AccessController.doPrivileged(Native Method)
>>>>>> > at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
>>>>>> > at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
>>>>>> > at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
>>>>>> > ... 8 more
>>>>>> >
>>>>>> > My main class is defined as below :
>>>>>> > public class FindSchoolsJob extends Configured implements Tool {
>>>>>> > :
>>>>>> > public int run(String[] args) throws Exception {
>>>>>> > :
>>>>>> > :
>>>>>> >               }
>>>>>> > :
>>>>>> > public static void main(String[] args) throws Exception {
>>>>>> > int res = ToolRunner.run(new Configuration(), new FindSchoolsJob(),
>>>>>> > args);
>>>>>> > System.exit(res);
>>>>>> > }
>>>>>> > }
>>>>>> > Any hint would be highly appreciated.
>>>>>> > Thank You!
>>>>>> > ~V
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Todd Lipcon
>>>>>> Software Engineer, Cloudera
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Best Regards
>>>>> Anty Rao
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Todd Lipcon
>>>> Software Engineer, Cloudera
>>>>
>>>
>>>
>>>
>>> --
>>> Best Regards
>>> Anty Rao
>>>
>>
>>
>>
>> --
>> Todd Lipcon
>> Software Engineer, Cloudera
>>
>
>
>
> --
> Best Regards
> Anty Rao
>

Re: -libjars?

Posted by Marco Didonna <m....@gmail.com>.
Yes the job doesn't even start, no mapping phase...it fails almost
instantly. I think I tried setting the HADOOP_CLASSPATH variable but
I'll do it again.

Thanks for your help,

Marco

On 15 September 2011 13:44, Joey Echeverria <jo...@cloudera.com> wrote:
> Ok, but does the job even start the maps, or does it fail during initial setup?
>
> The reason I ask is libjars only adds the jar to the classpath for the
> mappers and reducers. If you need the class before the job is
> submitted to the cluster, you should do something like this:
>
> HADOOP_CLASSPATH=../umd-hadoop-core/cloud9.jar hadoop jar myjob.jar
> myjob.driver.PreprocessANC -libjars ../umd-hadoop-core/cloud9.jar
> home/my/pyworkspace/openAnc.xml index/ 10 1
>
> -Joey
>
> On Thu, Sep 15, 2011 at 4:24 AM, Marco Didonna <m....@gmail.com> wrote:
>> Right now I am still in standalone mode ... I'd like to fix this issue
>> before starting a cluster on EC2. :)
>>
>> Thanks for your time
>>
>> Marco
>>
>> On 14 September 2011 14:04, Joey Echeverria <jo...@cloudera.com> wrote:
>>> When are you getting the exception? Is it during the setup of your
>>> job, or after it's running on the cluster?
>>>
>>> -Joey
>>>
>>> On Wed, Sep 14, 2011 at 4:50 AM, Marco Didonna <m....@gmail.com> wrote:
>>>> Hello everyone,
>>>> sorry to bring this up again but I need some clarification. I wrote a
>>>> map-reduce application that need cloud9 library
>>>> (https://github.com/lintool/Cloud9). This library is packet in a jar
>>>> file and I want to make it available to the whole cluster. So far I
>>>> have been working in standalone mode and I have unsuccessfully tried
>>>> to use the -libjars options. I always get ClassNotDefException: the
>>>> only way I made everything work fine is by copying the cloud9.jar into
>>>> hadoop/lib folder.
>>>> I suppose I cannot do it when using a cluster of N machines since I
>>>> would have to copy it on the N machines and this approach isn't
>>>> feasible.
>>>>
>>>> Here's how I perform the job "hadoop jar myjob.jar
>>>> myjob.driver.PreprocessANC -libjars ../umd-hadoop-core/cloud9.jar
>>>> home/my/pyworkspace/openAnc.xml index/ 10 1"
>>>>
>>>> Is there some code that needs to be written in the driver in order to
>>>> have the darn library added to the "global" classpath? This -libjars
>>>> option is really poor documented IMHO.
>>>>
>>>> Any help would be very much appreciated ;)
>>>>
>>>> Marco Didonna
>>>>
>>>> On 17 August 2011 03:57, Anty <an...@gmail.com> wrote:
>>>>> Thanks very much , todd. I get it.
>>>>>
>>>>>
>>>>> On Wed, Aug 17, 2011 at 6:23 AM, Todd Lipcon <to...@cloudera.com> wrote:
>>>>>> Putting files on the classpath doesn't make them accessible to JVM's
>>>>>> resource loader. If you have dir/foo.properties, then "dir" needs to
>>>>>> be on the classpath, not "dir/foo.properties". Since the working dir
>>>>>> of the task is on the classpath, then -files works since it gets the
>>>>>> properties file into a directory on the classpath.
>>>>>>
>>>>>> -Todd
>>>>>>
>>>>>> On Mon, Aug 15, 2011 at 8:09 PM, Anty <an...@gmail.com> wrote:
>>>>>>> thanks very much for you reply, todd.
>>>>>>> I am at a complete loss. I want to ship a configuration file to the
>>>>>>> cluster to run my mapreduce job.
>>>>>>>
>>>>>>> if I use -libjars option to ship the configuration file, the launched
>>>>>>> child JVM created  by task tracker
>>>>>>>  can't find the configuration file,curiously, the configuration file
>>>>>>> is already on the classpath of the child JVM.
>>>>>>>
>>>>>>> if I use -files option to ship the configuration file, the child JVM
>>>>>>> can find the file.
>>>>>>> IMO, what's the difference between -libjars and -files  is that -files
>>>>>>> will create a  symbol sink  to the configuration file
>>>>>>> in current workding directory of child JVM.
>>>>>>>
>>>>>>> I dig into the source code,but it's so complicated, i can't figure out
>>>>>>> the root cause of this.
>>>>>>> So my question is :
>>>>>>> with -libjars option ,the configuration file is already on the
>>>>>>> classpath, why classload can't the configuration file ,
>>>>>>> but why JVM classload CAN find the shipped jar with -libjars option?
>>>>>>>
>>>>>>> any help will be appreciated.
>>>>>>>
>>>>>>> On Tue, Aug 16, 2011 at 1:06 AM, Todd Lipcon <to...@cloudera.com> wrote:
>>>>>>>> Your "driver" is the program that submits the job. The task is the
>>>>>>>> thing that runs on the cluster. They have separate classpaths.
>>>>>>>>
>>>>>>>> Better to ask on the public lists if you want a more indepth explanation
>>>>>>>>
>>>>>>>> -Todd
>>>>>>>>
>>>>>>>> On Mon, Aug 15, 2011 at 9:02 AM, Anty <an...@gmail.com> wrote:
>>>>>>>>> Hi:Todd
>>>>>>>>> Would you please explain a litter more?
>>>>>>>>>
>>>>>>>>> On Sat, Dec 11, 2010 at 2:08 AM, Todd Lipcon <to...@cloudera.com> wrote:
>>>>>>>>>>
>>>>>>>>>> You need to put the library jar on your classpath (eg using
>>>>>>>>>> HADOOP_CLASSPATH) as well. The -libjars will ship it to the cluster
>>>>>>>>>> and put it on the classpath of your task, but not the classpath of
>>>>>>>>>> your "driver" code.
>>>>>>>>>>
>>>>>>>>> I still can't understand you mean by  " but not the classpath of
>>>>>>>>> your "driver" code."
>>>>>>>>>
>>>>>>>>> THX advance.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> -Todd
>>>>>>>>>>
>>>>>>>>>> On Thu, Dec 9, 2010 at 10:29 PM, Vipul Pandey <vi...@gmail.com> wrote:
>>>>>>>>>> > disclaimer : a newbie!!!
>>>>>>>>>> > Howdy?
>>>>>>>>>> > Got a quick question. -libjars option doesn't seem to work for me in -
>>>>>>>>>> > prettymuch - my first (or mayby second) mapreduce job.
>>>>>>>>>> > Here's what i'm doing :
>>>>>>>>>> > $bin/hadoop jar  sherlock.jar somepkg.FindSchoolsJob -libjars
>>>>>>>>>> >  HStats-1A18.jar input output
>>>>>>>>>> >
>>>>>>>>>> > sherlock.jar has my main class (ofcourse)  FindSchoolsJob, which runs
>>>>>>>>>> > just
>>>>>>>>>> > fine by itself till I add a dependency on a class in HStats-1A18.jar.
>>>>>>>>>> > When I run the above command with -libjars specified - it fails to find
>>>>>>>>>> > my
>>>>>>>>>> > classes that 'are' inside HStats jar file.
>>>>>>>>>> > Exception in thread "main" java.lang.NoClassDefFoundError:
>>>>>>>>>> > com/*****/HAgent
>>>>>>>>>> > at com.*****.FindSchoolsJob.run(FindSchoolsJob.java:46)
>>>>>>>>>> > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>>>>>>>>>> > at com.******.FindSchoolsJob.main(FindSchoolsJob.java:101)
>>>>>>>>>> > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>>>>>>> > at
>>>>>>>>>> >
>>>>>>>>>> > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>>>>>>>>> > at
>>>>>>>>>> >
>>>>>>>>>> > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>>>>>>>>> > at java.lang.reflect.Method.invoke(Method.java:597)
>>>>>>>>>> > at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
>>>>>>>>>> > Caused by: java.lang.ClassNotFoundException:com/*****/HAgent
>>>>>>>>>> > at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
>>>>>>>>>> > at java.security.AccessController.doPrivileged(Native Method)
>>>>>>>>>> > at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
>>>>>>>>>> > at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
>>>>>>>>>> > at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
>>>>>>>>>> > ... 8 more
>>>>>>>>>> >
>>>>>>>>>> > My main class is defined as below :
>>>>>>>>>> > public class FindSchoolsJob extends Configured implements Tool {
>>>>>>>>>> > :
>>>>>>>>>> > public int run(String[] args) throws Exception {
>>>>>>>>>> > :
>>>>>>>>>> > :
>>>>>>>>>> >               }
>>>>>>>>>> > :
>>>>>>>>>> > public static void main(String[] args) throws Exception {
>>>>>>>>>> > int res = ToolRunner.run(new Configuration(), new FindSchoolsJob(),
>>>>>>>>>> > args);
>>>>>>>>>> > System.exit(res);
>>>>>>>>>> > }
>>>>>>>>>> > }
>>>>>>>>>> > Any hint would be highly appreciated.
>>>>>>>>>> > Thank You!
>>>>>>>>>> > ~V
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Todd Lipcon
>>>>>>>>>> Software Engineer, Cloudera
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Best Regards
>>>>>>>>> Anty Rao
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Todd Lipcon
>>>>>>>> Software Engineer, Cloudera
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Best Regards
>>>>>>> Anty Rao
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Todd Lipcon
>>>>>> Software Engineer, Cloudera
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Best Regards
>>>>> Anty Rao
>>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> Joseph Echeverria
>>> Cloudera, Inc.
>>> 443.305.9434
>>>
>>
>
>
>
> --
> Joseph Echeverria
> Cloudera, Inc.
> 443.305.9434
>

Re: -libjars?

Posted by Joey Echeverria <jo...@cloudera.com>.
Ok, but does the job even start the maps, or does it fail during initial setup?

The reason I ask is libjars only adds the jar to the classpath for the
mappers and reducers. If you need the class before the job is
submitted to the cluster, you should do something like this:

HADOOP_CLASSPATH=../umd-hadoop-core/cloud9.jar hadoop jar myjob.jar
myjob.driver.PreprocessANC -libjars ../umd-hadoop-core/cloud9.jar
home/my/pyworkspace/openAnc.xml index/ 10 1

-Joey

On Thu, Sep 15, 2011 at 4:24 AM, Marco Didonna <m....@gmail.com> wrote:
> Right now I am still in standalone mode ... I'd like to fix this issue
> before starting a cluster on EC2. :)
>
> Thanks for your time
>
> Marco
>
> On 14 September 2011 14:04, Joey Echeverria <jo...@cloudera.com> wrote:
>> When are you getting the exception? Is it during the setup of your
>> job, or after it's running on the cluster?
>>
>> -Joey
>>
>> On Wed, Sep 14, 2011 at 4:50 AM, Marco Didonna <m....@gmail.com> wrote:
>>> Hello everyone,
>>> sorry to bring this up again but I need some clarification. I wrote a
>>> map-reduce application that need cloud9 library
>>> (https://github.com/lintool/Cloud9). This library is packet in a jar
>>> file and I want to make it available to the whole cluster. So far I
>>> have been working in standalone mode and I have unsuccessfully tried
>>> to use the -libjars options. I always get ClassNotDefException: the
>>> only way I made everything work fine is by copying the cloud9.jar into
>>> hadoop/lib folder.
>>> I suppose I cannot do it when using a cluster of N machines since I
>>> would have to copy it on the N machines and this approach isn't
>>> feasible.
>>>
>>> Here's how I perform the job "hadoop jar myjob.jar
>>> myjob.driver.PreprocessANC -libjars ../umd-hadoop-core/cloud9.jar
>>> home/my/pyworkspace/openAnc.xml index/ 10 1"
>>>
>>> Is there some code that needs to be written in the driver in order to
>>> have the darn library added to the "global" classpath? This -libjars
>>> option is really poor documented IMHO.
>>>
>>> Any help would be very much appreciated ;)
>>>
>>> Marco Didonna
>>>
>>> On 17 August 2011 03:57, Anty <an...@gmail.com> wrote:
>>>> Thanks very much , todd. I get it.
>>>>
>>>>
>>>> On Wed, Aug 17, 2011 at 6:23 AM, Todd Lipcon <to...@cloudera.com> wrote:
>>>>> Putting files on the classpath doesn't make them accessible to JVM's
>>>>> resource loader. If you have dir/foo.properties, then "dir" needs to
>>>>> be on the classpath, not "dir/foo.properties". Since the working dir
>>>>> of the task is on the classpath, then -files works since it gets the
>>>>> properties file into a directory on the classpath.
>>>>>
>>>>> -Todd
>>>>>
>>>>> On Mon, Aug 15, 2011 at 8:09 PM, Anty <an...@gmail.com> wrote:
>>>>>> thanks very much for you reply, todd.
>>>>>> I am at a complete loss. I want to ship a configuration file to the
>>>>>> cluster to run my mapreduce job.
>>>>>>
>>>>>> if I use -libjars option to ship the configuration file, the launched
>>>>>> child JVM created  by task tracker
>>>>>>  can't find the configuration file,curiously, the configuration file
>>>>>> is already on the classpath of the child JVM.
>>>>>>
>>>>>> if I use -files option to ship the configuration file, the child JVM
>>>>>> can find the file.
>>>>>> IMO, what's the difference between -libjars and -files  is that -files
>>>>>> will create a  symbol sink  to the configuration file
>>>>>> in current workding directory of child JVM.
>>>>>>
>>>>>> I dig into the source code,but it's so complicated, i can't figure out
>>>>>> the root cause of this.
>>>>>> So my question is :
>>>>>> with -libjars option ,the configuration file is already on the
>>>>>> classpath, why classload can't the configuration file ,
>>>>>> but why JVM classload CAN find the shipped jar with -libjars option?
>>>>>>
>>>>>> any help will be appreciated.
>>>>>>
>>>>>> On Tue, Aug 16, 2011 at 1:06 AM, Todd Lipcon <to...@cloudera.com> wrote:
>>>>>>> Your "driver" is the program that submits the job. The task is the
>>>>>>> thing that runs on the cluster. They have separate classpaths.
>>>>>>>
>>>>>>> Better to ask on the public lists if you want a more indepth explanation
>>>>>>>
>>>>>>> -Todd
>>>>>>>
>>>>>>> On Mon, Aug 15, 2011 at 9:02 AM, Anty <an...@gmail.com> wrote:
>>>>>>>> Hi:Todd
>>>>>>>> Would you please explain a litter more?
>>>>>>>>
>>>>>>>> On Sat, Dec 11, 2010 at 2:08 AM, Todd Lipcon <to...@cloudera.com> wrote:
>>>>>>>>>
>>>>>>>>> You need to put the library jar on your classpath (eg using
>>>>>>>>> HADOOP_CLASSPATH) as well. The -libjars will ship it to the cluster
>>>>>>>>> and put it on the classpath of your task, but not the classpath of
>>>>>>>>> your "driver" code.
>>>>>>>>>
>>>>>>>> I still can't understand you mean by  " but not the classpath of
>>>>>>>> your "driver" code."
>>>>>>>>
>>>>>>>> THX advance.
>>>>>>>>
>>>>>>>>
>>>>>>>>> -Todd
>>>>>>>>>
>>>>>>>>> On Thu, Dec 9, 2010 at 10:29 PM, Vipul Pandey <vi...@gmail.com> wrote:
>>>>>>>>> > disclaimer : a newbie!!!
>>>>>>>>> > Howdy?
>>>>>>>>> > Got a quick question. -libjars option doesn't seem to work for me in -
>>>>>>>>> > prettymuch - my first (or mayby second) mapreduce job.
>>>>>>>>> > Here's what i'm doing :
>>>>>>>>> > $bin/hadoop jar  sherlock.jar somepkg.FindSchoolsJob -libjars
>>>>>>>>> >  HStats-1A18.jar input output
>>>>>>>>> >
>>>>>>>>> > sherlock.jar has my main class (ofcourse)  FindSchoolsJob, which runs
>>>>>>>>> > just
>>>>>>>>> > fine by itself till I add a dependency on a class in HStats-1A18.jar.
>>>>>>>>> > When I run the above command with -libjars specified - it fails to find
>>>>>>>>> > my
>>>>>>>>> > classes that 'are' inside HStats jar file.
>>>>>>>>> > Exception in thread "main" java.lang.NoClassDefFoundError:
>>>>>>>>> > com/*****/HAgent
>>>>>>>>> > at com.*****.FindSchoolsJob.run(FindSchoolsJob.java:46)
>>>>>>>>> > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>>>>>>>>> > at com.******.FindSchoolsJob.main(FindSchoolsJob.java:101)
>>>>>>>>> > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>>>>>> > at
>>>>>>>>> >
>>>>>>>>> > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>>>>>>>> > at
>>>>>>>>> >
>>>>>>>>> > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>>>>>>>> > at java.lang.reflect.Method.invoke(Method.java:597)
>>>>>>>>> > at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
>>>>>>>>> > Caused by: java.lang.ClassNotFoundException:com/*****/HAgent
>>>>>>>>> > at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
>>>>>>>>> > at java.security.AccessController.doPrivileged(Native Method)
>>>>>>>>> > at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
>>>>>>>>> > at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
>>>>>>>>> > at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
>>>>>>>>> > ... 8 more
>>>>>>>>> >
>>>>>>>>> > My main class is defined as below :
>>>>>>>>> > public class FindSchoolsJob extends Configured implements Tool {
>>>>>>>>> > :
>>>>>>>>> > public int run(String[] args) throws Exception {
>>>>>>>>> > :
>>>>>>>>> > :
>>>>>>>>> >               }
>>>>>>>>> > :
>>>>>>>>> > public static void main(String[] args) throws Exception {
>>>>>>>>> > int res = ToolRunner.run(new Configuration(), new FindSchoolsJob(),
>>>>>>>>> > args);
>>>>>>>>> > System.exit(res);
>>>>>>>>> > }
>>>>>>>>> > }
>>>>>>>>> > Any hint would be highly appreciated.
>>>>>>>>> > Thank You!
>>>>>>>>> > ~V
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Todd Lipcon
>>>>>>>>> Software Engineer, Cloudera
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Best Regards
>>>>>>>> Anty Rao
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Todd Lipcon
>>>>>>> Software Engineer, Cloudera
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Best Regards
>>>>>> Anty Rao
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Todd Lipcon
>>>>> Software Engineer, Cloudera
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Best Regards
>>>> Anty Rao
>>>>
>>>
>>
>>
>>
>> --
>> Joseph Echeverria
>> Cloudera, Inc.
>> 443.305.9434
>>
>



-- 
Joseph Echeverria
Cloudera, Inc.
443.305.9434

Re: -libjars?

Posted by Marco Didonna <m....@gmail.com>.
Right now I am still in standalone mode ... I'd like to fix this issue
before starting a cluster on EC2. :)

Thanks for your time

Marco

On 14 September 2011 14:04, Joey Echeverria <jo...@cloudera.com> wrote:
> When are you getting the exception? Is it during the setup of your
> job, or after it's running on the cluster?
>
> -Joey
>
> On Wed, Sep 14, 2011 at 4:50 AM, Marco Didonna <m....@gmail.com> wrote:
>> Hello everyone,
>> sorry to bring this up again but I need some clarification. I wrote a
>> map-reduce application that need cloud9 library
>> (https://github.com/lintool/Cloud9). This library is packet in a jar
>> file and I want to make it available to the whole cluster. So far I
>> have been working in standalone mode and I have unsuccessfully tried
>> to use the -libjars options. I always get ClassNotDefException: the
>> only way I made everything work fine is by copying the cloud9.jar into
>> hadoop/lib folder.
>> I suppose I cannot do it when using a cluster of N machines since I
>> would have to copy it on the N machines and this approach isn't
>> feasible.
>>
>> Here's how I perform the job "hadoop jar myjob.jar
>> myjob.driver.PreprocessANC -libjars ../umd-hadoop-core/cloud9.jar
>> home/my/pyworkspace/openAnc.xml index/ 10 1"
>>
>> Is there some code that needs to be written in the driver in order to
>> have the darn library added to the "global" classpath? This -libjars
>> option is really poor documented IMHO.
>>
>> Any help would be very much appreciated ;)
>>
>> Marco Didonna
>>
>> On 17 August 2011 03:57, Anty <an...@gmail.com> wrote:
>>> Thanks very much , todd. I get it.
>>>
>>>
>>> On Wed, Aug 17, 2011 at 6:23 AM, Todd Lipcon <to...@cloudera.com> wrote:
>>>> Putting files on the classpath doesn't make them accessible to JVM's
>>>> resource loader. If you have dir/foo.properties, then "dir" needs to
>>>> be on the classpath, not "dir/foo.properties". Since the working dir
>>>> of the task is on the classpath, then -files works since it gets the
>>>> properties file into a directory on the classpath.
>>>>
>>>> -Todd
>>>>
>>>> On Mon, Aug 15, 2011 at 8:09 PM, Anty <an...@gmail.com> wrote:
>>>>> thanks very much for you reply, todd.
>>>>> I am at a complete loss. I want to ship a configuration file to the
>>>>> cluster to run my mapreduce job.
>>>>>
>>>>> if I use -libjars option to ship the configuration file, the launched
>>>>> child JVM created  by task tracker
>>>>>  can't find the configuration file,curiously, the configuration file
>>>>> is already on the classpath of the child JVM.
>>>>>
>>>>> if I use -files option to ship the configuration file, the child JVM
>>>>> can find the file.
>>>>> IMO, what's the difference between -libjars and -files  is that -files
>>>>> will create a  symbol sink  to the configuration file
>>>>> in current workding directory of child JVM.
>>>>>
>>>>> I dig into the source code,but it's so complicated, i can't figure out
>>>>> the root cause of this.
>>>>> So my question is :
>>>>> with -libjars option ,the configuration file is already on the
>>>>> classpath, why classload can't the configuration file ,
>>>>> but why JVM classload CAN find the shipped jar with -libjars option?
>>>>>
>>>>> any help will be appreciated.
>>>>>
>>>>> On Tue, Aug 16, 2011 at 1:06 AM, Todd Lipcon <to...@cloudera.com> wrote:
>>>>>> Your "driver" is the program that submits the job. The task is the
>>>>>> thing that runs on the cluster. They have separate classpaths.
>>>>>>
>>>>>> Better to ask on the public lists if you want a more indepth explanation
>>>>>>
>>>>>> -Todd
>>>>>>
>>>>>> On Mon, Aug 15, 2011 at 9:02 AM, Anty <an...@gmail.com> wrote:
>>>>>>> Hi:Todd
>>>>>>> Would you please explain a litter more?
>>>>>>>
>>>>>>> On Sat, Dec 11, 2010 at 2:08 AM, Todd Lipcon <to...@cloudera.com> wrote:
>>>>>>>>
>>>>>>>> You need to put the library jar on your classpath (eg using
>>>>>>>> HADOOP_CLASSPATH) as well. The -libjars will ship it to the cluster
>>>>>>>> and put it on the classpath of your task, but not the classpath of
>>>>>>>> your "driver" code.
>>>>>>>>
>>>>>>> I still can't understand you mean by  " but not the classpath of
>>>>>>> your "driver" code."
>>>>>>>
>>>>>>> THX advance.
>>>>>>>
>>>>>>>
>>>>>>>> -Todd
>>>>>>>>
>>>>>>>> On Thu, Dec 9, 2010 at 10:29 PM, Vipul Pandey <vi...@gmail.com> wrote:
>>>>>>>> > disclaimer : a newbie!!!
>>>>>>>> > Howdy?
>>>>>>>> > Got a quick question. -libjars option doesn't seem to work for me in -
>>>>>>>> > prettymuch - my first (or mayby second) mapreduce job.
>>>>>>>> > Here's what i'm doing :
>>>>>>>> > $bin/hadoop jar  sherlock.jar somepkg.FindSchoolsJob -libjars
>>>>>>>> >  HStats-1A18.jar input output
>>>>>>>> >
>>>>>>>> > sherlock.jar has my main class (ofcourse)  FindSchoolsJob, which runs
>>>>>>>> > just
>>>>>>>> > fine by itself till I add a dependency on a class in HStats-1A18.jar.
>>>>>>>> > When I run the above command with -libjars specified - it fails to find
>>>>>>>> > my
>>>>>>>> > classes that 'are' inside HStats jar file.
>>>>>>>> > Exception in thread "main" java.lang.NoClassDefFoundError:
>>>>>>>> > com/*****/HAgent
>>>>>>>> > at com.*****.FindSchoolsJob.run(FindSchoolsJob.java:46)
>>>>>>>> > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>>>>>>>> > at com.******.FindSchoolsJob.main(FindSchoolsJob.java:101)
>>>>>>>> > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>>>>> > at
>>>>>>>> >
>>>>>>>> > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>>>>>>> > at
>>>>>>>> >
>>>>>>>> > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>>>>>>> > at java.lang.reflect.Method.invoke(Method.java:597)
>>>>>>>> > at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
>>>>>>>> > Caused by: java.lang.ClassNotFoundException:com/*****/HAgent
>>>>>>>> > at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
>>>>>>>> > at java.security.AccessController.doPrivileged(Native Method)
>>>>>>>> > at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
>>>>>>>> > at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
>>>>>>>> > at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
>>>>>>>> > ... 8 more
>>>>>>>> >
>>>>>>>> > My main class is defined as below :
>>>>>>>> > public class FindSchoolsJob extends Configured implements Tool {
>>>>>>>> > :
>>>>>>>> > public int run(String[] args) throws Exception {
>>>>>>>> > :
>>>>>>>> > :
>>>>>>>> >               }
>>>>>>>> > :
>>>>>>>> > public static void main(String[] args) throws Exception {
>>>>>>>> > int res = ToolRunner.run(new Configuration(), new FindSchoolsJob(),
>>>>>>>> > args);
>>>>>>>> > System.exit(res);
>>>>>>>> > }
>>>>>>>> > }
>>>>>>>> > Any hint would be highly appreciated.
>>>>>>>> > Thank You!
>>>>>>>> > ~V
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Todd Lipcon
>>>>>>>> Software Engineer, Cloudera
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Best Regards
>>>>>>> Anty Rao
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Todd Lipcon
>>>>>> Software Engineer, Cloudera
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Best Regards
>>>>> Anty Rao
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Todd Lipcon
>>>> Software Engineer, Cloudera
>>>>
>>>
>>>
>>>
>>> --
>>> Best Regards
>>> Anty Rao
>>>
>>
>
>
>
> --
> Joseph Echeverria
> Cloudera, Inc.
> 443.305.9434
>

Re: -libjars?

Posted by Joey Echeverria <jo...@cloudera.com>.
When are you getting the exception? Is it during the setup of your
job, or after it's running on the cluster?

-Joey

On Wed, Sep 14, 2011 at 4:50 AM, Marco Didonna <m....@gmail.com> wrote:
> Hello everyone,
> sorry to bring this up again but I need some clarification. I wrote a
> map-reduce application that need cloud9 library
> (https://github.com/lintool/Cloud9). This library is packet in a jar
> file and I want to make it available to the whole cluster. So far I
> have been working in standalone mode and I have unsuccessfully tried
> to use the -libjars options. I always get ClassNotDefException: the
> only way I made everything work fine is by copying the cloud9.jar into
> hadoop/lib folder.
> I suppose I cannot do it when using a cluster of N machines since I
> would have to copy it on the N machines and this approach isn't
> feasible.
>
> Here's how I perform the job "hadoop jar myjob.jar
> myjob.driver.PreprocessANC -libjars ../umd-hadoop-core/cloud9.jar
> home/my/pyworkspace/openAnc.xml index/ 10 1"
>
> Is there some code that needs to be written in the driver in order to
> have the darn library added to the "global" classpath? This -libjars
> option is really poor documented IMHO.
>
> Any help would be very much appreciated ;)
>
> Marco Didonna
>
> On 17 August 2011 03:57, Anty <an...@gmail.com> wrote:
>> Thanks very much , todd. I get it.
>>
>>
>> On Wed, Aug 17, 2011 at 6:23 AM, Todd Lipcon <to...@cloudera.com> wrote:
>>> Putting files on the classpath doesn't make them accessible to JVM's
>>> resource loader. If you have dir/foo.properties, then "dir" needs to
>>> be on the classpath, not "dir/foo.properties". Since the working dir
>>> of the task is on the classpath, then -files works since it gets the
>>> properties file into a directory on the classpath.
>>>
>>> -Todd
>>>
>>> On Mon, Aug 15, 2011 at 8:09 PM, Anty <an...@gmail.com> wrote:
>>>> thanks very much for you reply, todd.
>>>> I am at a complete loss. I want to ship a configuration file to the
>>>> cluster to run my mapreduce job.
>>>>
>>>> if I use -libjars option to ship the configuration file, the launched
>>>> child JVM created  by task tracker
>>>>  can't find the configuration file,curiously, the configuration file
>>>> is already on the classpath of the child JVM.
>>>>
>>>> if I use -files option to ship the configuration file, the child JVM
>>>> can find the file.
>>>> IMO, what's the difference between -libjars and -files  is that -files
>>>> will create a  symbol sink  to the configuration file
>>>> in current workding directory of child JVM.
>>>>
>>>> I dig into the source code,but it's so complicated, i can't figure out
>>>> the root cause of this.
>>>> So my question is :
>>>> with -libjars option ,the configuration file is already on the
>>>> classpath, why classload can't the configuration file ,
>>>> but why JVM classload CAN find the shipped jar with -libjars option?
>>>>
>>>> any help will be appreciated.
>>>>
>>>> On Tue, Aug 16, 2011 at 1:06 AM, Todd Lipcon <to...@cloudera.com> wrote:
>>>>> Your "driver" is the program that submits the job. The task is the
>>>>> thing that runs on the cluster. They have separate classpaths.
>>>>>
>>>>> Better to ask on the public lists if you want a more indepth explanation
>>>>>
>>>>> -Todd
>>>>>
>>>>> On Mon, Aug 15, 2011 at 9:02 AM, Anty <an...@gmail.com> wrote:
>>>>>> Hi:Todd
>>>>>> Would you please explain a litter more?
>>>>>>
>>>>>> On Sat, Dec 11, 2010 at 2:08 AM, Todd Lipcon <to...@cloudera.com> wrote:
>>>>>>>
>>>>>>> You need to put the library jar on your classpath (eg using
>>>>>>> HADOOP_CLASSPATH) as well. The -libjars will ship it to the cluster
>>>>>>> and put it on the classpath of your task, but not the classpath of
>>>>>>> your "driver" code.
>>>>>>>
>>>>>> I still can't understand you mean by  " but not the classpath of
>>>>>> your "driver" code."
>>>>>>
>>>>>> THX advance.
>>>>>>
>>>>>>
>>>>>>> -Todd
>>>>>>>
>>>>>>> On Thu, Dec 9, 2010 at 10:29 PM, Vipul Pandey <vi...@gmail.com> wrote:
>>>>>>> > disclaimer : a newbie!!!
>>>>>>> > Howdy?
>>>>>>> > Got a quick question. -libjars option doesn't seem to work for me in -
>>>>>>> > prettymuch - my first (or mayby second) mapreduce job.
>>>>>>> > Here's what i'm doing :
>>>>>>> > $bin/hadoop jar  sherlock.jar somepkg.FindSchoolsJob -libjars
>>>>>>> >  HStats-1A18.jar input output
>>>>>>> >
>>>>>>> > sherlock.jar has my main class (ofcourse)  FindSchoolsJob, which runs
>>>>>>> > just
>>>>>>> > fine by itself till I add a dependency on a class in HStats-1A18.jar.
>>>>>>> > When I run the above command with -libjars specified - it fails to find
>>>>>>> > my
>>>>>>> > classes that 'are' inside HStats jar file.
>>>>>>> > Exception in thread "main" java.lang.NoClassDefFoundError:
>>>>>>> > com/*****/HAgent
>>>>>>> > at com.*****.FindSchoolsJob.run(FindSchoolsJob.java:46)
>>>>>>> > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>>>>>>> > at com.******.FindSchoolsJob.main(FindSchoolsJob.java:101)
>>>>>>> > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>>>> > at
>>>>>>> >
>>>>>>> > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>>>>>> > at
>>>>>>> >
>>>>>>> > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>>>>>> > at java.lang.reflect.Method.invoke(Method.java:597)
>>>>>>> > at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
>>>>>>> > Caused by: java.lang.ClassNotFoundException:com/*****/HAgent
>>>>>>> > at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
>>>>>>> > at java.security.AccessController.doPrivileged(Native Method)
>>>>>>> > at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
>>>>>>> > at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
>>>>>>> > at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
>>>>>>> > ... 8 more
>>>>>>> >
>>>>>>> > My main class is defined as below :
>>>>>>> > public class FindSchoolsJob extends Configured implements Tool {
>>>>>>> > :
>>>>>>> > public int run(String[] args) throws Exception {
>>>>>>> > :
>>>>>>> > :
>>>>>>> >               }
>>>>>>> > :
>>>>>>> > public static void main(String[] args) throws Exception {
>>>>>>> > int res = ToolRunner.run(new Configuration(), new FindSchoolsJob(),
>>>>>>> > args);
>>>>>>> > System.exit(res);
>>>>>>> > }
>>>>>>> > }
>>>>>>> > Any hint would be highly appreciated.
>>>>>>> > Thank You!
>>>>>>> > ~V
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Todd Lipcon
>>>>>>> Software Engineer, Cloudera
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Best Regards
>>>>>> Anty Rao
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Todd Lipcon
>>>>> Software Engineer, Cloudera
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Best Regards
>>>> Anty Rao
>>>>
>>>
>>>
>>>
>>> --
>>> Todd Lipcon
>>> Software Engineer, Cloudera
>>>
>>
>>
>>
>> --
>> Best Regards
>> Anty Rao
>>
>



-- 
Joseph Echeverria
Cloudera, Inc.
443.305.9434