You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Mark <st...@gmail.com> on 2011/04/09 04:10:33 UTC

Shared lib?

If I have some jars that I would like including into ALL of my jobs is 
there some shared lib directory on HDFS that will be available to all of 
my nodes? Something similar to how Oozie uses a shared lib directory 
when submitting workflows.

As of right now I've been cheating and copying these jars into the 
hadoop lib directory on each node. I figured there had to be a better way.

Thanks

RE: Shared lib?

Posted by Ke...@thomsonreuters.com.
It seems like -libjars is for CLASSPATH only. To affect changes to LIBPATH on each node, -archives needs to be used along with a scheme to have each process set it's own LIBPATH, once the -archives are untarred, accordingly.

I think the documentation for -libjars could be amended to specifically say this not intended for LIBPATH despite the "lib" in the name. This is fallout from Java using classpath to hold "lib"raries of jar files. Perhaps a better name inside hadoop would be -classjars?

Also the -archives could be documented more fully by explaining where the archive gets untarred presuming one uses a relative path inside the archive.

Kevin

-----Original Message-----
From: 顾荣 [mailto:gurongwalker@gmail.com] 
Sent: Monday, April 11, 2011 11:12 AM
To: common-user@hadoop.apache.org
Subject: Re: Shared lib?

Hi Mark,
 I also met your problem,I found my way finally.
 Firstly,your basic idea is right,we need to move these jars in to HDFS,because files in HDFS are shared by all the node automatically.
 So,There seem to be two solutions here.

 solution: a)After you export your project as a jar,you add a directory named lib in the root of your project jar manually,after that you
  can add ALL external jars which your project needs in this directory made by you. By this way,when you submit your job to the jobtracker,
  you also submit the lib jars into the HDFS together,and Hadoop will add your small lib into its lib for your project.I tried,it works.This may
  reduce your work,I think.

 solution: b)There seems to be  a Hadoop command with parameter as '-libjars jar1,jar2' ,I read this from *Hadoop.The.Definitive.Guide.*
* *the description of this command is*"*Copies the specified JAR files from
the local filesystem (or any filesystem if a scheme is specified)    to the
shared filesystem used by the jobtracker (usually HDFS), and adds them to the MapReduce task’s classpath. This option is a  useful way of shipping JAR files that *a job is dependent on." *howerver,I still did not figure it out now,It did not works for me.

The above are what I have experienced,I hope it can help you.
 If you still have some problems,please feel free to let me.Good Luck.

Regards.
Walker Gu.



2011/4/9 Mark <st...@gmail.com>

> If I have some jars that I would like including into ALL of my jobs is 
> there some shared lib directory on HDFS that will be available to all 
> of my nodes? Something similar to how Oozie uses a shared lib 
> directory when submitting workflows.
>
> As of right now I've been cheating and copying these jars into the 
> hadoop lib directory on each node. I figured there had to be a better way.
>
> Thanks
>

Re: Shared lib?

Posted by 顾荣 <gu...@gmail.com>.
Hi Mark,
 I also met your problem,I found my way finally.
 Firstly,your basic idea is right,we need to move these jars in to
HDFS,because files in HDFS are shared by all the node automatically.
 So,There seem to be two solutions here.

 solution: a)After you export your project as a jar,you add a directory
named lib in the root of your project jar manually,after that you
  can add ALL external jars which your project needs in this directory made
by you. By this way,when you submit your job to the jobtracker,
  you also submit the lib jars into the HDFS together,and Hadoop will add
your small lib into its lib for your project.I tried,it works.This may
  reduce your work,I think.

 solution: b)There seems to be  a Hadoop command with parameter as '-libjars
jar1,jar2' ,I read this from *Hadoop.The.Definitive.Guide.*
* *the description of this command is*"*Copies the specified JAR files from
the local filesystem (or any filesystem if a scheme is specified)    to the
shared filesystem used by the jobtracker (usually HDFS), and adds them to
the MapReduce task’s classpath. This option is a  useful way of shipping JAR
files that *a job is dependent on." *howerver,I still did not figure it out
now,It did not works for me.

The above are what I have experienced,I hope it can help you.
 If you still have some problems,please feel free to let me.Good Luck.

Regards.
Walker Gu.



2011/4/9 Mark <st...@gmail.com>

> If I have some jars that I would like including into ALL of my jobs is
> there some shared lib directory on HDFS that will be available to all of my
> nodes? Something similar to how Oozie uses a shared lib directory when
> submitting workflows.
>
> As of right now I've been cheating and copying these jars into the hadoop
> lib directory on each node. I figured there had to be a better way.
>
> Thanks
>