You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Victor Hsieh <vi...@gmail.com> on 2010/01/20 05:20:06 UTC

-libjars doesn't work with MR job

Hi,

I was trying to run a mapreduce job with some jars but failed.  It seems
that jars specified in command line -libjars was not shipped to mapreduce
worker together.

After digging into the code, I found that deprecated API and current are
different from -libjars behavior (also -files and -archives).  In deprecated
API, JobClient.runJob() will copy -libjars to DistributedCache (more
precisely, GenericOptionParser parses the -libjars, saving as "tmpjars" in
configuration, then JobClient upload tmpjars).  However, in current API, I
didn't see anything related (by grepping tmpjar or something in
hadoop-0.20.1/src/).

Is there any helper function or something in current API?  Or I need to do
it myself like what JobClient do?

Help appreciated.

Victor

Re: -libjars doesn't work with MR job

Posted by Victor Hsieh <vi...@gmail.com>.
After working on it for a while, I realized that it's a mistake.  It
actually works well.  But there are two points needed make sure, both
CLASSPATH and -libjars.

For CLASSPATH, we need to make sure depended jars should be included
in while launching mapreduce task, while -libjars is also important to
contain all necessary jars in mapper and reducer.

Victor

On Wed, Jan 20, 2010 at 2:35 PM, Victor Hsieh <vi...@gmail.com> wrote:
> Yes, it can be done by doing so, and then restart the cluster (since
> tasktrackers need to know new jars was added).  But I'm maintaining a
> cluster for different users, thus looking for a solution without restart.
> Thanks,
> Victor
>
> On Wed, Jan 20, 2010 at 2:17 PM, Rekha Joshi <re...@yahoo-inc.com> wrote:
>>
>> Not sure what error you get and if it is suggestive, but attimes where you
>> place the libjars option can make a difference.You can try adding the jar to
>> your HADOOP_CLASSPATH and then executing?
>>
>> Cheers,
>> /R
>>
>>
>> On 1/20/10 9:50 AM, "Victor Hsieh" <vi...@gmail.com> wrote:
>>
>> Hi,
>>
>> I was trying to run a mapreduce job with some jars but failed.  It seems
>> that jars specified in command line -libjars was not shipped to mapreduce
>> worker together.
>>
>> After digging into the code, I found that deprecated API and current are
>> different from -libjars behavior (also -files and -archives).  In
>> deprecated
>> API, JobClient.runJob() will copy -libjars to DistributedCache (more
>> precisely, GenericOptionParser parses the -libjars, saving as "tmpjars" in
>> configuration, then JobClient upload tmpjars).  However, in current API, I
>> didn't see anything related (by grepping tmpjar or something in
>> hadoop-0.20.1/src/).
>>
>> Is there any helper function or something in current API?  Or I need to do
>> it myself like what JobClient do?
>>
>> Help appreciated.
>>
>> Victor
>>
>
>

Re: -libjars doesn't work with MR job

Posted by Victor Hsieh <vi...@gmail.com>.
Yes, it can be done by doing so, and then restart the cluster (since
tasktrackers need to know new jars was added).  But I'm maintaining a
cluster for different users, thus looking for a solution without restart.

Thanks,
Victor

On Wed, Jan 20, 2010 at 2:17 PM, Rekha Joshi <re...@yahoo-inc.com> wrote:

> Not sure what error you get and if it is suggestive, but attimes where you
> place the libjars option can make a difference.You can try adding the jar to
> your HADOOP_CLASSPATH and then executing?
>
> Cheers,
> /R
>
>
> On 1/20/10 9:50 AM, "Victor Hsieh" <vi...@gmail.com> wrote:
>
> Hi,
>
> I was trying to run a mapreduce job with some jars but failed.  It seems
> that jars specified in command line -libjars was not shipped to mapreduce
> worker together.
>
> After digging into the code, I found that deprecated API and current are
> different from -libjars behavior (also -files and -archives).  In
> deprecated
> API, JobClient.runJob() will copy -libjars to DistributedCache (more
> precisely, GenericOptionParser parses the -libjars, saving as "tmpjars" in
> configuration, then JobClient upload tmpjars).  However, in current API, I
> didn't see anything related (by grepping tmpjar or something in
> hadoop-0.20.1/src/).
>
> Is there any helper function or something in current API?  Or I need to do
> it myself like what JobClient do?
>
> Help appreciated.
>
> Victor
>
>

Re: -libjars doesn't work with MR job

Posted by Rekha Joshi <re...@yahoo-inc.com>.
Not sure what error you get and if it is suggestive, but attimes where you place the libjars option can make a difference.You can try adding the jar to your HADOOP_CLASSPATH and then executing?

Cheers,
/R


On 1/20/10 9:50 AM, "Victor Hsieh" <vi...@gmail.com> wrote:

Hi,

I was trying to run a mapreduce job with some jars but failed.  It seems
that jars specified in command line -libjars was not shipped to mapreduce
worker together.

After digging into the code, I found that deprecated API and current are
different from -libjars behavior (also -files and -archives).  In deprecated
API, JobClient.runJob() will copy -libjars to DistributedCache (more
precisely, GenericOptionParser parses the -libjars, saving as "tmpjars" in
configuration, then JobClient upload tmpjars).  However, in current API, I
didn't see anything related (by grepping tmpjar or something in
hadoop-0.20.1/src/).

Is there any helper function or something in current API?  Or I need to do
it myself like what JobClient do?

Help appreciated.

Victor