You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by Samir Eljazovic <sa...@gmail.com> on 2012/01/03 01:08:47 UTC

Is it possible to user hadoop archive to specify third party libs

Hi,
I need to provide a lot of 3th party libraries (both java and native) and
doing that using generic option parser (-libjars and -files arguments) is a
little bit messy. I was wandering if it is possible to wrap all libraries
into single har archive and use that when submitting the job?

Just to mention that I want to avoid putting all libraries into job jar for
two reasons:
1. does not work for  native libs
2. takes time to upload jar

Thanks,
Samir

Re: Is it possible to user hadoop archive to specify third party libs

Posted by Eyal Golan <eg...@gmail.com>.
yes :)

thanks.


Eyal Golan
egolan74@gmail.com

Visit: http://jvdrums.sourceforge.net/
LinkedIn: http://www.linkedin.com/in/egolan74
Skype: egolan74

P  Save a tree. Please don't print this e-mail unless it's really necessary



On Sat, Jan 7, 2012 at 2:58 PM, Bejoy Ks <be...@gmail.com> wrote:

> Eyal
>       Hope you are looking for this one
>
> http://www.cloudera.com/blog/2011/01/how-to-include-third-party-libraries-in-your-map-reduce-job/
>
> Regards
> Bejoy.K.S
>
>
>
> On Sat, Jan 7, 2012 at 12:25 PM, Eyal Golan <eg...@gmail.com> wrote:
>
>> hi,
>> can you please point out link to Cloudera's article?
>>
>> thanks,
>>
>> Eyal
>>
>>
>> Eyal Golan
>> egolan74@gmail.com
>>
>> Visit: http://jvdrums.sourceforge.net/
>> LinkedIn: http://www.linkedin.com/in/egolan74
>> Skype: egolan74
>>
>> P  Save a tree. Please don't print this e-mail unless it's really
>> necessary
>>
>>
>>
>> On Tue, Jan 3, 2012 at 5:28 PM, Samir Eljazovic <
>> samir.eljazovic@gmail.com> wrote:
>>
>>> Hi,
>>> yes, I'm trying to get option 1 from Cloudera's article (using
>>> distributed cache) work. If I specify all libraries when running the job it
>>> works, but I'm trying to make it work using only one archive file
>>> containing all native libraries I need. And that seems to be a problem.
>>>
>>> when I use tar file libraries are extracted but they are not added to
>>> classpath.
>>>
>>> Here's TT log:
>>>
>>> 2012-01-03 15:04:43,611 INFO
>>> org.apache.hadoop.filecache.TrackerDistributedCacheManager (Thread-447):
>>> Creating openCV.tar in
>>> /mnt/var/lib/hadoop/mapred/taskTracker/hadoop/distcache/8087259939901130551_1003999143_605667452/
>>> 10.190.207.247/mnt/var/lib/hadoop/tmp/mapred/staging/hadoop/.staging/job_201201031358_0008/archives/openCV.tar-work--7133799918421346652with rwxr-xr-x
>>> 2012-01-03 15:04:44,209 INFO
>>> org.apache.hadoop.filecache.TrackerDistributedCacheManager (Thread-447):
>>> Extracting
>>> /mnt/var/lib/hadoop/mapred/taskTracker/hadoop/distcache/8087259939901130551_1003999143_605667452/
>>> 10.190.207.247/mnt/var/lib/hadoop/tmp/mapred/staging/hadoop/.staging/job_201201031358_0008/archives/openCV.tar-work--7133799918421346652/openCV.tarto
>>> /mnt/var/lib/hadoop/mapred/taskTracker/hadoop/distcache/8087259939901130551_1003999143_605667452/
>>> 10.190.207.247/mnt/var/lib/hadoop/tmp/mapred/staging/hadoop/.staging/job_201201031358_0008/archives/openCV.tar-work--7133799918421346652
>>> 2012-01-03 15:04:44,363 INFO
>>> org.apache.hadoop.filecache.TrackerDistributedCacheManager (Thread-447):
>>> Cached hdfs://
>>> 10.190.207.247:9000/mnt/var/lib/hadoop/tmp/mapred/staging/hadoop/.staging/job_201201031358_0008/archives/openCV.tar#openCV.taras
>>> /mnt/var/lib/hadoop/mapred/taskTracker/hadoop/distcache/8087259939901130551_1003999143_605667452/
>>> 10.190.207.247/mnt/var/lib/hadoop/tmp/mapred/staging/hadoop/.staging/job_201201031358_0008/archives/openCV.tar
>>>
>>> What should I do to get these libs available to my job?
>>>
>>> Thanks
>>>
>>>
>>> On 3 January 2012 15:57, Praveen Sripati <pr...@gmail.com>wrote:
>>>
>>>> Check this article from Cloudera for different options.
>>>>
>>>>
>>>> http://www.cloudera.com/blog/2011/01/how-to-include-third-party-libraries-in-your-map-reduce-job/
>>>>
>>>> Praveen
>>>>
>>>> On Tue, Jan 3, 2012 at 7:41 AM, Harsh J <ha...@cloudera.com> wrote:
>>>>
>>>>> Samir,
>>>>>
>>>>> I believe HARs won't work there. But you can use a regular tar
>>>>> instead, and that should be unpacked properly.
>>>>>
>>>>> On 03-Jan-2012, at 5:38 AM, Samir Eljazovic wrote:
>>>>>
>>>>> > Hi,
>>>>> > I need to provide a lot of 3th party libraries (both java and
>>>>> native) and doing that using generic option parser (-libjars and -files
>>>>> arguments) is a little bit messy. I was wandering if it is possible to wrap
>>>>> all libraries into single har archive and use that when submitting the job?
>>>>> >
>>>>> > Just to mention that I want to avoid putting all libraries into job
>>>>> jar for two reasons:
>>>>> > 1. does not work for  native libs
>>>>> > 2. takes time to upload jar
>>>>> >
>>>>> > Thanks,
>>>>> > Samir
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Is it possible to user hadoop archive to specify third party libs

Posted by Bejoy Ks <be...@gmail.com>.
Eyal
      Hope you are looking for this one
http://www.cloudera.com/blog/2011/01/how-to-include-third-party-libraries-in-your-map-reduce-job/

Regards
Bejoy.K.S


On Sat, Jan 7, 2012 at 12:25 PM, Eyal Golan <eg...@gmail.com> wrote:

> hi,
> can you please point out link to Cloudera's article?
>
> thanks,
>
> Eyal
>
>
> Eyal Golan
> egolan74@gmail.com
>
> Visit: http://jvdrums.sourceforge.net/
> LinkedIn: http://www.linkedin.com/in/egolan74
> Skype: egolan74
>
> P  Save a tree. Please don't print this e-mail unless it's really necessary
>
>
>
> On Tue, Jan 3, 2012 at 5:28 PM, Samir Eljazovic <samir.eljazovic@gmail.com
> > wrote:
>
>> Hi,
>> yes, I'm trying to get option 1 from Cloudera's article (using
>> distributed cache) work. If I specify all libraries when running the job it
>> works, but I'm trying to make it work using only one archive file
>> containing all native libraries I need. And that seems to be a problem.
>>
>> when I use tar file libraries are extracted but they are not added to
>> classpath.
>>
>> Here's TT log:
>>
>> 2012-01-03 15:04:43,611 INFO
>> org.apache.hadoop.filecache.TrackerDistributedCacheManager (Thread-447):
>> Creating openCV.tar in
>> /mnt/var/lib/hadoop/mapred/taskTracker/hadoop/distcache/8087259939901130551_1003999143_605667452/
>> 10.190.207.247/mnt/var/lib/hadoop/tmp/mapred/staging/hadoop/.staging/job_201201031358_0008/archives/openCV.tar-work--7133799918421346652with rwxr-xr-x
>> 2012-01-03 15:04:44,209 INFO
>> org.apache.hadoop.filecache.TrackerDistributedCacheManager (Thread-447):
>> Extracting
>> /mnt/var/lib/hadoop/mapred/taskTracker/hadoop/distcache/8087259939901130551_1003999143_605667452/
>> 10.190.207.247/mnt/var/lib/hadoop/tmp/mapred/staging/hadoop/.staging/job_201201031358_0008/archives/openCV.tar-work--7133799918421346652/openCV.tarto
>> /mnt/var/lib/hadoop/mapred/taskTracker/hadoop/distcache/8087259939901130551_1003999143_605667452/
>> 10.190.207.247/mnt/var/lib/hadoop/tmp/mapred/staging/hadoop/.staging/job_201201031358_0008/archives/openCV.tar-work--7133799918421346652
>> 2012-01-03 15:04:44,363 INFO
>> org.apache.hadoop.filecache.TrackerDistributedCacheManager (Thread-447):
>> Cached hdfs://
>> 10.190.207.247:9000/mnt/var/lib/hadoop/tmp/mapred/staging/hadoop/.staging/job_201201031358_0008/archives/openCV.tar#openCV.taras
>> /mnt/var/lib/hadoop/mapred/taskTracker/hadoop/distcache/8087259939901130551_1003999143_605667452/
>> 10.190.207.247/mnt/var/lib/hadoop/tmp/mapred/staging/hadoop/.staging/job_201201031358_0008/archives/openCV.tar
>>
>> What should I do to get these libs available to my job?
>>
>> Thanks
>>
>>
>> On 3 January 2012 15:57, Praveen Sripati <pr...@gmail.com>wrote:
>>
>>> Check this article from Cloudera for different options.
>>>
>>>
>>> http://www.cloudera.com/blog/2011/01/how-to-include-third-party-libraries-in-your-map-reduce-job/
>>>
>>> Praveen
>>>
>>> On Tue, Jan 3, 2012 at 7:41 AM, Harsh J <ha...@cloudera.com> wrote:
>>>
>>>> Samir,
>>>>
>>>> I believe HARs won't work there. But you can use a regular tar instead,
>>>> and that should be unpacked properly.
>>>>
>>>> On 03-Jan-2012, at 5:38 AM, Samir Eljazovic wrote:
>>>>
>>>> > Hi,
>>>> > I need to provide a lot of 3th party libraries (both java and native)
>>>> and doing that using generic option parser (-libjars and -files arguments)
>>>> is a little bit messy. I was wandering if it is possible to wrap all
>>>> libraries into single har archive and use that when submitting the job?
>>>> >
>>>> > Just to mention that I want to avoid putting all libraries into job
>>>> jar for two reasons:
>>>> > 1. does not work for  native libs
>>>> > 2. takes time to upload jar
>>>> >
>>>> > Thanks,
>>>> > Samir
>>>>
>>>>
>>>
>>
>

Re: Is it possible to user hadoop archive to specify third party libs

Posted by Eyal Golan <eg...@gmail.com>.
hi,
can you please point out link to Cloudera's article?

thanks,

Eyal


Eyal Golan
egolan74@gmail.com

Visit: http://jvdrums.sourceforge.net/
LinkedIn: http://www.linkedin.com/in/egolan74
Skype: egolan74

P  Save a tree. Please don't print this e-mail unless it's really necessary



On Tue, Jan 3, 2012 at 5:28 PM, Samir Eljazovic
<sa...@gmail.com>wrote:

> Hi,
> yes, I'm trying to get option 1 from Cloudera's article (using distributed
> cache) work. If I specify all libraries when running the job it works, but
> I'm trying to make it work using only one archive file containing all
> native libraries I need. And that seems to be a problem.
>
> when I use tar file libraries are extracted but they are not added to
> classpath.
>
> Here's TT log:
>
> 2012-01-03 15:04:43,611 INFO
> org.apache.hadoop.filecache.TrackerDistributedCacheManager (Thread-447):
> Creating openCV.tar in
> /mnt/var/lib/hadoop/mapred/taskTracker/hadoop/distcache/8087259939901130551_1003999143_605667452/
> 10.190.207.247/mnt/var/lib/hadoop/tmp/mapred/staging/hadoop/.staging/job_201201031358_0008/archives/openCV.tar-work--7133799918421346652with rwxr-xr-x
> 2012-01-03 15:04:44,209 INFO
> org.apache.hadoop.filecache.TrackerDistributedCacheManager (Thread-447):
> Extracting
> /mnt/var/lib/hadoop/mapred/taskTracker/hadoop/distcache/8087259939901130551_1003999143_605667452/
> 10.190.207.247/mnt/var/lib/hadoop/tmp/mapred/staging/hadoop/.staging/job_201201031358_0008/archives/openCV.tar-work--7133799918421346652/openCV.tarto
> /mnt/var/lib/hadoop/mapred/taskTracker/hadoop/distcache/8087259939901130551_1003999143_605667452/
> 10.190.207.247/mnt/var/lib/hadoop/tmp/mapred/staging/hadoop/.staging/job_201201031358_0008/archives/openCV.tar-work--7133799918421346652
> 2012-01-03 15:04:44,363 INFO
> org.apache.hadoop.filecache.TrackerDistributedCacheManager (Thread-447):
> Cached hdfs://
> 10.190.207.247:9000/mnt/var/lib/hadoop/tmp/mapred/staging/hadoop/.staging/job_201201031358_0008/archives/openCV.tar#openCV.taras
> /mnt/var/lib/hadoop/mapred/taskTracker/hadoop/distcache/8087259939901130551_1003999143_605667452/
> 10.190.207.247/mnt/var/lib/hadoop/tmp/mapred/staging/hadoop/.staging/job_201201031358_0008/archives/openCV.tar
>
> What should I do to get these libs available to my job?
>
> Thanks
>
>
> On 3 January 2012 15:57, Praveen Sripati <pr...@gmail.com> wrote:
>
>> Check this article from Cloudera for different options.
>>
>>
>> http://www.cloudera.com/blog/2011/01/how-to-include-third-party-libraries-in-your-map-reduce-job/
>>
>> Praveen
>>
>> On Tue, Jan 3, 2012 at 7:41 AM, Harsh J <ha...@cloudera.com> wrote:
>>
>>> Samir,
>>>
>>> I believe HARs won't work there. But you can use a regular tar instead,
>>> and that should be unpacked properly.
>>>
>>> On 03-Jan-2012, at 5:38 AM, Samir Eljazovic wrote:
>>>
>>> > Hi,
>>> > I need to provide a lot of 3th party libraries (both java and native)
>>> and doing that using generic option parser (-libjars and -files arguments)
>>> is a little bit messy. I was wandering if it is possible to wrap all
>>> libraries into single har archive and use that when submitting the job?
>>> >
>>> > Just to mention that I want to avoid putting all libraries into job
>>> jar for two reasons:
>>> > 1. does not work for  native libs
>>> > 2. takes time to upload jar
>>> >
>>> > Thanks,
>>> > Samir
>>>
>>>
>>
>

Re: Is it possible to user hadoop archive to specify third party libs

Posted by Samir Eljazovic <sa...@gmail.com>.
Does anyone know if this what I'm trying to do is possible or not?

Thanks

On 3 January 2012 16:28, Samir Eljazovic <sa...@gmail.com> wrote:

> Hi,
> yes, I'm trying to get option 1 from Cloudera's article (using distributed
> cache) work. If I specify all libraries when running the job it works, but
> I'm trying to make it work using only one archive file containing all
> native libraries I need. And that seems to be a problem.
>
> when I use tar file libraries are extracted but they are not added to
> classpath.
>
> Here's TT log:
>
> 2012-01-03 15:04:43,611 INFO
> org.apache.hadoop.filecache.TrackerDistributedCacheManager (Thread-447):
> Creating openCV.tar in
> /mnt/var/lib/hadoop/mapred/taskTracker/hadoop/distcache/8087259939901130551_1003999143_605667452/
> 10.190.207.247/mnt/var/lib/hadoop/tmp/mapred/staging/hadoop/.staging/job_201201031358_0008/archives/openCV.tar-work--7133799918421346652with rwxr-xr-x
> 2012-01-03 15:04:44,209 INFO
> org.apache.hadoop.filecache.TrackerDistributedCacheManager (Thread-447):
> Extracting
> /mnt/var/lib/hadoop/mapred/taskTracker/hadoop/distcache/8087259939901130551_1003999143_605667452/
> 10.190.207.247/mnt/var/lib/hadoop/tmp/mapred/staging/hadoop/.staging/job_201201031358_0008/archives/openCV.tar-work--7133799918421346652/openCV.tarto
> /mnt/var/lib/hadoop/mapred/taskTracker/hadoop/distcache/8087259939901130551_1003999143_605667452/
> 10.190.207.247/mnt/var/lib/hadoop/tmp/mapred/staging/hadoop/.staging/job_201201031358_0008/archives/openCV.tar-work--7133799918421346652
> 2012-01-03 15:04:44,363 INFO
> org.apache.hadoop.filecache.TrackerDistributedCacheManager (Thread-447):
> Cached hdfs://
> 10.190.207.247:9000/mnt/var/lib/hadoop/tmp/mapred/staging/hadoop/.staging/job_201201031358_0008/archives/openCV.tar#openCV.taras
> /mnt/var/lib/hadoop/mapred/taskTracker/hadoop/distcache/8087259939901130551_1003999143_605667452/
> 10.190.207.247/mnt/var/lib/hadoop/tmp/mapred/staging/hadoop/.staging/job_201201031358_0008/archives/openCV.tar
>
> What should I do to get these libs available to my job?
>
> Thanks
>
>
> On 3 January 2012 15:57, Praveen Sripati <pr...@gmail.com> wrote:
>
>> Check this article from Cloudera for different options.
>>
>>
>> http://www.cloudera.com/blog/2011/01/how-to-include-third-party-libraries-in-your-map-reduce-job/
>>
>> Praveen
>>
>> On Tue, Jan 3, 2012 at 7:41 AM, Harsh J <ha...@cloudera.com> wrote:
>>
>>> Samir,
>>>
>>> I believe HARs won't work there. But you can use a regular tar instead,
>>> and that should be unpacked properly.
>>>
>>> On 03-Jan-2012, at 5:38 AM, Samir Eljazovic wrote:
>>>
>>> > Hi,
>>> > I need to provide a lot of 3th party libraries (both java and native)
>>> and doing that using generic option parser (-libjars and -files arguments)
>>> is a little bit messy. I was wandering if it is possible to wrap all
>>> libraries into single har archive and use that when submitting the job?
>>> >
>>> > Just to mention that I want to avoid putting all libraries into job
>>> jar for two reasons:
>>> > 1. does not work for  native libs
>>> > 2. takes time to upload jar
>>> >
>>> > Thanks,
>>> > Samir
>>>
>>>
>>
>

Re: Is it possible to user hadoop archive to specify third party libs

Posted by Samir Eljazovic <sa...@gmail.com>.
Hi,
yes, I'm trying to get option 1 from Cloudera's article (using distributed
cache) work. If I specify all libraries when running the job it works, but
I'm trying to make it work using only one archive file containing all
native libraries I need. And that seems to be a problem.

when I use tar file libraries are extracted but they are not added to
classpath.

Here's TT log:

2012-01-03 15:04:43,611 INFO
org.apache.hadoop.filecache.TrackerDistributedCacheManager (Thread-447):
Creating openCV.tar in
/mnt/var/lib/hadoop/mapred/taskTracker/hadoop/distcache/8087259939901130551_1003999143_605667452/
10.190.207.247/mnt/var/lib/hadoop/tmp/mapred/staging/hadoop/.staging/job_201201031358_0008/archives/openCV.tar-work--7133799918421346652with
rwxr-xr-x
2012-01-03 15:04:44,209 INFO
org.apache.hadoop.filecache.TrackerDistributedCacheManager (Thread-447):
Extracting
/mnt/var/lib/hadoop/mapred/taskTracker/hadoop/distcache/8087259939901130551_1003999143_605667452/
10.190.207.247/mnt/var/lib/hadoop/tmp/mapred/staging/hadoop/.staging/job_201201031358_0008/archives/openCV.tar-work--7133799918421346652/openCV.tarto
/mnt/var/lib/hadoop/mapred/taskTracker/hadoop/distcache/8087259939901130551_1003999143_605667452/
10.190.207.247/mnt/var/lib/hadoop/tmp/mapred/staging/hadoop/.staging/job_201201031358_0008/archives/openCV.tar-work--7133799918421346652
2012-01-03 15:04:44,363 INFO
org.apache.hadoop.filecache.TrackerDistributedCacheManager (Thread-447):
Cached hdfs://
10.190.207.247:9000/mnt/var/lib/hadoop/tmp/mapred/staging/hadoop/.staging/job_201201031358_0008/archives/openCV.tar#openCV.taras
/mnt/var/lib/hadoop/mapred/taskTracker/hadoop/distcache/8087259939901130551_1003999143_605667452/
10.190.207.247/mnt/var/lib/hadoop/tmp/mapred/staging/hadoop/.staging/job_201201031358_0008/archives/openCV.tar

What should I do to get these libs available to my job?

Thanks

On 3 January 2012 15:57, Praveen Sripati <pr...@gmail.com> wrote:

> Check this article from Cloudera for different options.
>
>
> http://www.cloudera.com/blog/2011/01/how-to-include-third-party-libraries-in-your-map-reduce-job/
>
> Praveen
>
> On Tue, Jan 3, 2012 at 7:41 AM, Harsh J <ha...@cloudera.com> wrote:
>
>> Samir,
>>
>> I believe HARs won't work there. But you can use a regular tar instead,
>> and that should be unpacked properly.
>>
>> On 03-Jan-2012, at 5:38 AM, Samir Eljazovic wrote:
>>
>> > Hi,
>> > I need to provide a lot of 3th party libraries (both java and native)
>> and doing that using generic option parser (-libjars and -files arguments)
>> is a little bit messy. I was wandering if it is possible to wrap all
>> libraries into single har archive and use that when submitting the job?
>> >
>> > Just to mention that I want to avoid putting all libraries into job jar
>> for two reasons:
>> > 1. does not work for  native libs
>> > 2. takes time to upload jar
>> >
>> > Thanks,
>> > Samir
>>
>>
>

Re: Is it possible to user hadoop archive to specify third party libs

Posted by Praveen Sripati <pr...@gmail.com>.
Check this article from Cloudera for different options.

http://www.cloudera.com/blog/2011/01/how-to-include-third-party-libraries-in-your-map-reduce-job/

Praveen

On Tue, Jan 3, 2012 at 7:41 AM, Harsh J <ha...@cloudera.com> wrote:

> Samir,
>
> I believe HARs won't work there. But you can use a regular tar instead,
> and that should be unpacked properly.
>
> On 03-Jan-2012, at 5:38 AM, Samir Eljazovic wrote:
>
> > Hi,
> > I need to provide a lot of 3th party libraries (both java and native)
> and doing that using generic option parser (-libjars and -files arguments)
> is a little bit messy. I was wandering if it is possible to wrap all
> libraries into single har archive and use that when submitting the job?
> >
> > Just to mention that I want to avoid putting all libraries into job jar
> for two reasons:
> > 1. does not work for  native libs
> > 2. takes time to upload jar
> >
> > Thanks,
> > Samir
>
>

Re: Is it possible to user hadoop archive to specify third party libs

Posted by Harsh J <ha...@cloudera.com>.
Samir,

I believe HARs won't work there. But you can use a regular tar instead, and that should be unpacked properly.

On 03-Jan-2012, at 5:38 AM, Samir Eljazovic wrote:

> Hi,
> I need to provide a lot of 3th party libraries (both java and native) and doing that using generic option parser (-libjars and -files arguments) is a little bit messy. I was wandering if it is possible to wrap all libraries into single har archive and use that when submitting the job?
> 
> Just to mention that I want to avoid putting all libraries into job jar for two reasons:
> 1. does not work for  native libs
> 2. takes time to upload jar
> 
> Thanks,
> Samir 


Re: Is it possible to user hadoop archive to specify third party libs

Posted by Samir Eljazovic <sa...@gmail.com>.
And this is the log from TaskTracker when I try to use har file with
-archives parameter:

2012-01-02 23:37:25,873 INFO
org.apache.hadoop.filecache.TrackerDistributedCacheManager
(Thread-1824): Creating openCV.har in
/mnt/var/lib/hadoop/mapred/taskTracker/hadoop/distcache/-5719433807905296966_537383540_550032302/10.12.115.223/mnt/var/lib/hadoop/tmp/mapred/staging/hadoop/.staging/job_201201021856_0031/libjars/openCV.har-work-2271662679519761974
with rwxr-xr-x
2012-01-02 23:37:30,395 INFO
org.apache.hadoop.filecache.TrackerDistributedCacheManager
(Thread-1824): Extracting
/mnt/var/lib/hadoop/mapred/taskTracker/hadoop/distcache/-5719433807905296966_537383540_550032302/10.12.115.223/mnt/var/lib/hadoop/tmp/mapred/staging/hadoop/.staging/job_201201021856_0031/libjars/openCV.har-work-2271662679519761974/openCV.har
to /mnt/var/lib/hadoop/mapred/taskTracker/hadoop/distcache/-5719433807905296966_537383540_550032302/10.12.115.223/mnt/var/lib/hadoop/tmp/mapred/staging/hadoop/.staging/job_201201021856_0031/libjars/openCV.har-work-2271662679519761974
2012-01-02 23:37:30,395 WARN
org.apache.hadoop.filecache.TrackerDistributedCacheManager
(Thread-1824): Cache file
/mnt/var/lib/hadoop/mapred/taskTracker/hadoop/distcache/-5719433807905296966_537383540_550032302/10.12.115.223/mnt/var/lib/hadoop/tmp/mapred/staging/hadoop/.staging/job_201201021856_0031/libjars/openCV.har-work-2271662679519761974/openCV.har
specified as archive, but not valid extension.
2012-01-02 23:37:30,468 INFO
org.apache.hadoop.filecache.TrackerDistributedCacheManager
(Thread-1824): Cached
hdfs://10.12.115.223:9000/mnt/var/lib/hadoop/tmp/mapred/staging/hadoop/.staging/job_201201021856_0031/libjars/openCV.har
as /mnt/var/lib/hadoop/mapred/taskTracker/hadoop/distcache/-5719433807905296966_537383540_550032302/10.12.115.223/mnt/var/lib/hadoop/tmp/mapred/staging/hadoop/.staging/job_201201021856_0031/libjars/openCV.har



On 3 January 2012 01:11, Samir Eljazovic <sa...@gmail.com> wrote:

> Small update. Sorry for re-send.
>
> On 3 January 2012 01:08, Samir Eljazovic <sa...@gmail.com>wrote:
>
>> Hi,
>> I need to provide a lot of 3th party libraries (both java and native) to
>> my MR job and doing that using generic option parser (-libjars and -files
>> arguments) is a little bit messy. I was wandering if it is possible to wrap
>> all libraries into single har archive and use that when submitting the job?
>>
>> Just to mention that I want to avoid putting all libraries into job jar
>> for two reasons:
>> 1. does not work for  native libs
>> 2. takes time to upload jar
>>
>> Thanks,
>> Samir
>>
>
>

Re: Is it possible to user hadoop archive to specify third party libs

Posted by Samir Eljazovic <sa...@gmail.com>.
Small update. Sorry for re-send.

On 3 January 2012 01:08, Samir Eljazovic <sa...@gmail.com> wrote:

> Hi,
> I need to provide a lot of 3th party libraries (both java and native) to
> my MR job and doing that using generic option parser (-libjars and -files
> arguments) is a little bit messy. I was wandering if it is possible to wrap
> all libraries into single har archive and use that when submitting the job?
>
> Just to mention that I want to avoid putting all libraries into job jar
> for two reasons:
> 1. does not work for  native libs
> 2. takes time to upload jar
>
> Thanks,
> Samir
>