You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by Stephen TAK-LON WU <ta...@indiana.edu> on 2010/06/23 14:46:39 UTC

Distributed Cache download the second added archives from HDFS more than once?? mailed-by indiana.edu

Dear all,

I am using Hadoop 0.20.2 with the DistributedCache API.

Currently, I figured out that either I use the following ways to add a
cached archive from the HDFS to local slaves, the second add-in archive will
copy to the local disk every single time when I run the same job:

1. using setCacheArchives
                URI[] cache = {new URI(database), new URI(program)};
DistributedCache.setCacheArchives(cache, jc);

2. using addCacheArchive
DistributedCache.addCacheArchive(new URI(database), jc);
DistributedCache.addCacheArchive(new URI(program), jc);

I did track from the local slaves. The "program" archive, which is a
*.tar.gz file,  will download from the HDFS and unzip every time I submit a
job.

Do you know why I got this issue?? any solution for this problem??

Thank you so much.

Sincerely,
Stephen