You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by Stephen TAK-LON WU <ta...@indiana.edu> on 2010/06/23 14:46:39 UTC
Distributed Cache download the second added archives from HDFS more
than once?? mailed-by indiana.edu
Dear all,
I am using Hadoop 0.20.2 with the DistributedCache API.
Currently, I figured out that either I use the following ways to add a
cached archive from the HDFS to local slaves, the second add-in archive will
copy to the local disk every single time when I run the same job:
1. using setCacheArchives
URI[] cache = {new URI(database), new URI(program)};
DistributedCache.setCacheArchives(cache, jc);
2. using addCacheArchive
DistributedCache.addCacheArchive(new URI(database), jc);
DistributedCache.addCacheArchive(new URI(program), jc);
I did track from the local slaves. The "program" archive, which is a
*.tar.gz file, will download from the HDFS and unzip every time I submit a
job.
Do you know why I got this issue?? any solution for this problem??
Thank you so much.
Sincerely,
Stephen