You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by jerrro <je...@gmail.com> on 2008/01/28 18:57:10 UTC

distributed cache

Hello,

Is there a way to use Distributed Cache with a pipes (C++ code) job? I want
to be able to access a file on the local disk all over the data nodes, so
hadoop would copy it to all data nodes before a map reduce job.

Thanks.
-- 
View this message in context: http://www.nabble.com/distributed-cache-tp15141037p15141037.html
Sent from the Hadoop lucene-users mailing list archive at Nabble.com.

Re: distributed cache

Posted by Amareshwari Sri Ramadasu <am...@yahoo-inc.com>.

jerrro wrote:
> Hello,
>
> Is there a way to use Distributed Cache with a pipes (C++ code) job? I want
> to be able to access a file on the local disk all over the data nodes, so
> hadoop would copy it to all data nodes before a map reduce job.
>
> Thanks.
>   

Hi,

First of all you need to copy the files to the dfs. And then add the 
file to the distributed cache.
You can give comma sepearted values of the files or archives to be added 
to distributed cache,
for "mapred.cache.files" and "mapred.cache.archives" in the conf file.

Ex:

<property>
  <name>mapred.cache.files</name>
  <value>/files/file1,/files/file2.txt</value>
  <description> The files in distributed cache</description>
</property>

<property>
  <name>mapred.cache.archives</name>
  <value>/archives/arc1.zip,/archives/arc2.jar</value>
  <description>The archives in distributed cache</description>
</property>

You can also give URIs of the file names.
You can give the URI as hdfs://<path>#<link>, here mapred will create a 
symlink with name "link" in the  working directory.

Hope this clarifies

Thanks
Amareshwari