You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by "info@christianherta.de" <in...@christianherta.de> on 2012/05/04 13:41:44 UTC

How to create an archive-file in Java to distribute a MapFile via Distributed Cache

Hello,
I have written a chain of map-reduce jobs which creates a Mapfile. I want
to use the Mapfile in a proximate map-reduce job via distributed cache.
Therefore I have to create an archive file of the folder with holds the
/data and /index files.

In the documentation and in the Book "Hadoop the definite guide" there are
only examples how this is done on the command line. Is this possible in
HDFS via the Hadoop Java Api, too?

P.S.: To distribute the files separately is not a solution. They would go
in different temporary folders.

Thanks in advance
Christian

Re: How to create an archive-file in Java to distribute a MapFile via Distributed Cache

Posted by Shi Yu <sh...@uchicago.edu>.
My humble experience:  I would prefer specifying the files in 
command line using -files option, then treat them explicitly in 
the Mapper configure or setup function using 

File f1 = new File("file1name");
File f2 = new File("file2name");

Cause I am not 100% sure how does distributed cached determine 
the order of paths (archives) stored in the array.  I once 
messed up at this point so from then on I stick on the old 
method. 

Re: How to create an archive-file in Java to distribute a MapFile via Distributed Cache

Posted by Harsh J <ha...@cloudera.com>.
Hi,

The Java API offers a DistributedCache class which lets you do this.
The usage is detailed at
http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/filecache/DistributedCache.html

On Fri, May 4, 2012 at 5:11 PM, info@christianherta.de
<in...@christianherta.de> wrote:
> Hello,
> I have written a chain of map-reduce jobs which creates a Mapfile. I want
> to use the Mapfile in a proximate map-reduce job via distributed cache.
> Therefore I have to create an archive file of the folder with holds the
> /data and /index files.
>
> In the documentation and in the Book "Hadoop the definite guide" there are
> only examples how this is done on the command line. Is this possible in
> HDFS via the Hadoop Java Api, too?
>
> P.S.: To distribute the files separately is not a solution. They would go
> in different temporary folders.
>
> Thanks in advance
> Christian



-- 
Harsh J