You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by "info@christianherta.de" <in...@christianherta.de> on 2012/05/04 13:41:44 UTC
How to create an archive-file in Java to distribute a MapFile via
Distributed Cache
Hello,
I have written a chain of map-reduce jobs which creates a Mapfile. I want
to use the Mapfile in a proximate map-reduce job via distributed cache.
Therefore I have to create an archive file of the folder with holds the
/data and /index files.
In the documentation and in the Book "Hadoop the definite guide" there are
only examples how this is done on the command line. Is this possible in
HDFS via the Hadoop Java Api, too?
P.S.: To distribute the files separately is not a solution. They would go
in different temporary folders.
Thanks in advance
Christian
Re: How to create an archive-file in Java
to distribute a MapFile via Distributed Cache
Posted by Shi Yu <sh...@uchicago.edu>.
My humble experience: I would prefer specifying the files in
command line using -files option, then treat them explicitly in
the Mapper configure or setup function using
File f1 = new File("file1name");
File f2 = new File("file2name");
Cause I am not 100% sure how does distributed cached determine
the order of paths (archives) stored in the array. I once
messed up at this point so from then on I stick on the old
method.
Re: How to create an archive-file in Java to distribute a MapFile via
Distributed Cache
Posted by Harsh J <ha...@cloudera.com>.
Hi,
The Java API offers a DistributedCache class which lets you do this.
The usage is detailed at
http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/filecache/DistributedCache.html
On Fri, May 4, 2012 at 5:11 PM, info@christianherta.de
<in...@christianherta.de> wrote:
> Hello,
> I have written a chain of map-reduce jobs which creates a Mapfile. I want
> to use the Mapfile in a proximate map-reduce job via distributed cache.
> Therefore I have to create an archive file of the folder with holds the
> /data and /index files.
>
> In the documentation and in the Book "Hadoop the definite guide" there are
> only examples how this is done on the command line. Is this possible in
> HDFS via the Hadoop Java Api, too?
>
> P.S.: To distribute the files separately is not a solution. They would go
> in different temporary folders.
>
> Thanks in advance
> Christian
--
Harsh J