You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by Mark Tozzi <ma...@gmail.com> on 2010/05/05 00:52:36 UTC

distributed cache question

Hi all,

I've been tinkering with hadoop for some time, but am new to the
mailing list.  Please forgive me if this has already been asked and
answered.  I am attempting to use the Distributed Cache to allow my
map reduce job to access some lookup files.  I have the following code
to add the files to the distributed cache (showing only a single file
for brevity):

tmpPath = new Path(cl.getOptionValue("lookup_file"));
conf.set("lookupfileName", tmpPath.getName());
DistributedCache.addCacheFile(tmpPath.toUri(),conf);
System.out.println("added " + tmpPath.toUri().toString() + " as " +
tmpPath.getName() );

and the following code in the Mapper.setup method to access these files:

Path[] localFiles = DistributedCache.getLocalCacheFiles(conf);
for (Path file : localFiles) {
        if (file.getName().equals( conf.get("lookupfileName")) ){
                parser.registerResource("bad_uas", new FileReader(new
File( file.toUri())));
        }
        // further checks for other files in cache
}

this is generating the exception "java.lang.IllegalArgumentException:
URI is not absolute" when I attempt to instantiate the File object.
The registerResource method is currently designed to accept an
instance of a reader from which it pulls its information.  That method
is under my control, and I can reconfigure it to take a more
appropriate input if such exists.

I have tried a few variations on this specific method, and all seem to
come back to the "URI is not absolute" error.  What is the piece I am
missing here?

Thanks,

--Mark Tozzi

Re: distributed cache question

Posted by Amareshwari Sri Ramadasu <am...@yahoo-inc.com>.
Hi Mark,

You need to pass complete URL of the file on DFS for DistributedCache.addCacheFile.
Please see http://hadoop.apache.org/common/docs/r0.20.0/mapred_tutorial.html#DistributedCache
And http://hadoop.apache.org/common/docs/r0.20.0/api/org/apache/hadoop/filecache/DistributedCache.html for the usage.

Thanks
Amareshwari

On 5/5/10 4:22 AM, "Mark Tozzi" <ma...@gmail.com> wrote:

Hi all,

I've been tinkering with hadoop for some time, but am new to the
mailing list.  Please forgive me if this has already been asked and
answered.  I am attempting to use the Distributed Cache to allow my
map reduce job to access some lookup files.  I have the following code
to add the files to the distributed cache (showing only a single file
for brevity):

tmpPath = new Path(cl.getOptionValue("lookup_file"));
conf.set("lookupfileName", tmpPath.getName());
DistributedCache.addCacheFile(tmpPath.toUri(),conf);
System.out.println("added " + tmpPath.toUri().toString() + " as " +
tmpPath.getName() );

and the following code in the Mapper.setup method to access these files:

Path[] localFiles = DistributedCache.getLocalCacheFiles(conf);
for (Path file : localFiles) {
        if (file.getName().equals( conf.get("lookupfileName")) ){
                parser.registerResource("bad_uas", new FileReader(new
File( file.toUri())));
        }
        // further checks for other files in cache
}

this is generating the exception "java.lang.IllegalArgumentException:
URI is not absolute" when I attempt to instantiate the File object.
The registerResource method is currently designed to accept an
instance of a reader from which it pulls its information.  That method
is under my control, and I can reconfigure it to take a more
appropriate input if such exists.

I have tried a few variations on this specific method, and all seem to
come back to the "URI is not absolute" error.  What is the piece I am
missing here?

Thanks,

--Mark Tozzi