You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Sean Shanny <ss...@tripadvisor.com> on 2008/12/26 23:20:30 UTC

Does anyone have a working example for using MapFiles on the DistributedCache?

To all,

Version:  hadoop-0.17.2.1-core.jar

I have created a MapFile.

What I don't seem to be able to do is correctly place the MapFile in  
the DistributedCache and the make use of it in a map method.

I need the following info please:

1.	How and where to place the MapFile directory so that it is visible  
to the hadoop job.
2.	How to add the files to the DistributedCache.
3.	How to create a MapFile.Reader from files in the DistributedCache.

I can get this to work with a local file on a single node system  
outside of the DistributedCache but for the life of me cannot get it  
to work within a DistributedCache.

We are trying to load up key value mappings for a Data Warehouse ETL  
process.  The mapper will take an input record, lookup the keys based  
on values and emit the resulting key only record.

Happy to answer any questions to help me make this work.

Thanks.

--sean





Re: Does anyone have a working example for using MapFiles on the DistributedCache?

Posted by Sean Shanny <ss...@tripadvisor.com>.
Amareshwari

For part 3 I am trying to open the file to be able to use a  
MapFile.Reader.get() method to look up values based on an input key.

localFiles = DistributedCache.getLocalCacheFiles(conf);

             for (Path localFile : localFiles)
             {
                 String sFileName = localFile.getName();

                 if (sFileName.equalsIgnoreCase("data"))
                 {
                     System.out.println("Full Path: " +  
localFile.toString());
                     System.out.println("Parent: " +  
localFile.getParent().toString());

                     fs = FileSystem.get(localFile.toUri(), conf);
                     myReader = new MapFile.Reader(fs,  
localFile.getParent().toString(), conf);


The code always fails at the myReader.... line with a file not found  
error.  I have triple checked that the file is on the system.  I am  
stumped on how to use a MapReader based on  a file in the  
DistributedCache.


Thanks.

--sean

Sean Shanny
sshanny@tripadvisor.com




On Dec 28, 2008, at 10:59 PM, Amareshwari Sriramadasu wrote:

> Sean Shanny wrote:
>> To all,
>>
>> Version:  hadoop-0.17.2.1-core.jar
>>
>> I have created a MapFile.
>>
>> What I don't seem to be able to do is correctly place the MapFile  
>> in the DistributedCache and the make use of it in a map method.
>>
>> I need the following info please:
>>
>> 1.    How and where to place the MapFile directory so that it is  
>> visible to the hadoop job.
> You have to place your files in DFS. If it is directory you can  
> place an archive of it.
>> 2.    How to add the files to the DistributedCache.
> You can use DistributedCache.addCacheFile or  
> DistributedCache.addCacheArchive.
> See more documentation @ http://hadoop.apache.org/core/docs/r0.17.2/api/org/apache/hadoop/filecache/DistributedCache.html
> and
> http://hadoop.apache.org/core/docs/r0.17.2/mapred_tutorial.html#DistributedCache
>> 3.    How to create a MapFile.Reader from files in the  
>> DistributedCache.
>>
> I didn't understand what you want to do here. Do you want see the  
> files in directory MapFile? or do you want them in classpath etc?
> You can use DistributedCache.addFileToClassPath or  
> DistributedCache.addArchiveToClassPath
>
> Hope this helps.
>
> Thanks
> Amareshwari
>> I can get this to work with a local file on a single node system  
>> outside of the DistributedCache but for the life of me cannot get  
>> it to work within a DistributedCache.
>>
>> We are trying to load up key value mappings for a Data Warehouse  
>> ETL process.  The mapper will take an input record, lookup the keys  
>> based on values and emit the resulting key only record.
>>
>> Happy to answer any questions to help me make this work.
>>
>> Thanks.
>>
>> --sean
>>
>>
>>
>>
>


Re: Does anyone have a working example for using MapFiles on the DistributedCache?

Posted by Amareshwari Sriramadasu <am...@yahoo-inc.com>.
Sean Shanny wrote:
> To all,
>
> Version:  hadoop-0.17.2.1-core.jar
>
> I have created a MapFile.
>
> What I don't seem to be able to do is correctly place the MapFile in 
> the DistributedCache and the make use of it in a map method.
>
> I need the following info please:
>
> 1.    How and where to place the MapFile directory so that it is 
> visible to the hadoop job.
You have to place your files in DFS. If it is directory you can place an 
archive of it.
> 2.    How to add the files to the DistributedCache.
You can use DistributedCache.addCacheFile or 
DistributedCache.addCacheArchive.
See more documentation @ 
http://hadoop.apache.org/core/docs/r0.17.2/api/org/apache/hadoop/filecache/DistributedCache.html
and
http://hadoop.apache.org/core/docs/r0.17.2/mapred_tutorial.html#DistributedCache
> 3.    How to create a MapFile.Reader from files in the DistributedCache.
>
I didn't understand what you want to do here. Do you want see the files 
in directory MapFile? or do you want them in classpath etc?
You can use DistributedCache.addFileToClassPath or 
DistributedCache.addArchiveToClassPath

Hope this helps.

Thanks
Amareshwari
> I can get this to work with a local file on a single node system 
> outside of the DistributedCache but for the life of me cannot get it 
> to work within a DistributedCache.
>
> We are trying to load up key value mappings for a Data Warehouse ETL 
> process.  The mapper will take an input record, lookup the keys based 
> on values and emit the resulting key only record.
>
> Happy to answer any questions to help me make this work.
>
> Thanks.
>
> --sean
>
>
>
>