You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Allen Wittenauer <aw...@apache.org> on 2011/08/01 04:21:14 UTC

Re: Moving Files to Distributed Cache in MapReduce

We really need to build a working example to the wiki and add a link from the FAQ page.  Any volunteers?

On Jul 29, 2011, at 7:49 PM, Michael Segel wrote:

> 
> Here's the meat of my post earlier...
> Sample code on putting a file on the cache:
> DistributedCache.addCacheFile(new URI(path+"MyFileName",conf));
> 
> Sample code in pulling data off the cache:
>       private Path[] localFiles = DistributedCache.getLocalCacheFiles(context.getConfiguration());
>        boolean exitProcess = false;
>       int i=0;
>        while (!exit){ 
>            fileName = localFiles[i].getName();
>           if (fileName.equalsIgnoreCase("model.txt")){
>                 // Build your input file reader on localFiles[i].toString() 
>                 exitProcess = true;
>           }
>            i++;
>        } 
> 
> 
> Note that this is SAMPLE code. I didn't trap the exit condition if the file isn't there and you go beyond the size of the array localFiles[].
> Also I set exit to false because its easier to read this as "Do this loop until the condition exitProcess is true".
> 
> When you build your file reader you need the full path, not just the file name. The path will vary when the job runs.
> 
> HTH
> 
> -Mike
> 
> 
>> From: michael_segel@hotmail.com
>> To: common-user@hadoop.apache.org
>> Subject: RE: Moving Files to Distributed Cache in MapReduce
>> Date: Fri, 29 Jul 2011 21:43:37 -0500
>> 
>> 
>> I could have sworn that I gave an example earlier this week on how to push and pull stuff from distributed cache.
>> 
>> 
>>> Date: Fri, 29 Jul 2011 14:51:26 -0700
>>> Subject: Re: Moving Files to Distributed Cache in MapReduce
>>> From: rogchen@ucdavis.edu
>>> To: common-user@hadoop.apache.org
>>> 
>>> jobConf is deprecated in 0.20.2 I believe; you're supposed to be using
>>> Configuration for that
>>> 
>>> On Fri, Jul 29, 2011 at 1:59 PM, Mohit Anchlia <mo...@gmail.com>wrote:
>>> 
>>>> Is this what you are looking for?
>>>> 
>>>> http://hadoop.apache.org/common/docs/current/mapred_tutorial.html
>>>> 
>>>> search for jobConf
>>>> 
>>>> On Fri, Jul 29, 2011 at 1:51 PM, Roger Chen <ro...@ucdavis.edu> wrote:
>>>>> Thanks for the response! However, I'm having an issue with this line
>>>>> 
>>>>> Path[] cacheFiles = DistributedCache.getLocalCacheFiles(conf);
>>>>> 
>>>>> because conf has private access in org.apache.hadoop.configured
>>>>> 
>>>>> On Fri, Jul 29, 2011 at 11:18 AM, Mapred Learn <mapred.learn@gmail.com
>>>>> wrote:
>>>>> 
>>>>>> I hope my previous reply helps...
>>>>>> 
>>>>>> On Fri, Jul 29, 2011 at 11:11 AM, Roger Chen <ro...@ucdavis.edu>
>>>> wrote:
>>>>>> 
>>>>>>> After moving it to the distributed cache, how would I call it within
>>>> my
>>>>>>> MapReduce program?
>>>>>>> 
>>>>>>> On Fri, Jul 29, 2011 at 11:09 AM, Mapred Learn <
>>>> mapred.learn@gmail.com
>>>>>>>> wrote:
>>>>>>> 
>>>>>>>> Did you try using -files option in your hadoop jar command as:
>>>>>>>> 
>>>>>>>> /usr/bin/hadoop jar <jar name> <main class name> -files  <absolute
>>>> path
>>>>>>> of
>>>>>>>> file to be added to distributed cache> <input dir> <output dir>
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Fri, Jul 29, 2011 at 11:05 AM, Roger Chen <ro...@ucdavis.edu>
>>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> Slight modification: I now know how to add files to the
>>>> distributed
>>>>>>> file
>>>>>>>>> cache, which can be done via this command placed in the main or
>>>> run
>>>>>>>> class:
>>>>>>>>> 
>>>>>>>>>       DistributedCache.addCacheFile(new
>>>>>>> URI("/user/hadoop/thefile.dat"),
>>>>>>>>> conf);
>>>>>>>>> 
>>>>>>>>> However I am still having trouble locating the file in the
>>>>>> distributed
>>>>>>>>> cache. *How do I call the file path of thefile.dat in the
>>>> distributed
>>>>>>>> cache
>>>>>>>>> as a string?* I am using Hadoop 0.20.2
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On Fri, Jul 29, 2011 at 10:26 AM, Roger Chen <rogchen@ucdavis.edu
>>>>> 
>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>>> Hi all,
>>>>>>>>>> 
>>>>>>>>>> Does anybody have examples of how one moves files from the local
>>>>>>>>>> filestructure/HDFS to the distributed cache in MapReduce? A
>>>> Google
>>>>>>>> search
>>>>>>>>>> turned up examples in Pig but not MR.
>>>>>>>>>> 
>>>>>>>>>> --
>>>>>>>>>> Roger Chen
>>>>>>>>>> UC Davis Genome Center
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> --
>>>>>>>>> Roger Chen
>>>>>>>>> UC Davis Genome Center
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> --
>>>>>>> Roger Chen
>>>>>>> UC Davis Genome Center
>>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> Roger Chen
>>>>> UC Davis Genome Center
>>>>> 
>>>> 
>>> 
>>> 
>>> 
>>> -- 
>>> Roger Chen
>>> UC Davis Genome Center
>> 		 	   		  
> 		 	   		  


RE: Moving Files to Distributed Cache in MapReduce

Posted by Michael Segel <mi...@hotmail.com>.
Yeah,

I'll write something up and post it on my web site. Definitely not InfoQ stuff, but a simple tip and tricks stuff.

-Mike


> Subject: Re: Moving Files to Distributed Cache in MapReduce
> From: aw@apache.org
> Date: Sun, 31 Jul 2011 19:21:14 -0700
> To: common-user@hadoop.apache.org
> 
> 
> We really need to build a working example to the wiki and add a link from the FAQ page.  Any volunteers?
> 
> On Jul 29, 2011, at 7:49 PM, Michael Segel wrote:
> 
> > 
> > Here's the meat of my post earlier...
> > Sample code on putting a file on the cache:
> > DistributedCache.addCacheFile(new URI(path+"MyFileName",conf));
> > 
> > Sample code in pulling data off the cache:
> >       private Path[] localFiles = DistributedCache.getLocalCacheFiles(context.getConfiguration());
> >        boolean exitProcess = false;
> >       int i=0;
> >        while (!exit){ 
> >            fileName = localFiles[i].getName();
> >           if (fileName.equalsIgnoreCase("model.txt")){
> >                 // Build your input file reader on localFiles[i].toString() 
> >                 exitProcess = true;
> >           }
> >            i++;
> >        } 
> > 
> > 
> > Note that this is SAMPLE code. I didn't trap the exit condition if the file isn't there and you go beyond the size of the array localFiles[].
> > Also I set exit to false because its easier to read this as "Do this loop until the condition exitProcess is true".
> > 
> > When you build your file reader you need the full path, not just the file name. The path will vary when the job runs.
> > 
> > HTH
> > 
> > -Mike
> > 
> > 
> >> From: michael_segel@hotmail.com
> >> To: common-user@hadoop.apache.org
> >> Subject: RE: Moving Files to Distributed Cache in MapReduce
> >> Date: Fri, 29 Jul 2011 21:43:37 -0500
> >> 
> >> 
> >> I could have sworn that I gave an example earlier this week on how to push and pull stuff from distributed cache.
> >> 
> >> 
> >>> Date: Fri, 29 Jul 2011 14:51:26 -0700
> >>> Subject: Re: Moving Files to Distributed Cache in MapReduce
> >>> From: rogchen@ucdavis.edu
> >>> To: common-user@hadoop.apache.org
> >>> 
> >>> jobConf is deprecated in 0.20.2 I believe; you're supposed to be using
> >>> Configuration for that
> >>> 
> >>> On Fri, Jul 29, 2011 at 1:59 PM, Mohit Anchlia <mo...@gmail.com>wrote:
> >>> 
> >>>> Is this what you are looking for?
> >>>> 
> >>>> http://hadoop.apache.org/common/docs/current/mapred_tutorial.html
> >>>> 
> >>>> search for jobConf
> >>>> 
> >>>> On Fri, Jul 29, 2011 at 1:51 PM, Roger Chen <ro...@ucdavis.edu> wrote:
> >>>>> Thanks for the response! However, I'm having an issue with this line
> >>>>> 
> >>>>> Path[] cacheFiles = DistributedCache.getLocalCacheFiles(conf);
> >>>>> 
> >>>>> because conf has private access in org.apache.hadoop.configured
> >>>>> 
> >>>>> On Fri, Jul 29, 2011 at 11:18 AM, Mapred Learn <mapred.learn@gmail.com
> >>>>> wrote:
> >>>>> 
> >>>>>> I hope my previous reply helps...
> >>>>>> 
> >>>>>> On Fri, Jul 29, 2011 at 11:11 AM, Roger Chen <ro...@ucdavis.edu>
> >>>> wrote:
> >>>>>> 
> >>>>>>> After moving it to the distributed cache, how would I call it within
> >>>> my
> >>>>>>> MapReduce program?
> >>>>>>> 
> >>>>>>> On Fri, Jul 29, 2011 at 11:09 AM, Mapred Learn <
> >>>> mapred.learn@gmail.com
> >>>>>>>> wrote:
> >>>>>>> 
> >>>>>>>> Did you try using -files option in your hadoop jar command as:
> >>>>>>>> 
> >>>>>>>> /usr/bin/hadoop jar <jar name> <main class name> -files  <absolute
> >>>> path
> >>>>>>> of
> >>>>>>>> file to be added to distributed cache> <input dir> <output dir>
> >>>>>>>> 
> >>>>>>>> 
> >>>>>>>> On Fri, Jul 29, 2011 at 11:05 AM, Roger Chen <ro...@ucdavis.edu>
> >>>>>>> wrote:
> >>>>>>>> 
> >>>>>>>>> Slight modification: I now know how to add files to the
> >>>> distributed
> >>>>>>> file
> >>>>>>>>> cache, which can be done via this command placed in the main or
> >>>> run
> >>>>>>>> class:
> >>>>>>>>> 
> >>>>>>>>>       DistributedCache.addCacheFile(new
> >>>>>>> URI("/user/hadoop/thefile.dat"),
> >>>>>>>>> conf);
> >>>>>>>>> 
> >>>>>>>>> However I am still having trouble locating the file in the
> >>>>>> distributed
> >>>>>>>>> cache. *How do I call the file path of thefile.dat in the
> >>>> distributed
> >>>>>>>> cache
> >>>>>>>>> as a string?* I am using Hadoop 0.20.2
> >>>>>>>>> 
> >>>>>>>>> 
> >>>>>>>>> On Fri, Jul 29, 2011 at 10:26 AM, Roger Chen <rogchen@ucdavis.edu
> >>>>> 
> >>>>>>>> wrote:
> >>>>>>>>> 
> >>>>>>>>>> Hi all,
> >>>>>>>>>> 
> >>>>>>>>>> Does anybody have examples of how one moves files from the local
> >>>>>>>>>> filestructure/HDFS to the distributed cache in MapReduce? A
> >>>> Google
> >>>>>>>> search
> >>>>>>>>>> turned up examples in Pig but not MR.
> >>>>>>>>>> 
> >>>>>>>>>> --
> >>>>>>>>>> Roger Chen
> >>>>>>>>>> UC Davis Genome Center
> >>>>>>>>>> 
> >>>>>>>>> 
> >>>>>>>>> 
> >>>>>>>>> 
> >>>>>>>>> --
> >>>>>>>>> Roger Chen
> >>>>>>>>> UC Davis Genome Center
> >>>>>>>>> 
> >>>>>>>> 
> >>>>>>> 
> >>>>>>> 
> >>>>>>> 
> >>>>>>> --
> >>>>>>> Roger Chen
> >>>>>>> UC Davis Genome Center
> >>>>>>> 
> >>>>>> 
> >>>>> 
> >>>>> 
> >>>>> 
> >>>>> --
> >>>>> Roger Chen
> >>>>> UC Davis Genome Center
> >>>>> 
> >>>> 
> >>> 
> >>> 
> >>> 
> >>> -- 
> >>> Roger Chen
> >>> UC Davis Genome Center
> >> 		 	   		  
> > 		 	   		  
>