You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by tim robertson <ti...@gmail.com> on 2008/12/01 07:41:38 UTC

Re: "Lookup" HashMap available within the Map

Hi Shane,

I can't explain that, but I can say that with 0.19.0 I am using
setNumTasksToExecutePerJvm(-1) and then initializing statically
declared data in the Map configure successfully now.  It really is
educated guesswork for the tuning parameters though - I am profiling
the app for memory usage locally and then from trial and error
determining how much additional I need for the Node's hadoop framework
actiities, in order to set the -Xmx params and Maps jobs per Nodes for
the different EC2 sizes.  A little dirty perhaps, but I am still
learning (http://biodivertido.blogspot.com/2008/11/reproducing-spatial-joins-using-hadoop.html).

I'm interested to know when one would use a MultithreadedMapRunner also.

Cheers

Tim

On Sun, Nov 30, 2008 at 11:22 PM, Shane Butler <sh...@gmail.com> wrote:
> Given the goal of a shared data accessable across the Map instances,
> can someone please explain some of the differences between using:
> - setNumTasksToExecutePerJvm() and then having statically declared
> data initialised in Mapper.configure(); and
> - a MultithreadedMapRunner?
>
> Regards,
> Shane
>
>
> On Wed, Nov 26, 2008 at 6:41 AM, Doug Cutting <cu...@apache.org> wrote:
>> tim robertson wrote:
>>>
>>> Thanks Alex - this will allow me to share the shapefile, but I need to
>>> "one time only per job per jvm" read it, parse it and store the
>>> objects in the index.
>>> Is the Mapper.configure() the best place to do this?  E.g. will it
>>> only be called once per job?
>>
>> In 0.19, with HADOOP-249, all tasks from a job can be run in a single JVM.
>>  So, yes, you could access a static cache from Mapper.configure().
>>
>> Doug
>>
>>
>