You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Ryan LeCompte <le...@gmail.com> on 2008/09/03 06:00:50 UTC

JVM Spawning

Beginner's question:

If I have a cluster with a single node that has a max of 1 map/1
reduce, and the job submitted has 50 maps... Then it will process only
1 map at a time. Does that mean that it's spawning 1 new JVM for each
map processed? Or re-using the same JVM when a new map can be
processed?

Thanks,
Ryan

Re: JVM Spawning

Posted by Doug Cutting <cu...@apache.org>.
LocalJobRunner allows you to test your code with everything running in a 
single JVM.  Just set mapred.job.tracker=local.

Doug

Ryan LeCompte wrote:
> I see... so there really isn't a way for me to test a map/reduce
> program using a single node without incurring the overhead of
> upping/downing JVM's... My input is broken up into 5 text files.... is
> there a way I could start the job such that it only uses 1 map to
> process the whole thing? I guess I'd have to concatenate the files
> into 1 file and somehow turn off splitting?
> 
> Ryan
> 
> 
> On Wed, Sep 3, 2008 at 12:09 AM, Owen O'Malley <om...@apache.org> wrote:
>> On Sep 2, 2008, at 9:00 PM, Ryan LeCompte wrote:
>>
>>> Beginner's question:
>>>
>>> If I have a cluster with a single node that has a max of 1 map/1
>>> reduce, and the job submitted has 50 maps... Then it will process only
>>> 1 map at a time. Does that mean that it's spawning 1 new JVM for each
>>> map processed? Or re-using the same JVM when a new map can be
>>> processed?
>> It creates a new JVM for each task. Devaraj is working on
>> https://issues.apache.org/jira/browse/HADOOP-249
>> which will allow the jvms to run multiple tasks sequentially.
>>
>> -- Owen
>>

Re: JVM Spawning

Posted by Owen O'Malley <om...@apache.org>.
I posted an idea for an extension for MultipleFileInputFormat if someone has
any extra time. *smile*

https://issues.apache.org/jira/browse/HADOOP-4057

-- Owen

Re: JVM Spawning

Posted by Owen O'Malley <om...@apache.org>.
On Tue, Sep 2, 2008 at 9:13 PM, Ryan LeCompte <le...@gmail.com> wrote:

> I see... so there really isn't a way for me to test a map/reduce
> program using a single node without incurring the overhead of
> upping/downing JVM's... My input is broken up into 5 text files.... is
> there a way I could start the job such that it only uses 1 map to
> process the whole thing? I guess I'd have to concatenate the files
> into 1 file and somehow turn off splitting?


There is a MultipleFileInputFormat, but it is less useful than it should be,
but it is a good
place to start. If you defining a MultipleFileInputFormat that reads text
files should be pretty easy and it will give you a single map for your job.
Otherwise, yes, you'll need to make a single file and ask for a single map.

-- Owen

Re: JVM Spawning

Posted by Ryan LeCompte <le...@gmail.com>.
I see... so there really isn't a way for me to test a map/reduce
program using a single node without incurring the overhead of
upping/downing JVM's... My input is broken up into 5 text files.... is
there a way I could start the job such that it only uses 1 map to
process the whole thing? I guess I'd have to concatenate the files
into 1 file and somehow turn off splitting?

Ryan


On Wed, Sep 3, 2008 at 12:09 AM, Owen O'Malley <om...@apache.org> wrote:
>
> On Sep 2, 2008, at 9:00 PM, Ryan LeCompte wrote:
>
>> Beginner's question:
>>
>> If I have a cluster with a single node that has a max of 1 map/1
>> reduce, and the job submitted has 50 maps... Then it will process only
>> 1 map at a time. Does that mean that it's spawning 1 new JVM for each
>> map processed? Or re-using the same JVM when a new map can be
>> processed?
>
> It creates a new JVM for each task. Devaraj is working on
> https://issues.apache.org/jira/browse/HADOOP-249
> which will allow the jvms to run multiple tasks sequentially.
>
> -- Owen
>

Re: JVM Spawning

Posted by Owen O'Malley <om...@apache.org>.
On Sep 2, 2008, at 9:00 PM, Ryan LeCompte wrote:

> Beginner's question:
>
> If I have a cluster with a single node that has a max of 1 map/1
> reduce, and the job submitted has 50 maps... Then it will process only
> 1 map at a time. Does that mean that it's spawning 1 new JVM for each
> map processed? Or re-using the same JVM when a new map can be
> processed?

It creates a new JVM for each task. Devaraj is working on
https://issues.apache.org/jira/browse/HADOOP-249
which will allow the jvms to run multiple tasks sequentially.

-- Owen