You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by Arpit Wanchoo <Ar...@guavus.com> on 2012/06/04 14:12:08 UTC

JVM reuse in Map Tasks

Hi

I wanted to check what exactly we gain  when JVM reusability is enabled in mapped job.

My doubt was regarding the setup() method of mapper. Is it called for a mapper even if it is using the JVM for previously run mapper ?
If yes then is there any way I can control it or stop from being called more than once.

Regards,
Arpit Wanchoo | Sr. Software Engineer
Guavus Network Systems.
6th Floor, Enkay Towers, Tower B & B1,Vanijya Nikunj, Udyog Vihar Phase - V, Gurgaon,Haryana.
Mobile Number +91-9899949788


Re: JVM reuse in Map Tasks

Posted by Subroto <ss...@datameer.com>.
Hi Arpit,

A point to mention from http://www.cloudera.com/blog/2009/12/7-tips-for-improving-mapreduce-performance/:

If each task takes less than 30-40 seconds, reduce the number of tasks. The task setup and scheduling overhead is a few seconds, so if tasks finish very quickly, you’re wasting time while not doing work. JVM reuse can also be enabled to solve this problem.

Further I can think if we create a huge tree in the mapper phase in a Child JVM(lets say implementation needs a huge tree to be created), same can be re-used across the JVMs rather than creating again and again.

Cheers,
Subroto Sanyal

On Jun 4, 2012, at 2:12 PM, Arpit Wanchoo wrote:

> Hi
> 
> I wanted to check what exactly we gain  when JVM reusability is enabled in mapped job.
> 
> My doubt was regarding the setup() method of mapper. Is it called for a mapper even if it is using the JVM for previously run mapper ?
> If yes then is there any way I can control it or stop from being called more than once.
> 
> Regards,
> Arpit Wanchoo | Sr. Software Engineer
> Guavus Network Systems.
> 6th Floor, Enkay Towers, Tower B & B1,Vanijya Nikunj, Udyog Vihar Phase - V, Gurgaon,Haryana.
> Mobile Number +91-9899949788
> 


Re: JVM reuse in Map Tasks

Posted by GUOJUN Zhu <gu...@freddiemac.com>.
Yeah.  I think so.  For a mapper, that is probably not significant as our 
map runs usually takes minutes.  However, we also have it on for combiners 
(same as the reduce class), that becomes significant because a combiner's 
configure() run everytime for each key (quite a few in our case) in the 
end of every map task. 

Zhu, Guojun
Modeling Sr Graduate
571-3824370
guojun_zhu@freddiemac.com
Financial Engineering
Freddie Mac



   Arpit Wanchoo <Ar...@guavus.com> 
   06/05/2012 03:56 AM
   Please respond to
mapreduce-user@hadoop.apache.org


To
"<ma...@hadoop.apache.org>" <ma...@hadoop.apache.org>
cc

Subject
Re: JVM reuse in Map Tasks






Yes I meant the configure(JobConf).
I got that point. 
So that means, setup() is called for each mapper even if JVM reusability 
is enabled.

If i understood correctly, 
then if I initialize a static variable (say var) in setup() and when 
mapper is started for the 2nd time on same JVM, the that var would be 
already initialized before setup() is called i.e it is retaining its value 
from previously run mapper.
Is this the way ?



Regards,
Arpit Wanchoo | Sr. Software Engineer
Guavus Network Systems.
6th Floor, Enkay Towers, Tower B & B1,Vanijya Nikunj, Udyog Vihar Phase - 
V, Gurgaon,Haryana.
Mobile Number +91-9899949788 

On 04-Jun-2012, at 6:36 PM, GUOJUN Zhu wrote:


For setup(), do you mean configure(JobConf)?     We need to deserialize a 
big object and do some other preparing work on it within the configure() 
for setting up. It takes a few seconds and it is the same for all task. We 
just declare the object as static and do not recreate it if it is not 
null.  By that way, we make sure only create it once and save the setup 
time for the rest of the tasks.   

Zhu, Guojun
Modeling Sr Graduate
571-3824370
guojun_zhu@freddiemac.com
Financial Engineering
Freddie Mac 


   Arpit Wanchoo <Ar...@guavus.com> 
   06/04/2012 08:12 AM 

   Please respond to
mapreduce-user@hadoop.apache.org



To
"mapreduce-user@hadoop.apache.org" <ma...@hadoop.apache.org> 
cc

Subject
JVM reuse in Map Tasks








Hi 

I wanted to check what exactly we gain  when JVM reusability is enabled in 
mapped job. 

My doubt was regarding the setup() method of mapper. Is it called for a 
mapper even if it is using the JVM for previously run mapper ? 
If yes then is there any way I can control it or stop from being called 
more than once. 

Regards,
Arpit Wanchoo | Sr. Software Engineer
Guavus Network Systems.
6th Floor, Enkay Towers, Tower B & B1,Vanijya Nikunj, Udyog Vihar Phase - 
V, Gurgaon,Haryana.
Mobile Number +91-9899949788 



Re: JVM reuse in Map Tasks

Posted by Arpit Wanchoo <Ar...@guavus.com>.
Yes I meant the configure(JobConf).
I got that point.
So that means, setup() is called for each mapper even if JVM reusability is enabled.

If i understood correctly,
then if I initialize a static variable (say var) in setup() and when mapper is started for the 2nd time on same JVM, the that var would be already initialized before setup() is called i.e it is retaining its value from previously run mapper.
Is this the way ?



Regards,
Arpit Wanchoo | Sr. Software Engineer
Guavus Network Systems.
6th Floor, Enkay Towers, Tower B & B1,Vanijya Nikunj, Udyog Vihar Phase - V, Gurgaon,Haryana.
Mobile Number +91-9899949788

On 04-Jun-2012, at 6:36 PM, GUOJUN Zhu wrote:


For setup(), do you mean configure(JobConf)?     We need to deserialize a big object and do some other preparing work on it within the configure() for setting up. It takes a few seconds and it is the same for all task.  We just declare the object as static and do not recreate it if it is not null.  By that way, we make sure only create it once and save the setup time for the rest of the tasks.

Zhu, Guojun
Modeling Sr Graduate
571-3824370
guojun_zhu@freddiemac.com<ma...@freddiemac.com>
Financial Engineering
Freddie Mac


   Arpit Wanchoo <Ar...@guavus.com>>

   06/04/2012 08:12 AM
   Please respond to
mapreduce-user@hadoop.apache.org<ma...@hadoop.apache.org>




To
        "mapreduce-user@hadoop.apache.org<ma...@hadoop.apache.org>" <ma...@hadoop.apache.org>>
cc

Subject
        JVM reuse in Map Tasks







Hi

I wanted to check what exactly we gain  when JVM reusability is enabled in mapped job.

My doubt was regarding the setup() method of mapper. Is it called for a mapper even if it is using the JVM for previously run mapper ?
If yes then is there any way I can control it or stop from being called more than once.

Regards,
Arpit Wanchoo | Sr. Software Engineer
Guavus Network Systems.
6th Floor, Enkay Towers, Tower B & B1,Vanijya Nikunj, Udyog Vihar Phase - V, Gurgaon,Haryana.
Mobile Number +91-9899949788



Re: JVM reuse in Map Tasks

Posted by GUOJUN Zhu <gu...@freddiemac.com>.
For setup(), do you mean configure(JobConf)?     We need to deserialize a 
big object and do some other preparing work on it within the configure() 
for setting up. It takes a few seconds and it is the same for all task. We 
just declare the object as static and do not recreate it if it is not 
null.  By that way, we make sure only create it once and save the setup 
time for the rest of the tasks. 

Zhu, Guojun
Modeling Sr Graduate
571-3824370
guojun_zhu@freddiemac.com
Financial Engineering
Freddie Mac



   Arpit Wanchoo <Ar...@guavus.com> 
   06/04/2012 08:12 AM
   Please respond to
mapreduce-user@hadoop.apache.org


To
"mapreduce-user@hadoop.apache.org" <ma...@hadoop.apache.org>
cc

Subject
JVM reuse in Map Tasks






Hi 

I wanted to check what exactly we gain  when JVM reusability is enabled in 
mapped job.

My doubt was regarding the setup() method of mapper. Is it called for a 
mapper even if it is using the JVM for previously run mapper ?
If yes then is there any way I can control it or stop from being called 
more than once.

Regards,
Arpit Wanchoo | Sr. Software Engineer
Guavus Network Systems.
6th Floor, Enkay Towers, Tower B & B1,Vanijya Nikunj, Udyog Vihar Phase - 
V, Gurgaon,Haryana.
Mobile Number +91-9899949788