You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by Arpit Wanchoo <Ar...@guavus.com> on 2012/06/04 14:12:08 UTC
JVM reuse in Map Tasks
Hi
I wanted to check what exactly we gain when JVM reusability is enabled in mapped job.
My doubt was regarding the setup() method of mapper. Is it called for a mapper even if it is using the JVM for previously run mapper ?
If yes then is there any way I can control it or stop from being called more than once.
Regards,
Arpit Wanchoo | Sr. Software Engineer
Guavus Network Systems.
6th Floor, Enkay Towers, Tower B & B1,Vanijya Nikunj, Udyog Vihar Phase - V, Gurgaon,Haryana.
Mobile Number +91-9899949788
Re: JVM reuse in Map Tasks
Posted by Subroto <ss...@datameer.com>.
Hi Arpit,
A point to mention from http://www.cloudera.com/blog/2009/12/7-tips-for-improving-mapreduce-performance/:
If each task takes less than 30-40 seconds, reduce the number of tasks. The task setup and scheduling overhead is a few seconds, so if tasks finish very quickly, you’re wasting time while not doing work. JVM reuse can also be enabled to solve this problem.
Further I can think if we create a huge tree in the mapper phase in a Child JVM(lets say implementation needs a huge tree to be created), same can be re-used across the JVMs rather than creating again and again.
Cheers,
Subroto Sanyal
On Jun 4, 2012, at 2:12 PM, Arpit Wanchoo wrote:
> Hi
>
> I wanted to check what exactly we gain when JVM reusability is enabled in mapped job.
>
> My doubt was regarding the setup() method of mapper. Is it called for a mapper even if it is using the JVM for previously run mapper ?
> If yes then is there any way I can control it or stop from being called more than once.
>
> Regards,
> Arpit Wanchoo | Sr. Software Engineer
> Guavus Network Systems.
> 6th Floor, Enkay Towers, Tower B & B1,Vanijya Nikunj, Udyog Vihar Phase - V, Gurgaon,Haryana.
> Mobile Number +91-9899949788
>
Re: JVM reuse in Map Tasks
Posted by GUOJUN Zhu <gu...@freddiemac.com>.
Yeah. I think so. For a mapper, that is probably not significant as our
map runs usually takes minutes. However, we also have it on for combiners
(same as the reduce class), that becomes significant because a combiner's
configure() run everytime for each key (quite a few in our case) in the
end of every map task.
Zhu, Guojun
Modeling Sr Graduate
571-3824370
guojun_zhu@freddiemac.com
Financial Engineering
Freddie Mac
Arpit Wanchoo <Ar...@guavus.com>
06/05/2012 03:56 AM
Please respond to
mapreduce-user@hadoop.apache.org
To
"<ma...@hadoop.apache.org>" <ma...@hadoop.apache.org>
cc
Subject
Re: JVM reuse in Map Tasks
Yes I meant the configure(JobConf).
I got that point.
So that means, setup() is called for each mapper even if JVM reusability
is enabled.
If i understood correctly,
then if I initialize a static variable (say var) in setup() and when
mapper is started for the 2nd time on same JVM, the that var would be
already initialized before setup() is called i.e it is retaining its value
from previously run mapper.
Is this the way ?
Regards,
Arpit Wanchoo | Sr. Software Engineer
Guavus Network Systems.
6th Floor, Enkay Towers, Tower B & B1,Vanijya Nikunj, Udyog Vihar Phase -
V, Gurgaon,Haryana.
Mobile Number +91-9899949788
On 04-Jun-2012, at 6:36 PM, GUOJUN Zhu wrote:
For setup(), do you mean configure(JobConf)? We need to deserialize a
big object and do some other preparing work on it within the configure()
for setting up. It takes a few seconds and it is the same for all task. We
just declare the object as static and do not recreate it if it is not
null. By that way, we make sure only create it once and save the setup
time for the rest of the tasks.
Zhu, Guojun
Modeling Sr Graduate
571-3824370
guojun_zhu@freddiemac.com
Financial Engineering
Freddie Mac
Arpit Wanchoo <Ar...@guavus.com>
06/04/2012 08:12 AM
Please respond to
mapreduce-user@hadoop.apache.org
To
"mapreduce-user@hadoop.apache.org" <ma...@hadoop.apache.org>
cc
Subject
JVM reuse in Map Tasks
Hi
I wanted to check what exactly we gain when JVM reusability is enabled in
mapped job.
My doubt was regarding the setup() method of mapper. Is it called for a
mapper even if it is using the JVM for previously run mapper ?
If yes then is there any way I can control it or stop from being called
more than once.
Regards,
Arpit Wanchoo | Sr. Software Engineer
Guavus Network Systems.
6th Floor, Enkay Towers, Tower B & B1,Vanijya Nikunj, Udyog Vihar Phase -
V, Gurgaon,Haryana.
Mobile Number +91-9899949788
Re: JVM reuse in Map Tasks
Posted by Arpit Wanchoo <Ar...@guavus.com>.
Yes I meant the configure(JobConf).
I got that point.
So that means, setup() is called for each mapper even if JVM reusability is enabled.
If i understood correctly,
then if I initialize a static variable (say var) in setup() and when mapper is started for the 2nd time on same JVM, the that var would be already initialized before setup() is called i.e it is retaining its value from previously run mapper.
Is this the way ?
Regards,
Arpit Wanchoo | Sr. Software Engineer
Guavus Network Systems.
6th Floor, Enkay Towers, Tower B & B1,Vanijya Nikunj, Udyog Vihar Phase - V, Gurgaon,Haryana.
Mobile Number +91-9899949788
On 04-Jun-2012, at 6:36 PM, GUOJUN Zhu wrote:
For setup(), do you mean configure(JobConf)? We need to deserialize a big object and do some other preparing work on it within the configure() for setting up. It takes a few seconds and it is the same for all task. We just declare the object as static and do not recreate it if it is not null. By that way, we make sure only create it once and save the setup time for the rest of the tasks.
Zhu, Guojun
Modeling Sr Graduate
571-3824370
guojun_zhu@freddiemac.com<ma...@freddiemac.com>
Financial Engineering
Freddie Mac
Arpit Wanchoo <Ar...@guavus.com>>
06/04/2012 08:12 AM
Please respond to
mapreduce-user@hadoop.apache.org<ma...@hadoop.apache.org>
To
"mapreduce-user@hadoop.apache.org<ma...@hadoop.apache.org>" <ma...@hadoop.apache.org>>
cc
Subject
JVM reuse in Map Tasks
Hi
I wanted to check what exactly we gain when JVM reusability is enabled in mapped job.
My doubt was regarding the setup() method of mapper. Is it called for a mapper even if it is using the JVM for previously run mapper ?
If yes then is there any way I can control it or stop from being called more than once.
Regards,
Arpit Wanchoo | Sr. Software Engineer
Guavus Network Systems.
6th Floor, Enkay Towers, Tower B & B1,Vanijya Nikunj, Udyog Vihar Phase - V, Gurgaon,Haryana.
Mobile Number +91-9899949788
Re: JVM reuse in Map Tasks
Posted by GUOJUN Zhu <gu...@freddiemac.com>.
For setup(), do you mean configure(JobConf)? We need to deserialize a
big object and do some other preparing work on it within the configure()
for setting up. It takes a few seconds and it is the same for all task. We
just declare the object as static and do not recreate it if it is not
null. By that way, we make sure only create it once and save the setup
time for the rest of the tasks.
Zhu, Guojun
Modeling Sr Graduate
571-3824370
guojun_zhu@freddiemac.com
Financial Engineering
Freddie Mac
Arpit Wanchoo <Ar...@guavus.com>
06/04/2012 08:12 AM
Please respond to
mapreduce-user@hadoop.apache.org
To
"mapreduce-user@hadoop.apache.org" <ma...@hadoop.apache.org>
cc
Subject
JVM reuse in Map Tasks
Hi
I wanted to check what exactly we gain when JVM reusability is enabled in
mapped job.
My doubt was regarding the setup() method of mapper. Is it called for a
mapper even if it is using the JVM for previously run mapper ?
If yes then is there any way I can control it or stop from being called
more than once.
Regards,
Arpit Wanchoo | Sr. Software Engineer
Guavus Network Systems.
6th Floor, Enkay Towers, Tower B & B1,Vanijya Nikunj, Udyog Vihar Phase -
V, Gurgaon,Haryana.
Mobile Number +91-9899949788