You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-dev@hadoop.apache.org by Philippe Gassmann <ph...@anyware-tech.com> on 2007/03/19 15:46:55 UTC

Running tasks in the TaskTracker VM

Hi Hadoop guys,

At the moment, for each task (map or reduce) a new JVM is created by the
TaskTracker to run the Job.

We have in our Hadoop cluster a high number of small files thus
requiring a high number of map tasks. I know this is suboptimal, but
aggregating those small files is not possible now. So an idea came to us
: launching jobs in the task tracker JVM so the overhead of creating a
new vm will disappear.

I already have a working patch against the 0.10.1 release of Hadoop that
launch tasks inside the TaskTracker JVM if a specific parameter is set
in the JobConf of the launched Job (for job we trust ;) ). Each new task
have a specific class loader which basically load every needed class by
the Task, as it was running in a brand new JVM. (the same "classpath" is
used)

For that to work, an upgrade of commons-logging to the 1.1 version is
needed in order to circumvent class loader / memory leaks issues. I've
done some profiling using jprofiler on the task tracker to find and to
remove mem leaks. So I'm pretty confident with this code.

If you are interested with that, please let me know.
If so, I will provide a patch against the current Hadoop trunk in Jira
as soon as possible.

--
Philippe.

Re: Running tasks in the TaskTracker VM

Posted by Doug Cutting <cu...@apache.org>.

Philippe Gassmann wrote:
> Yes, but the issue remains present if you have to deal with a high
> number of map tasks to distribute the load on many machines. Launching a
> JVM is costly, let's say it costs 1 second (i'm optimistic) , if you
> have to do 2000 map, there will be 2000 seconds lost in launching JVMs...

The InputFormat controls the number of map tasks.  So, if 2000 is too 
many, so that JVM startup time dominates, then you can develop an 
InputFormat that splits things into fewer tasks so that this is not a 
problem.

> A bit of refactoring of the TaskRunner hierarchy is needed for this to
> work : the code that launch tasks in the JVM or in a separate process is
> very similar and it would have a sense that the TaskRunner would be the
> superclass of a InJVMRunner and a ChildJVMRunner.
> But what can we do with MapTaskRunner and ReduceTaskRunner ? It is not
> acceptable to have let's say : 2 or more implementation of the
> MapTaskRunner (one for in a child JVM execution, one for a in tracker
> JVM execution...). It would be painful to maintain and very complicated.

Perhaps it is too complicated for now, but I think we will want 
something like that long-term, so it is worth thinking about.

Doug

Re: Running tasks in the TaskTracker VM

Posted by Owen O'Malley <ow...@yahoo-inc.com>.

On Mar 19, 2007, at 10:51 AM, Philippe Gassmann wrote:

> Doug Cutting a écrit :
>> A simpler approach might be to develop an InputFormat that includes
>> multiple files per split.
>>
>
> Yes, but the issue remains present if you have to deal with a high
> number of map tasks to distribute the load on many machines.  
> Launching a
> JVM is costly, let's say it costs 1 second (i'm optimistic) , if you
> have to do 2000 map, there will be 2000 seconds lost in launching  
> JVMs...

For task granularity, the most that makes sense is roughly 10-50  
tasks/node. Given that a node runs at least 2 tasks at once, it maps  
into 5-25 seconds of wallclock time. It is noticeable, but shouldn't  
be the dominant factor.

>>> I already have a working patch against the 0.10.1 release of  
>>> Hadoop that
>>> launch tasks inside the TaskTracker JVM if a specific parameter  
>>> is set
>>> in the JobConf of the launched Job (for job we trust ;) ).

Another possible direction would be to have the Task JVM ask for  
another Task before exiting. I believe that Ben Reed experimented  
with that and the changes were not too extensive. For security, you  
would want to limit the JVM reuse to tasks within the same job.

As a side note, we've already seen cases of client code that killed  
the task trackers. So it is hardly an abstract concern. *smile* (The  
client code managed to send kill signals to the entire process group,  
which included the task tracker. It was hard to debug and I'm not  
very interested in making it easier for client code to take out the  
servers.)

-- Owen

Re: Running tasks in the TaskTracker VM

Posted by Sylvain Wallez <sy...@apache.org>.

Stephane Bailliez wrote:
> Torsten Curdt wrote:
>>
>>> Being a complete idiot for distributed computing, I would say it is
>>> easy to explode a JVM when doing such distributed jobs, (should it
>>> be for OOM or anything).
>>
>> Then restrict what people can do - at least Google went that route.
>
> I don't know what Google did on the specifics :)

They came up with their own language for mapreduce jobs:
http://labs.google.com/papers/sawzall.html

> If you want to do that with Java and restrict memory usage, cpu usage
> and descriptor access within each inVM instance. That's a considerable
> amount of work that likely implies writing a specific agent for the vm
> (or an agent for a specific vm that is, because it's pretty unlikely
> that you will get the same results across vms), assuming that can then
> really be done at the classloader level for each task (which is pretty
> insanely complex to me if you have to consider allocation done at the
> parent classloader level, etc..)
>
> At least by forking a vm you can afford to get some reasonably bound
> control over the resources usage (or at least memory) without bringing
> down everything since a vm is already bound to some degrees.
>
>
>>> Failing jobs are not exactly uncommon and running things in a
>>> sandboxed environment with less risk for the tracker seems like a
>>> perfectly reasonable choice. So yeah, vm pooling certainly makes
>>> perfect sense for it
>>
>> I am still not convinced - sorry
>>
>> It's a bit like you would like to run JSPs in a separate JVM because
>> they might take down the servlet container.
>
> it is  a bit too extreme in granularity. I think it is more about like
> running n different webapps within the same VM or not. So if one
> webapp is resource hog, separating it would not harm the n-1 other
> applications and you would either create another server instance or
> move it away to another node.
>
> I know of environment with large number of nodes (not related to
> hadoop) where they also reboot a set of nodes daily to ensure that all
> machines are really in working conditions (it's usually when the
> machine reboots due to failure or whatever that someone has to rush to
> it because some service forgot to be registered or things like that,
> so doing this periodic check gives some people better ideas of their
> response time to failure). That depends of operational procedures for
> sure.

This can be another implementation of the TaskTracker: a single JVM that
forks a "replacement JVM" after either a given time or a given amount of
tasks executed. This can avoid JVM fork overhead while also avoiding
memory leak problems.

The forked JVM could even be pre-forked and monitor the active one,
taking over if it no more responds (and eventually killing it).

Sylvain

-- 
Sylvain Wallez - http://bluxte.net

Re: Running tasks in the TaskTracker VM

Posted by Stephane Bailliez <sb...@gmail.com>.

Torsten Curdt wrote:
> 
>> Being a complete idiot for distributed computing, I would say it is 
>> easy to explode a JVM when doing such distributed jobs, (should it be 
>> for OOM or anything).
> 
> Then restrict what people can do - at least Google went that route.

I don't know what Google did on the specifics :)

If you want to do that with Java and restrict memory usage, cpu usage 
and descriptor access within each inVM instance. That's a considerable 
amount of work that likely implies writing a specific agent for the vm 
(or an agent for a specific vm that is, because it's pretty unlikely 
that you will get the same results across vms), assuming that can then 
really be done at the classloader level for each task (which is pretty 
insanely complex to me if you have to consider allocation done at the 
parent classloader level, etc..)

At least by forking a vm you can afford to get some reasonably bound 
control over the resources usage (or at least memory) without bringing 
down everything since a vm is already bound to some degrees.

>> Failing jobs are not exactly uncommon and running things in a 
>> sandboxed environment with less risk for the tracker seems like a 
>> perfectly reasonable choice. So yeah, vm pooling certainly makes 
>> perfect sense for it
> 
> I am still not convinced - sorry
> 
> It's a bit like you would like to run JSPs in a separate JVM because 
> they might take down the servlet container.

it is  a bit too extreme in granularity. I think it is more about like 
running n different webapps within the same VM or not. So if one webapp 
is resource hog, separating it would not harm the n-1 other applications 
and you would either create another server instance or move it away to 
another node.

I know of environment with large number of nodes (not related to hadoop) 
where they also reboot a set of nodes daily to ensure that all machines 
are really in working conditions (it's usually when the machine reboots 
due to failure or whatever that someone has to rush to it because some 
service forgot to be registered or things like that, so doing this 
periodic check gives some people better ideas of their response time to 
failure). That depends of operational procedures for sure.

I don't think it should be done in the spirit that everything is perfect 
in the perfect world because we know it is not like that. So there will 
be compromise between safety and performance and having something 
reasonably tolerant to failure is also a performance advantage.

Doing simple things in a task like a deleteOnExit is enough to leak on 
some VMs a few kbs each time and stay there until the vm dies (fixed in 
1.5.0_10 if I remember well). Figuring out things like that in the end 
is likely to take a severe amount of time considering it is an internal 
leak and will not appear in your favorite java profiler either.

Bottom line is that even if you're 100% sure of your code which is quite 
unlikely (at least for me as far as I'm concerned ), you don't know 
third-party code. So without being totally paranoid, this is something 
that cannot be ignored.

-- stephane

Re: Running tasks in the TaskTracker VM

Posted by Torsten Curdt <tc...@apache.org>.

On 20.03.2007, at 11:19, Stephane Bailliez wrote:

> Torsten Curdt wrote:
>>> Executing users' code in system daemons is a security risk.
>> Of course there is security benefit in starting the jobs in a  
>> different JVM but if you don't trust the code you are executing  
>> this is probably not for you either. So bottom line is - if you  
>> weight up the performance penalty against the gained security I am  
>> still no excited about the JVM spawning idea.
>> If you really consider security that big of a problem - come up  
>> with your own language to ease and restrict the jobs.
>
> I think security here was more about 'taking down the whole task  
> tracker' risk.

Well, the same applies

> Being a complete idiot for distributed computing, I would say it is  
> easy to explode a JVM when doing such distributed jobs, (should it  
> be for OOM or anything).

Then restrict what people can do - at least Google went that route.

> If you run within the task tracker vm you'll have to carefully size  
> the tracker vm to accommodate potentially the resources of all  
> possibles jobs running at the same time or simply allocate a  
> gigantic amount of resources 'just in case', which kind of offset  
> the benefits of any performance improvement to stability.

Question is whether the task tracker should have access to that  
gigantic amount of resources. In one jvm or the other.

> Not mentioning cleaning up all the mess left by running jobs  
> including flushing the introspection cache to avoid leaks, which  
> will then impact performance of other jobs since it is not a  
> selective flush.
>
> Failing jobs are not exactly uncommon and running things in a  
> sandboxed environment with less risk for the tracker seems like a  
> perfectly reasonable choice. So yeah, vm pooling certainly makes  
> perfect sense for it

I am still not convinced - sorry

It's a bit like you would like to run JSPs in a separate JVM because  
they might take down the servlet container.

cheers
--
Torsten

Re: Running tasks in the TaskTracker VM

Posted by Stephane Bailliez <sb...@gmail.com>.

Torsten Curdt wrote:
>> Executing users' code in system daemons is a security risk.
> 
> Of course there is security benefit in starting the jobs in a different 
> JVM but if you don't trust the code you are executing this is probably 
> not for you either. So bottom line is - if you weight up the performance 
> penalty against the gained security I am still no excited about the JVM 
> spawning idea.
> 
> If you really consider security that big of a problem - come up with 
> your own language to ease and restrict the jobs.

I think security here was more about 'taking down the whole task 
tracker' risk.

Being a complete idiot for distributed computing, I would say it is easy 
to explode a JVM when doing such distributed jobs, (should it be for OOM 
or anything).

If you run within the task tracker vm you'll have to carefully size the 
tracker vm to accommodate potentially the resources of all possibles 
jobs running at the same time or simply allocate a gigantic amount of 
resources 'just in case', which kind of offset the benefits of any 
performance improvement to stability.

Not mentioning cleaning up all the mess left by running jobs including 
flushing the introspection cache to avoid leaks, which will then impact 
performance of other jobs since it is not a selective flush.

Failing jobs are not exactly uncommon and running things in a sandboxed 
environment with less risk for the tracker seems like a perfectly 
reasonable choice. So yeah, vm pooling certainly makes perfect sense for 
it or should probably look at what Doug suggests as well.

My 0.01 kopek ;)

-- stephane

Re: Running tasks in the TaskTracker VM

Posted by Torsten Curdt <tc...@apache.org>.

>> Yes, but the issue remains present if you have to deal with a high
>> number of map tasks to distribute the load on many machines.  
>> Launching a
>> JVM is costly, let's say it costs 1 second (i'm optimistic) , if you
>> have to do 2000 map, there will be 2000 seconds lost in launching  
>> JVMs...
>>
>
> Executing users' code in system daemons is a security risk.

Of course there is security benefit in starting the jobs in a  
different JVM but if you don't trust the code you are executing this  
is probably not for you either. So bottom line is - if you weight up  
the performance penalty against the gained security I am still no  
excited about the JVM spawning idea.

If you really consider security that big of a problem - come up with  
your own language to ease and restrict the jobs.

My 2 cents
--
Torsten

Re: Running tasks in the TaskTracker VM

Posted by Milind Bhandarkar <mi...@yahoo-inc.com>.

>
> Yes, but the issue remains present if you have to deal with a high
> number of map tasks to distribute the load on many machines.  
> Launching a
> JVM is costly, let's say it costs 1 second (i'm optimistic) , if you
> have to do 2000 map, there will be 2000 seconds lost in launching  
> JVMs...
>

Executing users' code in system daemons is a security risk. In my  
experience, security always wins in when pitted against performance.  
IMHO, there is a happy middle ground, i.e. to maintain a pool of  
running JVMs that are launched when the tasktracker starts up. Even  
then, care has to be taken against memory leaks etc.

- Milind

--
Milind Bhandarkar
(mailto:milindb@yahoo-inc.com)
(phone: 408-349-2136 W)

Re: Running tasks in the TaskTracker VM

Posted by Philippe Gassmann <ph...@anyware-tech.com>.

Doug Cutting a écrit :
> Philippe Gassmann wrote:
>> At the moment, for each task (map or reduce) a new JVM is created by the
>> TaskTracker to run the Job.
>>
>> We have in our Hadoop cluster a high number of small files thus
>> requiring a high number of map tasks. I know this is suboptimal, but
>> aggregating those small files is not possible now. So an idea came to us
>> : launching jobs in the task tracker JVM so the overhead of creating a
>> new vm will disappear.
>
> A simpler approach might be to develop an InputFormat that includes
> multiple files per split.
>

Yes, but the issue remains present if you have to deal with a high
number of map tasks to distribute the load on many machines. Launching a
JVM is costly, let's say it costs 1 second (i'm optimistic) , if you
have to do 2000 map, there will be 2000 seconds lost in launching JVMs...

>> I already have a working patch against the 0.10.1 release of Hadoop that
>> launch tasks inside the TaskTracker JVM if a specific parameter is set
>> in the JobConf of the launched Job (for job we trust ;) ).
>
> Ideally this could be through a task-running interface, that permits
> one to plug in different implementations.  For example, sometimes it
> may make sense to run tasks in-process, sometimes to run them in a
> child JVM, and sometimes to fork a non-Java sub-process.  So, rather
> than specifying a flag on the job, one would specify the runner
> implementation class.
>

A bit of refactoring of the TaskRunner hierarchy is needed for this to
work : the code that launch tasks in the JVM or in a separate process is
very similar and it would have a sense that the TaskRunner would be the
superclass of a InJVMRunner and a ChildJVMRunner.
But what can we do with MapTaskRunner and ReduceTaskRunner ? It is not
acceptable to have let's say : 2 or more implementation of the
MapTaskRunner (one for in a child JVM execution, one for a in tracker
JVM execution...). It would be painful to maintain and very complicated.

> Doug

Re: Running tasks in the TaskTracker VM

Posted by Doug Cutting <cu...@apache.org>.

Philippe Gassmann wrote:
> At the moment, for each task (map or reduce) a new JVM is created by the
> TaskTracker to run the Job.
> 
> We have in our Hadoop cluster a high number of small files thus
> requiring a high number of map tasks. I know this is suboptimal, but
> aggregating those small files is not possible now. So an idea came to us
> : launching jobs in the task tracker JVM so the overhead of creating a
> new vm will disappear.

A simpler approach might be to develop an InputFormat that includes 
multiple files per split.

> I already have a working patch against the 0.10.1 release of Hadoop that
> launch tasks inside the TaskTracker JVM if a specific parameter is set
> in the JobConf of the launched Job (for job we trust ;) ).

Ideally this could be through a task-running interface, that permits one 
to plug in different implementations.  For example, sometimes it may 
make sense to run tasks in-process, sometimes to run them in a child 
JVM, and sometimes to fork a non-Java sub-process.  So, rather than 
specifying a flag on the job, one would specify the runner 
implementation class.

Doug

Re: Running tasks in the TaskTracker VM

Posted by Torsten Curdt <tc...@apache.org>.

On 19.03.2007, at 15:46, Philippe Gassmann wrote:

> Hi Hadoop guys,
>
> At the moment, for each task (map or reduce) a new JVM is created  
> by the
> TaskTracker to run the Job.
>
> We have in our Hadoop cluster a high number of small files thus
> requiring a high number of map tasks. I know this is suboptimal, but
> aggregating those small files is not possible now. So an idea came  
> to us
> : launching jobs in the task tracker JVM so the overhead of creating a
> new vm will disappear.

Cool stuff! :)

cheers
--
Torsten