You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@river.apache.org by Mark Brouwer <ma...@marbro.org> on 2013/04/08 15:52:41 UTC

What is wrong with TaskManager?

I didn't read all the discussion with regard to concurrency, just some
of them and I noticed the creation of RIVER-344 "TaskManager scalability
and concurrency".

I'm uncomfortable with this, as it makes the Jini utilities or services
look like a time bomb which can explode any moment and to be honest I
refuse to believe so without proof. Around 2003/2004 I did analyze the
codebase quite extensively to solve some issues the company I worked for
was having with the number of threads created by the JTSK utilities and
I wanted to use a common thread pool for many of these utilities which I
implemented as some point in time, more about that later.

There is a lot of history with regard to discussion about TaskManager
around 2004 when Java SE 5 was on the horizon. People were asking for an
ExecutorService but couldn't state the problem it was going to solve.
This was during the Porter project for which there is no archive of the
mailing lists anymore (hate that, because many of the discussions here
are like broken records ;-). I did attach one discussion (if the
attachment are removed I will add them to RIVER-344 which discussed the
use of the TaskManager among other things.

In that discussion Bob Scheifler (Sun) mentioned "Over the years we've
been slowly eliminating use of Task.runAfter, and in any overhaul I'd
prefer to finish that job rather than perpetuate it."

Therefore I spent some time in analyzing the 4 tasks
(ServiceDiscoveryManager and JoinManager) that didn't return 'false',
but I concluded it wouldn't be easy to rewrite those, it requires an
intimate knowledge of these implementations and I didn't want to burn my
fingers on that one. On that response of Bob I wrote "I agree I had
mixed feelings when implementing it (that is runAfter), but now that we
have it I must admit I can see some use cases for it in our own
environment. Sometimes you have those event generators that have to
notify all the listening entities. Notifications you want to have
performed through a thread pool, but you must guaranty the ordering of
the events. Of course everything can be implemented in a different way,
but the runAfter semantics are quite handy for the 'ordinary'
programmer, opposed to putting the burden on them to arrange for it
themselves.". I concluded runAfter is *not evil* by itself when properly
used, although it gives some overhead and it has to perform its work
while a lock has been held.

I also wrote in that discussion "The work I undertook made me realize
that no matter whether one can make use of the 1.5 concurrency
utilities, without being able to stick constraints on tasks it is
*nearly impossible* to use thread pools in an efficient way, because by
not being able to attach constraints on the tasks you end up creating
loads of pools with various constraints attached to them.". The issue is
that currently all the utilities have their own Task/WakeupManager with
min size, max size, load factor, etc. for good reasons.

*Conclusion*: given the nature of the tasks the TaskManager executes I
fail to believe there is any advantage in replacing it, regardless of
the fact the configuration contracts for the utilities will be broken by
doing so. I truly believe time spent/lost on locking or even processing
the runAfter method (given the number of tasks in the queue) is
negligible compared to time spent on executing those tasks. Yes it is
possible to replace TaskManager with ExecutorService for the sake of it,
but to get a real benefit you should also implement your own
ExecutorService. The latter I did, see below, so I speak from having
walked that road. Of course there might be concurrency issues in the
River codebase, but replacing TaskManager won't solve that.

For some background information. The issue that the company I worked for
faced was that we had up to 30 Jini services running in one JVM, each of
those services using ServiceDiscoveryManager, JoinManager, etc. When
certain events happened in the djinn we saw a huge ramp up of the number
of threads in the JVM, e.g. from 300 up to 600 while a few seconds later
most of them were idle or gone. I didn't want to have such a ramp up of
threads and wanted to spread out those task a little bit in time.

I wanted to be able to share Task/WakeupManagers between the utilities.
Based on the specification of the utilities one couldn't do that as
there were no specifications about the security context and thread
context class loader in place while executing a task. Another issue was
that each of the utilities had their own minimum size, max size, load
factor, etc. Having a single Task/WakeupManager for multiple utilities
should not result in starvation as we had seen starvation before due to
having the wrong pool size and load factor.

To make a short story long. I have been able to extend TaskManager (Sun
made some methods visible that allowed us to do that) so I was able to
delegate to an ExecutorService, but in my analysis of the complete
codebase I found out that an out-of-the-box ExecutorService was not
going to provide the required functionality. There was basically a task
that should never be queued and if no idle threads are available one
should create one on the spot. Then we have those tasks that have a
dependency on each other (this sucks but that sometimes happens) and
then you have tasks for which you want to guarantee they are executed
before a certain deadline (otherwise a lease won't be renewed e.g.). And
then you have to ensure that the security context and thread context
class loader is correct when a task is executed.

What we ended up with is an extension of TaskManager and WakeupManager
that had intrinsic knowledge of the tasks in the JTSK and 'enhances'
them with some constraints and delegates the execution to a shared
ExcecutorService that performs security context and thread context class
loader restoration, has support for deadline constraints (task will be
assigned to worker thread if not executed within a set deadline) and has
support for dependencies (runAfter). Yes this ExecutorService
implementation is in many cases a few orders slower than the J2SE
out-of-the-box implementations when running Runnable.run() with an empty
method body (security context restoration is/was extremely costly), but
in the profiling of real Jini applications it never showed up. For the
context we are living in I believe performance of an ExecutorService is
that important. And of course this ExecutorService implementation we
didn't use for the typical fork-join work.

The code for the extension of Task/WakeupManager and the ExecutorService
implementation has been available since 2006 as part of the Cheiron
Utils project so if interested feel free to have a look:

API:
http://www.cheiron.org/utils/release/v0.2/api/org/cheiron/util/thread/VirtualTaskManager.html

Source:
http://cheiron-scm.merangar.com/@md=d&cd=//cheiron/utils/release/0.2/10/src/org/cheiron/util/thread/&cdf=//cheiron/utils/release/0.2/10/src/org/cheiron/util/thread/VirtualTaskManager.java&sr=338&c=45a@//cheiron/utils/release/0.2/10/src/org/cheiron/util/thread/VirtualTaskManager.java

Regards,
--
Mark Brouwer

Re: What is wrong with TaskManager?

Posted by Mark Brouwer <ma...@marbro.org>.

Hi Patricia,

On 4/8/13 4:32 PM, Patricia Shanahan wrote:
>
> Thanks for the background. The general performance of TaskManager, and
> especially the overhead for runAfter, seemed to me to depend strongly on
> the queue length. For example, a non-trivial runAfter method scans the
> list of tasks older than "this", stopping if it find a collision. That
> is a fast operation for a short queue length. Did you do any
> measurements of that?
>
> Patricia

Unfortunately I can't provide any meaningful data with regard to queue 
lengths observed. Neither from the plain TaskManager nor the 
ExecutorService implementation where many task submitted through 
TaskManager came together.

Basically this was one of the issues I was never able to finish at that 
time, getting statistics out of a thread pool to make meaningful 
decisions about load factor, min/max size and whether tasks should have 
a dead line. A few times I found out the hard way that being on the 
conservative side I could severely impact the liveliness of the system.

The ExecutorService implementation I referred to was going to collect 
statistics about things as queue length, blocked tasks, execution times 
of task, etc. Hoping these statistics would give as insight in what was 
going on under the hood, or could serve as an alarm for operators to see 
that something could be wrong. That work was never finished though, 
partly because I found it very hard to come up what would be meaningful, 
but in general I came to the conclusion I was willing to sacrifices 
performance over insight what was going on because a thread pool is both 
a blessing and a curse.

Currently I'm in the process of moving my code base to Java SE 8, just 
as an exercise in updating my rusty Java skills and as this is still 
important to me so I might give it another attempt.

Regards,
-- 
Mark Brouwer

Re: What is wrong with TaskManager?

Posted by Patricia Shanahan <pa...@acm.org>.

On 4/8/2013 6:52 AM, Mark Brouwer wrote:
...
> Therefore I spent some time in analyzing the 4 tasks
> (ServiceDiscoveryManager and JoinManager) that didn't return 'false',
> but I concluded it wouldn't be easy to rewrite those, it requires an
> intimate knowledge of these implementations and I didn't want to burn my
> fingers on that one. On that response of Bob I wrote "I agree I had
> mixed feelings when implementing it (that is runAfter), but now that we
> have it I must admit I can see some use cases for it in our own
> environment. Sometimes you have those event generators that have to
> notify all the listening entities. Notifications you want to have
> performed through a thread pool, but you must guaranty the ordering of
> the events. Of course everything can be implemented in a different way,
> but the runAfter semantics are quite handy for the 'ordinary'
> programmer, opposed to putting the burden on them to arrange for it
> themselves.". I concluded runAfter is *not evil* by itself when properly
> used, although it gives some overhead and it has to perform its work
> while a lock has been held.
...

Thanks for the background. The general performance of TaskManager, and 
especially the overhead for runAfter, seemed to me to depend strongly on 
the queue length. For example, a non-trivial runAfter method scans the 
list of tasks older than "this", stopping if it find a collision. That 
is a fast operation for a short queue length. Did you do any 
measurements of that?

Patricia