You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@river.apache.org by Mark Brouwer <ma...@marbro.org> on 2013/04/08 15:52:41 UTC

What is wrong with TaskManager?

I didn't read all the discussion with regard to concurrency, just some 
of them and I noticed the creation of RIVER-344 "TaskManager scalability 
and concurrency".

I'm uncomfortable with this, as it makes the Jini utilities or services 
look like a time bomb which can explode any moment and to be honest I 
refuse to believe so without proof. Around 2003/2004 I did analyze the 
codebase quite extensively to solve some issues the company I worked for 
was having with the number of threads created by the JTSK utilities and 
I wanted to use a common thread pool for many of these utilities which I 
implemented as some point in time, more about that later.

There is a lot of history with regard to discussion about TaskManager 
around 2004 when Java SE 5 was on the horizon. People were asking for an 
ExecutorService but couldn't state the problem it was going to solve. 
This was during the Porter project for which there is no archive of the 
mailing lists anymore (hate that, because many of the discussions here 
are like broken records ;-). I did attach one discussion (if the 
attachment are removed I will add them to RIVER-344 which discussed the 
use of the TaskManager among other things.

In that discussion Bob Scheifler (Sun) mentioned "Over the years we've 
been slowly eliminating use of Task.runAfter, and in any overhaul I'd 
prefer to finish that job rather than perpetuate it."

Therefore I spent some time in analyzing the 4 tasks 
(ServiceDiscoveryManager and JoinManager) that didn't return 'false', 
but I concluded it wouldn't be easy to rewrite those, it requires an 
intimate knowledge of these implementations and I didn't want to burn my 
fingers on that one. On that response of Bob I wrote "I agree I had 
mixed feelings when implementing it (that is runAfter), but now that we 
have it I must admit I can see some use cases for it in our own 
environment. Sometimes you have those event generators that have to 
notify all the listening entities. Notifications you want to have 
performed through a thread pool, but you must guaranty the ordering of 
the events. Of course everything can be implemented in a different way, 
but the runAfter semantics are quite handy for the 'ordinary' 
programmer, opposed to putting the burden on them to arrange for it 
themselves.". I concluded runAfter is *not evil* by itself when properly 
used, although it gives some overhead and it has to perform its work 
while a lock has been held.

I also wrote in that discussion "The work I undertook made me realize 
that no matter whether one can make use of the 1.5 concurrency 
utilities, without being able to stick constraints on tasks it is 
*nearly impossible* to use thread pools in an efficient way, because by 
not being able to attach constraints on the tasks you end up creating 
loads of pools with various constraints attached to them.". The issue is 
that currently all the utilities have their own Task/WakeupManager with 
min size, max size, load factor, etc. for good reasons.

*Conclusion*: given the nature of the tasks the TaskManager executes I 
fail to believe there is any advantage in replacing it, regardless of 
the fact the configuration contracts for the utilities will be broken by 
doing so. I truly believe time spent/lost on locking or even processing 
the runAfter method (given the number of tasks in the queue) is 
negligible compared to time spent on executing those tasks. Yes it is 
possible to replace TaskManager with ExecutorService for the sake of it, 
but to get a real benefit you should also implement your own 
ExecutorService. The latter I did, see below, so I speak from having 
walked that road. Of course there might be concurrency issues in the 
River codebase, but replacing TaskManager won't solve that.

For some background information. The issue that the company I worked for 
faced was that we had up to 30 Jini services running in one JVM, each of 
those services using ServiceDiscoveryManager, JoinManager, etc. When 
certain events happened in the djinn we saw a huge ramp up of the number 
of threads in the JVM, e.g. from 300 up to 600 while a few seconds later 
most of them were idle or gone. I didn't want to have such a ramp up of 
threads and wanted to spread out those task a little bit in time.

I wanted to be able to share Task/WakeupManagers between the utilities. 
Based on the specification of the utilities one couldn't do that as 
there were no specifications about the security context and thread 
context class loader in place while executing a task. Another issue was 
that each of the utilities had their own minimum size, max size, load 
factor, etc. Having a single Task/WakeupManager for multiple utilities 
should not result in starvation as we had seen starvation before due to 
having the wrong pool size and load factor.

To make a short story long. I have been able to extend TaskManager (Sun 
made some methods visible that allowed us to do that) so I was able to 
delegate to an ExecutorService, but in my analysis of the complete 
codebase I found out that an out-of-the-box ExecutorService was not 
going to provide the required functionality. There was basically a task 
that should never be queued and if no idle threads are available one 
should create one on the spot. Then we have those tasks that have a 
dependency on each other (this sucks but that sometimes happens) and 
then you have tasks for which you want to guarantee they are executed 
before a certain deadline (otherwise a lease won't be renewed e.g.). And 
then you have to ensure that the security context and thread context 
class loader is correct when a task is executed.

What we ended up with is an extension of TaskManager and WakeupManager 
that had intrinsic knowledge of the tasks in the JTSK and 'enhances' 
them with some constraints and delegates the execution to a shared 
ExcecutorService that performs security context and thread context class 
loader restoration, has support for deadline constraints (task will be 
assigned to worker thread if not executed within a set deadline) and has 
support for dependencies (runAfter). Yes this ExecutorService 
implementation is in many cases a few orders slower than the J2SE 
out-of-the-box implementations when running Runnable.run() with an empty 
method body (security context restoration is/was extremely costly), but 
in the profiling of real Jini applications it never showed up. For the 
context we are living in I believe performance of an ExecutorService is 
that important. And of course this ExecutorService implementation we 
didn't use for the typical fork-join work.

The code for the extension of Task/WakeupManager and the ExecutorService 
implementation has been available since 2006 as part of the Cheiron 
Utils project so if interested feel free to have a look:

API:
http://www.cheiron.org/utils/release/v0.2/api/org/cheiron/util/thread/VirtualTaskManager.html

Source:
http://cheiron-scm.merangar.com/@md=d&cd=//cheiron/utils/release/0.2/10/src/org/cheiron/util/thread/&cdf=//cheiron/utils/release/0.2/10/src/org/cheiron/util/thread/VirtualTaskManager.java&sr=338&c=45a@//cheiron/utils/release/0.2/10/src/org/cheiron/util/thread/VirtualTaskManager.java

Regards,
-- 
Mark Brouwer


Re: What is wrong with TaskManager?

Posted by Mark Brouwer <ma...@marbro.org>.
Hi Patricia,

On 4/8/13 4:32 PM, Patricia Shanahan wrote:
>
> Thanks for the background. The general performance of TaskManager, and
> especially the overhead for runAfter, seemed to me to depend strongly on
> the queue length. For example, a non-trivial runAfter method scans the
> list of tasks older than "this", stopping if it find a collision. That
> is a fast operation for a short queue length. Did you do any
> measurements of that?
>
> Patricia

Unfortunately I can't provide any meaningful data with regard to queue 
lengths observed. Neither from the plain TaskManager nor the 
ExecutorService implementation where many task submitted through 
TaskManager came together.

Basically this was one of the issues I was never able to finish at that 
time, getting statistics out of a thread pool to make meaningful 
decisions about load factor, min/max size and whether tasks should have 
a dead line. A few times I found out the hard way that being on the 
conservative side I could severely impact the liveliness of the system.

The ExecutorService implementation I referred to was going to collect 
statistics about things as queue length, blocked tasks, execution times 
of task, etc. Hoping these statistics would give as insight in what was 
going on under the hood, or could serve as an alarm for operators to see 
that something could be wrong. That work was never finished though, 
partly because I found it very hard to come up what would be meaningful, 
but in general I came to the conclusion I was willing to sacrifices 
performance over insight what was going on because a thread pool is both 
a blessing and a curse.

Currently I'm in the process of moving my code base to Java SE 8, just 
as an exercise in updating my rusty Java skills and as this is still 
important to me so I might give it another attempt.

Regards,
-- 
Mark Brouwer

Re: What is wrong with TaskManager?

Posted by Patricia Shanahan <pa...@acm.org>.
On 4/8/2013 6:52 AM, Mark Brouwer wrote:
...
> Therefore I spent some time in analyzing the 4 tasks
> (ServiceDiscoveryManager and JoinManager) that didn't return 'false',
> but I concluded it wouldn't be easy to rewrite those, it requires an
> intimate knowledge of these implementations and I didn't want to burn my
> fingers on that one. On that response of Bob I wrote "I agree I had
> mixed feelings when implementing it (that is runAfter), but now that we
> have it I must admit I can see some use cases for it in our own
> environment. Sometimes you have those event generators that have to
> notify all the listening entities. Notifications you want to have
> performed through a thread pool, but you must guaranty the ordering of
> the events. Of course everything can be implemented in a different way,
> but the runAfter semantics are quite handy for the 'ordinary'
> programmer, opposed to putting the burden on them to arrange for it
> themselves.". I concluded runAfter is *not evil* by itself when properly
> used, although it gives some overhead and it has to perform its work
> while a lock has been held.
...

Thanks for the background. The general performance of TaskManager, and 
especially the overhead for runAfter, seemed to me to depend strongly on 
the queue length. For example, a non-trivial runAfter method scans the 
list of tasks older than "this", stopping if it find a collision. That 
is a fast operation for a short queue length. Did you do any 
measurements of that?

Patricia