You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avalon.apache.org by Berin Loritsch <bl...@apache.org> on 2001/02/28 21:58:31 UTC

Re: [Avalon][C2] ComponentPool problems under Apache JMeter

Davanum Srinivas wrote:
> 
> Berin,
> 
> One major problem for the XSP scalability is that the Pool system is not good and we keep creating
> new instances of the stuff that extends XSPGenerator. Avalon's ThreadSafePool and AbstractPool
> need to be re-visited/re-written. Here are the problems.
> 
> 1. AbstractPool's get() should allocate from a pool of available connections. I could not
> understand why it has code that expands the pool? and that code never gets called because m_pool
> is never null (init is called). Shouldn't the get() shrink the size of the available pool if it is
> too big instead?

The pool code was based off of a composite of code that I wrote, Stefano wrote, and Federico wrote.
Peter Donald did the current implementation.  He did some extensive testing and chose the best performing
methodology.  Unfortunately it is not ThreadSafe, and was never intended to be (at least according to
our last discussion on the subject).  That's why I suggested the ThreadSafe version to you.

> 2. Abstract Pool's put() should add the component to the available pool. So theoretically if the
> pool size is not big enough it should expand the pool size not reduce it....

I agree.  The question is this: what kind of pooling policy do we want here?  There are several policies
available:

Hard Resource Limiting: the pool will never expand, but will cause you to wait for an available resource.
Soft Resource Limiting: the pool becomes a factory when the resources have been exausted--extra resources
                        are destroyed when they return.
No Resource Limiting:   the pool expands to the extent necessary, but is never shrunk.

They all have performance/memory usage tradeoffs.

> 3. ThreadSafePool's get() and put() are declared final. They should not be.

I believe the ThreadSafePool was intended to be used standalone.

> 4. Currently the mechanism for growing and shrinking pools is severely broken. The code has to be
> re-visited.

Ok.

> 5. Please note that C2's ComponentPool now uses ThreadSafePool and not AbstractPool directly.

yep.  Question, what do you think of the Pool used in the DataSources code.  There are some things
that I need to do to it, but it is my code (Peter reformatted it).  There is a clear distinction
between used components and unused components.

> Can you please take a look?

Will do.  I am under the gun, but I can get to it sometime late next week. (unless I burn the
midnight oil).

Peter, do you think you could help us out with the pool implementations?

Re: Pool Rework (was: Re: [Avalon][C2] ComponentPool problems under Apache JMeter)

Posted by Berin Loritsch <bl...@apache.org>.
Sorry guys, I meant to send this to the Avalon list.

Berin Loritsch wrote:
> 
> I am going to write out my plan for upgrading the Avalon pool package.
> Concidering the fact that Avalon is a Server Framework, and most servers
> live in a multithreaded environment, we need some design rework on the
> pools.
> 
> The AbstractPool was not meant to be in a Multithreaded environment, so
> it is suitable for a single threaded application.  However its name should
> be changed to reflect that fact and not be confused with the base class
> that all pools should inherit from.  The design of the abstract pool is
> as optimized for a single threaded environment as you can get, but will
> _*never*_ be reliable in any shape or form in a multithreaded environment.
> Why?  Because it is subject to a type of race condition that simply
> synchronizing the get() and set() method won't solve:
> 
> It is built on an array of objects, and has an index pointing to the
> current pool member:
> 
> [0][1][2][3]
>        ^
> 
> When a get is called, the pointer is decremented
> 
> [0][1][2][3]
>     ^
> 
> Obviously, when there are no more objects, the pool has to grow.
> 
> When a put is called, the pointer is incremented
> 
> [0][1][2][3]
>        ^
> For the problem domain and the efficiency that was trying to be attained,
> this is a truly ingenious design.  My kudos to Peter for coming up with it.
> However, here is the type of race condition that it *cannot* prevent.
> 
> Two threads request a Poolable at the same time, but return them out of
> order, also a third thread requests a Poolable after on of the objects
> are returned:
> 
> Thread 1 get(object indexed 3):
> [0][1][2][3]
>        ^ <-
> 
> Thread 2 get(object indexed 2):
> [0][1][2][3]
>     ^ <-
> 
> Thread 1 put(object indexed 3):
> [0][1][2][3]
>     -> ^
> 
> Thread 3 get(object indexed 2):
> [0][1][2][3]
>     ^ <-
> 
> Thread 2 put(object indexed 2):
> [0][1][2][3]
>     -> ^
> 
> Thread 3 put(object indexed 2):
> [0][1][2][3]
>        -> ^
> 
> So what has happened?  Threads two and three are sharing an object!  Since
> the reason for a Pool is to pool high overhead objects that cannot be used
> by more than one thread simultaneously, we cannot use any Pool based on this
> algorithm in a multithreaded environment--as ingenious as it is.
> 
> The ThreadsafePool implementation extends this algorithm and makes the get()
> and set() methods synchronized.  This is not _truly_ a ThreadSafe object
> because it cannot protect itself against the afformentioned Race Condition.
> 
> So how do we protect ourselves in a multithreaded environment?  We have
> two methods: Use a Stack, or use two Lists.  Why? Because we need to have
> a distinction between *active* objects and *inactive* objects.  Under no
> circumstances should a pool return an object that is already being used.
> The Stack implementation is easy to implement, but does not keep track of
> the active objects.  The two list approach maintains a reference to the
> objects in question so that resource limiting is much easier to implement.
> NOTE: you can mix a Stack and a List.
> 
> It comes down to what is the most efficient object to use, but the gist
> of the problem domain that we need to solve is that we cannot have a
> Race Condition which has two threads sharing a reference to the same
> object.
> 
> The initial state is this:
> 
> Inactive List          Active List
> -------------          -----------
>      P1
>      P2
>      P3
>      P4
> 
> When multiple threads request an object, it _must_ be taken out of the
> Inactive roster and placed in the Active roster (using the scenario above):
> 
> Thread 1 and 2 request an object
> 
> Inactive List          Active List
> -------------          -----------
>      P3                     P1 (Thread 1)
>      P4                     P2 (Thread 2)
> 
> Thread 1 returns an object
> 
> Inactive List          Active List
> -------------          -----------
>      P1                     P2 (Thread 2)
>      P3
>      P4
> 
> Thread 3 requests an object
> 
> Inactive List          Active List
> -------------          -----------
>      P3                     P2 (Thread 2)
>      P4                     P1 (Thread 3)
> 
> Threads 2 and 3 return their objects
> 
> Inactive List          Active List
> -------------          -----------
>      P1
>      P2
>      P3
>      P4
> 
> The important thing to catch here is the sequence of events:
> 
> get()
> -----
> 
> Inactive    TemporalRef      Active
>    |           |               |
>    | Remove(0) |               |
>    |<----------+               |
>    |           |    Add(ref)   |
>    |           +-------------->|
>    |           |               |
> 
> put()
> -----
> 
> Inactive    TemporalRef      Active
>    |           |               |
>    |           |  Remove(ref)  |
>    |           +-------------->|
>    |  Add(ref) |               |
>    |<----------+               |
>    |           |               |
> 
> The TemporalRef is the actual object that is being transfered
> to and from the requesting thread.  On a get(), we remove the
> first item from the list (or stack.pop()), put it in our active
> list and return that reference.
> 
> Now for the all important reason for using a List in the Active
> references.  On a put(), we remove the reference of that Object
> from the list (Very important for it to be that particular object).
> and add the object to the Inactive list (or stack.push()).
> 
> The pure stack approach will also work, but it is blind as to
> whether the Pool actually originated the object or not.
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
> For additional commands, email: cocoon-dev-help@xml.apache.org

Pool Rework (was: Re: [Avalon][C2] ComponentPool problems under Apache JMeter)

Posted by Berin Loritsch <bl...@apache.org>.
I am going to write out my plan for upgrading the Avalon pool package.
Concidering the fact that Avalon is a Server Framework, and most servers
live in a multithreaded environment, we need some design rework on the
pools.

The AbstractPool was not meant to be in a Multithreaded environment, so
it is suitable for a single threaded application.  However its name should
be changed to reflect that fact and not be confused with the base class
that all pools should inherit from.  The design of the abstract pool is
as optimized for a single threaded environment as you can get, but will
_*never*_ be reliable in any shape or form in a multithreaded environment.
Why?  Because it is subject to a type of race condition that simply
synchronizing the get() and set() method won't solve:

It is built on an array of objects, and has an index pointing to the
current pool member:

[0][1][2][3]
       ^

When a get is called, the pointer is decremented

[0][1][2][3]
    ^

Obviously, when there are no more objects, the pool has to grow.

When a put is called, the pointer is incremented

[0][1][2][3]
       ^
For the problem domain and the efficiency that was trying to be attained,
this is a truly ingenious design.  My kudos to Peter for coming up with it.
However, here is the type of race condition that it *cannot* prevent.

Two threads request a Poolable at the same time, but return them out of
order, also a third thread requests a Poolable after on of the objects
are returned:

Thread 1 get(object indexed 3):
[0][1][2][3]
       ^ <-

Thread 2 get(object indexed 2):
[0][1][2][3]
    ^ <-

Thread 1 put(object indexed 3):
[0][1][2][3]
    -> ^

Thread 3 get(object indexed 2):
[0][1][2][3]
    ^ <-

Thread 2 put(object indexed 2):
[0][1][2][3]
    -> ^

Thread 3 put(object indexed 2):
[0][1][2][3]
       -> ^

So what has happened?  Threads two and three are sharing an object!  Since
the reason for a Pool is to pool high overhead objects that cannot be used
by more than one thread simultaneously, we cannot use any Pool based on this
algorithm in a multithreaded environment--as ingenious as it is.

The ThreadsafePool implementation extends this algorithm and makes the get()
and set() methods synchronized.  This is not _truly_ a ThreadSafe object
because it cannot protect itself against the afformentioned Race Condition.

So how do we protect ourselves in a multithreaded environment?  We have
two methods: Use a Stack, or use two Lists.  Why? Because we need to have
a distinction between *active* objects and *inactive* objects.  Under no
circumstances should a pool return an object that is already being used.
The Stack implementation is easy to implement, but does not keep track of
the active objects.  The two list approach maintains a reference to the
objects in question so that resource limiting is much easier to implement.
NOTE: you can mix a Stack and a List.

It comes down to what is the most efficient object to use, but the gist
of the problem domain that we need to solve is that we cannot have a
Race Condition which has two threads sharing a reference to the same
object.

The initial state is this:

Inactive List          Active List
-------------          -----------
     P1
     P2
     P3
     P4

When multiple threads request an object, it _must_ be taken out of the
Inactive roster and placed in the Active roster (using the scenario above):

Thread 1 and 2 request an object

Inactive List          Active List
-------------          -----------
     P3                     P1 (Thread 1)
     P4                     P2 (Thread 2)

Thread 1 returns an object

Inactive List          Active List
-------------          -----------
     P1                     P2 (Thread 2)
     P3
     P4

Thread 3 requests an object

Inactive List          Active List
-------------          -----------
     P3                     P2 (Thread 2)
     P4                     P1 (Thread 3)

Threads 2 and 3 return their objects

Inactive List          Active List
-------------          -----------
     P1
     P2
     P3
     P4

The important thing to catch here is the sequence of events:

get()
-----

Inactive    TemporalRef      Active
   |           |               |
   | Remove(0) |               |
   |<----------+               |
   |           |    Add(ref)   |
   |           +-------------->|
   |           |               |

put()
-----

Inactive    TemporalRef      Active
   |           |               |
   |           |  Remove(ref)  |
   |           +-------------->|
   |  Add(ref) |               |
   |<----------+               |
   |           |               |

The TemporalRef is the actual object that is being transfered
to and from the requesting thread.  On a get(), we remove the
first item from the list (or stack.pop()), put it in our active
list and return that reference.

Now for the all important reason for using a List in the Active
references.  On a put(), we remove the reference of that Object
from the list (Very important for it to be that particular object).
and add the object to the Inactive list (or stack.push()).

The pure stack approach will also work, but it is blind as to
whether the Pool actually originated the object or not.