You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@beehive.apache.org by Eddie O'Neil <ek...@gmail.com> on 2006/03/29 19:41:40 UTC

page flow destroy, controls, and deadlock

Rich/Daryl--

  Hey, I've run into a thread deadlock problem in the interaction
between Controls and Page Flow that happens in the following
circumstance:

thread1: acquire lock on FooPageFlow (during FlowController.execute)
thread2: acquire lock on
BeanContextServicesSupport.globalHierarchyLock (JDK class)
thread1: acquire lock on BarControlBean (ControlBean.ensureControl())
thread2: wait for lock on BarControlBean (BeanContextSupport.remove())
thread1: wait for lock on
BeanContextServicesSupport.globalHierarchyLock (JDK class)

So, the problem is that the same JPF is being both created and
destroyed by two different threads.  It appears that the "destroy"
phase of the JPF lifecycle is entirely unsynchronized and can freely
acquire Control locks without having serialized access to the Page
Flow itself.

  My thought is to add a synchronization point in
JavaControlUtils.uninitJavaControls that locks on the
PageFlowManagedObject as this would serialize access to the Page Flow.
 But, I seem to recall some threading issues with locking on a JPF
during the "destroy" part of the lifecycle and don't want to resurrect
those.

  Any suggestions about how best to make this fix?

Eddie

Re: page flow destroy, controls, and deadlock

Posted by Eddie O'Neil <ek...@gmail.com>.

Rich--

  Hey; thanks for the response.  Seems like we're thinking the same
thing here, which is good.  :)  Additional comments below...

Eddie

On 4/11/06, Rich Feit <ri...@gmail.com> wrote:
> expiring.  The one thing I'm still wondering is about the deadlock --
> lack of synchronization can't in itself cause a deadlock.  Is the
> deadlock with the control-bean/BC.gHL?  If so, it still seems like
> there's an ordering issue there.  In general, locks should always be
> acquired in the same order.

[eko] Agreed -- wasn't trying to imply that the lack of
synchronization caused the deadlock.   And, you're right, the deadlock
is with the ControlBean / BC.gHL.  :)  In general, when locks are
taken on the ControlBean / BC.gHL, it's done _after_ a lock on the
current page flow, but in some destroy cases, that synchronize-on-JPF
is missing.  I've simplified and abbreviated a bit here; for example:

create JPF
1) synchronize on JPF, CBC.beginContext(), JPF.onCreate(), CBC.endContext

execute JPF action:
1) synchronize on JPF, CBC.beginContext(), invoke JPF action, CBC.endContext

render page:
1) synchronize on JPF, CBC.beginContext(), render page, CBC.endContext

destroy (most cases):
1) synchronize on JPF, JPF.onDestroy(), CBC.endContext

And, in these cases, all JPF <> Controls interactions are done after
synchronizing on the current page flow.  However, there are cases
where, under load to the same page flow in the same session, the
DeferredSessionStorageHandler can cause the the "destroy" step to run
without synchronizing on the current Page Flow thus violating the
synchronization order used everywhere else.

I think the fix I described in my last mail will cause the locks to be
acquired in the same order and thus prevent the deadlock.  :)

>
> Of course, that doesn't change the fact that there's a fundamental
> problem with the destroy behavior in general.  Just seems like both
> should be addressed (I've never understood why BC.gHL was locked first
> during destroy()).  There might be a reason for doing the locking
> out-of-order... I just can't think of one.

[eko] Definitely!  I think both will be addressed with an atomic call
to DSSH.applyChanges that destroys a JPF explicitly.  Then, the order
will be the same as the working "destroy" step above.

I'm making a simple fix first just to get something that works in the
tree and will look at the longer term solution after that.  Will let
you know how it goes...  :)




>
> About the proposed fix, the only question I have is about this:
> > - make the apply changes and ensure failover stages of
> > DSSH.acceptChanges atomic by synchronizing on a class local.  This
> > would serialize changes on the HttpSession and prevent two threads
> > from stepping on each other's attributes.  I believe that this would
> > preserve the "last-one-wins" semantics described in DSSH's Javadoc.
> Is this a class local in DSSH?  If so, that would create a (short)
> synchronization bottleneck across *all* users, since the DSSH instance
> is used context-wide.  Am I understanding this correctly?
>
> Rich
>
> Eddie O'Neil wrote:
> > Rich--
> >
> >   I think I've got a better idea what's happening now.  Doesn't appear
> > that this is caused by the extra
> > HttpServletBindingListener.valueUnbound event happening when
> > DeferredSessionStorageHandler.acceptChanges (DSSH.aC) is invoked.
> >
> >   A thread dump I've got seems to confirm that the deadlock is caused
> > by an unprotected "destroy" lifecycle method run during
> > "ensureFailover".  While I don't have a concrete repro for this yet,
> > it appears that it's possible to have two threads running through
> > DSSH.aC at once.  When the ServletContainerAdapter is cluster aware
> > and implements failover as a set attribute call on the HttpSession,
> > this can happen (note, this doesn't appear to be a problem on
> > Tomcat!):
> >
> > Thread1, step 1: DSSH.applyChanges -- add JPF A to the session while
> > applying changed attributes
> >
> > Thread2, step 1: DSSH.applyChanges -- add JPF B to the session while
> > applying changed attributes
> >
> > Thread1, step 2: DSSH.applyChanges -- finds JPF B in the session while
> > ensuring failover of JPF A.  Because JPF B is unbound from the
> > HttpSession when JPF A is re-set into the attribute map, the
> > HttpServletBindingListener.valueUnbound event is called and JPF B is
> > destroyed.  This destroy does not synchronize on JPF B which causes
> > the locking problem.  The code that runs while ensuring failover also
> > isn't protected using the ThreadLocal that protects the "destroy"
> > method from running while applying changed attributes
> >
> > So, a proposed fix would do a few things:
> >
> > - make the apply changes and ensure failover stages of
> > DSSH.acceptChanges atomic by synchronizing on a class local.  This
> > would serialize changes on the HttpSession and prevent two threads
> > from stepping on each other's attributes.  I believe that this would
> > preserve the "last-one-wins" semantics described in DSSH's Javadoc.
> >
> > - add a check at the beginning of the method that will call "destroy"
> > on any PageFlowManagedObjects if these objects are still "alive".
> > "liveness" would be tracked by a local in the PFMO class.  This could
> > be replaced with a ServletRequest attribute, I suppose.
> >
> > - protect both the apply changes and ensure failover steps from
> > running "destroy" on any PageFlowManagedObjects using the existing
> > ThreadLocal to wrap the "ensureFailover" call.
> >
> > The latter fix is necessary to ensure that containers don't invoke
> > valueUnbound again during ensure failover.  It appears as though
> > Tomcat and WebLogic both implement the semantics of session attribute
> > binding differently.  Tomcat calls valueUnbound on every setAttribute
> > call while WebLogic only calls valueUnbound when the attribute value
> > is different.  The result is that depending on the
> > ServletContainerAdapter, it is possible to have 3 valueUnbound calls
> > per request on a single Page Flow and to destroy a Page Flow on every
> > request.  Again, this doesn't appear to be an issue on Tomcat...
> >
> >   Am curious about your thoughts on the scenario and proposed fix...
> >
> > In the long run, it might be worth considering changing things to call
> > the PageFlowManagedObject lifecycle methods directly rather than
> > implicitly via valueUnbound because of the differences in how
> > containers manage these objects.  An additional session attribute
> > could be used to listen for session timeout via a valueUnbound call
> > which would cause the rest of the Page Flow objects to be cleaned up.
> > Does this long-term approach sound reasonable?
> >
> >   Anyway, that's what I've found out.  It's similar, but slightly
> > different from the previous situation I described.  Sorry about that.
> > :)
> >
> >   Thoughts welcome...
> >
> > Eddie
> >
> >
> >
> >
> > On 4/7/06, Eddie O'Neil <ek...@gmail.com> wrote:
> >
> >> Rich--
> >>
> >>   Hey -- one other question that just struck me about this...
> >>
> >>   In the DeferredSessionStorageHandler, is it possible to have two
> >> threads running there at once such that for a single thread, the
> >> "current" Page Flow changes between changed attributes and failover
> >> attributes?  Seems like this is a possibility.  For example:
> >>
> >> Thread 1                                                   Thread 2
> >>                                                                 1:
> >> apply changed attribute for JPF B
> >> 1: apply changed attribute for JPF A
> >>                                                                 1:
> >> apply failover attribute for new JPF B
> >> 2: apply failover attribute via
> >> session.setAttribute(...) that overwrites
> >> B with A, causing B's onDestroy method
> >> to be invoked
> >>
> >>   In this case, it seems like B.onDestroy(...) would need to be
> >> invoked, so lifecycle methods *should* run when setting failover
> >> changes.
> >>
> >>   Just making sure I've got this right.  ;)
> >>
> >> Eddie
> >>
> >>
> >> On 4/7/06, Eddie O'Neil <ek...@gmail.com> wrote:
> >>
> >>> Rich--
> >>>
> >>>   That's close.  :)  My current speculation is that the locking issues
> >>> related to Controls (BH.gHL, etc) are happening because of
> >>> unsynchronized execution of the Page Flow's "destroy" lifecycle method
> >>> from the second valueUnbound event.  Since this is unsynchronized, it
> >>> allows two threads (one running in destroy and another running an
> >>> action) to interact with the locks inside of the ControlBeanContext
> >>> hierarchy.
> >>>
> >>>   Recapping the last example, this is a Page Flow "FooController" with
> >>> an @Control BarControlBean; the same instance is being used in both of
> >>> the threads below.
> >>>
> >>> Thread 1 (run an action a JPF)    Thread 2 (destroying a JPF)
> >>> ===============                  =============
> >>> 1: lock FooController
> >>> 2: lock BarControlBean
> >>>                                                 1: lock BC.gHL
> >>>                                                 2: wait for lock on
> >>> BarControlBean
> >>> 3: wait BC.gHL lock
> >>>
> >>> Thread 2 looks strange because it's destroynig the same JPF instance
> >>> as Thread 1.  I agree that this is odd yet seems possible with "quick
> >>> clicking".
> >>>
> >>>   That seems like the most likely explanation so far.  Does that make
> >>> more sense?
> >>>
> >>> Eddie
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> On 4/7/06, Rich Feit <ri...@gmail.com> wrote:
> >>>
> >>>> OK, thanks for the info.  It sounds to me like the deadlock is caused by
> >>>> ordering issues when locking the control bean and BC.gHL, and that the
> >>>> page flow synchronization is simply masking this root problem.  Is that
> >>>> what's going on, or am I misunderstanding?
> >>>>
> >>>> Rich
> >>>>
> >>>> Eddie O'Neil wrote:
> >>>>
> >>>>>   Yeah -- that's right the deadlock is caused by lack of
> >>>>> synchronization during "destroy" on the second valueUnbound event.  I
> >>>>> think (but haven't confirmed yet) that the same thing could happen
> >>>>> with different implementations of the
> >>>>> ServletContainerAdapter.ensureFailover method if it calls
> >>>>> "session.setAttribute(...)".  Just a hunch, though, and I need to
> >>>>> confirm this.
> >>>>>
> >>>>>   Back to the deadlock, absence of synchronization on the "destroy"
> >>>>> lifecycle call allows a second thread to enter into the
> >>>>> ControlBeanContext which can cause the thread to deadlock with a
> >>>>> second thread that's already in the ControlBeanContext.
> >>>>>
> >>>>>
> >>>>>
> >>>>>> I don't think that's an option here, since it voids the whole purpose of
> >>>>>> DeferredSessionStorageHandler
> >>>>>> (http://mail-archives.apache.org/mod_mbox/beehive-dev/200601.mbox/%3C43D7AC11.8060207@gmail.com%3E
> >>>>>> ) -- am I understanding you here?
> >>>>>>
> >>>>>>
> >>>>> Yeah, that makes sense -- just wanted to rule this out as an option.  :)
> >>>>>
> >>>>>
> >>>>>
> >>>>>> Otherwise, I wonder if there's a way to track this without adding state
> >>>>>> to the page flow instance.  Possibly in a request attribute?  Just a
> >>>>>> thought...
> >>>>>>
> >>>>>>
> >>>>> Sure -- this would certainly work.  It could take the form of a set of
> >>>>> attribute names that should be persisted / failed over, but that
> >>>>> should ignore lifecycle methods.  Will look at this and get back to
> >>>>> you.
> >>>>>
> >>>>> Thanks for the comments...
> >>>>>
> >>>>> Eddie
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>> Rich
> >>>>>>
> >>>>>> Eddie O'Neil wrote:
> >>>>>>
> >>>>>>
> >>>>>>>   Gotcha -- thanks for the info; glad to know I wasn't entirely out in
> >>>>>>> left field.  :)
> >>>>>>>
> >>>>>>>   Given this and putting the issue of the session expiration aside for
> >>>>>>> a second, I think that root cause of the problem is that it is
> >>>>>>> possible for the DeferredSessionStorageHandler to run the
> >>>>>>> "valueUnbound" method code on a PageFlowController twice in these two
> >>>>>>> circumstances:
> >>>>>>>
> >>>>>>>   - DeferredSessionStorageHandler::setAttribute / remove Attribute
> >>>>>>>   - DeferredSessionStorageHandler::applyChanges @ line 221
> >>>>>>>
> >>>>>>> The first runs in "real time" with the removal or replacement of a
> >>>>>>> Page Flow in the session and runs inside of the lock, but the second
> >>>>>>> is the deferred part that runs at the end of a request to persist
> >>>>>>> changed attributes in the session.  The "applyChanges" then makes
> >>>>>>> calls to set / remove the Page Flow attribute from the real
> >>>>>>> HttpSession, and this triggers the "valueUnbound" event and ultimately
> >>>>>>> an "onDestroy".  This latter step is unsynchronized and thus causes
> >>>>>>> the deadlock.
> >>>>>>>
> >>>>>>>   If the Page Flow is stored in the session and is an
> >>>>>>> HttpSessionBindingListener, any deferred calls are going to trigger
> >>>>>>> this event.  In order to keep consecutive JPF invocations in the same
> >>>>>>> order, the onDestroy of the previous JPF needs to run before onCreate
> >>>>>>> of the next JPF.  The cheap way to fix this would be to track the
> >>>>>>> state of "valueUnbound" as a private boolean inside of the
> >>>>>>> PageFlowController.  Then, when the second valueUnbound call is made,
> >>>>>>> the event will be a no-op on the JPF.
> >>>>>>>
> >>>>>>>   Another solution would be to tunnel down into the real HttpSession
> >>>>>>> and make JPF changes "live" against the session rather than deferring
> >>>>>>> them.
> >>>>>>>
> >>>>>>>   Have an opinion either way?
> >>>>>>>
> >>>>>>> Eddie
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> On 4/6/06, Rich Feit <ri...@gmail.com> wrote:
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>> Funny, I'd drafted a longish email earlier asking about the BeanContext
> >>>>>>>> lock, because it sounded like the *fundamental* issue (ordering problems
> >>>>>>>> in locking that and the control bean instance itself).  But it's not, or
> >>>>>>>> at least it's unrelated.  In a way, that's good.  :)
> >>>>>>>>
> >>>>>>>> Regarding this:
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>> Thread A is execution an action on JPF X
> >>>>>>>>> Thread B is destroying JPF X en route to running JPF Y
> >>>>>>>>>
> >>>>>>>>> >From the code in PageFlowController.persistInSession (transitively
> >>>>>>>>> down to StorageHandler.removeAttribute), it seems like a lock is held
> >>>>>>>>> on the "current" JPF that would prevent this.  Right?
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>> That's correct.  The only time onDestroy() should ever run without a
> >>>>>>>> lock on the page flow instance is when the session is expiring.  That's
> >>>>>>>> the situation where the session itself is already locked, so it's
> >>>>>>>> dangerous to subsequently lock the page flow instance (while another
> >>>>>>>> thread may be accessing the session through synchronized get() after
> >>>>>>>> already locking the page flow instance).  The user could also simulate
> >>>>>>>> this situation by explicitly removing the page flow session attribute,
> >>>>>>>> but of course that's not something we should try to support.
> >>>>>>>>
> >>>>>>>> Let me know if you see/find any holes in that, though...
> >>>>>>>>
> >>>>>>>> Rich
> >>>>>>>>
> >>>>>>>> Eddie O'Neil wrote:
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>> Rich--
> >>>>>>>>>
> >>>>>>>>>   Having looked into this more now, I've got a little more detail
> >>>>>>>>> about what's happening:
> >>>>>>>>>
> >>>>>>>>>   There are two issues intertwined here.  The first issue is related
> >>>>>>>>> to the ControlBeanContext implementation class and its usage of a
> >>>>>>>>> global, VM-wide lock.  At issue is the fact that all
> >>>>>>>>> ControlContainerContext implementations inherit from a
> >>>>>>>>> BeanContextServicesSupport base class.  This class (and another in
> >>>>>>>>> that hierarchy) use a the VM-wide object
> >>>>>>>>> BeanContext.globalHierarchyLock for synchronization in two situations:
> >>>>>>>>>
> >>>>>>>>>   - adding / removing a control to / from a context
> >>>>>>>>>   - adding / removing / getting a "service" from a context
> >>>>>>>>>
> >>>>>>>>> This acts as a choke point for all Control references and is what
> >>>>>>>>> shows up in the locking traces in a previous mail.  This is
> >>>>>>>>> problematic, but it's also orthogonal (I believe) to the issue with
> >>>>>>>>> the "destroy" lifecycle method on a JPF.  Ultimately, I think that the
> >>>>>>>>> JDK assumed that the classes in java.beans.beancontext.* would be used
> >>>>>>>>> to run a single hierarchy of controls on a VM.  Beehive's usage
> >>>>>>>>> pattern is a little different in that in enterprise applications, many
> >>>>>>>>> hierarchies of beans run on a VM (one per JPF, in NetUI's case).  As
> >>>>>>>>> such, a global lock doesn't fit well.
> >>>>>>>>>
> >>>>>>>>>   So, hopefully that provides some background on the
> >>>>>>>>> BeanContext.globalHierarchyLock.
> >>>>>>>>>
> >>>>>>>>>   That being said, I think that the deadlock is orthogonal to this
> >>>>>>>>> problem as the "destroy" lifecycle method on a JPF can be invoked
> >>>>>>>>> without waiting for a lock on the JPF instance itself.  The result is
> >>>>>>>>> that two threads can actually run through a JPF simultaneously.
> >>>>>>>>>
> >>>>>>>>>   With Daryl's changes in February, the ControlContainerContext object
> >>>>>>>>> for a Page Flow is a class-level field, and locking is performed on
> >>>>>>>>> this object at:
> >>>>>>>>>
> >>>>>>>>>   - create time
> >>>>>>>>>   - action execution time
> >>>>>>>>>   - page rendering time (JSP and Faces)
> >>>>>>>>>
> >>>>>>>>> Guess that the question is whether this situation is protected:
> >>>>>>>>>
> >>>>>>>>> Thread A is execution an action on JPF X
> >>>>>>>>> Thread B is destroying JPF X en route to running JPF Y
> >>>>>>>>>
> >>>>>>>>> >From the code in PageFlowController.persistInSession (transitively
> >>>>>>>>> down to StorageHandler.removeAttribute), it seems like a lock is held
> >>>>>>>>> on the "current" JPF that would prevent this.  Right?
> >>>>>>>>>
> >>>>>>>>>   If that's true, then I've got a little more info to go on in
> >>>>>>>>> investigating this problem.
> >>>>>>>>>
> >>>>>>>>>   Thanks for any info...  :)
> >>>>>>>>>
> >>>>>>>>> Eddie
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> On 3/31/06, Eddie O'Neil <ek...@gmail.com> wrote:
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>>   Good questions -- let's see if I can explain the threads more
> >>>>>>>>>> clearly in ASCII.  :)
> >>>>>>>>>>
> >>>>>>>>>>   What's happening is this (note, both threads operate on the *same*
> >>>>>>>>>> instance of a JPF):
> >>>>>>>>>>
> >>>>>>>>>> Thread 1 (executing a page flow)              Thread2 (destroying a page flow)
> >>>>>>>>>> acquire lock on FooPage Flow
> >>>>>>>>>>                                                              acquire BC.gHL
> >>>>>>>>>> acquire lock on BarControlBean
> >>>>>>>>>>                                                              wait for
> >>>>>>>>>> lock on BarControlBean
> >>>>>>>>>> wait for lock on BC.gHL
> >>>>>>>>>>
> >>>>>>>>>> Because the Controls infrastructure is based on the BeanContext
> >>>>>>>>>> (Services|Support|etc) code in the java.beans package of the JDK, it
> >>>>>>>>>> uses the BeanContextServicesSupport class as a base class for the
> >>>>>>>>>> ControlBeanContext object in a Control.  This JDK class (BCSS) uses
> >>>>>>>>>> the BeanContext.globalHierarchyLock (also in the JDK) to serialize
> >>>>>>>>>> access to shared data structures like the list of "service" objects in
> >>>>>>>>>> a BeanContext, etc.  This is a lock that is static throughout the JDK
> >>>>>>>>>> (!).
> >>>>>>>>>>
> >>>>>>>>>> To answer your questions:
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>> What grabs the lock on BeanContextServicesSupport.globalHierarchyLock?
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>> The base classes for the ControlBeanContext object which delegates up
> >>>>>>>>>> to it's "super" in order to serialize changes and service requests to
> >>>>>>>>>> the classes that implement event listener support, etc.
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>> What prevents deadlock in general between locking on
> >>>>>>>>>>> BeanContextServicesSupport.globalHierarchyLock and BarControlBean?
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>> Good question -- this seems like the crux of the issue.  In general,
> >>>>>>>>>> this is protected by a Control container serializing access to the
> >>>>>>>>>> container's ControlContainerContext object.  In this case, the problem
> >>>>>>>>>> occurs because two threads are trying to setup and destroy the same
> >>>>>>>>>> context object simultaneously.  If those setup and destroy calls were
> >>>>>>>>>> serialized against the ControlContainerContext object stored in the
> >>>>>>>>>> HttpSession, presumably, the semantics would be maintained.
> >>>>>>>>>>
> >>>>>>>>>> My guess is that this deadlock occurred with a double page submit or
> >>>>>>>>>> quick refresh that caught the tail end of the previous thread and the
> >>>>>>>>>> beginning of the next request such that the threads interleaved in
> >>>>>>>>>> this way.
> >>>>>>>>>>
> >>>>>>>>>> So, hopefully that helps explain what's happening.
> >>>>>>>>>>
> >>>>>>>>>> Ultimately, I'm not sure how to fix this yet and would appreciate
> >>>>>>>>>> thoughts on the topic.  Would it be safe to acquire a lock on the JPF
> >>>>>>>>>> instance during onDestroy?  That would serialize the access to the
> >>>>>>>>>> session scoped CCC object.
> >>>>>>>>>>
> >>>>>>>>>> Another way to go (and it's a lot of work) is to build a NetUI /
> >>>>>>>>>> web-tier specific control container that doesn't leverage the *Support
> >>>>>>>>>> classes in the JDK.  This would allow the container's implementation
> >>>>>>>>>> to be tailored to the single-threaded nature of the web tier and would
> >>>>>>>>>> remove a lot of the performance implications around locking and
> >>>>>>>>>> synchronized data structures.  But, I've not thought through this yet,
> >>>>>>>>>> either.  :)
> >>>>>>>>>>
> >>>>>>>>>> Eddie
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> On 3/30/06, Rich Feit <ri...@gmail.com> wrote:
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>> Looks like a tough one.  First of all, the problem with synchronizing in
> >>>>>>>>>>> onDestroy is also a deadlock issue: since onDestroy is called from an
> >>>>>>>>>>> HttpSessionBindingListener, the Servlet container may have a lock on the
> >>>>>>>>>>> session itself when onDestroy is called.  I don't think it's mandated by
> >>>>>>>>>>> the Servlet spec, but I don't think it's forbidden either.
> >>>>>>>>>>> Unfortunately, this means that a deadlock can occur if code in another
> >>>>>>>>>>> thread synchronizes on the page flow instance first (as in an action
> >>>>>>>>>>> method invocation), then calls a method on HttpSession that synchronizes
> >>>>>>>>>>> on the session object.  Again, I don't think this is mandated by the
> >>>>>>>>>>> spec, but it happens.
> >>>>>>>>>>>
> >>>>>>>>>>> Nothing strikes me off the bat, but I actually don't understand the
> >>>>>>>>>>> deadlock sequence below (I don't disagree -- just don't have enough info
> >>>>>>>>>>> to see it).  What grabs the lock on
> >>>>>>>>>>> BeanContextServicesSupport.globalHierarchyLock?  What prevents deadlock
> >>>>>>>>>>> in general between locking on
> >>>>>>>>>>> BeanContextServicesSupport.globalHierarchyLock and BarControlBean?
> >>>>>>>>>>>
> >>>>>>>>>>> Also, how is the same JPF being *created* by two separate threads?
> >>>>>>>>>>>
> >>>>>>>>>>> Rich
> >>>>>>>>>>>
> >>>>>>>>>>> Eddie O'Neil wrote:
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>> Rich/Daryl--
> >>>>>>>>>>>>
> >>>>>>>>>>>>   Hey, I've run into a thread deadlock problem in the interaction
> >>>>>>>>>>>> between Controls and Page Flow that happens in the following
> >>>>>>>>>>>> circumstance:
> >>>>>>>>>>>>
> >>>>>>>>>>>> thread1: acquire lock on FooPageFlow (during FlowController.execute)
> >>>>>>>>>>>> thread2: acquire lock on
> >>>>>>>>>>>> BeanContextServicesSupport.globalHierarchyLock (JDK class)
> >>>>>>>>>>>> thread1: acquire lock on BarControlBean (ControlBean.ensureControl())
> >>>>>>>>>>>> thread2: wait for lock on BarControlBean (BeanContextSupport.remove())
> >>>>>>>>>>>> thread1: wait for lock on
> >>>>>>>>>>>> BeanContextServicesSupport.globalHierarchyLock (JDK class)
> >>>>>>>>>>>>
> >>>>>>>>>>>> So, the problem is that the same JPF is being both created and
> >>>>>>>>>>>> destroyed by two different threads.  It appears that the "destroy"
> >>>>>>>>>>>> phase of the JPF lifecycle is entirely unsynchronized and can freely
> >>>>>>>>>>>> acquire Control locks without having serialized access to the Page
> >>>>>>>>>>>> Flow itself.
> >>>>>>>>>>>>
> >>>>>>>>>>>>   My thought is to add a synchronization point in
> >>>>>>>>>>>> JavaControlUtils.uninitJavaControls that locks on the
> >>>>>>>>>>>> PageFlowManagedObject as this would serialize access to the Page Flow.
> >>>>>>>>>>>>  But, I seem to recall some threading issues with locking on a JPF
> >>>>>>>>>>>> during the "destroy" part of the lifecycle and don't want to resurrect
> >>>>>>>>>>>> those.
> >>>>>>>>>>>>
> >>>>>>>>>>>>   Any suggestions about how best to make this fix?
> >>>>>>>>>>>>
> >>>>>>>>>>>> Eddie
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>
> >
> >
>
>

Re: page flow destroy, controls, and deadlock

Posted by Rich Feit <ri...@gmail.com>.

Eddie,

I agree with your assessment and also with the long-term plan of not 
depending on the listener to do the delete except when the session is 
expiring.  The one thing I'm still wondering is about the deadlock -- 
lack of synchronization can't in itself cause a deadlock.  Is the 
deadlock with the control-bean/BC.gHL?  If so, it still seems like 
there's an ordering issue there.  In general, locks should always be 
acquired in the same order.

Of course, that doesn't change the fact that there's a fundamental 
problem with the destroy behavior in general.  Just seems like both 
should be addressed (I've never understood why BC.gHL was locked first 
during destroy()).  There might be a reason for doing the locking 
out-of-order... I just can't think of one.

About the proposed fix, the only question I have is about this:
> - make the apply changes and ensure failover stages of
> DSSH.acceptChanges atomic by synchronizing on a class local.  This
> would serialize changes on the HttpSession and prevent two threads
> from stepping on each other's attributes.  I believe that this would
> preserve the "last-one-wins" semantics described in DSSH's Javadoc.
Is this a class local in DSSH?  If so, that would create a (short) 
synchronization bottleneck across *all* users, since the DSSH instance 
is used context-wide.  Am I understanding this correctly?

Rich

Eddie O'Neil wrote:
> Rich--
>
>   I think I've got a better idea what's happening now.  Doesn't appear
> that this is caused by the extra
> HttpServletBindingListener.valueUnbound event happening when
> DeferredSessionStorageHandler.acceptChanges (DSSH.aC) is invoked.
>
>   A thread dump I've got seems to confirm that the deadlock is caused
> by an unprotected "destroy" lifecycle method run during
> "ensureFailover".  While I don't have a concrete repro for this yet,
> it appears that it's possible to have two threads running through
> DSSH.aC at once.  When the ServletContainerAdapter is cluster aware
> and implements failover as a set attribute call on the HttpSession,
> this can happen (note, this doesn't appear to be a problem on
> Tomcat!):
>
> Thread1, step 1: DSSH.applyChanges -- add JPF A to the session while
> applying changed attributes
>
> Thread2, step 1: DSSH.applyChanges -- add JPF B to the session while
> applying changed attributes
>
> Thread1, step 2: DSSH.applyChanges -- finds JPF B in the session while
> ensuring failover of JPF A.  Because JPF B is unbound from the
> HttpSession when JPF A is re-set into the attribute map, the
> HttpServletBindingListener.valueUnbound event is called and JPF B is
> destroyed.  This destroy does not synchronize on JPF B which causes
> the locking problem.  The code that runs while ensuring failover also
> isn't protected using the ThreadLocal that protects the "destroy"
> method from running while applying changed attributes
>
> So, a proposed fix would do a few things:
>
> - make the apply changes and ensure failover stages of
> DSSH.acceptChanges atomic by synchronizing on a class local.  This
> would serialize changes on the HttpSession and prevent two threads
> from stepping on each other's attributes.  I believe that this would
> preserve the "last-one-wins" semantics described in DSSH's Javadoc.
>
> - add a check at the beginning of the method that will call "destroy"
> on any PageFlowManagedObjects if these objects are still "alive". 
> "liveness" would be tracked by a local in the PFMO class.  This could
> be replaced with a ServletRequest attribute, I suppose.
>
> - protect both the apply changes and ensure failover steps from
> running "destroy" on any PageFlowManagedObjects using the existing
> ThreadLocal to wrap the "ensureFailover" call.
>
> The latter fix is necessary to ensure that containers don't invoke
> valueUnbound again during ensure failover.  It appears as though
> Tomcat and WebLogic both implement the semantics of session attribute
> binding differently.  Tomcat calls valueUnbound on every setAttribute
> call while WebLogic only calls valueUnbound when the attribute value
> is different.  The result is that depending on the
> ServletContainerAdapter, it is possible to have 3 valueUnbound calls
> per request on a single Page Flow and to destroy a Page Flow on every
> request.  Again, this doesn't appear to be an issue on Tomcat...
>
>   Am curious about your thoughts on the scenario and proposed fix...
>
> In the long run, it might be worth considering changing things to call
> the PageFlowManagedObject lifecycle methods directly rather than
> implicitly via valueUnbound because of the differences in how
> containers manage these objects.  An additional session attribute
> could be used to listen for session timeout via a valueUnbound call
> which would cause the rest of the Page Flow objects to be cleaned up. 
> Does this long-term approach sound reasonable?
>
>   Anyway, that's what I've found out.  It's similar, but slightly
> different from the previous situation I described.  Sorry about that. 
> :)
>
>   Thoughts welcome...
>
> Eddie
>
>
>
>
> On 4/7/06, Eddie O'Neil <ek...@gmail.com> wrote:
>   
>> Rich--
>>
>>   Hey -- one other question that just struck me about this...
>>
>>   In the DeferredSessionStorageHandler, is it possible to have two
>> threads running there at once such that for a single thread, the
>> "current" Page Flow changes between changed attributes and failover
>> attributes?  Seems like this is a possibility.  For example:
>>
>> Thread 1                                                   Thread 2
>>                                                                 1:
>> apply changed attribute for JPF B
>> 1: apply changed attribute for JPF A
>>                                                                 1:
>> apply failover attribute for new JPF B
>> 2: apply failover attribute via
>> session.setAttribute(...) that overwrites
>> B with A, causing B's onDestroy method
>> to be invoked
>>
>>   In this case, it seems like B.onDestroy(...) would need to be
>> invoked, so lifecycle methods *should* run when setting failover
>> changes.
>>
>>   Just making sure I've got this right.  ;)
>>
>> Eddie
>>
>>
>> On 4/7/06, Eddie O'Neil <ek...@gmail.com> wrote:
>>     
>>> Rich--
>>>
>>>   That's close.  :)  My current speculation is that the locking issues
>>> related to Controls (BH.gHL, etc) are happening because of
>>> unsynchronized execution of the Page Flow's "destroy" lifecycle method
>>> from the second valueUnbound event.  Since this is unsynchronized, it
>>> allows two threads (one running in destroy and another running an
>>> action) to interact with the locks inside of the ControlBeanContext
>>> hierarchy.
>>>
>>>   Recapping the last example, this is a Page Flow "FooController" with
>>> an @Control BarControlBean; the same instance is being used in both of
>>> the threads below.
>>>
>>> Thread 1 (run an action a JPF)    Thread 2 (destroying a JPF)
>>> ===============                  =============
>>> 1: lock FooController
>>> 2: lock BarControlBean
>>>                                                 1: lock BC.gHL
>>>                                                 2: wait for lock on
>>> BarControlBean
>>> 3: wait BC.gHL lock
>>>
>>> Thread 2 looks strange because it's destroynig the same JPF instance
>>> as Thread 1.  I agree that this is odd yet seems possible with "quick
>>> clicking".
>>>
>>>   That seems like the most likely explanation so far.  Does that make
>>> more sense?
>>>
>>> Eddie
>>>
>>>
>>>
>>>
>>>
>>>
>>> On 4/7/06, Rich Feit <ri...@gmail.com> wrote:
>>>       
>>>> OK, thanks for the info.  It sounds to me like the deadlock is caused by
>>>> ordering issues when locking the control bean and BC.gHL, and that the
>>>> page flow synchronization is simply masking this root problem.  Is that
>>>> what's going on, or am I misunderstanding?
>>>>
>>>> Rich
>>>>
>>>> Eddie O'Neil wrote:
>>>>         
>>>>>   Yeah -- that's right the deadlock is caused by lack of
>>>>> synchronization during "destroy" on the second valueUnbound event.  I
>>>>> think (but haven't confirmed yet) that the same thing could happen
>>>>> with different implementations of the
>>>>> ServletContainerAdapter.ensureFailover method if it calls
>>>>> "session.setAttribute(...)".  Just a hunch, though, and I need to
>>>>> confirm this.
>>>>>
>>>>>   Back to the deadlock, absence of synchronization on the "destroy"
>>>>> lifecycle call allows a second thread to enter into the
>>>>> ControlBeanContext which can cause the thread to deadlock with a
>>>>> second thread that's already in the ControlBeanContext.
>>>>>
>>>>>
>>>>>           
>>>>>> I don't think that's an option here, since it voids the whole purpose of
>>>>>> DeferredSessionStorageHandler
>>>>>> (http://mail-archives.apache.org/mod_mbox/beehive-dev/200601.mbox/%3C43D7AC11.8060207@gmail.com%3E
>>>>>> ) -- am I understanding you here?
>>>>>>
>>>>>>             
>>>>> Yeah, that makes sense -- just wanted to rule this out as an option.  :)
>>>>>
>>>>>
>>>>>           
>>>>>> Otherwise, I wonder if there's a way to track this without adding state
>>>>>> to the page flow instance.  Possibly in a request attribute?  Just a
>>>>>> thought...
>>>>>>
>>>>>>             
>>>>> Sure -- this would certainly work.  It could take the form of a set of
>>>>> attribute names that should be persisted / failed over, but that
>>>>> should ignore lifecycle methods.  Will look at this and get back to
>>>>> you.
>>>>>
>>>>> Thanks for the comments...
>>>>>
>>>>> Eddie
>>>>>
>>>>>
>>>>>
>>>>>           
>>>>>> Rich
>>>>>>
>>>>>> Eddie O'Neil wrote:
>>>>>>
>>>>>>             
>>>>>>>   Gotcha -- thanks for the info; glad to know I wasn't entirely out in
>>>>>>> left field.  :)
>>>>>>>
>>>>>>>   Given this and putting the issue of the session expiration aside for
>>>>>>> a second, I think that root cause of the problem is that it is
>>>>>>> possible for the DeferredSessionStorageHandler to run the
>>>>>>> "valueUnbound" method code on a PageFlowController twice in these two
>>>>>>> circumstances:
>>>>>>>
>>>>>>>   - DeferredSessionStorageHandler::setAttribute / remove Attribute
>>>>>>>   - DeferredSessionStorageHandler::applyChanges @ line 221
>>>>>>>
>>>>>>> The first runs in "real time" with the removal or replacement of a
>>>>>>> Page Flow in the session and runs inside of the lock, but the second
>>>>>>> is the deferred part that runs at the end of a request to persist
>>>>>>> changed attributes in the session.  The "applyChanges" then makes
>>>>>>> calls to set / remove the Page Flow attribute from the real
>>>>>>> HttpSession, and this triggers the "valueUnbound" event and ultimately
>>>>>>> an "onDestroy".  This latter step is unsynchronized and thus causes
>>>>>>> the deadlock.
>>>>>>>
>>>>>>>   If the Page Flow is stored in the session and is an
>>>>>>> HttpSessionBindingListener, any deferred calls are going to trigger
>>>>>>> this event.  In order to keep consecutive JPF invocations in the same
>>>>>>> order, the onDestroy of the previous JPF needs to run before onCreate
>>>>>>> of the next JPF.  The cheap way to fix this would be to track the
>>>>>>> state of "valueUnbound" as a private boolean inside of the
>>>>>>> PageFlowController.  Then, when the second valueUnbound call is made,
>>>>>>> the event will be a no-op on the JPF.
>>>>>>>
>>>>>>>   Another solution would be to tunnel down into the real HttpSession
>>>>>>> and make JPF changes "live" against the session rather than deferring
>>>>>>> them.
>>>>>>>
>>>>>>>   Have an opinion either way?
>>>>>>>
>>>>>>> Eddie
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 4/6/06, Rich Feit <ri...@gmail.com> wrote:
>>>>>>>
>>>>>>>
>>>>>>>               
>>>>>>>> Funny, I'd drafted a longish email earlier asking about the BeanContext
>>>>>>>> lock, because it sounded like the *fundamental* issue (ordering problems
>>>>>>>> in locking that and the control bean instance itself).  But it's not, or
>>>>>>>> at least it's unrelated.  In a way, that's good.  :)
>>>>>>>>
>>>>>>>> Regarding this:
>>>>>>>>
>>>>>>>>
>>>>>>>>                 
>>>>>>>>> Thread A is execution an action on JPF X
>>>>>>>>> Thread B is destroying JPF X en route to running JPF Y
>>>>>>>>>
>>>>>>>>> >From the code in PageFlowController.persistInSession (transitively
>>>>>>>>> down to StorageHandler.removeAttribute), it seems like a lock is held
>>>>>>>>> on the "current" JPF that would prevent this.  Right?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>                   
>>>>>>>> That's correct.  The only time onDestroy() should ever run without a
>>>>>>>> lock on the page flow instance is when the session is expiring.  That's
>>>>>>>> the situation where the session itself is already locked, so it's
>>>>>>>> dangerous to subsequently lock the page flow instance (while another
>>>>>>>> thread may be accessing the session through synchronized get() after
>>>>>>>> already locking the page flow instance).  The user could also simulate
>>>>>>>> this situation by explicitly removing the page flow session attribute,
>>>>>>>> but of course that's not something we should try to support.
>>>>>>>>
>>>>>>>> Let me know if you see/find any holes in that, though...
>>>>>>>>
>>>>>>>> Rich
>>>>>>>>
>>>>>>>> Eddie O'Neil wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>                 
>>>>>>>>> Rich--
>>>>>>>>>
>>>>>>>>>   Having looked into this more now, I've got a little more detail
>>>>>>>>> about what's happening:
>>>>>>>>>
>>>>>>>>>   There are two issues intertwined here.  The first issue is related
>>>>>>>>> to the ControlBeanContext implementation class and its usage of a
>>>>>>>>> global, VM-wide lock.  At issue is the fact that all
>>>>>>>>> ControlContainerContext implementations inherit from a
>>>>>>>>> BeanContextServicesSupport base class.  This class (and another in
>>>>>>>>> that hierarchy) use a the VM-wide object
>>>>>>>>> BeanContext.globalHierarchyLock for synchronization in two situations:
>>>>>>>>>
>>>>>>>>>   - adding / removing a control to / from a context
>>>>>>>>>   - adding / removing / getting a "service" from a context
>>>>>>>>>
>>>>>>>>> This acts as a choke point for all Control references and is what
>>>>>>>>> shows up in the locking traces in a previous mail.  This is
>>>>>>>>> problematic, but it's also orthogonal (I believe) to the issue with
>>>>>>>>> the "destroy" lifecycle method on a JPF.  Ultimately, I think that the
>>>>>>>>> JDK assumed that the classes in java.beans.beancontext.* would be used
>>>>>>>>> to run a single hierarchy of controls on a VM.  Beehive's usage
>>>>>>>>> pattern is a little different in that in enterprise applications, many
>>>>>>>>> hierarchies of beans run on a VM (one per JPF, in NetUI's case).  As
>>>>>>>>> such, a global lock doesn't fit well.
>>>>>>>>>
>>>>>>>>>   So, hopefully that provides some background on the
>>>>>>>>> BeanContext.globalHierarchyLock.
>>>>>>>>>
>>>>>>>>>   That being said, I think that the deadlock is orthogonal to this
>>>>>>>>> problem as the "destroy" lifecycle method on a JPF can be invoked
>>>>>>>>> without waiting for a lock on the JPF instance itself.  The result is
>>>>>>>>> that two threads can actually run through a JPF simultaneously.
>>>>>>>>>
>>>>>>>>>   With Daryl's changes in February, the ControlContainerContext object
>>>>>>>>> for a Page Flow is a class-level field, and locking is performed on
>>>>>>>>> this object at:
>>>>>>>>>
>>>>>>>>>   - create time
>>>>>>>>>   - action execution time
>>>>>>>>>   - page rendering time (JSP and Faces)
>>>>>>>>>
>>>>>>>>> Guess that the question is whether this situation is protected:
>>>>>>>>>
>>>>>>>>> Thread A is execution an action on JPF X
>>>>>>>>> Thread B is destroying JPF X en route to running JPF Y
>>>>>>>>>
>>>>>>>>> >From the code in PageFlowController.persistInSession (transitively
>>>>>>>>> down to StorageHandler.removeAttribute), it seems like a lock is held
>>>>>>>>> on the "current" JPF that would prevent this.  Right?
>>>>>>>>>
>>>>>>>>>   If that's true, then I've got a little more info to go on in
>>>>>>>>> investigating this problem.
>>>>>>>>>
>>>>>>>>>   Thanks for any info...  :)
>>>>>>>>>
>>>>>>>>> Eddie
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 3/31/06, Eddie O'Neil <ek...@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>                   
>>>>>>>>>>   Good questions -- let's see if I can explain the threads more
>>>>>>>>>> clearly in ASCII.  :)
>>>>>>>>>>
>>>>>>>>>>   What's happening is this (note, both threads operate on the *same*
>>>>>>>>>> instance of a JPF):
>>>>>>>>>>
>>>>>>>>>> Thread 1 (executing a page flow)              Thread2 (destroying a page flow)
>>>>>>>>>> acquire lock on FooPage Flow
>>>>>>>>>>                                                              acquire BC.gHL
>>>>>>>>>> acquire lock on BarControlBean
>>>>>>>>>>                                                              wait for
>>>>>>>>>> lock on BarControlBean
>>>>>>>>>> wait for lock on BC.gHL
>>>>>>>>>>
>>>>>>>>>> Because the Controls infrastructure is based on the BeanContext
>>>>>>>>>> (Services|Support|etc) code in the java.beans package of the JDK, it
>>>>>>>>>> uses the BeanContextServicesSupport class as a base class for the
>>>>>>>>>> ControlBeanContext object in a Control.  This JDK class (BCSS) uses
>>>>>>>>>> the BeanContext.globalHierarchyLock (also in the JDK) to serialize
>>>>>>>>>> access to shared data structures like the list of "service" objects in
>>>>>>>>>> a BeanContext, etc.  This is a lock that is static throughout the JDK
>>>>>>>>>> (!).
>>>>>>>>>>
>>>>>>>>>> To answer your questions:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>                     
>>>>>>>>>>> What grabs the lock on BeanContextServicesSupport.globalHierarchyLock?
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>                       
>>>>>>>>>> The base classes for the ControlBeanContext object which delegates up
>>>>>>>>>> to it's "super" in order to serialize changes and service requests to
>>>>>>>>>> the classes that implement event listener support, etc.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>                     
>>>>>>>>>>> What prevents deadlock in general between locking on
>>>>>>>>>>> BeanContextServicesSupport.globalHierarchyLock and BarControlBean?
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>                       
>>>>>>>>>> Good question -- this seems like the crux of the issue.  In general,
>>>>>>>>>> this is protected by a Control container serializing access to the
>>>>>>>>>> container's ControlContainerContext object.  In this case, the problem
>>>>>>>>>> occurs because two threads are trying to setup and destroy the same
>>>>>>>>>> context object simultaneously.  If those setup and destroy calls were
>>>>>>>>>> serialized against the ControlContainerContext object stored in the
>>>>>>>>>> HttpSession, presumably, the semantics would be maintained.
>>>>>>>>>>
>>>>>>>>>> My guess is that this deadlock occurred with a double page submit or
>>>>>>>>>> quick refresh that caught the tail end of the previous thread and the
>>>>>>>>>> beginning of the next request such that the threads interleaved in
>>>>>>>>>> this way.
>>>>>>>>>>
>>>>>>>>>> So, hopefully that helps explain what's happening.
>>>>>>>>>>
>>>>>>>>>> Ultimately, I'm not sure how to fix this yet and would appreciate
>>>>>>>>>> thoughts on the topic.  Would it be safe to acquire a lock on the JPF
>>>>>>>>>> instance during onDestroy?  That would serialize the access to the
>>>>>>>>>> session scoped CCC object.
>>>>>>>>>>
>>>>>>>>>> Another way to go (and it's a lot of work) is to build a NetUI /
>>>>>>>>>> web-tier specific control container that doesn't leverage the *Support
>>>>>>>>>> classes in the JDK.  This would allow the container's implementation
>>>>>>>>>> to be tailored to the single-threaded nature of the web tier and would
>>>>>>>>>> remove a lot of the performance implications around locking and
>>>>>>>>>> synchronized data structures.  But, I've not thought through this yet,
>>>>>>>>>> either.  :)
>>>>>>>>>>
>>>>>>>>>> Eddie
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 3/30/06, Rich Feit <ri...@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>                     
>>>>>>>>>>> Looks like a tough one.  First of all, the problem with synchronizing in
>>>>>>>>>>> onDestroy is also a deadlock issue: since onDestroy is called from an
>>>>>>>>>>> HttpSessionBindingListener, the Servlet container may have a lock on the
>>>>>>>>>>> session itself when onDestroy is called.  I don't think it's mandated by
>>>>>>>>>>> the Servlet spec, but I don't think it's forbidden either.
>>>>>>>>>>> Unfortunately, this means that a deadlock can occur if code in another
>>>>>>>>>>> thread synchronizes on the page flow instance first (as in an action
>>>>>>>>>>> method invocation), then calls a method on HttpSession that synchronizes
>>>>>>>>>>> on the session object.  Again, I don't think this is mandated by the
>>>>>>>>>>> spec, but it happens.
>>>>>>>>>>>
>>>>>>>>>>> Nothing strikes me off the bat, but I actually don't understand the
>>>>>>>>>>> deadlock sequence below (I don't disagree -- just don't have enough info
>>>>>>>>>>> to see it).  What grabs the lock on
>>>>>>>>>>> BeanContextServicesSupport.globalHierarchyLock?  What prevents deadlock
>>>>>>>>>>> in general between locking on
>>>>>>>>>>> BeanContextServicesSupport.globalHierarchyLock and BarControlBean?
>>>>>>>>>>>
>>>>>>>>>>> Also, how is the same JPF being *created* by two separate threads?
>>>>>>>>>>>
>>>>>>>>>>> Rich
>>>>>>>>>>>
>>>>>>>>>>> Eddie O'Neil wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>                       
>>>>>>>>>>>> Rich/Daryl--
>>>>>>>>>>>>
>>>>>>>>>>>>   Hey, I've run into a thread deadlock problem in the interaction
>>>>>>>>>>>> between Controls and Page Flow that happens in the following
>>>>>>>>>>>> circumstance:
>>>>>>>>>>>>
>>>>>>>>>>>> thread1: acquire lock on FooPageFlow (during FlowController.execute)
>>>>>>>>>>>> thread2: acquire lock on
>>>>>>>>>>>> BeanContextServicesSupport.globalHierarchyLock (JDK class)
>>>>>>>>>>>> thread1: acquire lock on BarControlBean (ControlBean.ensureControl())
>>>>>>>>>>>> thread2: wait for lock on BarControlBean (BeanContextSupport.remove())
>>>>>>>>>>>> thread1: wait for lock on
>>>>>>>>>>>> BeanContextServicesSupport.globalHierarchyLock (JDK class)
>>>>>>>>>>>>
>>>>>>>>>>>> So, the problem is that the same JPF is being both created and
>>>>>>>>>>>> destroyed by two different threads.  It appears that the "destroy"
>>>>>>>>>>>> phase of the JPF lifecycle is entirely unsynchronized and can freely
>>>>>>>>>>>> acquire Control locks without having serialized access to the Page
>>>>>>>>>>>> Flow itself.
>>>>>>>>>>>>
>>>>>>>>>>>>   My thought is to add a synchronization point in
>>>>>>>>>>>> JavaControlUtils.uninitJavaControls that locks on the
>>>>>>>>>>>> PageFlowManagedObject as this would serialize access to the Page Flow.
>>>>>>>>>>>>  But, I seem to recall some threading issues with locking on a JPF
>>>>>>>>>>>> during the "destroy" part of the lifecycle and don't want to resurrect
>>>>>>>>>>>> those.
>>>>>>>>>>>>
>>>>>>>>>>>>   Any suggestions about how best to make this fix?
>>>>>>>>>>>>
>>>>>>>>>>>> Eddie
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>                         
>>>>>           
>
>

Re: page flow destroy, controls, and deadlock

Posted by Eddie O'Neil <ek...@gmail.com>.

Rich--

  I think I've got a better idea what's happening now.  Doesn't appear
that this is caused by the extra
HttpServletBindingListener.valueUnbound event happening when
DeferredSessionStorageHandler.acceptChanges (DSSH.aC) is invoked.

  A thread dump I've got seems to confirm that the deadlock is caused
by an unprotected "destroy" lifecycle method run during
"ensureFailover".  While I don't have a concrete repro for this yet,
it appears that it's possible to have two threads running through
DSSH.aC at once.  When the ServletContainerAdapter is cluster aware
and implements failover as a set attribute call on the HttpSession,
this can happen (note, this doesn't appear to be a problem on
Tomcat!):

Thread1, step 1: DSSH.applyChanges -- add JPF A to the session while
applying changed attributes

Thread2, step 1: DSSH.applyChanges -- add JPF B to the session while
applying changed attributes

Thread1, step 2: DSSH.applyChanges -- finds JPF B in the session while
ensuring failover of JPF A.  Because JPF B is unbound from the
HttpSession when JPF A is re-set into the attribute map, the
HttpServletBindingListener.valueUnbound event is called and JPF B is
destroyed.  This destroy does not synchronize on JPF B which causes
the locking problem.  The code that runs while ensuring failover also
isn't protected using the ThreadLocal that protects the "destroy"
method from running while applying changed attributes

So, a proposed fix would do a few things:

- make the apply changes and ensure failover stages of
DSSH.acceptChanges atomic by synchronizing on a class local.  This
would serialize changes on the HttpSession and prevent two threads
from stepping on each other's attributes.  I believe that this would
preserve the "last-one-wins" semantics described in DSSH's Javadoc.

- add a check at the beginning of the method that will call "destroy"
on any PageFlowManagedObjects if these objects are still "alive". 
"liveness" would be tracked by a local in the PFMO class.  This could
be replaced with a ServletRequest attribute, I suppose.

- protect both the apply changes and ensure failover steps from
running "destroy" on any PageFlowManagedObjects using the existing
ThreadLocal to wrap the "ensureFailover" call.

The latter fix is necessary to ensure that containers don't invoke
valueUnbound again during ensure failover.  It appears as though
Tomcat and WebLogic both implement the semantics of session attribute
binding differently.  Tomcat calls valueUnbound on every setAttribute
call while WebLogic only calls valueUnbound when the attribute value
is different.  The result is that depending on the
ServletContainerAdapter, it is possible to have 3 valueUnbound calls
per request on a single Page Flow and to destroy a Page Flow on every
request.  Again, this doesn't appear to be an issue on Tomcat...

  Am curious about your thoughts on the scenario and proposed fix...

In the long run, it might be worth considering changing things to call
the PageFlowManagedObject lifecycle methods directly rather than
implicitly via valueUnbound because of the differences in how
containers manage these objects.  An additional session attribute
could be used to listen for session timeout via a valueUnbound call
which would cause the rest of the Page Flow objects to be cleaned up. 
Does this long-term approach sound reasonable?

  Anyway, that's what I've found out.  It's similar, but slightly
different from the previous situation I described.  Sorry about that. 
:)

  Thoughts welcome...

Eddie




On 4/7/06, Eddie O'Neil <ek...@gmail.com> wrote:
> Rich--
>
>   Hey -- one other question that just struck me about this...
>
>   In the DeferredSessionStorageHandler, is it possible to have two
> threads running there at once such that for a single thread, the
> "current" Page Flow changes between changed attributes and failover
> attributes?  Seems like this is a possibility.  For example:
>
> Thread 1                                                   Thread 2
>                                                                 1:
> apply changed attribute for JPF B
> 1: apply changed attribute for JPF A
>                                                                 1:
> apply failover attribute for new JPF B
> 2: apply failover attribute via
> session.setAttribute(...) that overwrites
> B with A, causing B's onDestroy method
> to be invoked
>
>   In this case, it seems like B.onDestroy(...) would need to be
> invoked, so lifecycle methods *should* run when setting failover
> changes.
>
>   Just making sure I've got this right.  ;)
>
> Eddie
>
>
> On 4/7/06, Eddie O'Neil <ek...@gmail.com> wrote:
> > Rich--
> >
> >   That's close.  :)  My current speculation is that the locking issues
> > related to Controls (BH.gHL, etc) are happening because of
> > unsynchronized execution of the Page Flow's "destroy" lifecycle method
> > from the second valueUnbound event.  Since this is unsynchronized, it
> > allows two threads (one running in destroy and another running an
> > action) to interact with the locks inside of the ControlBeanContext
> > hierarchy.
> >
> >   Recapping the last example, this is a Page Flow "FooController" with
> > an @Control BarControlBean; the same instance is being used in both of
> > the threads below.
> >
> > Thread 1 (run an action a JPF)    Thread 2 (destroying a JPF)
> > ===============                  =============
> > 1: lock FooController
> > 2: lock BarControlBean
> >                                                 1: lock BC.gHL
> >                                                 2: wait for lock on
> > BarControlBean
> > 3: wait BC.gHL lock
> >
> > Thread 2 looks strange because it's destroynig the same JPF instance
> > as Thread 1.  I agree that this is odd yet seems possible with "quick
> > clicking".
> >
> >   That seems like the most likely explanation so far.  Does that make
> > more sense?
> >
> > Eddie
> >
> >
> >
> >
> >
> >
> > On 4/7/06, Rich Feit <ri...@gmail.com> wrote:
> > > OK, thanks for the info.  It sounds to me like the deadlock is caused by
> > > ordering issues when locking the control bean and BC.gHL, and that the
> > > page flow synchronization is simply masking this root problem.  Is that
> > > what's going on, or am I misunderstanding?
> > >
> > > Rich
> > >
> > > Eddie O'Neil wrote:
> > > >   Yeah -- that's right the deadlock is caused by lack of
> > > > synchronization during "destroy" on the second valueUnbound event.  I
> > > > think (but haven't confirmed yet) that the same thing could happen
> > > > with different implementations of the
> > > > ServletContainerAdapter.ensureFailover method if it calls
> > > > "session.setAttribute(...)".  Just a hunch, though, and I need to
> > > > confirm this.
> > > >
> > > >   Back to the deadlock, absence of synchronization on the "destroy"
> > > > lifecycle call allows a second thread to enter into the
> > > > ControlBeanContext which can cause the thread to deadlock with a
> > > > second thread that's already in the ControlBeanContext.
> > > >
> > > >
> > > >> I don't think that's an option here, since it voids the whole purpose of
> > > >> DeferredSessionStorageHandler
> > > >> (http://mail-archives.apache.org/mod_mbox/beehive-dev/200601.mbox/%3C43D7AC11.8060207@gmail.com%3E
> > > >> ) -- am I understanding you here?
> > > >>
> > > >
> > > > Yeah, that makes sense -- just wanted to rule this out as an option.  :)
> > > >
> > > >
> > > >> Otherwise, I wonder if there's a way to track this without adding state
> > > >> to the page flow instance.  Possibly in a request attribute?  Just a
> > > >> thought...
> > > >>
> > > >
> > > > Sure -- this would certainly work.  It could take the form of a set of
> > > > attribute names that should be persisted / failed over, but that
> > > > should ignore lifecycle methods.  Will look at this and get back to
> > > > you.
> > > >
> > > > Thanks for the comments...
> > > >
> > > > Eddie
> > > >
> > > >
> > > >
> > > >> Rich
> > > >>
> > > >> Eddie O'Neil wrote:
> > > >>
> > > >>>   Gotcha -- thanks for the info; glad to know I wasn't entirely out in
> > > >>> left field.  :)
> > > >>>
> > > >>>   Given this and putting the issue of the session expiration aside for
> > > >>> a second, I think that root cause of the problem is that it is
> > > >>> possible for the DeferredSessionStorageHandler to run the
> > > >>> "valueUnbound" method code on a PageFlowController twice in these two
> > > >>> circumstances:
> > > >>>
> > > >>>   - DeferredSessionStorageHandler::setAttribute / remove Attribute
> > > >>>   - DeferredSessionStorageHandler::applyChanges @ line 221
> > > >>>
> > > >>> The first runs in "real time" with the removal or replacement of a
> > > >>> Page Flow in the session and runs inside of the lock, but the second
> > > >>> is the deferred part that runs at the end of a request to persist
> > > >>> changed attributes in the session.  The "applyChanges" then makes
> > > >>> calls to set / remove the Page Flow attribute from the real
> > > >>> HttpSession, and this triggers the "valueUnbound" event and ultimately
> > > >>> an "onDestroy".  This latter step is unsynchronized and thus causes
> > > >>> the deadlock.
> > > >>>
> > > >>>   If the Page Flow is stored in the session and is an
> > > >>> HttpSessionBindingListener, any deferred calls are going to trigger
> > > >>> this event.  In order to keep consecutive JPF invocations in the same
> > > >>> order, the onDestroy of the previous JPF needs to run before onCreate
> > > >>> of the next JPF.  The cheap way to fix this would be to track the
> > > >>> state of "valueUnbound" as a private boolean inside of the
> > > >>> PageFlowController.  Then, when the second valueUnbound call is made,
> > > >>> the event will be a no-op on the JPF.
> > > >>>
> > > >>>   Another solution would be to tunnel down into the real HttpSession
> > > >>> and make JPF changes "live" against the session rather than deferring
> > > >>> them.
> > > >>>
> > > >>>   Have an opinion either way?
> > > >>>
> > > >>> Eddie
> > > >>>
> > > >>>
> > > >>>
> > > >>> On 4/6/06, Rich Feit <ri...@gmail.com> wrote:
> > > >>>
> > > >>>
> > > >>>> Funny, I'd drafted a longish email earlier asking about the BeanContext
> > > >>>> lock, because it sounded like the *fundamental* issue (ordering problems
> > > >>>> in locking that and the control bean instance itself).  But it's not, or
> > > >>>> at least it's unrelated.  In a way, that's good.  :)
> > > >>>>
> > > >>>> Regarding this:
> > > >>>>
> > > >>>>
> > > >>>>> Thread A is execution an action on JPF X
> > > >>>>> Thread B is destroying JPF X en route to running JPF Y
> > > >>>>>
> > > >>>>> From the code in PageFlowController.persistInSession (transitively
> > > >>>>> down to StorageHandler.removeAttribute), it seems like a lock is held
> > > >>>>> on the "current" JPF that would prevent this.  Right?
> > > >>>>>
> > > >>>>>
> > > >>>> That's correct.  The only time onDestroy() should ever run without a
> > > >>>> lock on the page flow instance is when the session is expiring.  That's
> > > >>>> the situation where the session itself is already locked, so it's
> > > >>>> dangerous to subsequently lock the page flow instance (while another
> > > >>>> thread may be accessing the session through synchronized get() after
> > > >>>> already locking the page flow instance).  The user could also simulate
> > > >>>> this situation by explicitly removing the page flow session attribute,
> > > >>>> but of course that's not something we should try to support.
> > > >>>>
> > > >>>> Let me know if you see/find any holes in that, though...
> > > >>>>
> > > >>>> Rich
> > > >>>>
> > > >>>> Eddie O'Neil wrote:
> > > >>>>
> > > >>>>
> > > >>>>> Rich--
> > > >>>>>
> > > >>>>>   Having looked into this more now, I've got a little more detail
> > > >>>>> about what's happening:
> > > >>>>>
> > > >>>>>   There are two issues intertwined here.  The first issue is related
> > > >>>>> to the ControlBeanContext implementation class and its usage of a
> > > >>>>> global, VM-wide lock.  At issue is the fact that all
> > > >>>>> ControlContainerContext implementations inherit from a
> > > >>>>> BeanContextServicesSupport base class.  This class (and another in
> > > >>>>> that hierarchy) use a the VM-wide object
> > > >>>>> BeanContext.globalHierarchyLock for synchronization in two situations:
> > > >>>>>
> > > >>>>>   - adding / removing a control to / from a context
> > > >>>>>   - adding / removing / getting a "service" from a context
> > > >>>>>
> > > >>>>> This acts as a choke point for all Control references and is what
> > > >>>>> shows up in the locking traces in a previous mail.  This is
> > > >>>>> problematic, but it's also orthogonal (I believe) to the issue with
> > > >>>>> the "destroy" lifecycle method on a JPF.  Ultimately, I think that the
> > > >>>>> JDK assumed that the classes in java.beans.beancontext.* would be used
> > > >>>>> to run a single hierarchy of controls on a VM.  Beehive's usage
> > > >>>>> pattern is a little different in that in enterprise applications, many
> > > >>>>> hierarchies of beans run on a VM (one per JPF, in NetUI's case).  As
> > > >>>>> such, a global lock doesn't fit well.
> > > >>>>>
> > > >>>>>   So, hopefully that provides some background on the
> > > >>>>> BeanContext.globalHierarchyLock.
> > > >>>>>
> > > >>>>>   That being said, I think that the deadlock is orthogonal to this
> > > >>>>> problem as the "destroy" lifecycle method on a JPF can be invoked
> > > >>>>> without waiting for a lock on the JPF instance itself.  The result is
> > > >>>>> that two threads can actually run through a JPF simultaneously.
> > > >>>>>
> > > >>>>>   With Daryl's changes in February, the ControlContainerContext object
> > > >>>>> for a Page Flow is a class-level field, and locking is performed on
> > > >>>>> this object at:
> > > >>>>>
> > > >>>>>   - create time
> > > >>>>>   - action execution time
> > > >>>>>   - page rendering time (JSP and Faces)
> > > >>>>>
> > > >>>>> Guess that the question is whether this situation is protected:
> > > >>>>>
> > > >>>>> Thread A is execution an action on JPF X
> > > >>>>> Thread B is destroying JPF X en route to running JPF Y
> > > >>>>>
> > > >>>>> >From the code in PageFlowController.persistInSession (transitively
> > > >>>>> down to StorageHandler.removeAttribute), it seems like a lock is held
> > > >>>>> on the "current" JPF that would prevent this.  Right?
> > > >>>>>
> > > >>>>>   If that's true, then I've got a little more info to go on in
> > > >>>>> investigating this problem.
> > > >>>>>
> > > >>>>>   Thanks for any info...  :)
> > > >>>>>
> > > >>>>> Eddie
> > > >>>>>
> > > >>>>>
> > > >>>>>
> > > >>>>>
> > > >>>>> On 3/31/06, Eddie O'Neil <ek...@gmail.com> wrote:
> > > >>>>>
> > > >>>>>
> > > >>>>>
> > > >>>>>>   Good questions -- let's see if I can explain the threads more
> > > >>>>>> clearly in ASCII.  :)
> > > >>>>>>
> > > >>>>>>   What's happening is this (note, both threads operate on the *same*
> > > >>>>>> instance of a JPF):
> > > >>>>>>
> > > >>>>>> Thread 1 (executing a page flow)              Thread2 (destroying a page flow)
> > > >>>>>> acquire lock on FooPage Flow
> > > >>>>>>                                                              acquire BC.gHL
> > > >>>>>> acquire lock on BarControlBean
> > > >>>>>>                                                              wait for
> > > >>>>>> lock on BarControlBean
> > > >>>>>> wait for lock on BC.gHL
> > > >>>>>>
> > > >>>>>> Because the Controls infrastructure is based on the BeanContext
> > > >>>>>> (Services|Support|etc) code in the java.beans package of the JDK, it
> > > >>>>>> uses the BeanContextServicesSupport class as a base class for the
> > > >>>>>> ControlBeanContext object in a Control.  This JDK class (BCSS) uses
> > > >>>>>> the BeanContext.globalHierarchyLock (also in the JDK) to serialize
> > > >>>>>> access to shared data structures like the list of "service" objects in
> > > >>>>>> a BeanContext, etc.  This is a lock that is static throughout the JDK
> > > >>>>>> (!).
> > > >>>>>>
> > > >>>>>> To answer your questions:
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>> What grabs the lock on BeanContextServicesSupport.globalHierarchyLock?
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>> The base classes for the ControlBeanContext object which delegates up
> > > >>>>>> to it's "super" in order to serialize changes and service requests to
> > > >>>>>> the classes that implement event listener support, etc.
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>> What prevents deadlock in general between locking on
> > > >>>>>>> BeanContextServicesSupport.globalHierarchyLock and BarControlBean?
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>> Good question -- this seems like the crux of the issue.  In general,
> > > >>>>>> this is protected by a Control container serializing access to the
> > > >>>>>> container's ControlContainerContext object.  In this case, the problem
> > > >>>>>> occurs because two threads are trying to setup and destroy the same
> > > >>>>>> context object simultaneously.  If those setup and destroy calls were
> > > >>>>>> serialized against the ControlContainerContext object stored in the
> > > >>>>>> HttpSession, presumably, the semantics would be maintained.
> > > >>>>>>
> > > >>>>>> My guess is that this deadlock occurred with a double page submit or
> > > >>>>>> quick refresh that caught the tail end of the previous thread and the
> > > >>>>>> beginning of the next request such that the threads interleaved in
> > > >>>>>> this way.
> > > >>>>>>
> > > >>>>>> So, hopefully that helps explain what's happening.
> > > >>>>>>
> > > >>>>>> Ultimately, I'm not sure how to fix this yet and would appreciate
> > > >>>>>> thoughts on the topic.  Would it be safe to acquire a lock on the JPF
> > > >>>>>> instance during onDestroy?  That would serialize the access to the
> > > >>>>>> session scoped CCC object.
> > > >>>>>>
> > > >>>>>> Another way to go (and it's a lot of work) is to build a NetUI /
> > > >>>>>> web-tier specific control container that doesn't leverage the *Support
> > > >>>>>> classes in the JDK.  This would allow the container's implementation
> > > >>>>>> to be tailored to the single-threaded nature of the web tier and would
> > > >>>>>> remove a lot of the performance implications around locking and
> > > >>>>>> synchronized data structures.  But, I've not thought through this yet,
> > > >>>>>> either.  :)
> > > >>>>>>
> > > >>>>>> Eddie
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>
> > > >>>>>> On 3/30/06, Rich Feit <ri...@gmail.com> wrote:
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>> Looks like a tough one.  First of all, the problem with synchronizing in
> > > >>>>>>> onDestroy is also a deadlock issue: since onDestroy is called from an
> > > >>>>>>> HttpSessionBindingListener, the Servlet container may have a lock on the
> > > >>>>>>> session itself when onDestroy is called.  I don't think it's mandated by
> > > >>>>>>> the Servlet spec, but I don't think it's forbidden either.
> > > >>>>>>> Unfortunately, this means that a deadlock can occur if code in another
> > > >>>>>>> thread synchronizes on the page flow instance first (as in an action
> > > >>>>>>> method invocation), then calls a method on HttpSession that synchronizes
> > > >>>>>>> on the session object.  Again, I don't think this is mandated by the
> > > >>>>>>> spec, but it happens.
> > > >>>>>>>
> > > >>>>>>> Nothing strikes me off the bat, but I actually don't understand the
> > > >>>>>>> deadlock sequence below (I don't disagree -- just don't have enough info
> > > >>>>>>> to see it).  What grabs the lock on
> > > >>>>>>> BeanContextServicesSupport.globalHierarchyLock?  What prevents deadlock
> > > >>>>>>> in general between locking on
> > > >>>>>>> BeanContextServicesSupport.globalHierarchyLock and BarControlBean?
> > > >>>>>>>
> > > >>>>>>> Also, how is the same JPF being *created* by two separate threads?
> > > >>>>>>>
> > > >>>>>>> Rich
> > > >>>>>>>
> > > >>>>>>> Eddie O'Neil wrote:
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>>> Rich/Daryl--
> > > >>>>>>>>
> > > >>>>>>>>   Hey, I've run into a thread deadlock problem in the interaction
> > > >>>>>>>> between Controls and Page Flow that happens in the following
> > > >>>>>>>> circumstance:
> > > >>>>>>>>
> > > >>>>>>>> thread1: acquire lock on FooPageFlow (during FlowController.execute)
> > > >>>>>>>> thread2: acquire lock on
> > > >>>>>>>> BeanContextServicesSupport.globalHierarchyLock (JDK class)
> > > >>>>>>>> thread1: acquire lock on BarControlBean (ControlBean.ensureControl())
> > > >>>>>>>> thread2: wait for lock on BarControlBean (BeanContextSupport.remove())
> > > >>>>>>>> thread1: wait for lock on
> > > >>>>>>>> BeanContextServicesSupport.globalHierarchyLock (JDK class)
> > > >>>>>>>>
> > > >>>>>>>> So, the problem is that the same JPF is being both created and
> > > >>>>>>>> destroyed by two different threads.  It appears that the "destroy"
> > > >>>>>>>> phase of the JPF lifecycle is entirely unsynchronized and can freely
> > > >>>>>>>> acquire Control locks without having serialized access to the Page
> > > >>>>>>>> Flow itself.
> > > >>>>>>>>
> > > >>>>>>>>   My thought is to add a synchronization point in
> > > >>>>>>>> JavaControlUtils.uninitJavaControls that locks on the
> > > >>>>>>>> PageFlowManagedObject as this would serialize access to the Page Flow.
> > > >>>>>>>>  But, I seem to recall some threading issues with locking on a JPF
> > > >>>>>>>> during the "destroy" part of the lifecycle and don't want to resurrect
> > > >>>>>>>> those.
> > > >>>>>>>>
> > > >>>>>>>>   Any suggestions about how best to make this fix?
> > > >>>>>>>>
> > > >>>>>>>> Eddie
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>
> > > >
> > > >
> > >
> >
>

Re: page flow destroy, controls, and deadlock

Posted by Eddie O'Neil <ek...@gmail.com>.

Rich--

  Hey -- one other question that just struck me about this...

  In the DeferredSessionStorageHandler, is it possible to have two
threads running there at once such that for a single thread, the
"current" Page Flow changes between changed attributes and failover
attributes?  Seems like this is a possibility.  For example:

Thread 1                                                   Thread 2
                                                                1:
apply changed attribute for JPF B
1: apply changed attribute for JPF A
                                                                1:
apply failover attribute for new JPF B
2: apply failover attribute via
session.setAttribute(...) that overwrites
B with A, causing B's onDestroy method
to be invoked

  In this case, it seems like B.onDestroy(...) would need to be
invoked, so lifecycle methods *should* run when setting failover
changes.

  Just making sure I've got this right.  ;)

Eddie


On 4/7/06, Eddie O'Neil <ek...@gmail.com> wrote:
> Rich--
>
>   That's close.  :)  My current speculation is that the locking issues
> related to Controls (BH.gHL, etc) are happening because of
> unsynchronized execution of the Page Flow's "destroy" lifecycle method
> from the second valueUnbound event.  Since this is unsynchronized, it
> allows two threads (one running in destroy and another running an
> action) to interact with the locks inside of the ControlBeanContext
> hierarchy.
>
>   Recapping the last example, this is a Page Flow "FooController" with
> an @Control BarControlBean; the same instance is being used in both of
> the threads below.
>
> Thread 1 (run an action a JPF)    Thread 2 (destroying a JPF)
> ===============                  =============
> 1: lock FooController
> 2: lock BarControlBean
>                                                 1: lock BC.gHL
>                                                 2: wait for lock on
> BarControlBean
> 3: wait BC.gHL lock
>
> Thread 2 looks strange because it's destroynig the same JPF instance
> as Thread 1.  I agree that this is odd yet seems possible with "quick
> clicking".
>
>   That seems like the most likely explanation so far.  Does that make
> more sense?
>
> Eddie
>
>
>
>
>
>
> On 4/7/06, Rich Feit <ri...@gmail.com> wrote:
> > OK, thanks for the info.  It sounds to me like the deadlock is caused by
> > ordering issues when locking the control bean and BC.gHL, and that the
> > page flow synchronization is simply masking this root problem.  Is that
> > what's going on, or am I misunderstanding?
> >
> > Rich
> >
> > Eddie O'Neil wrote:
> > >   Yeah -- that's right the deadlock is caused by lack of
> > > synchronization during "destroy" on the second valueUnbound event.  I
> > > think (but haven't confirmed yet) that the same thing could happen
> > > with different implementations of the
> > > ServletContainerAdapter.ensureFailover method if it calls
> > > "session.setAttribute(...)".  Just a hunch, though, and I need to
> > > confirm this.
> > >
> > >   Back to the deadlock, absence of synchronization on the "destroy"
> > > lifecycle call allows a second thread to enter into the
> > > ControlBeanContext which can cause the thread to deadlock with a
> > > second thread that's already in the ControlBeanContext.
> > >
> > >
> > >> I don't think that's an option here, since it voids the whole purpose of
> > >> DeferredSessionStorageHandler
> > >> (http://mail-archives.apache.org/mod_mbox/beehive-dev/200601.mbox/%3C43D7AC11.8060207@gmail.com%3E
> > >> ) -- am I understanding you here?
> > >>
> > >
> > > Yeah, that makes sense -- just wanted to rule this out as an option.  :)
> > >
> > >
> > >> Otherwise, I wonder if there's a way to track this without adding state
> > >> to the page flow instance.  Possibly in a request attribute?  Just a
> > >> thought...
> > >>
> > >
> > > Sure -- this would certainly work.  It could take the form of a set of
> > > attribute names that should be persisted / failed over, but that
> > > should ignore lifecycle methods.  Will look at this and get back to
> > > you.
> > >
> > > Thanks for the comments...
> > >
> > > Eddie
> > >
> > >
> > >
> > >> Rich
> > >>
> > >> Eddie O'Neil wrote:
> > >>
> > >>>   Gotcha -- thanks for the info; glad to know I wasn't entirely out in
> > >>> left field.  :)
> > >>>
> > >>>   Given this and putting the issue of the session expiration aside for
> > >>> a second, I think that root cause of the problem is that it is
> > >>> possible for the DeferredSessionStorageHandler to run the
> > >>> "valueUnbound" method code on a PageFlowController twice in these two
> > >>> circumstances:
> > >>>
> > >>>   - DeferredSessionStorageHandler::setAttribute / remove Attribute
> > >>>   - DeferredSessionStorageHandler::applyChanges @ line 221
> > >>>
> > >>> The first runs in "real time" with the removal or replacement of a
> > >>> Page Flow in the session and runs inside of the lock, but the second
> > >>> is the deferred part that runs at the end of a request to persist
> > >>> changed attributes in the session.  The "applyChanges" then makes
> > >>> calls to set / remove the Page Flow attribute from the real
> > >>> HttpSession, and this triggers the "valueUnbound" event and ultimately
> > >>> an "onDestroy".  This latter step is unsynchronized and thus causes
> > >>> the deadlock.
> > >>>
> > >>>   If the Page Flow is stored in the session and is an
> > >>> HttpSessionBindingListener, any deferred calls are going to trigger
> > >>> this event.  In order to keep consecutive JPF invocations in the same
> > >>> order, the onDestroy of the previous JPF needs to run before onCreate
> > >>> of the next JPF.  The cheap way to fix this would be to track the
> > >>> state of "valueUnbound" as a private boolean inside of the
> > >>> PageFlowController.  Then, when the second valueUnbound call is made,
> > >>> the event will be a no-op on the JPF.
> > >>>
> > >>>   Another solution would be to tunnel down into the real HttpSession
> > >>> and make JPF changes "live" against the session rather than deferring
> > >>> them.
> > >>>
> > >>>   Have an opinion either way?
> > >>>
> > >>> Eddie
> > >>>
> > >>>
> > >>>
> > >>> On 4/6/06, Rich Feit <ri...@gmail.com> wrote:
> > >>>
> > >>>
> > >>>> Funny, I'd drafted a longish email earlier asking about the BeanContext
> > >>>> lock, because it sounded like the *fundamental* issue (ordering problems
> > >>>> in locking that and the control bean instance itself).  But it's not, or
> > >>>> at least it's unrelated.  In a way, that's good.  :)
> > >>>>
> > >>>> Regarding this:
> > >>>>
> > >>>>
> > >>>>> Thread A is execution an action on JPF X
> > >>>>> Thread B is destroying JPF X en route to running JPF Y
> > >>>>>
> > >>>>> From the code in PageFlowController.persistInSession (transitively
> > >>>>> down to StorageHandler.removeAttribute), it seems like a lock is held
> > >>>>> on the "current" JPF that would prevent this.  Right?
> > >>>>>
> > >>>>>
> > >>>> That's correct.  The only time onDestroy() should ever run without a
> > >>>> lock on the page flow instance is when the session is expiring.  That's
> > >>>> the situation where the session itself is already locked, so it's
> > >>>> dangerous to subsequently lock the page flow instance (while another
> > >>>> thread may be accessing the session through synchronized get() after
> > >>>> already locking the page flow instance).  The user could also simulate
> > >>>> this situation by explicitly removing the page flow session attribute,
> > >>>> but of course that's not something we should try to support.
> > >>>>
> > >>>> Let me know if you see/find any holes in that, though...
> > >>>>
> > >>>> Rich
> > >>>>
> > >>>> Eddie O'Neil wrote:
> > >>>>
> > >>>>
> > >>>>> Rich--
> > >>>>>
> > >>>>>   Having looked into this more now, I've got a little more detail
> > >>>>> about what's happening:
> > >>>>>
> > >>>>>   There are two issues intertwined here.  The first issue is related
> > >>>>> to the ControlBeanContext implementation class and its usage of a
> > >>>>> global, VM-wide lock.  At issue is the fact that all
> > >>>>> ControlContainerContext implementations inherit from a
> > >>>>> BeanContextServicesSupport base class.  This class (and another in
> > >>>>> that hierarchy) use a the VM-wide object
> > >>>>> BeanContext.globalHierarchyLock for synchronization in two situations:
> > >>>>>
> > >>>>>   - adding / removing a control to / from a context
> > >>>>>   - adding / removing / getting a "service" from a context
> > >>>>>
> > >>>>> This acts as a choke point for all Control references and is what
> > >>>>> shows up in the locking traces in a previous mail.  This is
> > >>>>> problematic, but it's also orthogonal (I believe) to the issue with
> > >>>>> the "destroy" lifecycle method on a JPF.  Ultimately, I think that the
> > >>>>> JDK assumed that the classes in java.beans.beancontext.* would be used
> > >>>>> to run a single hierarchy of controls on a VM.  Beehive's usage
> > >>>>> pattern is a little different in that in enterprise applications, many
> > >>>>> hierarchies of beans run on a VM (one per JPF, in NetUI's case).  As
> > >>>>> such, a global lock doesn't fit well.
> > >>>>>
> > >>>>>   So, hopefully that provides some background on the
> > >>>>> BeanContext.globalHierarchyLock.
> > >>>>>
> > >>>>>   That being said, I think that the deadlock is orthogonal to this
> > >>>>> problem as the "destroy" lifecycle method on a JPF can be invoked
> > >>>>> without waiting for a lock on the JPF instance itself.  The result is
> > >>>>> that two threads can actually run through a JPF simultaneously.
> > >>>>>
> > >>>>>   With Daryl's changes in February, the ControlContainerContext object
> > >>>>> for a Page Flow is a class-level field, and locking is performed on
> > >>>>> this object at:
> > >>>>>
> > >>>>>   - create time
> > >>>>>   - action execution time
> > >>>>>   - page rendering time (JSP and Faces)
> > >>>>>
> > >>>>> Guess that the question is whether this situation is protected:
> > >>>>>
> > >>>>> Thread A is execution an action on JPF X
> > >>>>> Thread B is destroying JPF X en route to running JPF Y
> > >>>>>
> > >>>>> >From the code in PageFlowController.persistInSession (transitively
> > >>>>> down to StorageHandler.removeAttribute), it seems like a lock is held
> > >>>>> on the "current" JPF that would prevent this.  Right?
> > >>>>>
> > >>>>>   If that's true, then I've got a little more info to go on in
> > >>>>> investigating this problem.
> > >>>>>
> > >>>>>   Thanks for any info...  :)
> > >>>>>
> > >>>>> Eddie
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>> On 3/31/06, Eddie O'Neil <ek...@gmail.com> wrote:
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>>>   Good questions -- let's see if I can explain the threads more
> > >>>>>> clearly in ASCII.  :)
> > >>>>>>
> > >>>>>>   What's happening is this (note, both threads operate on the *same*
> > >>>>>> instance of a JPF):
> > >>>>>>
> > >>>>>> Thread 1 (executing a page flow)              Thread2 (destroying a page flow)
> > >>>>>> acquire lock on FooPage Flow
> > >>>>>>                                                              acquire BC.gHL
> > >>>>>> acquire lock on BarControlBean
> > >>>>>>                                                              wait for
> > >>>>>> lock on BarControlBean
> > >>>>>> wait for lock on BC.gHL
> > >>>>>>
> > >>>>>> Because the Controls infrastructure is based on the BeanContext
> > >>>>>> (Services|Support|etc) code in the java.beans package of the JDK, it
> > >>>>>> uses the BeanContextServicesSupport class as a base class for the
> > >>>>>> ControlBeanContext object in a Control.  This JDK class (BCSS) uses
> > >>>>>> the BeanContext.globalHierarchyLock (also in the JDK) to serialize
> > >>>>>> access to shared data structures like the list of "service" objects in
> > >>>>>> a BeanContext, etc.  This is a lock that is static throughout the JDK
> > >>>>>> (!).
> > >>>>>>
> > >>>>>> To answer your questions:
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>>> What grabs the lock on BeanContextServicesSupport.globalHierarchyLock?
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>> The base classes for the ControlBeanContext object which delegates up
> > >>>>>> to it's "super" in order to serialize changes and service requests to
> > >>>>>> the classes that implement event listener support, etc.
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>>> What prevents deadlock in general between locking on
> > >>>>>>> BeanContextServicesSupport.globalHierarchyLock and BarControlBean?
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>> Good question -- this seems like the crux of the issue.  In general,
> > >>>>>> this is protected by a Control container serializing access to the
> > >>>>>> container's ControlContainerContext object.  In this case, the problem
> > >>>>>> occurs because two threads are trying to setup and destroy the same
> > >>>>>> context object simultaneously.  If those setup and destroy calls were
> > >>>>>> serialized against the ControlContainerContext object stored in the
> > >>>>>> HttpSession, presumably, the semantics would be maintained.
> > >>>>>>
> > >>>>>> My guess is that this deadlock occurred with a double page submit or
> > >>>>>> quick refresh that caught the tail end of the previous thread and the
> > >>>>>> beginning of the next request such that the threads interleaved in
> > >>>>>> this way.
> > >>>>>>
> > >>>>>> So, hopefully that helps explain what's happening.
> > >>>>>>
> > >>>>>> Ultimately, I'm not sure how to fix this yet and would appreciate
> > >>>>>> thoughts on the topic.  Would it be safe to acquire a lock on the JPF
> > >>>>>> instance during onDestroy?  That would serialize the access to the
> > >>>>>> session scoped CCC object.
> > >>>>>>
> > >>>>>> Another way to go (and it's a lot of work) is to build a NetUI /
> > >>>>>> web-tier specific control container that doesn't leverage the *Support
> > >>>>>> classes in the JDK.  This would allow the container's implementation
> > >>>>>> to be tailored to the single-threaded nature of the web tier and would
> > >>>>>> remove a lot of the performance implications around locking and
> > >>>>>> synchronized data structures.  But, I've not thought through this yet,
> > >>>>>> either.  :)
> > >>>>>>
> > >>>>>> Eddie
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>> On 3/30/06, Rich Feit <ri...@gmail.com> wrote:
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>>> Looks like a tough one.  First of all, the problem with synchronizing in
> > >>>>>>> onDestroy is also a deadlock issue: since onDestroy is called from an
> > >>>>>>> HttpSessionBindingListener, the Servlet container may have a lock on the
> > >>>>>>> session itself when onDestroy is called.  I don't think it's mandated by
> > >>>>>>> the Servlet spec, but I don't think it's forbidden either.
> > >>>>>>> Unfortunately, this means that a deadlock can occur if code in another
> > >>>>>>> thread synchronizes on the page flow instance first (as in an action
> > >>>>>>> method invocation), then calls a method on HttpSession that synchronizes
> > >>>>>>> on the session object.  Again, I don't think this is mandated by the
> > >>>>>>> spec, but it happens.
> > >>>>>>>
> > >>>>>>> Nothing strikes me off the bat, but I actually don't understand the
> > >>>>>>> deadlock sequence below (I don't disagree -- just don't have enough info
> > >>>>>>> to see it).  What grabs the lock on
> > >>>>>>> BeanContextServicesSupport.globalHierarchyLock?  What prevents deadlock
> > >>>>>>> in general between locking on
> > >>>>>>> BeanContextServicesSupport.globalHierarchyLock and BarControlBean?
> > >>>>>>>
> > >>>>>>> Also, how is the same JPF being *created* by two separate threads?
> > >>>>>>>
> > >>>>>>> Rich
> > >>>>>>>
> > >>>>>>> Eddie O'Neil wrote:
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>> Rich/Daryl--
> > >>>>>>>>
> > >>>>>>>>   Hey, I've run into a thread deadlock problem in the interaction
> > >>>>>>>> between Controls and Page Flow that happens in the following
> > >>>>>>>> circumstance:
> > >>>>>>>>
> > >>>>>>>> thread1: acquire lock on FooPageFlow (during FlowController.execute)
> > >>>>>>>> thread2: acquire lock on
> > >>>>>>>> BeanContextServicesSupport.globalHierarchyLock (JDK class)
> > >>>>>>>> thread1: acquire lock on BarControlBean (ControlBean.ensureControl())
> > >>>>>>>> thread2: wait for lock on BarControlBean (BeanContextSupport.remove())
> > >>>>>>>> thread1: wait for lock on
> > >>>>>>>> BeanContextServicesSupport.globalHierarchyLock (JDK class)
> > >>>>>>>>
> > >>>>>>>> So, the problem is that the same JPF is being both created and
> > >>>>>>>> destroyed by two different threads.  It appears that the "destroy"
> > >>>>>>>> phase of the JPF lifecycle is entirely unsynchronized and can freely
> > >>>>>>>> acquire Control locks without having serialized access to the Page
> > >>>>>>>> Flow itself.
> > >>>>>>>>
> > >>>>>>>>   My thought is to add a synchronization point in
> > >>>>>>>> JavaControlUtils.uninitJavaControls that locks on the
> > >>>>>>>> PageFlowManagedObject as this would serialize access to the Page Flow.
> > >>>>>>>>  But, I seem to recall some threading issues with locking on a JPF
> > >>>>>>>> during the "destroy" part of the lifecycle and don't want to resurrect
> > >>>>>>>> those.
> > >>>>>>>>
> > >>>>>>>>   Any suggestions about how best to make this fix?
> > >>>>>>>>
> > >>>>>>>> Eddie
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>
> > >>>
> > >
> > >
> >
>

Re: page flow destroy, controls, and deadlock

Posted by Eddie O'Neil <ek...@gmail.com>.

Rich--

  That's close.  :)  My current speculation is that the locking issues
related to Controls (BH.gHL, etc) are happening because of
unsynchronized execution of the Page Flow's "destroy" lifecycle method
from the second valueUnbound event.  Since this is unsynchronized, it
allows two threads (one running in destroy and another running an
action) to interact with the locks inside of the ControlBeanContext
hierarchy.

  Recapping the last example, this is a Page Flow "FooController" with
an @Control BarControlBean; the same instance is being used in both of
the threads below.

Thread 1 (run an action a JPF)    Thread 2 (destroying a JPF)
===============                  =============
1: lock FooController
2: lock BarControlBean
                                                1: lock BC.gHL
                                                2: wait for lock on
BarControlBean
3: wait BC.gHL lock

Thread 2 looks strange because it's destroynig the same JPF instance
as Thread 1.  I agree that this is odd yet seems possible with "quick
clicking".

  That seems like the most likely explanation so far.  Does that make
more sense?

Eddie






On 4/7/06, Rich Feit <ri...@gmail.com> wrote:
> OK, thanks for the info.  It sounds to me like the deadlock is caused by
> ordering issues when locking the control bean and BC.gHL, and that the
> page flow synchronization is simply masking this root problem.  Is that
> what's going on, or am I misunderstanding?
>
> Rich
>
> Eddie O'Neil wrote:
> >   Yeah -- that's right the deadlock is caused by lack of
> > synchronization during "destroy" on the second valueUnbound event.  I
> > think (but haven't confirmed yet) that the same thing could happen
> > with different implementations of the
> > ServletContainerAdapter.ensureFailover method if it calls
> > "session.setAttribute(...)".  Just a hunch, though, and I need to
> > confirm this.
> >
> >   Back to the deadlock, absence of synchronization on the "destroy"
> > lifecycle call allows a second thread to enter into the
> > ControlBeanContext which can cause the thread to deadlock with a
> > second thread that's already in the ControlBeanContext.
> >
> >
> >> I don't think that's an option here, since it voids the whole purpose of
> >> DeferredSessionStorageHandler
> >> (http://mail-archives.apache.org/mod_mbox/beehive-dev/200601.mbox/%3C43D7AC11.8060207@gmail.com%3E
> >> ) -- am I understanding you here?
> >>
> >
> > Yeah, that makes sense -- just wanted to rule this out as an option.  :)
> >
> >
> >> Otherwise, I wonder if there's a way to track this without adding state
> >> to the page flow instance.  Possibly in a request attribute?  Just a
> >> thought...
> >>
> >
> > Sure -- this would certainly work.  It could take the form of a set of
> > attribute names that should be persisted / failed over, but that
> > should ignore lifecycle methods.  Will look at this and get back to
> > you.
> >
> > Thanks for the comments...
> >
> > Eddie
> >
> >
> >
> >> Rich
> >>
> >> Eddie O'Neil wrote:
> >>
> >>>   Gotcha -- thanks for the info; glad to know I wasn't entirely out in
> >>> left field.  :)
> >>>
> >>>   Given this and putting the issue of the session expiration aside for
> >>> a second, I think that root cause of the problem is that it is
> >>> possible for the DeferredSessionStorageHandler to run the
> >>> "valueUnbound" method code on a PageFlowController twice in these two
> >>> circumstances:
> >>>
> >>>   - DeferredSessionStorageHandler::setAttribute / remove Attribute
> >>>   - DeferredSessionStorageHandler::applyChanges @ line 221
> >>>
> >>> The first runs in "real time" with the removal or replacement of a
> >>> Page Flow in the session and runs inside of the lock, but the second
> >>> is the deferred part that runs at the end of a request to persist
> >>> changed attributes in the session.  The "applyChanges" then makes
> >>> calls to set / remove the Page Flow attribute from the real
> >>> HttpSession, and this triggers the "valueUnbound" event and ultimately
> >>> an "onDestroy".  This latter step is unsynchronized and thus causes
> >>> the deadlock.
> >>>
> >>>   If the Page Flow is stored in the session and is an
> >>> HttpSessionBindingListener, any deferred calls are going to trigger
> >>> this event.  In order to keep consecutive JPF invocations in the same
> >>> order, the onDestroy of the previous JPF needs to run before onCreate
> >>> of the next JPF.  The cheap way to fix this would be to track the
> >>> state of "valueUnbound" as a private boolean inside of the
> >>> PageFlowController.  Then, when the second valueUnbound call is made,
> >>> the event will be a no-op on the JPF.
> >>>
> >>>   Another solution would be to tunnel down into the real HttpSession
> >>> and make JPF changes "live" against the session rather than deferring
> >>> them.
> >>>
> >>>   Have an opinion either way?
> >>>
> >>> Eddie
> >>>
> >>>
> >>>
> >>> On 4/6/06, Rich Feit <ri...@gmail.com> wrote:
> >>>
> >>>
> >>>> Funny, I'd drafted a longish email earlier asking about the BeanContext
> >>>> lock, because it sounded like the *fundamental* issue (ordering problems
> >>>> in locking that and the control bean instance itself).  But it's not, or
> >>>> at least it's unrelated.  In a way, that's good.  :)
> >>>>
> >>>> Regarding this:
> >>>>
> >>>>
> >>>>> Thread A is execution an action on JPF X
> >>>>> Thread B is destroying JPF X en route to running JPF Y
> >>>>>
> >>>>> From the code in PageFlowController.persistInSession (transitively
> >>>>> down to StorageHandler.removeAttribute), it seems like a lock is held
> >>>>> on the "current" JPF that would prevent this.  Right?
> >>>>>
> >>>>>
> >>>> That's correct.  The only time onDestroy() should ever run without a
> >>>> lock on the page flow instance is when the session is expiring.  That's
> >>>> the situation where the session itself is already locked, so it's
> >>>> dangerous to subsequently lock the page flow instance (while another
> >>>> thread may be accessing the session through synchronized get() after
> >>>> already locking the page flow instance).  The user could also simulate
> >>>> this situation by explicitly removing the page flow session attribute,
> >>>> but of course that's not something we should try to support.
> >>>>
> >>>> Let me know if you see/find any holes in that, though...
> >>>>
> >>>> Rich
> >>>>
> >>>> Eddie O'Neil wrote:
> >>>>
> >>>>
> >>>>> Rich--
> >>>>>
> >>>>>   Having looked into this more now, I've got a little more detail
> >>>>> about what's happening:
> >>>>>
> >>>>>   There are two issues intertwined here.  The first issue is related
> >>>>> to the ControlBeanContext implementation class and its usage of a
> >>>>> global, VM-wide lock.  At issue is the fact that all
> >>>>> ControlContainerContext implementations inherit from a
> >>>>> BeanContextServicesSupport base class.  This class (and another in
> >>>>> that hierarchy) use a the VM-wide object
> >>>>> BeanContext.globalHierarchyLock for synchronization in two situations:
> >>>>>
> >>>>>   - adding / removing a control to / from a context
> >>>>>   - adding / removing / getting a "service" from a context
> >>>>>
> >>>>> This acts as a choke point for all Control references and is what
> >>>>> shows up in the locking traces in a previous mail.  This is
> >>>>> problematic, but it's also orthogonal (I believe) to the issue with
> >>>>> the "destroy" lifecycle method on a JPF.  Ultimately, I think that the
> >>>>> JDK assumed that the classes in java.beans.beancontext.* would be used
> >>>>> to run a single hierarchy of controls on a VM.  Beehive's usage
> >>>>> pattern is a little different in that in enterprise applications, many
> >>>>> hierarchies of beans run on a VM (one per JPF, in NetUI's case).  As
> >>>>> such, a global lock doesn't fit well.
> >>>>>
> >>>>>   So, hopefully that provides some background on the
> >>>>> BeanContext.globalHierarchyLock.
> >>>>>
> >>>>>   That being said, I think that the deadlock is orthogonal to this
> >>>>> problem as the "destroy" lifecycle method on a JPF can be invoked
> >>>>> without waiting for a lock on the JPF instance itself.  The result is
> >>>>> that two threads can actually run through a JPF simultaneously.
> >>>>>
> >>>>>   With Daryl's changes in February, the ControlContainerContext object
> >>>>> for a Page Flow is a class-level field, and locking is performed on
> >>>>> this object at:
> >>>>>
> >>>>>   - create time
> >>>>>   - action execution time
> >>>>>   - page rendering time (JSP and Faces)
> >>>>>
> >>>>> Guess that the question is whether this situation is protected:
> >>>>>
> >>>>> Thread A is execution an action on JPF X
> >>>>> Thread B is destroying JPF X en route to running JPF Y
> >>>>>
> >>>>> >From the code in PageFlowController.persistInSession (transitively
> >>>>> down to StorageHandler.removeAttribute), it seems like a lock is held
> >>>>> on the "current" JPF that would prevent this.  Right?
> >>>>>
> >>>>>   If that's true, then I've got a little more info to go on in
> >>>>> investigating this problem.
> >>>>>
> >>>>>   Thanks for any info...  :)
> >>>>>
> >>>>> Eddie
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> On 3/31/06, Eddie O'Neil <ek...@gmail.com> wrote:
> >>>>>
> >>>>>
> >>>>>
> >>>>>>   Good questions -- let's see if I can explain the threads more
> >>>>>> clearly in ASCII.  :)
> >>>>>>
> >>>>>>   What's happening is this (note, both threads operate on the *same*
> >>>>>> instance of a JPF):
> >>>>>>
> >>>>>> Thread 1 (executing a page flow)              Thread2 (destroying a page flow)
> >>>>>> acquire lock on FooPage Flow
> >>>>>>                                                              acquire BC.gHL
> >>>>>> acquire lock on BarControlBean
> >>>>>>                                                              wait for
> >>>>>> lock on BarControlBean
> >>>>>> wait for lock on BC.gHL
> >>>>>>
> >>>>>> Because the Controls infrastructure is based on the BeanContext
> >>>>>> (Services|Support|etc) code in the java.beans package of the JDK, it
> >>>>>> uses the BeanContextServicesSupport class as a base class for the
> >>>>>> ControlBeanContext object in a Control.  This JDK class (BCSS) uses
> >>>>>> the BeanContext.globalHierarchyLock (also in the JDK) to serialize
> >>>>>> access to shared data structures like the list of "service" objects in
> >>>>>> a BeanContext, etc.  This is a lock that is static throughout the JDK
> >>>>>> (!).
> >>>>>>
> >>>>>> To answer your questions:
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>> What grabs the lock on BeanContextServicesSupport.globalHierarchyLock?
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>> The base classes for the ControlBeanContext object which delegates up
> >>>>>> to it's "super" in order to serialize changes and service requests to
> >>>>>> the classes that implement event listener support, etc.
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>> What prevents deadlock in general between locking on
> >>>>>>> BeanContextServicesSupport.globalHierarchyLock and BarControlBean?
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>> Good question -- this seems like the crux of the issue.  In general,
> >>>>>> this is protected by a Control container serializing access to the
> >>>>>> container's ControlContainerContext object.  In this case, the problem
> >>>>>> occurs because two threads are trying to setup and destroy the same
> >>>>>> context object simultaneously.  If those setup and destroy calls were
> >>>>>> serialized against the ControlContainerContext object stored in the
> >>>>>> HttpSession, presumably, the semantics would be maintained.
> >>>>>>
> >>>>>> My guess is that this deadlock occurred with a double page submit or
> >>>>>> quick refresh that caught the tail end of the previous thread and the
> >>>>>> beginning of the next request such that the threads interleaved in
> >>>>>> this way.
> >>>>>>
> >>>>>> So, hopefully that helps explain what's happening.
> >>>>>>
> >>>>>> Ultimately, I'm not sure how to fix this yet and would appreciate
> >>>>>> thoughts on the topic.  Would it be safe to acquire a lock on the JPF
> >>>>>> instance during onDestroy?  That would serialize the access to the
> >>>>>> session scoped CCC object.
> >>>>>>
> >>>>>> Another way to go (and it's a lot of work) is to build a NetUI /
> >>>>>> web-tier specific control container that doesn't leverage the *Support
> >>>>>> classes in the JDK.  This would allow the container's implementation
> >>>>>> to be tailored to the single-threaded nature of the web tier and would
> >>>>>> remove a lot of the performance implications around locking and
> >>>>>> synchronized data structures.  But, I've not thought through this yet,
> >>>>>> either.  :)
> >>>>>>
> >>>>>> Eddie
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> On 3/30/06, Rich Feit <ri...@gmail.com> wrote:
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>> Looks like a tough one.  First of all, the problem with synchronizing in
> >>>>>>> onDestroy is also a deadlock issue: since onDestroy is called from an
> >>>>>>> HttpSessionBindingListener, the Servlet container may have a lock on the
> >>>>>>> session itself when onDestroy is called.  I don't think it's mandated by
> >>>>>>> the Servlet spec, but I don't think it's forbidden either.
> >>>>>>> Unfortunately, this means that a deadlock can occur if code in another
> >>>>>>> thread synchronizes on the page flow instance first (as in an action
> >>>>>>> method invocation), then calls a method on HttpSession that synchronizes
> >>>>>>> on the session object.  Again, I don't think this is mandated by the
> >>>>>>> spec, but it happens.
> >>>>>>>
> >>>>>>> Nothing strikes me off the bat, but I actually don't understand the
> >>>>>>> deadlock sequence below (I don't disagree -- just don't have enough info
> >>>>>>> to see it).  What grabs the lock on
> >>>>>>> BeanContextServicesSupport.globalHierarchyLock?  What prevents deadlock
> >>>>>>> in general between locking on
> >>>>>>> BeanContextServicesSupport.globalHierarchyLock and BarControlBean?
> >>>>>>>
> >>>>>>> Also, how is the same JPF being *created* by two separate threads?
> >>>>>>>
> >>>>>>> Rich
> >>>>>>>
> >>>>>>> Eddie O'Neil wrote:
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>> Rich/Daryl--
> >>>>>>>>
> >>>>>>>>   Hey, I've run into a thread deadlock problem in the interaction
> >>>>>>>> between Controls and Page Flow that happens in the following
> >>>>>>>> circumstance:
> >>>>>>>>
> >>>>>>>> thread1: acquire lock on FooPageFlow (during FlowController.execute)
> >>>>>>>> thread2: acquire lock on
> >>>>>>>> BeanContextServicesSupport.globalHierarchyLock (JDK class)
> >>>>>>>> thread1: acquire lock on BarControlBean (ControlBean.ensureControl())
> >>>>>>>> thread2: wait for lock on BarControlBean (BeanContextSupport.remove())
> >>>>>>>> thread1: wait for lock on
> >>>>>>>> BeanContextServicesSupport.globalHierarchyLock (JDK class)
> >>>>>>>>
> >>>>>>>> So, the problem is that the same JPF is being both created and
> >>>>>>>> destroyed by two different threads.  It appears that the "destroy"
> >>>>>>>> phase of the JPF lifecycle is entirely unsynchronized and can freely
> >>>>>>>> acquire Control locks without having serialized access to the Page
> >>>>>>>> Flow itself.
> >>>>>>>>
> >>>>>>>>   My thought is to add a synchronization point in
> >>>>>>>> JavaControlUtils.uninitJavaControls that locks on the
> >>>>>>>> PageFlowManagedObject as this would serialize access to the Page Flow.
> >>>>>>>>  But, I seem to recall some threading issues with locking on a JPF
> >>>>>>>> during the "destroy" part of the lifecycle and don't want to resurrect
> >>>>>>>> those.
> >>>>>>>>
> >>>>>>>>   Any suggestions about how best to make this fix?
> >>>>>>>>
> >>>>>>>> Eddie
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>
> >
> >
>

Re: page flow destroy, controls, and deadlock

Posted by Rich Feit <ri...@gmail.com>.

OK, thanks for the info.  It sounds to me like the deadlock is caused by 
ordering issues when locking the control bean and BC.gHL, and that the 
page flow synchronization is simply masking this root problem.  Is that 
what's going on, or am I misunderstanding?

Rich

Eddie O'Neil wrote:
>   Yeah -- that's right the deadlock is caused by lack of
> synchronization during "destroy" on the second valueUnbound event.  I
> think (but haven't confirmed yet) that the same thing could happen
> with different implementations of the
> ServletContainerAdapter.ensureFailover method if it calls
> "session.setAttribute(...)".  Just a hunch, though, and I need to
> confirm this.
>
>   Back to the deadlock, absence of synchronization on the "destroy"
> lifecycle call allows a second thread to enter into the
> ControlBeanContext which can cause the thread to deadlock with a
> second thread that's already in the ControlBeanContext.
>
>   
>> I don't think that's an option here, since it voids the whole purpose of
>> DeferredSessionStorageHandler
>> (http://mail-archives.apache.org/mod_mbox/beehive-dev/200601.mbox/%3C43D7AC11.8060207@gmail.com%3E
>> ) -- am I understanding you here?
>>     
>
> Yeah, that makes sense -- just wanted to rule this out as an option.  :)
>
>   
>> Otherwise, I wonder if there's a way to track this without adding state
>> to the page flow instance.  Possibly in a request attribute?  Just a
>> thought...
>>     
>
> Sure -- this would certainly work.  It could take the form of a set of
> attribute names that should be persisted / failed over, but that
> should ignore lifecycle methods.  Will look at this and get back to
> you.
>
> Thanks for the comments...
>
> Eddie
>
>
>   
>> Rich
>>
>> Eddie O'Neil wrote:
>>     
>>>   Gotcha -- thanks for the info; glad to know I wasn't entirely out in
>>> left field.  :)
>>>
>>>   Given this and putting the issue of the session expiration aside for
>>> a second, I think that root cause of the problem is that it is
>>> possible for the DeferredSessionStorageHandler to run the
>>> "valueUnbound" method code on a PageFlowController twice in these two
>>> circumstances:
>>>
>>>   - DeferredSessionStorageHandler::setAttribute / remove Attribute
>>>   - DeferredSessionStorageHandler::applyChanges @ line 221
>>>
>>> The first runs in "real time" with the removal or replacement of a
>>> Page Flow in the session and runs inside of the lock, but the second
>>> is the deferred part that runs at the end of a request to persist
>>> changed attributes in the session.  The "applyChanges" then makes
>>> calls to set / remove the Page Flow attribute from the real
>>> HttpSession, and this triggers the "valueUnbound" event and ultimately
>>> an "onDestroy".  This latter step is unsynchronized and thus causes
>>> the deadlock.
>>>
>>>   If the Page Flow is stored in the session and is an
>>> HttpSessionBindingListener, any deferred calls are going to trigger
>>> this event.  In order to keep consecutive JPF invocations in the same
>>> order, the onDestroy of the previous JPF needs to run before onCreate
>>> of the next JPF.  The cheap way to fix this would be to track the
>>> state of "valueUnbound" as a private boolean inside of the
>>> PageFlowController.  Then, when the second valueUnbound call is made,
>>> the event will be a no-op on the JPF.
>>>
>>>   Another solution would be to tunnel down into the real HttpSession
>>> and make JPF changes "live" against the session rather than deferring
>>> them.
>>>
>>>   Have an opinion either way?
>>>
>>> Eddie
>>>
>>>
>>>
>>> On 4/6/06, Rich Feit <ri...@gmail.com> wrote:
>>>
>>>       
>>>> Funny, I'd drafted a longish email earlier asking about the BeanContext
>>>> lock, because it sounded like the *fundamental* issue (ordering problems
>>>> in locking that and the control bean instance itself).  But it's not, or
>>>> at least it's unrelated.  In a way, that's good.  :)
>>>>
>>>> Regarding this:
>>>>
>>>>         
>>>>> Thread A is execution an action on JPF X
>>>>> Thread B is destroying JPF X en route to running JPF Y
>>>>>
>>>>> From the code in PageFlowController.persistInSession (transitively
>>>>> down to StorageHandler.removeAttribute), it seems like a lock is held
>>>>> on the "current" JPF that would prevent this.  Right?
>>>>>
>>>>>           
>>>> That's correct.  The only time onDestroy() should ever run without a
>>>> lock on the page flow instance is when the session is expiring.  That's
>>>> the situation where the session itself is already locked, so it's
>>>> dangerous to subsequently lock the page flow instance (while another
>>>> thread may be accessing the session through synchronized get() after
>>>> already locking the page flow instance).  The user could also simulate
>>>> this situation by explicitly removing the page flow session attribute,
>>>> but of course that's not something we should try to support.
>>>>
>>>> Let me know if you see/find any holes in that, though...
>>>>
>>>> Rich
>>>>
>>>> Eddie O'Neil wrote:
>>>>
>>>>         
>>>>> Rich--
>>>>>
>>>>>   Having looked into this more now, I've got a little more detail
>>>>> about what's happening:
>>>>>
>>>>>   There are two issues intertwined here.  The first issue is related
>>>>> to the ControlBeanContext implementation class and its usage of a
>>>>> global, VM-wide lock.  At issue is the fact that all
>>>>> ControlContainerContext implementations inherit from a
>>>>> BeanContextServicesSupport base class.  This class (and another in
>>>>> that hierarchy) use a the VM-wide object
>>>>> BeanContext.globalHierarchyLock for synchronization in two situations:
>>>>>
>>>>>   - adding / removing a control to / from a context
>>>>>   - adding / removing / getting a "service" from a context
>>>>>
>>>>> This acts as a choke point for all Control references and is what
>>>>> shows up in the locking traces in a previous mail.  This is
>>>>> problematic, but it's also orthogonal (I believe) to the issue with
>>>>> the "destroy" lifecycle method on a JPF.  Ultimately, I think that the
>>>>> JDK assumed that the classes in java.beans.beancontext.* would be used
>>>>> to run a single hierarchy of controls on a VM.  Beehive's usage
>>>>> pattern is a little different in that in enterprise applications, many
>>>>> hierarchies of beans run on a VM (one per JPF, in NetUI's case).  As
>>>>> such, a global lock doesn't fit well.
>>>>>
>>>>>   So, hopefully that provides some background on the
>>>>> BeanContext.globalHierarchyLock.
>>>>>
>>>>>   That being said, I think that the deadlock is orthogonal to this
>>>>> problem as the "destroy" lifecycle method on a JPF can be invoked
>>>>> without waiting for a lock on the JPF instance itself.  The result is
>>>>> that two threads can actually run through a JPF simultaneously.
>>>>>
>>>>>   With Daryl's changes in February, the ControlContainerContext object
>>>>> for a Page Flow is a class-level field, and locking is performed on
>>>>> this object at:
>>>>>
>>>>>   - create time
>>>>>   - action execution time
>>>>>   - page rendering time (JSP and Faces)
>>>>>
>>>>> Guess that the question is whether this situation is protected:
>>>>>
>>>>> Thread A is execution an action on JPF X
>>>>> Thread B is destroying JPF X en route to running JPF Y
>>>>>
>>>>> >From the code in PageFlowController.persistInSession (transitively
>>>>> down to StorageHandler.removeAttribute), it seems like a lock is held
>>>>> on the "current" JPF that would prevent this.  Right?
>>>>>
>>>>>   If that's true, then I've got a little more info to go on in
>>>>> investigating this problem.
>>>>>
>>>>>   Thanks for any info...  :)
>>>>>
>>>>> Eddie
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On 3/31/06, Eddie O'Neil <ek...@gmail.com> wrote:
>>>>>
>>>>>
>>>>>           
>>>>>>   Good questions -- let's see if I can explain the threads more
>>>>>> clearly in ASCII.  :)
>>>>>>
>>>>>>   What's happening is this (note, both threads operate on the *same*
>>>>>> instance of a JPF):
>>>>>>
>>>>>> Thread 1 (executing a page flow)              Thread2 (destroying a page flow)
>>>>>> acquire lock on FooPage Flow
>>>>>>                                                              acquire BC.gHL
>>>>>> acquire lock on BarControlBean
>>>>>>                                                              wait for
>>>>>> lock on BarControlBean
>>>>>> wait for lock on BC.gHL
>>>>>>
>>>>>> Because the Controls infrastructure is based on the BeanContext
>>>>>> (Services|Support|etc) code in the java.beans package of the JDK, it
>>>>>> uses the BeanContextServicesSupport class as a base class for the
>>>>>> ControlBeanContext object in a Control.  This JDK class (BCSS) uses
>>>>>> the BeanContext.globalHierarchyLock (also in the JDK) to serialize
>>>>>> access to shared data structures like the list of "service" objects in
>>>>>> a BeanContext, etc.  This is a lock that is static throughout the JDK
>>>>>> (!).
>>>>>>
>>>>>> To answer your questions:
>>>>>>
>>>>>>
>>>>>>
>>>>>>             
>>>>>>> What grabs the lock on BeanContextServicesSupport.globalHierarchyLock?
>>>>>>>
>>>>>>>
>>>>>>>               
>>>>>> The base classes for the ControlBeanContext object which delegates up
>>>>>> to it's "super" in order to serialize changes and service requests to
>>>>>> the classes that implement event listener support, etc.
>>>>>>
>>>>>>
>>>>>>
>>>>>>             
>>>>>>> What prevents deadlock in general between locking on
>>>>>>> BeanContextServicesSupport.globalHierarchyLock and BarControlBean?
>>>>>>>
>>>>>>>
>>>>>>>               
>>>>>> Good question -- this seems like the crux of the issue.  In general,
>>>>>> this is protected by a Control container serializing access to the
>>>>>> container's ControlContainerContext object.  In this case, the problem
>>>>>> occurs because two threads are trying to setup and destroy the same
>>>>>> context object simultaneously.  If those setup and destroy calls were
>>>>>> serialized against the ControlContainerContext object stored in the
>>>>>> HttpSession, presumably, the semantics would be maintained.
>>>>>>
>>>>>> My guess is that this deadlock occurred with a double page submit or
>>>>>> quick refresh that caught the tail end of the previous thread and the
>>>>>> beginning of the next request such that the threads interleaved in
>>>>>> this way.
>>>>>>
>>>>>> So, hopefully that helps explain what's happening.
>>>>>>
>>>>>> Ultimately, I'm not sure how to fix this yet and would appreciate
>>>>>> thoughts on the topic.  Would it be safe to acquire a lock on the JPF
>>>>>> instance during onDestroy?  That would serialize the access to the
>>>>>> session scoped CCC object.
>>>>>>
>>>>>> Another way to go (and it's a lot of work) is to build a NetUI /
>>>>>> web-tier specific control container that doesn't leverage the *Support
>>>>>> classes in the JDK.  This would allow the container's implementation
>>>>>> to be tailored to the single-threaded nature of the web tier and would
>>>>>> remove a lot of the performance implications around locking and
>>>>>> synchronized data structures.  But, I've not thought through this yet,
>>>>>> either.  :)
>>>>>>
>>>>>> Eddie
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 3/30/06, Rich Feit <ri...@gmail.com> wrote:
>>>>>>
>>>>>>
>>>>>>             
>>>>>>> Looks like a tough one.  First of all, the problem with synchronizing in
>>>>>>> onDestroy is also a deadlock issue: since onDestroy is called from an
>>>>>>> HttpSessionBindingListener, the Servlet container may have a lock on the
>>>>>>> session itself when onDestroy is called.  I don't think it's mandated by
>>>>>>> the Servlet spec, but I don't think it's forbidden either.
>>>>>>> Unfortunately, this means that a deadlock can occur if code in another
>>>>>>> thread synchronizes on the page flow instance first (as in an action
>>>>>>> method invocation), then calls a method on HttpSession that synchronizes
>>>>>>> on the session object.  Again, I don't think this is mandated by the
>>>>>>> spec, but it happens.
>>>>>>>
>>>>>>> Nothing strikes me off the bat, but I actually don't understand the
>>>>>>> deadlock sequence below (I don't disagree -- just don't have enough info
>>>>>>> to see it).  What grabs the lock on
>>>>>>> BeanContextServicesSupport.globalHierarchyLock?  What prevents deadlock
>>>>>>> in general between locking on
>>>>>>> BeanContextServicesSupport.globalHierarchyLock and BarControlBean?
>>>>>>>
>>>>>>> Also, how is the same JPF being *created* by two separate threads?
>>>>>>>
>>>>>>> Rich
>>>>>>>
>>>>>>> Eddie O'Neil wrote:
>>>>>>>
>>>>>>>
>>>>>>>               
>>>>>>>> Rich/Daryl--
>>>>>>>>
>>>>>>>>   Hey, I've run into a thread deadlock problem in the interaction
>>>>>>>> between Controls and Page Flow that happens in the following
>>>>>>>> circumstance:
>>>>>>>>
>>>>>>>> thread1: acquire lock on FooPageFlow (during FlowController.execute)
>>>>>>>> thread2: acquire lock on
>>>>>>>> BeanContextServicesSupport.globalHierarchyLock (JDK class)
>>>>>>>> thread1: acquire lock on BarControlBean (ControlBean.ensureControl())
>>>>>>>> thread2: wait for lock on BarControlBean (BeanContextSupport.remove())
>>>>>>>> thread1: wait for lock on
>>>>>>>> BeanContextServicesSupport.globalHierarchyLock (JDK class)
>>>>>>>>
>>>>>>>> So, the problem is that the same JPF is being both created and
>>>>>>>> destroyed by two different threads.  It appears that the "destroy"
>>>>>>>> phase of the JPF lifecycle is entirely unsynchronized and can freely
>>>>>>>> acquire Control locks without having serialized access to the Page
>>>>>>>> Flow itself.
>>>>>>>>
>>>>>>>>   My thought is to add a synchronization point in
>>>>>>>> JavaControlUtils.uninitJavaControls that locks on the
>>>>>>>> PageFlowManagedObject as this would serialize access to the Page Flow.
>>>>>>>>  But, I seem to recall some threading issues with locking on a JPF
>>>>>>>> during the "destroy" part of the lifecycle and don't want to resurrect
>>>>>>>> those.
>>>>>>>>
>>>>>>>>   Any suggestions about how best to make this fix?
>>>>>>>>
>>>>>>>> Eddie
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>                 
>>>       
>
>

Re: page flow destroy, controls, and deadlock

Posted by Eddie O'Neil <ek...@gmail.com>.

  Yeah -- that's right the deadlock is caused by lack of
synchronization during "destroy" on the second valueUnbound event.  I
think (but haven't confirmed yet) that the same thing could happen
with different implementations of the
ServletContainerAdapter.ensureFailover method if it calls
"session.setAttribute(...)".  Just a hunch, though, and I need to
confirm this.

  Back to the deadlock, absence of synchronization on the "destroy"
lifecycle call allows a second thread to enter into the
ControlBeanContext which can cause the thread to deadlock with a
second thread that's already in the ControlBeanContext.

> I don't think that's an option here, since it voids the whole purpose of
> DeferredSessionStorageHandler
> (http://mail-archives.apache.org/mod_mbox/beehive-dev/200601.mbox/%3C43D7AC11.8060207@gmail.com%3E
> ) -- am I understanding you here?

Yeah, that makes sense -- just wanted to rule this out as an option.  :)

>
> Otherwise, I wonder if there's a way to track this without adding state
> to the page flow instance.  Possibly in a request attribute?  Just a
> thought...

Sure -- this would certainly work.  It could take the form of a set of
attribute names that should be persisted / failed over, but that
should ignore lifecycle methods.  Will look at this and get back to
you.

Thanks for the comments...

Eddie


>
> Rich
>
> Eddie O'Neil wrote:
> >   Gotcha -- thanks for the info; glad to know I wasn't entirely out in
> > left field.  :)
> >
> >   Given this and putting the issue of the session expiration aside for
> > a second, I think that root cause of the problem is that it is
> > possible for the DeferredSessionStorageHandler to run the
> > "valueUnbound" method code on a PageFlowController twice in these two
> > circumstances:
> >
> >   - DeferredSessionStorageHandler::setAttribute / remove Attribute
> >   - DeferredSessionStorageHandler::applyChanges @ line 221
> >
> > The first runs in "real time" with the removal or replacement of a
> > Page Flow in the session and runs inside of the lock, but the second
> > is the deferred part that runs at the end of a request to persist
> > changed attributes in the session.  The "applyChanges" then makes
> > calls to set / remove the Page Flow attribute from the real
> > HttpSession, and this triggers the "valueUnbound" event and ultimately
> > an "onDestroy".  This latter step is unsynchronized and thus causes
> > the deadlock.
> >
> >   If the Page Flow is stored in the session and is an
> > HttpSessionBindingListener, any deferred calls are going to trigger
> > this event.  In order to keep consecutive JPF invocations in the same
> > order, the onDestroy of the previous JPF needs to run before onCreate
> > of the next JPF.  The cheap way to fix this would be to track the
> > state of "valueUnbound" as a private boolean inside of the
> > PageFlowController.  Then, when the second valueUnbound call is made,
> > the event will be a no-op on the JPF.
> >
> >   Another solution would be to tunnel down into the real HttpSession
> > and make JPF changes "live" against the session rather than deferring
> > them.
> >
> >   Have an opinion either way?
> >
> > Eddie
> >
> >
> >
> > On 4/6/06, Rich Feit <ri...@gmail.com> wrote:
> >
> >> Funny, I'd drafted a longish email earlier asking about the BeanContext
> >> lock, because it sounded like the *fundamental* issue (ordering problems
> >> in locking that and the control bean instance itself).  But it's not, or
> >> at least it's unrelated.  In a way, that's good.  :)
> >>
> >> Regarding this:
> >>
> >>> Thread A is execution an action on JPF X
> >>> Thread B is destroying JPF X en route to running JPF Y
> >>>
> >>> From the code in PageFlowController.persistInSession (transitively
> >>> down to StorageHandler.removeAttribute), it seems like a lock is held
> >>> on the "current" JPF that would prevent this.  Right?
> >>>
> >> That's correct.  The only time onDestroy() should ever run without a
> >> lock on the page flow instance is when the session is expiring.  That's
> >> the situation where the session itself is already locked, so it's
> >> dangerous to subsequently lock the page flow instance (while another
> >> thread may be accessing the session through synchronized get() after
> >> already locking the page flow instance).  The user could also simulate
> >> this situation by explicitly removing the page flow session attribute,
> >> but of course that's not something we should try to support.
> >>
> >> Let me know if you see/find any holes in that, though...
> >>
> >> Rich
> >>
> >> Eddie O'Neil wrote:
> >>
> >>> Rich--
> >>>
> >>>   Having looked into this more now, I've got a little more detail
> >>> about what's happening:
> >>>
> >>>   There are two issues intertwined here.  The first issue is related
> >>> to the ControlBeanContext implementation class and its usage of a
> >>> global, VM-wide lock.  At issue is the fact that all
> >>> ControlContainerContext implementations inherit from a
> >>> BeanContextServicesSupport base class.  This class (and another in
> >>> that hierarchy) use a the VM-wide object
> >>> BeanContext.globalHierarchyLock for synchronization in two situations:
> >>>
> >>>   - adding / removing a control to / from a context
> >>>   - adding / removing / getting a "service" from a context
> >>>
> >>> This acts as a choke point for all Control references and is what
> >>> shows up in the locking traces in a previous mail.  This is
> >>> problematic, but it's also orthogonal (I believe) to the issue with
> >>> the "destroy" lifecycle method on a JPF.  Ultimately, I think that the
> >>> JDK assumed that the classes in java.beans.beancontext.* would be used
> >>> to run a single hierarchy of controls on a VM.  Beehive's usage
> >>> pattern is a little different in that in enterprise applications, many
> >>> hierarchies of beans run on a VM (one per JPF, in NetUI's case).  As
> >>> such, a global lock doesn't fit well.
> >>>
> >>>   So, hopefully that provides some background on the
> >>> BeanContext.globalHierarchyLock.
> >>>
> >>>   That being said, I think that the deadlock is orthogonal to this
> >>> problem as the "destroy" lifecycle method on a JPF can be invoked
> >>> without waiting for a lock on the JPF instance itself.  The result is
> >>> that two threads can actually run through a JPF simultaneously.
> >>>
> >>>   With Daryl's changes in February, the ControlContainerContext object
> >>> for a Page Flow is a class-level field, and locking is performed on
> >>> this object at:
> >>>
> >>>   - create time
> >>>   - action execution time
> >>>   - page rendering time (JSP and Faces)
> >>>
> >>> Guess that the question is whether this situation is protected:
> >>>
> >>> Thread A is execution an action on JPF X
> >>> Thread B is destroying JPF X en route to running JPF Y
> >>>
> >>> >From the code in PageFlowController.persistInSession (transitively
> >>> down to StorageHandler.removeAttribute), it seems like a lock is held
> >>> on the "current" JPF that would prevent this.  Right?
> >>>
> >>>   If that's true, then I've got a little more info to go on in
> >>> investigating this problem.
> >>>
> >>>   Thanks for any info...  :)
> >>>
> >>> Eddie
> >>>
> >>>
> >>>
> >>>
> >>> On 3/31/06, Eddie O'Neil <ek...@gmail.com> wrote:
> >>>
> >>>
> >>>>   Good questions -- let's see if I can explain the threads more
> >>>> clearly in ASCII.  :)
> >>>>
> >>>>   What's happening is this (note, both threads operate on the *same*
> >>>> instance of a JPF):
> >>>>
> >>>> Thread 1 (executing a page flow)              Thread2 (destroying a page flow)
> >>>> acquire lock on FooPage Flow
> >>>>                                                              acquire BC.gHL
> >>>> acquire lock on BarControlBean
> >>>>                                                              wait for
> >>>> lock on BarControlBean
> >>>> wait for lock on BC.gHL
> >>>>
> >>>> Because the Controls infrastructure is based on the BeanContext
> >>>> (Services|Support|etc) code in the java.beans package of the JDK, it
> >>>> uses the BeanContextServicesSupport class as a base class for the
> >>>> ControlBeanContext object in a Control.  This JDK class (BCSS) uses
> >>>> the BeanContext.globalHierarchyLock (also in the JDK) to serialize
> >>>> access to shared data structures like the list of "service" objects in
> >>>> a BeanContext, etc.  This is a lock that is static throughout the JDK
> >>>> (!).
> >>>>
> >>>> To answer your questions:
> >>>>
> >>>>
> >>>>
> >>>>> What grabs the lock on BeanContextServicesSupport.globalHierarchyLock?
> >>>>>
> >>>>>
> >>>> The base classes for the ControlBeanContext object which delegates up
> >>>> to it's "super" in order to serialize changes and service requests to
> >>>> the classes that implement event listener support, etc.
> >>>>
> >>>>
> >>>>
> >>>>> What prevents deadlock in general between locking on
> >>>>> BeanContextServicesSupport.globalHierarchyLock and BarControlBean?
> >>>>>
> >>>>>
> >>>> Good question -- this seems like the crux of the issue.  In general,
> >>>> this is protected by a Control container serializing access to the
> >>>> container's ControlContainerContext object.  In this case, the problem
> >>>> occurs because two threads are trying to setup and destroy the same
> >>>> context object simultaneously.  If those setup and destroy calls were
> >>>> serialized against the ControlContainerContext object stored in the
> >>>> HttpSession, presumably, the semantics would be maintained.
> >>>>
> >>>> My guess is that this deadlock occurred with a double page submit or
> >>>> quick refresh that caught the tail end of the previous thread and the
> >>>> beginning of the next request such that the threads interleaved in
> >>>> this way.
> >>>>
> >>>> So, hopefully that helps explain what's happening.
> >>>>
> >>>> Ultimately, I'm not sure how to fix this yet and would appreciate
> >>>> thoughts on the topic.  Would it be safe to acquire a lock on the JPF
> >>>> instance during onDestroy?  That would serialize the access to the
> >>>> session scoped CCC object.
> >>>>
> >>>> Another way to go (and it's a lot of work) is to build a NetUI /
> >>>> web-tier specific control container that doesn't leverage the *Support
> >>>> classes in the JDK.  This would allow the container's implementation
> >>>> to be tailored to the single-threaded nature of the web tier and would
> >>>> remove a lot of the performance implications around locking and
> >>>> synchronized data structures.  But, I've not thought through this yet,
> >>>> either.  :)
> >>>>
> >>>> Eddie
> >>>>
> >>>>
> >>>>
> >>>> On 3/30/06, Rich Feit <ri...@gmail.com> wrote:
> >>>>
> >>>>
> >>>>> Looks like a tough one.  First of all, the problem with synchronizing in
> >>>>> onDestroy is also a deadlock issue: since onDestroy is called from an
> >>>>> HttpSessionBindingListener, the Servlet container may have a lock on the
> >>>>> session itself when onDestroy is called.  I don't think it's mandated by
> >>>>> the Servlet spec, but I don't think it's forbidden either.
> >>>>> Unfortunately, this means that a deadlock can occur if code in another
> >>>>> thread synchronizes on the page flow instance first (as in an action
> >>>>> method invocation), then calls a method on HttpSession that synchronizes
> >>>>> on the session object.  Again, I don't think this is mandated by the
> >>>>> spec, but it happens.
> >>>>>
> >>>>> Nothing strikes me off the bat, but I actually don't understand the
> >>>>> deadlock sequence below (I don't disagree -- just don't have enough info
> >>>>> to see it).  What grabs the lock on
> >>>>> BeanContextServicesSupport.globalHierarchyLock?  What prevents deadlock
> >>>>> in general between locking on
> >>>>> BeanContextServicesSupport.globalHierarchyLock and BarControlBean?
> >>>>>
> >>>>> Also, how is the same JPF being *created* by two separate threads?
> >>>>>
> >>>>> Rich
> >>>>>
> >>>>> Eddie O'Neil wrote:
> >>>>>
> >>>>>
> >>>>>> Rich/Daryl--
> >>>>>>
> >>>>>>   Hey, I've run into a thread deadlock problem in the interaction
> >>>>>> between Controls and Page Flow that happens in the following
> >>>>>> circumstance:
> >>>>>>
> >>>>>> thread1: acquire lock on FooPageFlow (during FlowController.execute)
> >>>>>> thread2: acquire lock on
> >>>>>> BeanContextServicesSupport.globalHierarchyLock (JDK class)
> >>>>>> thread1: acquire lock on BarControlBean (ControlBean.ensureControl())
> >>>>>> thread2: wait for lock on BarControlBean (BeanContextSupport.remove())
> >>>>>> thread1: wait for lock on
> >>>>>> BeanContextServicesSupport.globalHierarchyLock (JDK class)
> >>>>>>
> >>>>>> So, the problem is that the same JPF is being both created and
> >>>>>> destroyed by two different threads.  It appears that the "destroy"
> >>>>>> phase of the JPF lifecycle is entirely unsynchronized and can freely
> >>>>>> acquire Control locks without having serialized access to the Page
> >>>>>> Flow itself.
> >>>>>>
> >>>>>>   My thought is to add a synchronization point in
> >>>>>> JavaControlUtils.uninitJavaControls that locks on the
> >>>>>> PageFlowManagedObject as this would serialize access to the Page Flow.
> >>>>>>  But, I seem to recall some threading issues with locking on a JPF
> >>>>>> during the "destroy" part of the lifecycle and don't want to resurrect
> >>>>>> those.
> >>>>>>
> >>>>>>   Any suggestions about how best to make this fix?
> >>>>>>
> >>>>>> Eddie
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>
> >
> >
>

Re: page flow destroy, controls, and deadlock

Posted by Rich Feit <ri...@gmail.com>.

Saw the double-valueUnbound bug (nice bug!).  But what's the actual 
deadlock that's happening in this situation?  I mean, what deadlock is 
being caused by *lack* of synchronization?  Just trying to understand. 

About this:
>   Another solution would be to tunnel down into the real HttpSession
> and make JPF changes "live" against the session rather than deferring
> them.
I don't think that's an option here, since it voids the whole purpose of 
DeferredSessionStorageHandler 
(http://mail-archives.apache.org/mod_mbox/beehive-dev/200601.mbox/%3C43D7AC11.8060207@gmail.com%3E 
) -- am I understanding you here?

Otherwise, I wonder if there's a way to track this without adding state 
to the page flow instance.  Possibly in a request attribute?  Just a 
thought...

Rich

Eddie O'Neil wrote:
>   Gotcha -- thanks for the info; glad to know I wasn't entirely out in
> left field.  :)
>
>   Given this and putting the issue of the session expiration aside for
> a second, I think that root cause of the problem is that it is
> possible for the DeferredSessionStorageHandler to run the
> "valueUnbound" method code on a PageFlowController twice in these two
> circumstances:
>
>   - DeferredSessionStorageHandler::setAttribute / remove Attribute
>   - DeferredSessionStorageHandler::applyChanges @ line 221
>
> The first runs in "real time" with the removal or replacement of a
> Page Flow in the session and runs inside of the lock, but the second
> is the deferred part that runs at the end of a request to persist
> changed attributes in the session.  The "applyChanges" then makes
> calls to set / remove the Page Flow attribute from the real
> HttpSession, and this triggers the "valueUnbound" event and ultimately
> an "onDestroy".  This latter step is unsynchronized and thus causes
> the deadlock.
>
>   If the Page Flow is stored in the session and is an
> HttpSessionBindingListener, any deferred calls are going to trigger
> this event.  In order to keep consecutive JPF invocations in the same
> order, the onDestroy of the previous JPF needs to run before onCreate
> of the next JPF.  The cheap way to fix this would be to track the
> state of "valueUnbound" as a private boolean inside of the
> PageFlowController.  Then, when the second valueUnbound call is made,
> the event will be a no-op on the JPF.
>
>   Another solution would be to tunnel down into the real HttpSession
> and make JPF changes "live" against the session rather than deferring
> them.
>
>   Have an opinion either way?
>
> Eddie
>
>
>
> On 4/6/06, Rich Feit <ri...@gmail.com> wrote:
>   
>> Funny, I'd drafted a longish email earlier asking about the BeanContext
>> lock, because it sounded like the *fundamental* issue (ordering problems
>> in locking that and the control bean instance itself).  But it's not, or
>> at least it's unrelated.  In a way, that's good.  :)
>>
>> Regarding this:
>>     
>>> Thread A is execution an action on JPF X
>>> Thread B is destroying JPF X en route to running JPF Y
>>>
>>> From the code in PageFlowController.persistInSession (transitively
>>> down to StorageHandler.removeAttribute), it seems like a lock is held
>>> on the "current" JPF that would prevent this.  Right?
>>>       
>> That's correct.  The only time onDestroy() should ever run without a
>> lock on the page flow instance is when the session is expiring.  That's
>> the situation where the session itself is already locked, so it's
>> dangerous to subsequently lock the page flow instance (while another
>> thread may be accessing the session through synchronized get() after
>> already locking the page flow instance).  The user could also simulate
>> this situation by explicitly removing the page flow session attribute,
>> but of course that's not something we should try to support.
>>
>> Let me know if you see/find any holes in that, though...
>>
>> Rich
>>
>> Eddie O'Neil wrote:
>>     
>>> Rich--
>>>
>>>   Having looked into this more now, I've got a little more detail
>>> about what's happening:
>>>
>>>   There are two issues intertwined here.  The first issue is related
>>> to the ControlBeanContext implementation class and its usage of a
>>> global, VM-wide lock.  At issue is the fact that all
>>> ControlContainerContext implementations inherit from a
>>> BeanContextServicesSupport base class.  This class (and another in
>>> that hierarchy) use a the VM-wide object
>>> BeanContext.globalHierarchyLock for synchronization in two situations:
>>>
>>>   - adding / removing a control to / from a context
>>>   - adding / removing / getting a "service" from a context
>>>
>>> This acts as a choke point for all Control references and is what
>>> shows up in the locking traces in a previous mail.  This is
>>> problematic, but it's also orthogonal (I believe) to the issue with
>>> the "destroy" lifecycle method on a JPF.  Ultimately, I think that the
>>> JDK assumed that the classes in java.beans.beancontext.* would be used
>>> to run a single hierarchy of controls on a VM.  Beehive's usage
>>> pattern is a little different in that in enterprise applications, many
>>> hierarchies of beans run on a VM (one per JPF, in NetUI's case).  As
>>> such, a global lock doesn't fit well.
>>>
>>>   So, hopefully that provides some background on the
>>> BeanContext.globalHierarchyLock.
>>>
>>>   That being said, I think that the deadlock is orthogonal to this
>>> problem as the "destroy" lifecycle method on a JPF can be invoked
>>> without waiting for a lock on the JPF instance itself.  The result is
>>> that two threads can actually run through a JPF simultaneously.
>>>
>>>   With Daryl's changes in February, the ControlContainerContext object
>>> for a Page Flow is a class-level field, and locking is performed on
>>> this object at:
>>>
>>>   - create time
>>>   - action execution time
>>>   - page rendering time (JSP and Faces)
>>>
>>> Guess that the question is whether this situation is protected:
>>>
>>> Thread A is execution an action on JPF X
>>> Thread B is destroying JPF X en route to running JPF Y
>>>
>>> >From the code in PageFlowController.persistInSession (transitively
>>> down to StorageHandler.removeAttribute), it seems like a lock is held
>>> on the "current" JPF that would prevent this.  Right?
>>>
>>>   If that's true, then I've got a little more info to go on in
>>> investigating this problem.
>>>
>>>   Thanks for any info...  :)
>>>
>>> Eddie
>>>
>>>
>>>
>>>
>>> On 3/31/06, Eddie O'Neil <ek...@gmail.com> wrote:
>>>
>>>       
>>>>   Good questions -- let's see if I can explain the threads more
>>>> clearly in ASCII.  :)
>>>>
>>>>   What's happening is this (note, both threads operate on the *same*
>>>> instance of a JPF):
>>>>
>>>> Thread 1 (executing a page flow)              Thread2 (destroying a page flow)
>>>> acquire lock on FooPage Flow
>>>>                                                              acquire BC.gHL
>>>> acquire lock on BarControlBean
>>>>                                                              wait for
>>>> lock on BarControlBean
>>>> wait for lock on BC.gHL
>>>>
>>>> Because the Controls infrastructure is based on the BeanContext
>>>> (Services|Support|etc) code in the java.beans package of the JDK, it
>>>> uses the BeanContextServicesSupport class as a base class for the
>>>> ControlBeanContext object in a Control.  This JDK class (BCSS) uses
>>>> the BeanContext.globalHierarchyLock (also in the JDK) to serialize
>>>> access to shared data structures like the list of "service" objects in
>>>> a BeanContext, etc.  This is a lock that is static throughout the JDK
>>>> (!).
>>>>
>>>> To answer your questions:
>>>>
>>>>
>>>>         
>>>>> What grabs the lock on BeanContextServicesSupport.globalHierarchyLock?
>>>>>
>>>>>           
>>>> The base classes for the ControlBeanContext object which delegates up
>>>> to it's "super" in order to serialize changes and service requests to
>>>> the classes that implement event listener support, etc.
>>>>
>>>>
>>>>         
>>>>> What prevents deadlock in general between locking on
>>>>> BeanContextServicesSupport.globalHierarchyLock and BarControlBean?
>>>>>
>>>>>           
>>>> Good question -- this seems like the crux of the issue.  In general,
>>>> this is protected by a Control container serializing access to the
>>>> container's ControlContainerContext object.  In this case, the problem
>>>> occurs because two threads are trying to setup and destroy the same
>>>> context object simultaneously.  If those setup and destroy calls were
>>>> serialized against the ControlContainerContext object stored in the
>>>> HttpSession, presumably, the semantics would be maintained.
>>>>
>>>> My guess is that this deadlock occurred with a double page submit or
>>>> quick refresh that caught the tail end of the previous thread and the
>>>> beginning of the next request such that the threads interleaved in
>>>> this way.
>>>>
>>>> So, hopefully that helps explain what's happening.
>>>>
>>>> Ultimately, I'm not sure how to fix this yet and would appreciate
>>>> thoughts on the topic.  Would it be safe to acquire a lock on the JPF
>>>> instance during onDestroy?  That would serialize the access to the
>>>> session scoped CCC object.
>>>>
>>>> Another way to go (and it's a lot of work) is to build a NetUI /
>>>> web-tier specific control container that doesn't leverage the *Support
>>>> classes in the JDK.  This would allow the container's implementation
>>>> to be tailored to the single-threaded nature of the web tier and would
>>>> remove a lot of the performance implications around locking and
>>>> synchronized data structures.  But, I've not thought through this yet,
>>>> either.  :)
>>>>
>>>> Eddie
>>>>
>>>>
>>>>
>>>> On 3/30/06, Rich Feit <ri...@gmail.com> wrote:
>>>>
>>>>         
>>>>> Looks like a tough one.  First of all, the problem with synchronizing in
>>>>> onDestroy is also a deadlock issue: since onDestroy is called from an
>>>>> HttpSessionBindingListener, the Servlet container may have a lock on the
>>>>> session itself when onDestroy is called.  I don't think it's mandated by
>>>>> the Servlet spec, but I don't think it's forbidden either.
>>>>> Unfortunately, this means that a deadlock can occur if code in another
>>>>> thread synchronizes on the page flow instance first (as in an action
>>>>> method invocation), then calls a method on HttpSession that synchronizes
>>>>> on the session object.  Again, I don't think this is mandated by the
>>>>> spec, but it happens.
>>>>>
>>>>> Nothing strikes me off the bat, but I actually don't understand the
>>>>> deadlock sequence below (I don't disagree -- just don't have enough info
>>>>> to see it).  What grabs the lock on
>>>>> BeanContextServicesSupport.globalHierarchyLock?  What prevents deadlock
>>>>> in general between locking on
>>>>> BeanContextServicesSupport.globalHierarchyLock and BarControlBean?
>>>>>
>>>>> Also, how is the same JPF being *created* by two separate threads?
>>>>>
>>>>> Rich
>>>>>
>>>>> Eddie O'Neil wrote:
>>>>>
>>>>>           
>>>>>> Rich/Daryl--
>>>>>>
>>>>>>   Hey, I've run into a thread deadlock problem in the interaction
>>>>>> between Controls and Page Flow that happens in the following
>>>>>> circumstance:
>>>>>>
>>>>>> thread1: acquire lock on FooPageFlow (during FlowController.execute)
>>>>>> thread2: acquire lock on
>>>>>> BeanContextServicesSupport.globalHierarchyLock (JDK class)
>>>>>> thread1: acquire lock on BarControlBean (ControlBean.ensureControl())
>>>>>> thread2: wait for lock on BarControlBean (BeanContextSupport.remove())
>>>>>> thread1: wait for lock on
>>>>>> BeanContextServicesSupport.globalHierarchyLock (JDK class)
>>>>>>
>>>>>> So, the problem is that the same JPF is being both created and
>>>>>> destroyed by two different threads.  It appears that the "destroy"
>>>>>> phase of the JPF lifecycle is entirely unsynchronized and can freely
>>>>>> acquire Control locks without having serialized access to the Page
>>>>>> Flow itself.
>>>>>>
>>>>>>   My thought is to add a synchronization point in
>>>>>> JavaControlUtils.uninitJavaControls that locks on the
>>>>>> PageFlowManagedObject as this would serialize access to the Page Flow.
>>>>>>  But, I seem to recall some threading issues with locking on a JPF
>>>>>> during the "destroy" part of the lifecycle and don't want to resurrect
>>>>>> those.
>>>>>>
>>>>>>   Any suggestions about how best to make this fix?
>>>>>>
>>>>>> Eddie
>>>>>>
>>>>>>
>>>>>>
>>>>>>             
>>>       
>
>

Re: page flow destroy, controls, and deadlock

Posted by Eddie O'Neil <ek...@gmail.com>.

  Gotcha -- thanks for the info; glad to know I wasn't entirely out in
left field.  :)

  Given this and putting the issue of the session expiration aside for
a second, I think that root cause of the problem is that it is
possible for the DeferredSessionStorageHandler to run the
"valueUnbound" method code on a PageFlowController twice in these two
circumstances:

  - DeferredSessionStorageHandler::setAttribute / remove Attribute
  - DeferredSessionStorageHandler::applyChanges @ line 221

The first runs in "real time" with the removal or replacement of a
Page Flow in the session and runs inside of the lock, but the second
is the deferred part that runs at the end of a request to persist
changed attributes in the session.  The "applyChanges" then makes
calls to set / remove the Page Flow attribute from the real
HttpSession, and this triggers the "valueUnbound" event and ultimately
an "onDestroy".  This latter step is unsynchronized and thus causes
the deadlock.

  If the Page Flow is stored in the session and is an
HttpSessionBindingListener, any deferred calls are going to trigger
this event.  In order to keep consecutive JPF invocations in the same
order, the onDestroy of the previous JPF needs to run before onCreate
of the next JPF.  The cheap way to fix this would be to track the
state of "valueUnbound" as a private boolean inside of the
PageFlowController.  Then, when the second valueUnbound call is made,
the event will be a no-op on the JPF.

  Another solution would be to tunnel down into the real HttpSession
and make JPF changes "live" against the session rather than deferring
them.

  Have an opinion either way?

Eddie



On 4/6/06, Rich Feit <ri...@gmail.com> wrote:
> Funny, I'd drafted a longish email earlier asking about the BeanContext
> lock, because it sounded like the *fundamental* issue (ordering problems
> in locking that and the control bean instance itself).  But it's not, or
> at least it's unrelated.  In a way, that's good.  :)
>
> Regarding this:
> > Thread A is execution an action on JPF X
> > Thread B is destroying JPF X en route to running JPF Y
> >
> > From the code in PageFlowController.persistInSession (transitively
> > down to StorageHandler.removeAttribute), it seems like a lock is held
> > on the "current" JPF that would prevent this.  Right?
> That's correct.  The only time onDestroy() should ever run without a
> lock on the page flow instance is when the session is expiring.  That's
> the situation where the session itself is already locked, so it's
> dangerous to subsequently lock the page flow instance (while another
> thread may be accessing the session through synchronized get() after
> already locking the page flow instance).  The user could also simulate
> this situation by explicitly removing the page flow session attribute,
> but of course that's not something we should try to support.
>
> Let me know if you see/find any holes in that, though...
>
> Rich
>
> Eddie O'Neil wrote:
> > Rich--
> >
> >   Having looked into this more now, I've got a little more detail
> > about what's happening:
> >
> >   There are two issues intertwined here.  The first issue is related
> > to the ControlBeanContext implementation class and its usage of a
> > global, VM-wide lock.  At issue is the fact that all
> > ControlContainerContext implementations inherit from a
> > BeanContextServicesSupport base class.  This class (and another in
> > that hierarchy) use a the VM-wide object
> > BeanContext.globalHierarchyLock for synchronization in two situations:
> >
> >   - adding / removing a control to / from a context
> >   - adding / removing / getting a "service" from a context
> >
> > This acts as a choke point for all Control references and is what
> > shows up in the locking traces in a previous mail.  This is
> > problematic, but it's also orthogonal (I believe) to the issue with
> > the "destroy" lifecycle method on a JPF.  Ultimately, I think that the
> > JDK assumed that the classes in java.beans.beancontext.* would be used
> > to run a single hierarchy of controls on a VM.  Beehive's usage
> > pattern is a little different in that in enterprise applications, many
> > hierarchies of beans run on a VM (one per JPF, in NetUI's case).  As
> > such, a global lock doesn't fit well.
> >
> >   So, hopefully that provides some background on the
> > BeanContext.globalHierarchyLock.
> >
> >   That being said, I think that the deadlock is orthogonal to this
> > problem as the "destroy" lifecycle method on a JPF can be invoked
> > without waiting for a lock on the JPF instance itself.  The result is
> > that two threads can actually run through a JPF simultaneously.
> >
> >   With Daryl's changes in February, the ControlContainerContext object
> > for a Page Flow is a class-level field, and locking is performed on
> > this object at:
> >
> >   - create time
> >   - action execution time
> >   - page rendering time (JSP and Faces)
> >
> > Guess that the question is whether this situation is protected:
> >
> > Thread A is execution an action on JPF X
> > Thread B is destroying JPF X en route to running JPF Y
> >
> > >From the code in PageFlowController.persistInSession (transitively
> > down to StorageHandler.removeAttribute), it seems like a lock is held
> > on the "current" JPF that would prevent this.  Right?
> >
> >   If that's true, then I've got a little more info to go on in
> > investigating this problem.
> >
> >   Thanks for any info...  :)
> >
> > Eddie
> >
> >
> >
> >
> > On 3/31/06, Eddie O'Neil <ek...@gmail.com> wrote:
> >
> >>   Good questions -- let's see if I can explain the threads more
> >> clearly in ASCII.  :)
> >>
> >>   What's happening is this (note, both threads operate on the *same*
> >> instance of a JPF):
> >>
> >> Thread 1 (executing a page flow)              Thread2 (destroying a page flow)
> >> acquire lock on FooPage Flow
> >>                                                              acquire BC.gHL
> >> acquire lock on BarControlBean
> >>                                                              wait for
> >> lock on BarControlBean
> >> wait for lock on BC.gHL
> >>
> >> Because the Controls infrastructure is based on the BeanContext
> >> (Services|Support|etc) code in the java.beans package of the JDK, it
> >> uses the BeanContextServicesSupport class as a base class for the
> >> ControlBeanContext object in a Control.  This JDK class (BCSS) uses
> >> the BeanContext.globalHierarchyLock (also in the JDK) to serialize
> >> access to shared data structures like the list of "service" objects in
> >> a BeanContext, etc.  This is a lock that is static throughout the JDK
> >> (!).
> >>
> >> To answer your questions:
> >>
> >>
> >>> What grabs the lock on BeanContextServicesSupport.globalHierarchyLock?
> >>>
> >> The base classes for the ControlBeanContext object which delegates up
> >> to it's "super" in order to serialize changes and service requests to
> >> the classes that implement event listener support, etc.
> >>
> >>
> >>> What prevents deadlock in general between locking on
> >>> BeanContextServicesSupport.globalHierarchyLock and BarControlBean?
> >>>
> >> Good question -- this seems like the crux of the issue.  In general,
> >> this is protected by a Control container serializing access to the
> >> container's ControlContainerContext object.  In this case, the problem
> >> occurs because two threads are trying to setup and destroy the same
> >> context object simultaneously.  If those setup and destroy calls were
> >> serialized against the ControlContainerContext object stored in the
> >> HttpSession, presumably, the semantics would be maintained.
> >>
> >> My guess is that this deadlock occurred with a double page submit or
> >> quick refresh that caught the tail end of the previous thread and the
> >> beginning of the next request such that the threads interleaved in
> >> this way.
> >>
> >> So, hopefully that helps explain what's happening.
> >>
> >> Ultimately, I'm not sure how to fix this yet and would appreciate
> >> thoughts on the topic.  Would it be safe to acquire a lock on the JPF
> >> instance during onDestroy?  That would serialize the access to the
> >> session scoped CCC object.
> >>
> >> Another way to go (and it's a lot of work) is to build a NetUI /
> >> web-tier specific control container that doesn't leverage the *Support
> >> classes in the JDK.  This would allow the container's implementation
> >> to be tailored to the single-threaded nature of the web tier and would
> >> remove a lot of the performance implications around locking and
> >> synchronized data structures.  But, I've not thought through this yet,
> >> either.  :)
> >>
> >> Eddie
> >>
> >>
> >>
> >> On 3/30/06, Rich Feit <ri...@gmail.com> wrote:
> >>
> >>> Looks like a tough one.  First of all, the problem with synchronizing in
> >>> onDestroy is also a deadlock issue: since onDestroy is called from an
> >>> HttpSessionBindingListener, the Servlet container may have a lock on the
> >>> session itself when onDestroy is called.  I don't think it's mandated by
> >>> the Servlet spec, but I don't think it's forbidden either.
> >>> Unfortunately, this means that a deadlock can occur if code in another
> >>> thread synchronizes on the page flow instance first (as in an action
> >>> method invocation), then calls a method on HttpSession that synchronizes
> >>> on the session object.  Again, I don't think this is mandated by the
> >>> spec, but it happens.
> >>>
> >>> Nothing strikes me off the bat, but I actually don't understand the
> >>> deadlock sequence below (I don't disagree -- just don't have enough info
> >>> to see it).  What grabs the lock on
> >>> BeanContextServicesSupport.globalHierarchyLock?  What prevents deadlock
> >>> in general between locking on
> >>> BeanContextServicesSupport.globalHierarchyLock and BarControlBean?
> >>>
> >>> Also, how is the same JPF being *created* by two separate threads?
> >>>
> >>> Rich
> >>>
> >>> Eddie O'Neil wrote:
> >>>
> >>>> Rich/Daryl--
> >>>>
> >>>>   Hey, I've run into a thread deadlock problem in the interaction
> >>>> between Controls and Page Flow that happens in the following
> >>>> circumstance:
> >>>>
> >>>> thread1: acquire lock on FooPageFlow (during FlowController.execute)
> >>>> thread2: acquire lock on
> >>>> BeanContextServicesSupport.globalHierarchyLock (JDK class)
> >>>> thread1: acquire lock on BarControlBean (ControlBean.ensureControl())
> >>>> thread2: wait for lock on BarControlBean (BeanContextSupport.remove())
> >>>> thread1: wait for lock on
> >>>> BeanContextServicesSupport.globalHierarchyLock (JDK class)
> >>>>
> >>>> So, the problem is that the same JPF is being both created and
> >>>> destroyed by two different threads.  It appears that the "destroy"
> >>>> phase of the JPF lifecycle is entirely unsynchronized and can freely
> >>>> acquire Control locks without having serialized access to the Page
> >>>> Flow itself.
> >>>>
> >>>>   My thought is to add a synchronization point in
> >>>> JavaControlUtils.uninitJavaControls that locks on the
> >>>> PageFlowManagedObject as this would serialize access to the Page Flow.
> >>>>  But, I seem to recall some threading issues with locking on a JPF
> >>>> during the "destroy" part of the lifecycle and don't want to resurrect
> >>>> those.
> >>>>
> >>>>   Any suggestions about how best to make this fix?
> >>>>
> >>>> Eddie
> >>>>
> >>>>
> >>>>
> >
> >
>

Re: page flow destroy, controls, and deadlock

Posted by Rich Feit <ri...@gmail.com>.

Funny, I'd drafted a longish email earlier asking about the BeanContext 
lock, because it sounded like the *fundamental* issue (ordering problems 
in locking that and the control bean instance itself).  But it's not, or 
at least it's unrelated.  In a way, that's good.  :)

Regarding this:
> Thread A is execution an action on JPF X
> Thread B is destroying JPF X en route to running JPF Y
>
> From the code in PageFlowController.persistInSession (transitively
> down to StorageHandler.removeAttribute), it seems like a lock is held
> on the "current" JPF that would prevent this.  Right?
That's correct.  The only time onDestroy() should ever run without a 
lock on the page flow instance is when the session is expiring.  That's 
the situation where the session itself is already locked, so it's 
dangerous to subsequently lock the page flow instance (while another 
thread may be accessing the session through synchronized get() after 
already locking the page flow instance).  The user could also simulate 
this situation by explicitly removing the page flow session attribute, 
but of course that's not something we should try to support.

Let me know if you see/find any holes in that, though...

Rich

Eddie O'Neil wrote:
> Rich--
>
>   Having looked into this more now, I've got a little more detail
> about what's happening:
>
>   There are two issues intertwined here.  The first issue is related
> to the ControlBeanContext implementation class and its usage of a
> global, VM-wide lock.  At issue is the fact that all
> ControlContainerContext implementations inherit from a
> BeanContextServicesSupport base class.  This class (and another in
> that hierarchy) use a the VM-wide object
> BeanContext.globalHierarchyLock for synchronization in two situations:
>
>   - adding / removing a control to / from a context
>   - adding / removing / getting a "service" from a context
>
> This acts as a choke point for all Control references and is what
> shows up in the locking traces in a previous mail.  This is
> problematic, but it's also orthogonal (I believe) to the issue with
> the "destroy" lifecycle method on a JPF.  Ultimately, I think that the
> JDK assumed that the classes in java.beans.beancontext.* would be used
> to run a single hierarchy of controls on a VM.  Beehive's usage
> pattern is a little different in that in enterprise applications, many
> hierarchies of beans run on a VM (one per JPF, in NetUI's case).  As
> such, a global lock doesn't fit well.
>
>   So, hopefully that provides some background on the
> BeanContext.globalHierarchyLock.
>
>   That being said, I think that the deadlock is orthogonal to this
> problem as the "destroy" lifecycle method on a JPF can be invoked
> without waiting for a lock on the JPF instance itself.  The result is
> that two threads can actually run through a JPF simultaneously.
>
>   With Daryl's changes in February, the ControlContainerContext object
> for a Page Flow is a class-level field, and locking is performed on
> this object at:
>
>   - create time
>   - action execution time
>   - page rendering time (JSP and Faces)
>
> Guess that the question is whether this situation is protected:
>
> Thread A is execution an action on JPF X
> Thread B is destroying JPF X en route to running JPF Y
>
> >From the code in PageFlowController.persistInSession (transitively
> down to StorageHandler.removeAttribute), it seems like a lock is held
> on the "current" JPF that would prevent this.  Right?
>
>   If that's true, then I've got a little more info to go on in
> investigating this problem.
>
>   Thanks for any info...  :)
>
> Eddie
>
>
>
>
> On 3/31/06, Eddie O'Neil <ek...@gmail.com> wrote:
>   
>>   Good questions -- let's see if I can explain the threads more
>> clearly in ASCII.  :)
>>
>>   What's happening is this (note, both threads operate on the *same*
>> instance of a JPF):
>>
>> Thread 1 (executing a page flow)              Thread2 (destroying a page flow)
>> acquire lock on FooPage Flow
>>                                                              acquire BC.gHL
>> acquire lock on BarControlBean
>>                                                              wait for
>> lock on BarControlBean
>> wait for lock on BC.gHL
>>
>> Because the Controls infrastructure is based on the BeanContext
>> (Services|Support|etc) code in the java.beans package of the JDK, it
>> uses the BeanContextServicesSupport class as a base class for the
>> ControlBeanContext object in a Control.  This JDK class (BCSS) uses
>> the BeanContext.globalHierarchyLock (also in the JDK) to serialize
>> access to shared data structures like the list of "service" objects in
>> a BeanContext, etc.  This is a lock that is static throughout the JDK
>> (!).
>>
>> To answer your questions:
>>
>>     
>>> What grabs the lock on BeanContextServicesSupport.globalHierarchyLock?
>>>       
>> The base classes for the ControlBeanContext object which delegates up
>> to it's "super" in order to serialize changes and service requests to
>> the classes that implement event listener support, etc.
>>
>>     
>>> What prevents deadlock in general between locking on
>>> BeanContextServicesSupport.globalHierarchyLock and BarControlBean?
>>>       
>> Good question -- this seems like the crux of the issue.  In general,
>> this is protected by a Control container serializing access to the
>> container's ControlContainerContext object.  In this case, the problem
>> occurs because two threads are trying to setup and destroy the same
>> context object simultaneously.  If those setup and destroy calls were
>> serialized against the ControlContainerContext object stored in the
>> HttpSession, presumably, the semantics would be maintained.
>>
>> My guess is that this deadlock occurred with a double page submit or
>> quick refresh that caught the tail end of the previous thread and the
>> beginning of the next request such that the threads interleaved in
>> this way.
>>
>> So, hopefully that helps explain what's happening.
>>
>> Ultimately, I'm not sure how to fix this yet and would appreciate
>> thoughts on the topic.  Would it be safe to acquire a lock on the JPF
>> instance during onDestroy?  That would serialize the access to the
>> session scoped CCC object.
>>
>> Another way to go (and it's a lot of work) is to build a NetUI /
>> web-tier specific control container that doesn't leverage the *Support
>> classes in the JDK.  This would allow the container's implementation
>> to be tailored to the single-threaded nature of the web tier and would
>> remove a lot of the performance implications around locking and
>> synchronized data structures.  But, I've not thought through this yet,
>> either.  :)
>>
>> Eddie
>>
>>
>>
>> On 3/30/06, Rich Feit <ri...@gmail.com> wrote:
>>     
>>> Looks like a tough one.  First of all, the problem with synchronizing in
>>> onDestroy is also a deadlock issue: since onDestroy is called from an
>>> HttpSessionBindingListener, the Servlet container may have a lock on the
>>> session itself when onDestroy is called.  I don't think it's mandated by
>>> the Servlet spec, but I don't think it's forbidden either.
>>> Unfortunately, this means that a deadlock can occur if code in another
>>> thread synchronizes on the page flow instance first (as in an action
>>> method invocation), then calls a method on HttpSession that synchronizes
>>> on the session object.  Again, I don't think this is mandated by the
>>> spec, but it happens.
>>>
>>> Nothing strikes me off the bat, but I actually don't understand the
>>> deadlock sequence below (I don't disagree -- just don't have enough info
>>> to see it).  What grabs the lock on
>>> BeanContextServicesSupport.globalHierarchyLock?  What prevents deadlock
>>> in general between locking on
>>> BeanContextServicesSupport.globalHierarchyLock and BarControlBean?
>>>
>>> Also, how is the same JPF being *created* by two separate threads?
>>>
>>> Rich
>>>
>>> Eddie O'Neil wrote:
>>>       
>>>> Rich/Daryl--
>>>>
>>>>   Hey, I've run into a thread deadlock problem in the interaction
>>>> between Controls and Page Flow that happens in the following
>>>> circumstance:
>>>>
>>>> thread1: acquire lock on FooPageFlow (during FlowController.execute)
>>>> thread2: acquire lock on
>>>> BeanContextServicesSupport.globalHierarchyLock (JDK class)
>>>> thread1: acquire lock on BarControlBean (ControlBean.ensureControl())
>>>> thread2: wait for lock on BarControlBean (BeanContextSupport.remove())
>>>> thread1: wait for lock on
>>>> BeanContextServicesSupport.globalHierarchyLock (JDK class)
>>>>
>>>> So, the problem is that the same JPF is being both created and
>>>> destroyed by two different threads.  It appears that the "destroy"
>>>> phase of the JPF lifecycle is entirely unsynchronized and can freely
>>>> acquire Control locks without having serialized access to the Page
>>>> Flow itself.
>>>>
>>>>   My thought is to add a synchronization point in
>>>> JavaControlUtils.uninitJavaControls that locks on the
>>>> PageFlowManagedObject as this would serialize access to the Page Flow.
>>>>  But, I seem to recall some threading issues with locking on a JPF
>>>> during the "destroy" part of the lifecycle and don't want to resurrect
>>>> those.
>>>>
>>>>   Any suggestions about how best to make this fix?
>>>>
>>>> Eddie
>>>>
>>>>
>>>>         
>
>

Re: page flow destroy, controls, and deadlock

Posted by Eddie O'Neil <ek...@gmail.com>.

Rich--

  Having looked into this more now, I've got a little more detail
about what's happening:

  There are two issues intertwined here.  The first issue is related
to the ControlBeanContext implementation class and its usage of a
global, VM-wide lock.  At issue is the fact that all
ControlContainerContext implementations inherit from a
BeanContextServicesSupport base class.  This class (and another in
that hierarchy) use a the VM-wide object
BeanContext.globalHierarchyLock for synchronization in two situations:

  - adding / removing a control to / from a context
  - adding / removing / getting a "service" from a context

This acts as a choke point for all Control references and is what
shows up in the locking traces in a previous mail.  This is
problematic, but it's also orthogonal (I believe) to the issue with
the "destroy" lifecycle method on a JPF.  Ultimately, I think that the
JDK assumed that the classes in java.beans.beancontext.* would be used
to run a single hierarchy of controls on a VM.  Beehive's usage
pattern is a little different in that in enterprise applications, many
hierarchies of beans run on a VM (one per JPF, in NetUI's case).  As
such, a global lock doesn't fit well.

  So, hopefully that provides some background on the
BeanContext.globalHierarchyLock.

  That being said, I think that the deadlock is orthogonal to this
problem as the "destroy" lifecycle method on a JPF can be invoked
without waiting for a lock on the JPF instance itself.  The result is
that two threads can actually run through a JPF simultaneously.

  With Daryl's changes in February, the ControlContainerContext object
for a Page Flow is a class-level field, and locking is performed on
this object at:

  - create time
  - action execution time
  - page rendering time (JSP and Faces)

Guess that the question is whether this situation is protected:

Thread A is execution an action on JPF X
Thread B is destroying JPF X en route to running JPF Y

>From the code in PageFlowController.persistInSession (transitively
down to StorageHandler.removeAttribute), it seems like a lock is held
on the "current" JPF that would prevent this.  Right?

  If that's true, then I've got a little more info to go on in
investigating this problem.

  Thanks for any info...  :)

Eddie




On 3/31/06, Eddie O'Neil <ek...@gmail.com> wrote:
>   Good questions -- let's see if I can explain the threads more
> clearly in ASCII.  :)
>
>   What's happening is this (note, both threads operate on the *same*
> instance of a JPF):
>
> Thread 1 (executing a page flow)              Thread2 (destroying a page flow)
> acquire lock on FooPage Flow
>                                                              acquire BC.gHL
> acquire lock on BarControlBean
>                                                              wait for
> lock on BarControlBean
> wait for lock on BC.gHL
>
> Because the Controls infrastructure is based on the BeanContext
> (Services|Support|etc) code in the java.beans package of the JDK, it
> uses the BeanContextServicesSupport class as a base class for the
> ControlBeanContext object in a Control.  This JDK class (BCSS) uses
> the BeanContext.globalHierarchyLock (also in the JDK) to serialize
> access to shared data structures like the list of "service" objects in
> a BeanContext, etc.  This is a lock that is static throughout the JDK
> (!).
>
> To answer your questions:
>
> > What grabs the lock on BeanContextServicesSupport.globalHierarchyLock?
> The base classes for the ControlBeanContext object which delegates up
> to it's "super" in order to serialize changes and service requests to
> the classes that implement event listener support, etc.
>
> > What prevents deadlock in general between locking on
> > BeanContextServicesSupport.globalHierarchyLock and BarControlBean?
>
> Good question -- this seems like the crux of the issue.  In general,
> this is protected by a Control container serializing access to the
> container's ControlContainerContext object.  In this case, the problem
> occurs because two threads are trying to setup and destroy the same
> context object simultaneously.  If those setup and destroy calls were
> serialized against the ControlContainerContext object stored in the
> HttpSession, presumably, the semantics would be maintained.
>
> My guess is that this deadlock occurred with a double page submit or
> quick refresh that caught the tail end of the previous thread and the
> beginning of the next request such that the threads interleaved in
> this way.
>
> So, hopefully that helps explain what's happening.
>
> Ultimately, I'm not sure how to fix this yet and would appreciate
> thoughts on the topic.  Would it be safe to acquire a lock on the JPF
> instance during onDestroy?  That would serialize the access to the
> session scoped CCC object.
>
> Another way to go (and it's a lot of work) is to build a NetUI /
> web-tier specific control container that doesn't leverage the *Support
> classes in the JDK.  This would allow the container's implementation
> to be tailored to the single-threaded nature of the web tier and would
> remove a lot of the performance implications around locking and
> synchronized data structures.  But, I've not thought through this yet,
> either.  :)
>
> Eddie
>
>
>
> On 3/30/06, Rich Feit <ri...@gmail.com> wrote:
> > Looks like a tough one.  First of all, the problem with synchronizing in
> > onDestroy is also a deadlock issue: since onDestroy is called from an
> > HttpSessionBindingListener, the Servlet container may have a lock on the
> > session itself when onDestroy is called.  I don't think it's mandated by
> > the Servlet spec, but I don't think it's forbidden either.
> > Unfortunately, this means that a deadlock can occur if code in another
> > thread synchronizes on the page flow instance first (as in an action
> > method invocation), then calls a method on HttpSession that synchronizes
> > on the session object.  Again, I don't think this is mandated by the
> > spec, but it happens.
> >
> > Nothing strikes me off the bat, but I actually don't understand the
> > deadlock sequence below (I don't disagree -- just don't have enough info
> > to see it).  What grabs the lock on
> > BeanContextServicesSupport.globalHierarchyLock?  What prevents deadlock
> > in general between locking on
> > BeanContextServicesSupport.globalHierarchyLock and BarControlBean?
> >
> > Also, how is the same JPF being *created* by two separate threads?
> >
> > Rich
> >
> > Eddie O'Neil wrote:
> > > Rich/Daryl--
> > >
> > >   Hey, I've run into a thread deadlock problem in the interaction
> > > between Controls and Page Flow that happens in the following
> > > circumstance:
> > >
> > > thread1: acquire lock on FooPageFlow (during FlowController.execute)
> > > thread2: acquire lock on
> > > BeanContextServicesSupport.globalHierarchyLock (JDK class)
> > > thread1: acquire lock on BarControlBean (ControlBean.ensureControl())
> > > thread2: wait for lock on BarControlBean (BeanContextSupport.remove())
> > > thread1: wait for lock on
> > > BeanContextServicesSupport.globalHierarchyLock (JDK class)
> > >
> > > So, the problem is that the same JPF is being both created and
> > > destroyed by two different threads.  It appears that the "destroy"
> > > phase of the JPF lifecycle is entirely unsynchronized and can freely
> > > acquire Control locks without having serialized access to the Page
> > > Flow itself.
> > >
> > >   My thought is to add a synchronization point in
> > > JavaControlUtils.uninitJavaControls that locks on the
> > > PageFlowManagedObject as this would serialize access to the Page Flow.
> > >  But, I seem to recall some threading issues with locking on a JPF
> > > during the "destroy" part of the lifecycle and don't want to resurrect
> > > those.
> > >
> > >   Any suggestions about how best to make this fix?
> > >
> > > Eddie
> > >
> > >
> >
>

Re: page flow destroy, controls, and deadlock

Posted by Eddie O'Neil <ek...@gmail.com>.

  Good questions -- let's see if I can explain the threads more
clearly in ASCII.  :)

  What's happening is this (note, both threads operate on the *same*
instance of a JPF):

Thread 1 (executing a page flow)              Thread2 (destroying a page flow)
acquire lock on FooPage Flow
                                                             acquire BC.gHL
acquire lock on BarControlBean
                                                             wait for
lock on BarControlBean
wait for lock on BC.gHL

Because the Controls infrastructure is based on the BeanContext
(Services|Support|etc) code in the java.beans package of the JDK, it
uses the BeanContextServicesSupport class as a base class for the
ControlBeanContext object in a Control.  This JDK class (BCSS) uses
the BeanContext.globalHierarchyLock (also in the JDK) to serialize
access to shared data structures like the list of "service" objects in
a BeanContext, etc.  This is a lock that is static throughout the JDK
(!).

To answer your questions:

> What grabs the lock on BeanContextServicesSupport.globalHierarchyLock?
The base classes for the ControlBeanContext object which delegates up
to it's "super" in order to serialize changes and service requests to
the classes that implement event listener support, etc.

> What prevents deadlock in general between locking on
> BeanContextServicesSupport.globalHierarchyLock and BarControlBean?

Good question -- this seems like the crux of the issue.  In general,
this is protected by a Control container serializing access to the
container's ControlContainerContext object.  In this case, the problem
occurs because two threads are trying to setup and destroy the same
context object simultaneously.  If those setup and destroy calls were
serialized against the ControlContainerContext object stored in the
HttpSession, presumably, the semantics would be maintained.

My guess is that this deadlock occurred with a double page submit or
quick refresh that caught the tail end of the previous thread and the
beginning of the next request such that the threads interleaved in
this way.

So, hopefully that helps explain what's happening.

Ultimately, I'm not sure how to fix this yet and would appreciate
thoughts on the topic.  Would it be safe to acquire a lock on the JPF
instance during onDestroy?  That would serialize the access to the
session scoped CCC object.

Another way to go (and it's a lot of work) is to build a NetUI /
web-tier specific control container that doesn't leverage the *Support
classes in the JDK.  This would allow the container's implementation
to be tailored to the single-threaded nature of the web tier and would
remove a lot of the performance implications around locking and
synchronized data structures.  But, I've not thought through this yet,
either.  :)

Eddie

On 3/30/06, Rich Feit <ri...@gmail.com> wrote:
> Looks like a tough one.  First of all, the problem with synchronizing in
> onDestroy is also a deadlock issue: since onDestroy is called from an
> HttpSessionBindingListener, the Servlet container may have a lock on the
> session itself when onDestroy is called.  I don't think it's mandated by
> the Servlet spec, but I don't think it's forbidden either.
> Unfortunately, this means that a deadlock can occur if code in another
> thread synchronizes on the page flow instance first (as in an action
> method invocation), then calls a method on HttpSession that synchronizes
> on the session object.  Again, I don't think this is mandated by the
> spec, but it happens.
>
> Nothing strikes me off the bat, but I actually don't understand the
> deadlock sequence below (I don't disagree -- just don't have enough info
> to see it).  What grabs the lock on
> BeanContextServicesSupport.globalHierarchyLock?  What prevents deadlock
> in general between locking on
> BeanContextServicesSupport.globalHierarchyLock and BarControlBean?
>
> Also, how is the same JPF being *created* by two separate threads?
>
> Rich
>
> Eddie O'Neil wrote:
> > Rich/Daryl--
> >
> >   Hey, I've run into a thread deadlock problem in the interaction
> > between Controls and Page Flow that happens in the following
> > circumstance:
> >
> > thread1: acquire lock on FooPageFlow (during FlowController.execute)
> > thread2: acquire lock on
> > BeanContextServicesSupport.globalHierarchyLock (JDK class)
> > thread1: acquire lock on BarControlBean (ControlBean.ensureControl())
> > thread2: wait for lock on BarControlBean (BeanContextSupport.remove())
> > thread1: wait for lock on
> > BeanContextServicesSupport.globalHierarchyLock (JDK class)
> >
> > So, the problem is that the same JPF is being both created and
> > destroyed by two different threads.  It appears that the "destroy"
> > phase of the JPF lifecycle is entirely unsynchronized and can freely
> > acquire Control locks without having serialized access to the Page
> > Flow itself.
> >
> >   My thought is to add a synchronization point in
> > JavaControlUtils.uninitJavaControls that locks on the
> > PageFlowManagedObject as this would serialize access to the Page Flow.
> >  But, I seem to recall some threading issues with locking on a JPF
> > during the "destroy" part of the lifecycle and don't want to resurrect
> > those.
> >
> >   Any suggestions about how best to make this fix?
> >
> > Eddie
> >
> >
>

Re: page flow destroy, controls, and deadlock

Posted by Rich Feit <ri...@gmail.com>.

Looks like a tough one.  First of all, the problem with synchronizing in 
onDestroy is also a deadlock issue: since onDestroy is called from an 
HttpSessionBindingListener, the Servlet container may have a lock on the 
session itself when onDestroy is called.  I don't think it's mandated by 
the Servlet spec, but I don't think it's forbidden either.  
Unfortunately, this means that a deadlock can occur if code in another 
thread synchronizes on the page flow instance first (as in an action 
method invocation), then calls a method on HttpSession that synchronizes 
on the session object.  Again, I don't think this is mandated by the 
spec, but it happens.

Nothing strikes me off the bat, but I actually don't understand the 
deadlock sequence below (I don't disagree -- just don't have enough info 
to see it).  What grabs the lock on 
BeanContextServicesSupport.globalHierarchyLock?  What prevents deadlock 
in general between locking on 
BeanContextServicesSupport.globalHierarchyLock and BarControlBean?

Also, how is the same JPF being *created* by two separate threads?

Rich

Eddie O'Neil wrote:
> Rich/Daryl--
>
>   Hey, I've run into a thread deadlock problem in the interaction
> between Controls and Page Flow that happens in the following
> circumstance:
>
> thread1: acquire lock on FooPageFlow (during FlowController.execute)
> thread2: acquire lock on
> BeanContextServicesSupport.globalHierarchyLock (JDK class)
> thread1: acquire lock on BarControlBean (ControlBean.ensureControl())
> thread2: wait for lock on BarControlBean (BeanContextSupport.remove())
> thread1: wait for lock on
> BeanContextServicesSupport.globalHierarchyLock (JDK class)
>
> So, the problem is that the same JPF is being both created and
> destroyed by two different threads.  It appears that the "destroy"
> phase of the JPF lifecycle is entirely unsynchronized and can freely
> acquire Control locks without having serialized access to the Page
> Flow itself.
>
>   My thought is to add a synchronization point in
> JavaControlUtils.uninitJavaControls that locks on the
> PageFlowManagedObject as this would serialize access to the Page Flow.
>  But, I seem to recall some threading issues with locking on a JPF
> during the "destroy" part of the lifecycle and don't want to resurrect
> those.
>
>   Any suggestions about how best to make this fix?
>
> Eddie
>
>