You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@felix.apache.org by David Jencks <da...@yahoo.com> on 2012/06/03 20:21:13 UTC

More DS concurrency problems

We found another DS deadlock with the current locking concurrency solution.

When we start, B is registered (its dependencies are optional) but not yet created (it's delayed)

Thread 1 creates and registers D so service registration and bind attempts propagate right to left <<<<<< starting with D. Thread 1 obtains the lock on C and tries to get the lock on B

A ---- 1..1 -----> B -------0..n -------> C --------- 1..1 ------->D
immediate delayed delayed not a ds service

Thread 2 going left to right >>>>>>> creates A which calls getService on B (obtaining the B lock) which calls getService on C which is locked by thread 1.

I'm considering two possible approaches:

1. dont lock or don't lock as much for events that don't change the state (the C to B service event: B can't change state as a result)

2. a lock-free implementation using compare-and-set. The idea is that an event such as a service event would get processed using a holder. If at the completion of the processing the previous holder is the same then we'd compare-and-set the holder, otherwise undo whatever happened during processing.

(2) could result in multiple service registrations for a given component while the events are getting processed. It's not completely obvious to me how to assure that all events will in fact get processed and that e.g the "winning" result will have all the highest-priority service references and that there won't be infinite update cycles.

On the other hand I have no confidence that (1) is possible to implement. So I'm going to try for (2). This is going to reopen the "how does it run on java < 5" debate since compare-and-set is not available there with the same characteristics of the java 5 implementation.

In the interests of making more code paths more consistent and similar I think I will change the immediate component handling to be as the spec describes, where the service is registered before the implementation object is created. This means all the services will be registered as service factories as the delayed components currently are.

In other DS issues:

I'd like to separate the felix specific non-spec ComponentFactory behavior (where it acts like a ManagedServiceFactory) into a separate class so I can understand the spec compliant behavior more easily.

I think that the behavior of ServiceFactory components that are configured by config admin is wrong. AFAICT for service factory components we will never call a modify method but always destroy and recreate the instances.

I also have a few cleanup changes such as better javadoc and removing stray references to no-longer-present *ing transient states that I don't think will be controversial so I plan to just commit them.

thanks
david jencks

Re: More DS concurrency problems

Posted by Felix Meschberger <fm...@adobe.com>.

Thanks for tackling and reporting.

Regards
Felix

Am 05.06.2012 um 03:48 schrieb David Jencks:

> 
> On Jun 4, 2012, at 10:45 AM, Felix Meschberger wrote:
> 
>> Hi,
>> 
>> Smaller things first ...
> 
> good idea :-)
> 
>> Will you create issues for them ?
>> 
>> Am 03.06.2012 um 20:21 schrieb David Jencks:
>> 
>>> In other DS issues:
>>> 
>>> I'd like to separate the felix specific non-spec ComponentFactory behavior (where it acts like a ManagedServiceFactory) into a separate class so I can understand the spec compliant behavior more easily.
>> 
>> Basically this non-compliant behaviour is AFAICT only apparent in two methods. Not sure whether factoring this out is worth it.
> 
> See https://issues.apache.org/jira/browse/FELIX-3536 Unfortunately the patch also contains detemining the satisfied state from the component manager class rather than a switch statement.
>> 
>>> 
>>> I think that the behavior of ServiceFactory components that are configured by config admin is wrong.  AFAICT for service factory components we will never call a modify method but always destroy and recreate the instances.
>> 
>> Right.
>> 
>> Unlike other component managers updating configuration, ServiceFactory components may internally have more than one actual instance to update. The getInstance() method is not appropriate for this situation.
>> 
>> We'd probably need new methods for this situation (or some more abstract mechanism) in the ImmediateComponentManager.modify():
>> 
>> (1) instead of the getInstance()==null check a method return true or false indicating whether modify makes sense. The ServiceFactoryComponentManager implementation returns true if at least one instance has been created
>> 
>> (2) instead of directly calling the modify method (step 4) a method is implemented which is called by the ServiceFactoryComponentManager.
> 
> https://issues.apache.org/jira/browse/FELIX-3534  I'll look at how to fix this once I have a better idea about the concurrency problems.... or sooner.
>> 
>>> 
>>> I also have a few cleanup changes such as better javadoc and removing stray references to no-longer-present *ing transient states that I don't think will be controversial so I plan to just commit them.
>> 
>> +1
> 
> https://issues.apache.org/jira/browse/FELIX-3535  I've applied the ones I have so far.... I'll probably have more soon.
> 
> thanks!
> david jencks
> 
>> 
>> Regards
>> Felix
>

Re: More DS concurrency problems

Posted by David Jencks <da...@yahoo.com>.

On Jun 4, 2012, at 10:45 AM, Felix Meschberger wrote:

> Hi,
> 
> Smaller things first ...

good idea :-)

> Will you create issues for them ?
> 
> Am 03.06.2012 um 20:21 schrieb David Jencks:
> 
>> In other DS issues:
>> 
>> I'd like to separate the felix specific non-spec ComponentFactory behavior (where it acts like a ManagedServiceFactory) into a separate class so I can understand the spec compliant behavior more easily.
> 
> Basically this non-compliant behaviour is AFAICT only apparent in two methods. Not sure whether factoring this out is worth it.

See https://issues.apache.org/jira/browse/FELIX-3536 Unfortunately the patch also contains detemining the satisfied state from the component manager class rather than a switch statement.
> 
>> 
>> I think that the behavior of ServiceFactory components that are configured by config admin is wrong.  AFAICT for service factory components we will never call a modify method but always destroy and recreate the instances.
> 
> Right.
> 
> Unlike other component managers updating configuration, ServiceFactory components may internally have more than one actual instance to update. The getInstance() method is not appropriate for this situation.
> 
> We'd probably need new methods for this situation (or some more abstract mechanism) in the ImmediateComponentManager.modify():
> 
> (1) instead of the getInstance()==null check a method return true or false indicating whether modify makes sense. The ServiceFactoryComponentManager implementation returns true if at least one instance has been created
> 
> (2) instead of directly calling the modify method (step 4) a method is implemented which is called by the ServiceFactoryComponentManager.

https://issues.apache.org/jira/browse/FELIX-3534  I'll look at how to fix this once I have a better idea about the concurrency problems.... or sooner.
> 
>> 
>> I also have a few cleanup changes such as better javadoc and removing stray references to no-longer-present *ing transient states that I don't think will be controversial so I plan to just commit them.
> 
> +1

https://issues.apache.org/jira/browse/FELIX-3535  I've applied the ones I have so far.... I'll probably have more soon.

thanks!
david jencks

> 
> Regards
> Felix

Re: More DS concurrency problems

Posted by Felix Meschberger <fm...@adobe.com>.

Hi,

Smaller things first ... Will you create issues for them ?

Am 03.06.2012 um 20:21 schrieb David Jencks:

> In other DS issues:
> 
> I'd like to separate the felix specific non-spec ComponentFactory behavior (where it acts like a ManagedServiceFactory) into a separate class so I can understand the spec compliant behavior more easily.

Basically this non-compliant behaviour is AFAICT only apparent in two methods. Not sure whether factoring this out is worth it.

> 
> I think that the behavior of ServiceFactory components that are configured by config admin is wrong.  AFAICT for service factory components we will never call a modify method but always destroy and recreate the instances.

Right.

Unlike other component managers updating configuration, ServiceFactory components may internally have more than one actual instance to update. The getInstance() method is not appropriate for this situation.

We'd probably need new methods for this situation (or some more abstract mechanism) in the ImmediateComponentManager.modify():

(1) instead of the getInstance()==null check a method return true or false indicating whether modify makes sense. The ServiceFactoryComponentManager implementation returns true if at least one instance has been created

(2) instead of directly calling the modify method (step 4) a method is implemented which is called by the ServiceFactoryComponentManager.

> 
> I also have a few cleanup changes such as better javadoc and removing stray references to no-longer-present *ing transient states that I don't think will be controversial so I plan to just commit them.

+1

Regards
Felix

Re: More DS concurrency problems

Posted by Felix Meschberger <fm...@adobe.com>.

Hi,

Am 05.06.2012 um 16:57 schrieb Michael Fraenkel:

> On 6/5/12 8:11 AM, Felix Meschberger wrote:
>> Hi,
>> 
>> I fear we are getting on the "ever-and-ever-more-complex" track ...
>> 
>> I understand the concurrency issues (how the come into live) but I would love if we could solve them with simple code without inventing more and more locking and and signaling etc. behaviour. This really was the "beauty of some sorts" of the previous lock-less approach.
>> 
>> I think under certain circumstances there will always be a situation where one event (service or configuration) is happening while at the same time another event is being handled. In some situations -- e.g. a required dependency being unregistered while requiring the component is being activated -- the ongoing task should probably be aborted and reverted. Which in turn may be simpler (before service registration) or harder (during service registration).
>> 
>> This can maybe be solved by using checkpoints: Long running tasks (mostly activation) after each step check the checkpoint while intermittent events (service, configuration) may set the checkpoint and optionally wait for the checkpoint to be confirmed by the long-running task (eventually timing out to prevent deadlocks). Such a thing would be "easier" to implement with an Activating state where just the Activating.deactivate method might be called which would handle the checkpoint thing.
>> 
>> Do I sound getting complicated ? ;-)
>> 
>> Regards
>> Felix
>> 
> 
> It all comes down to a question of guarantees and what you are willing 
> to live with.
> 
> The most basic locking could be accomplished just when touch the 
> "state".  You hold the lock if you are going to change the state.
> You use a lock to retrieve the current state.  Anything after retrieving 
> is a best effort.  You make the assumption that you exist in this state 
> until a failure occurs.  This would provide guarantees across switching 
> state and some minimal deadlock prevention.
> 
> The ReadWrite lock idea was to just separate the two concepts above. 
> Transitioning requires the Write lock and the Read lock is used if you 
> are performing some activity within the given state.  It accomplishes 
> the same goal just with stronger guarantees.

Ok. The question is: Who is acquiring the locks.

For simple operations like "bind, "unbind", "activate", "deactivate" this is ok and atomic enough.

The problem we have is with "reactivation" which currently is down by calling the sequence of deactivate and then activate which both indepentently acquire the lock. Maybe we should introduce new "reactivate" method (we had it once some time ago) which acquires the lock, calls deactivate then activate and then unlocks again.

And keep in mind: We still want to be Java 1.4 compliant ...

Regards
Felix


> 
> - Michael
>

Re: More DS concurrency problems

Posted by Michael Fraenkel <mi...@gmail.com>.

On 6/5/12 8:11 AM, Felix Meschberger wrote:
> Hi,
>
> I fear we are getting on the "ever-and-ever-more-complex" track ...
>
> I understand the concurrency issues (how the come into live) but I would love if we could solve them with simple code without inventing more and more locking and and signaling etc. behaviour. This really was the "beauty of some sorts" of the previous lock-less approach.
>
> I think under certain circumstances there will always be a situation where one event (service or configuration) is happening while at the same time another event is being handled. In some situations -- e.g. a required dependency being unregistered while requiring the component is being activated -- the ongoing task should probably be aborted and reverted. Which in turn may be simpler (before service registration) or harder (during service registration).
>
> This can maybe be solved by using checkpoints: Long running tasks (mostly activation) after each step check the checkpoint while intermittent events (service, configuration) may set the checkpoint and optionally wait for the checkpoint to be confirmed by the long-running task (eventually timing out to prevent deadlocks). Such a thing would be "easier" to implement with an Activating state where just the Activating.deactivate method might be called which would handle the checkpoint thing.
>
> Do I sound getting complicated ? ;-)
>
> Regards
> Felix
>

It all comes down to a question of guarantees and what you are willing 
to live with.

The most basic locking could be accomplished just when touch the 
"state".  You hold the lock if you are going to change the state.
You use a lock to retrieve the current state.  Anything after retrieving 
is a best effort.  You make the assumption that you exist in this state 
until a failure occurs.  This would provide guarantees across switching 
state and some minimal deadlock prevention.

The ReadWrite lock idea was to just separate the two concepts above. 
Transitioning requires the Write lock and the Read lock is used if you 
are performing some activity within the given state.  It accomplishes 
the same goal just with stronger guarantees.

- Michael

Re: More DS concurrency problems

Posted by Felix Meschberger <fm...@adobe.com>.

Hi,

I fear we are getting on the "ever-and-ever-more-complex" track ...

I understand the concurrency issues (how the come into live) but I would love if we could solve them with simple code without inventing more and more locking and and signaling etc. behaviour. This really was the "beauty of some sorts" of the previous lock-less approach.

I think under certain circumstances there will always be a situation where one event (service or configuration) is happening while at the same time another event is being handled. In some situations -- e.g. a required dependency being unregistered while requiring the component is being activated -- the ongoing task should probably be aborted and reverted. Which in turn may be simpler (before service registration) or harder (during service registration).

This can maybe be solved by using checkpoints: Long running tasks (mostly activation) after each step check the checkpoint while intermittent events (service, configuration) may set the checkpoint and optionally wait for the checkpoint to be confirmed by the long-running task (eventually timing out to prevent deadlocks). Such a thing would be "easier" to implement with an Activating state where just the Activating.deactivate method might be called which would handle the checkpoint thing.

Do I sound getting complicated ? ;-)

Regards
Felix

Am 04.06.2012 um 02:40 schrieb David Jencks:

> A colleague suggested using a read-write lock might work.  I may not understand his proposal: I think he's suggesting a write lock around code that change the state and read lock around code that doesn't.
> 
> I'm not sure how this would help.  I think we'd need the write lock around the code that determines that a bind satisfies a component, downgraded to a read lock while the service is registered.  Then we'd need a write lock on the code that creates the component instance so we only get one instance.
> 
> In the picture below, thread one will publish the service registration inside a read lock on C.  Thread 2 will have a write lock on B since it's trying to create the component instance (B is delayed), blocking thread 1.  In thread 2, B will try to get a write lock on C to create the instance of C (C is also delayed).  (Note that B knows of C because felix ds doesn't use the service events to track which services are available, it uses them to trigger querying for services for binding).
> 
> So here are a couple more lock ideas.
> 
> 1. read-write lock for binding events and state change and a separate plain lock for creating the component instance object.  All the locks in thread 1 are read-write, all the locks in thread 2 are plain.  While there's nothing stopping the B service from being unregistered in a third thread while thread 2 is creating the implementation object, at least A will be notified in thread 3 that B is no longer available.  Keeping B consistent could be trickier.
> 
> 2. tracking dependent services using events and never querying for services.  This would mean in thread 2 B would not know about C since the event in thread 1 could not be delivered until after the lock on B was released.  I'm not sure how to get an accurate initial list of services with no possibility that we're about to get an event for one of those services.  I guess this is similar to the problem ServiceTracker is supposed to solve.
> 
> I think one of the effects of the non-blocking model I'm thinking of is that for a given component, several service registrations and several service instances may briefly be present.  However the extra ones should disappear quickly as the thread involved discovers it didn't "win".  This is similar to the situation in the 2-locks scenario in (1) above.
> 
> thanks
> david jencks
> 
> 
> 
> 
> On Jun 3, 2012, at 11:21 AM, David Jencks wrote:
> 
>> We found another DS deadlock with the current locking concurrency solution.
>> 
>> When we start, B is registered (its dependencies are optional) but not yet created (it's delayed)
>> 
>> Thread 1 creates and registers D so service registration and bind attempts propagate right to left <<<<<< starting with D.  Thread 1 obtains the lock on C and tries to get the lock on B
>> 
>>    A  ---- 1..1 -----> B -------0..n -------> C  --------- 1..1 ------->D
>> immediate         delayed                 delayed                       not a ds service
>> 
>> Thread 2 going left to right >>>>>>> creates A which calls getService on B (obtaining the B lock) which calls getService on C which is locked by thread 1.
>> 
>> I'm considering two possible approaches:
>> 
>> 1. dont lock or don't lock as much for events that don't change the state (the C to B service event: B can't change state as a result)
>> 
>> 2. a lock-free implementation using compare-and-set.  The idea is that an event such as a service event would get processed using a holder.  If at the completion of the processing the previous holder is the same then we'd compare-and-set the holder, otherwise undo whatever happened during processing.
>> 
>> (2) could result in multiple service registrations for a given component while the events are getting processed.  It's not completely obvious to me how to assure that all events will in fact get processed and that e.g the "winning" result will have all the highest-priority service references and that there won't be infinite update cycles.
>> 
>> On the other hand I have no confidence that (1) is possible to implement.  So I'm going to try for (2).  This is going to reopen the "how does it run on java < 5" debate since compare-and-set is not available there with the same characteristics of the java 5 implementation.
>> 
>> In the interests of making more code paths more consistent and similar I think I will change the immediate component handling to be as the spec describes, where the service is registered before the implementation object is created.  This means all the services will be registered as service factories as the delayed components currently are.
>> 
>> In other DS issues:
>> 
>> I'd like to separate the felix specific non-spec ComponentFactory behavior (where it acts like a ManagedServiceFactory) into a separate class so I can understand the spec compliant behavior more easily.
>> 
>> I think that the behavior of ServiceFactory components that are configured by config admin is wrong.  AFAICT for service factory components we will never call a modify method but always destroy and recreate the instances.
>> 
>> I also have a few cleanup changes such as better javadoc and removing stray references to no-longer-present *ing transient states that I don't think will be controversial so I plan to just commit them.
>> 
>> thanks
>> david jencks
>> 
>

Re: More DS concurrency problems

Posted by David Jencks <da...@yahoo.com>.

A colleague suggested using a read-write lock might work.  I may not understand his proposal: I think he's suggesting a write lock around code that change the state and read lock around code that doesn't.

I'm not sure how this would help.  I think we'd need the write lock around the code that determines that a bind satisfies a component, downgraded to a read lock while the service is registered.  Then we'd need a write lock on the code that creates the component instance so we only get one instance.

In the picture below, thread one will publish the service registration inside a read lock on C.  Thread 2 will have a write lock on B since it's trying to create the component instance (B is delayed), blocking thread 1.  In thread 2, B will try to get a write lock on C to create the instance of C (C is also delayed).  (Note that B knows of C because felix ds doesn't use the service events to track which services are available, it uses them to trigger querying for services for binding).

So here are a couple more lock ideas.

1. read-write lock for binding events and state change and a separate plain lock for creating the component instance object.  All the locks in thread 1 are read-write, all the locks in thread 2 are plain.  While there's nothing stopping the B service from being unregistered in a third thread while thread 2 is creating the implementation object, at least A will be notified in thread 3 that B is no longer available.  Keeping B consistent could be trickier.

2. tracking dependent services using events and never querying for services.  This would mean in thread 2 B would not know about C since the event in thread 1 could not be delivered until after the lock on B was released.  I'm not sure how to get an accurate initial list of services with no possibility that we're about to get an event for one of those services.  I guess this is similar to the problem ServiceTracker is supposed to solve.

I think one of the effects of the non-blocking model I'm thinking of is that for a given component, several service registrations and several service instances may briefly be present.  However the extra ones should disappear quickly as the thread involved discovers it didn't "win".  This is similar to the situation in the 2-locks scenario in (1) above.

thanks
david jencks

On Jun 3, 2012, at 11:21 AM, David Jencks wrote:

> We found another DS deadlock with the current locking concurrency solution.
> 
> When we start, B is registered (its dependencies are optional) but not yet created (it's delayed)
> 
> Thread 1 creates and registers D so service registration and bind attempts propagate right to left <<<<<< starting with D.  Thread 1 obtains the lock on C and tries to get the lock on B
> 
>     A  ---- 1..1 -----> B -------0..n -------> C  --------- 1..1 ------->D
> immediate         delayed                 delayed                       not a ds service
> 
> Thread 2 going left to right >>>>>>> creates A which calls getService on B (obtaining the B lock) which calls getService on C which is locked by thread 1.
> 
> I'm considering two possible approaches:
> 
> 1. dont lock or don't lock as much for events that don't change the state (the C to B service event: B can't change state as a result)
> 
> 2. a lock-free implementation using compare-and-set.  The idea is that an event such as a service event would get processed using a holder.  If at the completion of the processing the previous holder is the same then we'd compare-and-set the holder, otherwise undo whatever happened during processing.
> 
> (2) could result in multiple service registrations for a given component while the events are getting processed.  It's not completely obvious to me how to assure that all events will in fact get processed and that e.g the "winning" result will have all the highest-priority service references and that there won't be infinite update cycles.
> 
> On the other hand I have no confidence that (1) is possible to implement.  So I'm going to try for (2).  This is going to reopen the "how does it run on java < 5" debate since compare-and-set is not available there with the same characteristics of the java 5 implementation.
> 
> In the interests of making more code paths more consistent and similar I think I will change the immediate component handling to be as the spec describes, where the service is registered before the implementation object is created.  This means all the services will be registered as service factories as the delayed components currently are.
> 
> In other DS issues:
> 
> I'd like to separate the felix specific non-spec ComponentFactory behavior (where it acts like a ManagedServiceFactory) into a separate class so I can understand the spec compliant behavior more easily.
> 
> I think that the behavior of ServiceFactory components that are configured by config admin is wrong.  AFAICT for service factory components we will never call a modify method but always destroy and recreate the instances.
> 
> I also have a few cleanup changes such as better javadoc and removing stray references to no-longer-present *ing transient states that I don't think will be controversial so I plan to just commit them.
> 
> thanks
> david jencks
>