You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@felix.apache.org by "Pierre De Rop (JIRA)" <ji...@apache.org> on 2018/09/23 22:39:00 UTC

[jira] [Updated] (FELIX-5716) Dead Lock in DM

     [ https://issues.apache.org/jira/browse/FELIX-5716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Pierre De Rop updated FELIX-5716:
---------------------------------
    Fix Version/s: org.apache.felix.dependencymanager-r10

> Dead Lock in DM
> ---------------
>
>                 Key: FELIX-5716
>                 URL: https://issues.apache.org/jira/browse/FELIX-5716
>             Project: Felix
>          Issue Type: Bug
>    Affects Versions: org.apache.felix.dependencymanager-r1
>            Reporter: Pierre De Rop
>            Assignee: Pierre De Rop
>            Priority: Critical
>             Fix For: org.apache.felix.dependencymanager-r12
>
>         Attachments: deadlock.proposed.patch.txt
>
>
> I just found an unfortunate deadlock when using latest DM r11 and latest SCR 2.0.12 in the same JVM.
> I believe that the same issue applies to both DM and SCR (I may have to open a seperate issue for SCR).
> First, here is the deadlock:
> {code}
> Found one Java-level deadlock:
> =============================
> "CM Event Dispatcher (Fire ConfigurationEvent: pid=com.alcatel.as.http.ioh.impl.HttpIOH)":
>   waiting to lock monitor 0x00007f8188004538 (object 0x00000000eb38d420, a org.apache.felix.dm.tracker.ServiceTracker$Tracked),
>   which is held by "FelixDispatchQueue"
> "FelixDispatchQueue":
>   waiting for ownable synchronizer 0x00000000c0699f30, (a java.util.concurrent.locks.ReentrantLock$FairSync),
>   which is held by "CM Event Dispatcher (Fire ConfigurationEvent: pid=com.alcatel.as.http.ioh.impl.HttpIOH)"
> Java stack information for the threads listed above:
> ===================================================
> "CM Event Dispatcher (Fire ConfigurationEvent: pid=com.alcatel.as.http.ioh.impl.HttpIOH)":
>         at org.apache.felix.dm.tracker.ServiceTracker$Tracked.serviceChangedHideAspects(ServiceTracker.java:1140)
>         - waiting to lock <0x00000000eb38d420> (a org.apache.felix.dm.tracker.ServiceTracker$Tracked)
>         at org.apache.felix.dm.tracker.ServiceTracker$Tracked.serviceChanged(ServiceTracker.java:1054)
>         at org.apache.felix.framework.EventDispatcher.invokeServiceListenerCallback(EventDispatcher.java:990)
>         at org.apache.felix.framework.EventDispatcher.fireEventImmediately(EventDispatcher.java:838)
>         at org.apache.felix.framework.EventDispatcher.fireServiceEvent(EventDispatcher.java:545)
>         at org.apache.felix.framework.Felix.fireServiceEvent(Felix.java:4579)
>         at org.apache.felix.framework.Felix.access$000(Felix.java:105)
>         at org.apache.felix.framework.Felix$1.serviceChanged(Felix.java:419)
>         at org.apache.felix.framework.ServiceRegistry.servicePropertiesModified(ServiceRegistry.java:588)
>         at org.apache.felix.framework.ServiceRegistrationImpl.setProperties(ServiceRegistrationImpl.java:131)
>         at org.apache.felix.scr.impl.manager.SingleComponentManager.updateServiceRegistration(SingleComponentManager.java:558)
>         at org.apache.felix.scr.impl.manager.SingleComponentManager.modify(SingleComponentManager.java:755)
>         at org.apache.felix.scr.impl.manager.SingleComponentManager.reconfigure(SingleComponentManager.java:645)
>         at org.apache.felix.scr.impl.manager.SingleComponentManager.reconfigure(SingleComponentManager.java:609)
>         at org.apache.felix.scr.impl.manager.ConfigurableComponentHolder.configurationUpdated(ConfigurableComponentHolder.java:426)
>         at org.apache.felix.scr.impl.manager.RegionConfigurationSupport.configurationEvent(RegionConfigurationSupport.java:284)
>         at org.apache.felix.scr.impl.manager.RegionConfigurationSupport$1.configurationEvent(RegionConfigurationSupport.java:89)
>         at org.apache.felix.cm.impl.ConfigurationManager$FireConfigurationEvent.sendEvent(ConfigurationManager.java:2090)
>         at org.apache.felix.cm.impl.ConfigurationManager$FireConfigurationEvent.run(ConfigurationManager.java:2058)
>         at org.apache.felix.cm.impl.UpdateThread.run0(UpdateThread.java:141)
>         at org.apache.felix.cm.impl.UpdateThread.run(UpdateThread.java:109)
>         at java.lang.Thread.run(Thread.java:748)
> "FelixDispatchQueue":
>         at sun.misc.Unsafe.park(Native Method)
>         - parking to wait for  <0x00000000c0699f30> (a java.util.concurrent.locks.ReentrantLock$FairSync)
>         at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
>         at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireNanos(AbstractQueuedSynchronizer.java:934)
>         at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireNanos(AbstractQueuedSynchronizer.java:1247)
>         at java.util.concurrent.locks.ReentrantLock.tryLock(ReentrantLock.java:442)
>         at org.apache.felix.scr.impl.manager.AbstractComponentManager.obtainLock(AbstractComponentManager.java:228)
>         at org.apache.felix.scr.impl.manager.AbstractComponentManager.obtainStateLock(AbstractComponentManager.java:279)
>         at org.apache.felix.scr.impl.manager.SingleComponentManager.getService(SingleComponentManager.java:806)
>         at org.apache.felix.framework.ServiceRegistrationImpl.getFactoryUnchecked(ServiceRegistrationImpl.java:347)
>         at org.apache.felix.framework.ServiceRegistrationImpl.getService(ServiceRegistrationImpl.java:247)
>         at org.apache.felix.framework.ServiceRegistry.getService(ServiceRegistry.java:350)
>         at org.apache.felix.framework.Felix.getService(Felix.java:3721)
>         at org.apache.felix.framework.BundleContextImpl.getService(BundleContextImpl.java:470)
>         at org.apache.felix.dm.impl.ServiceEventImpl.getEvent(ServiceEventImpl.java:86)
>         at org.apache.felix.dm.impl.ServiceDependencyImpl.addingService(ServiceDependencyImpl.java:251)
>         at org.apache.felix.dm.tracker.ServiceTracker$Tracked.customizerAdding(ServiceTracker.java:1276)
>         at org.apache.felix.dm.tracker.AbstractTracked.trackAdding(AbstractTracked.java:279)
>         at org.apache.felix.dm.tracker.AbstractTracked.trackInitial(AbstractTracked.java:188)
>         - locked <0x00000000eb38d420> (a org.apache.felix.dm.tracker.ServiceTracker$Tracked)
>         at org.apache.felix.dm.tracker.ServiceTracker.open(ServiceTracker.java:387)
>         - locked <0x00000000eb38d420> (a org.apache.felix.dm.tracker.ServiceTracker$Tracked)
>         - locked <0x00000000eb38cd00> (a org.apache.felix.dm.tracker.ServiceTracker)
>         at org.apache.felix.dm.tracker.ServiceTracker.open(ServiceTracker.java:320)
>         at org.apache.felix.dm.tracker.ServiceTracker.open(ServiceTracker.java:296)
>         at org.apache.felix.dm.impl.ServiceDependencyImpl.start(ServiceDependencyImpl.java:235)
>         at org.apache.felix.dm.impl.ComponentImpl.startDependencies(ComponentImpl.java:1360)
>         at org.apache.felix.dm.impl.ComponentImpl.performTransition(ComponentImpl.java:1059)
>         at org.apache.felix.dm.impl.ComponentImpl.handleChange(ComponentImpl.java:1010)
>         at org.apache.felix.dm.impl.ComponentImpl.lambda$start$2(ComponentImpl.java:426)
>         at org.apache.felix.dm.impl.ComponentImpl$$Lambda$7/201251057.run(Unknown Source)
>         at org.apache.felix.dm.impl.SerialExecutor.runTask(SerialExecutor.java:138)
>         at org.apache.felix.dm.impl.SerialExecutor.runTasks(SerialExecutor.java:120)
>         at org.apache.felix.dm.impl.SerialExecutor.execute(SerialExecutor.java:86)
>         at org.apache.felix.dm.impl.SerialExecutor.execute(SerialExecutor.java:105)
>         at org.apache.felix.dm.impl.ComponentImpl.start(ComponentImpl.java:424)
>         at org.apache.felix.dm.impl.ComponentScheduler.add(ComponentScheduler.java:69)
>         at org.apache.felix.dm.DependencyManager.add(DependencyManager.java:136)
>         at com.alcatel.as.service.coordinator.impl.Activator.init(Activator.java:16)
>         at org.apache.felix.dm.DependencyActivatorBase.start(DependencyActivatorBase.java:75)
>         at org.apache.felix.framework.util.SecureAction.startActivator(SecureAction.java:697)
>         at org.apache.felix.framework.Felix.activateBundle(Felix.java:2239)
>         at org.apache.felix.framework.Felix.startBundle(Felix.java:2145)
>         at org.apache.felix.framework.BundleImpl.start(BundleImpl.java:998)
>         at com.alcatel.as.service.bundleinstaller.impl.BundleInstallerImpl$DeployedBundle.start(BundleInstallerImpl.java:1056)
>         at com.alcatel.as.service.bundleinstaller.impl.BundleInstallerImpl.startBundles(BundleInstallerImpl.java:728)
>         at com.alcatel.as.service.bundleinstaller.impl.BundleInstallerImpl.finishInitialisation(BundleInstallerImpl.java:230)
>         at com.alcatel.as.service.bundleinstaller.impl.BundleInstallerImpl.frameworkEvent(BundleInstallerImpl.java:213)
>         at com.alcatel.as.service.bundleinstaller.impl.BundleInstallerImpl.start(BundleInstallerImpl.java:174)
>         at org.apache.felix.framework.util.SecureAction.startActivator(SecureAction.java:697)
>         at org.apache.felix.framework.Felix.activateBundle(Felix.java:2239)
>         at org.apache.felix.framework.Felix.startBundle(Felix.java:2145)
>         at org.apache.felix.framework.Felix.updateBundle(Felix.java:2506)
>         at org.apache.felix.framework.BundleImpl.update(BundleImpl.java:1018)
>         at org.apache.felix.framework.BundleImpl.update(BundleImpl.java:1004)
>         at com.alcatel.as.service.bundleinstaller.impl.BundleInstallerImpl.checkSelfUpdate(BundleInstallerImpl.java:273)
>         at com.alcatel.as.service.bundleinstaller.impl.BundleInstallerImpl.frameworkEvent(BundleInstallerImpl.java:192)
>         at org.apache.felix.framework.EventDispatcher.invokeFrameworkListenerCallback(EventDispatcher.java:881)
>         at org.apache.felix.framework.EventDispatcher.fireEventImmediately(EventDispatcher.java:830)
>         at org.apache.felix.framework.EventDispatcher.run(EventDispatcher.java:1147)
>         at org.apache.felix.framework.EventDispatcher.access$000(EventDispatcher.java:54)
>         at org.apache.felix.framework.EventDispatcher$1.run(EventDispatcher.java:102)
>         at java.lang.Thread.run(Thread.java:748)
> Found 1 deadlock.
> {code}
> now, here is my first interpretation of what is going on:
> in SCR:
> =====
> Configuration Admin is delivering a configuration update event to SCR. So, the org.apache.felix.scr.impl.manager.RegionConfigurationSupport.configurationEvent() method is then called. This method then indirectly triggers a call to the SingleComponentManager.reconfigure method, which then calls SingleComponentManager.modify method. Here is the modify() method:
> {code}
>     private boolean modify(boolean configurationDeleted)
>     {
>         //0 SCR 112.7.1 If configuration is deleted, and version is < 1.3 and no flag set, then deactivate unconditionally.
>         // For version 1.3 and later, or with a flag, more sensible behavior is allowed.
>         if ( configurationDeleted && !getComponentMetadata().isDeleteCallsModify()){
>             return false;
>         }
>         // 1. no live update if there is no declared method
>         if ( getComponentMetadata().getModified() == null )
>         {
>             log( LogService.LOG_DEBUG, "No modified method, cannot update dynamically", null );
>             return false;
>         }
>         // invariant: we have a modified method name
>         // 2. get and check configured method
>         // invariant: modify method is configured and found
>         // 3. check whether we can dynamically apply the configuration if
>         // any target filters influence the bound services
>         final Map<String, Object> props = getProperties();
>         for ( DependencyManager<S, ?> dm: getDependencyManagers() )
>         {
>             if ( !dm.canUpdateDynamically( props ) )
>             {
>                 log( LogService.LOG_DEBUG,
>                         "Cannot dynamically update the configuration due to dependency changes induced on dependency {0}",
>                         new Object[] {dm.getName()}, null );
>                 return false;
>             }
>         }
>         // invariant: modify method existing and no static bound service changes
>         // 4. call method (nothing to do when failed, since it has already been logged)
>         //   (call with non-null default result to continue even if the
>         //    modify method call failed)
>         obtainStateLock(  );
>         try
>         {
>             //cf 112.5.12 where invoking modified method before updating target services is specified.
>             final MethodResult result = invokeModifiedMethod();
>             updateTargets( props );
>             if ( result == null )
>             {
>                 // log an error if the declared method cannot be found
>                 log( LogService.LOG_ERROR, "Declared modify method ''{0}'' cannot be found, configuring by reactivation",
>                         new Object[] {getComponentMetadata().getModified()}, null );
>                 return false;
>             }
>             // 5. update the target filter on the services now, this may still
>             // result in unsatisfied dependencies, in which case we abort
>             // this dynamic update and have the component be deactivated
>             if ( !verifyDependencyManagers() )
>             {
>                 log( LogService.LOG_DEBUG,
>                         "Updating the service references caused at least one reference to become unsatisfied, deactivating component",
>                         null );
>                 return false;
>             }
>             // 6. update service registration properties if we didn't just do it
>             if ( result.hasResult() )
>             {
>                 setServiceProperties( result, null );
>             }
>             else
>             {
>                 updateServiceRegistration();
>             }
>      ...
> {code}
> This is problematic because we see that the modify method is invoking the updateServiceRegistration while holding the StateLock. (the obtainStateLock() method is called and then the service registration is updaded). 
> Here, the listener is a DM component, but this component is internally synchronized on it's service tracker, and is currently blocked, trying to dereference the DS service, but it can't because the DS service state lock is currently held by the modify method.
> in the next section, I'm now analyzing what is doing DM and will propose a patch in order to make sure DM does not hold any locks while calling bundleContext.getService(ref) ... this will fix the issue.
> Dependency Manager
> =================
> In dependency manager, now let's analyze the the FelixDispatchQueue stacktrace: we open the DM service tracker, but the open method (in the DM service tracker) calls the AbstractTracked.trackInitial while holding the lock on ServiceTracker$Tracked. Then this triggers the customizer "addingService" method call, which then perform a getService. But the getService then tries to dereference a declarative service component, and the stacktrace is then blocked because the DS component state lock is currently locked by the other "CM Event Dispatcher"
> So, in the end, I think that there is something wrong in both SCR and DM :
> 1) SCR obtain the state lock while update the service registration.
> This is , I think, unfortunate because updating the service registration while holding a lock is like hodling a lock while calling foreign code (you don't know what will do the service listener when you update the service registration).
> 2) and DM calls the customizer "addingService" method while holding the lock on the service tracker. This is also not satisfying.
> So, at minimul, in DM I do believe that we should manage to not invoke the customizer.addingService while holding the lock on the DM service tracker.
> now, I have given a look into the DM ServiceTracker.open method and it is currently implemented like this:
> {code}
> 	public void open(boolean trackAllServices, boolean trackAllAspects) {
> 		if (debug) {
> 			System.out.println("[ServiceTracker] " + debugKey + " T" + Thread.currentThread().getId() + " open");
> 		}
> 		final Tracked t;
> 		synchronized (this) {
> 			if (tracked != null) {
> 				return;
> 			}
> 			if (DEBUG) {
> 				System.out.println("ServiceTracker.open: " + filter); 
> 			}
> 			m_trackAllAspects = trackAllAspects;
> 			t = trackAllServices ? new AllTracked() : new Tracked();
> 			synchronized (t) {
> 				try {
> 					context.addServiceListener(t, listenerFilter);
> 					ServiceReference[] references = null;
> 					if (trackClass != null) {
> 						references = getInitialReferences(trackAllServices,
> 								trackClass, null);
> 					}
> 					else {
> 						if (trackReference != null) {
> 							if (trackReference.getBundle() != null) {
> 								references = new ServiceReference[] {trackReference};
> 							}
> 						}
> 						else { /* user supplied filter */
> 							references = getInitialReferences(trackAllServices,
> 									null,
> 									(listenerFilter != null) ? listenerFilter
> 											: filter.toString());
> 						}
> 					}
> 					/* set tracked with the initial references */
> 					t.setInitial(references);
> 					
> 					// only actually schedules the actions for execution within this synchronized block,
> 					// but do the actual execution afterwards.
> 					t.trackInitial(); 
> 				}
> 				catch (InvalidSyntaxException e) {
> 					throw new RuntimeException(
> 							"unexpected InvalidSyntaxException: "
> 									+ e.getMessage(), e); 
> 				}
> 			}
> 			tracked = t;
> 		}
> 		/* Call tracked outside of synchronized region */
> 		// just trigger the executor
> 		t.getExecutor().execute();
> 	}
> {code}
> As you can see the "t.trackInitial()" is invoked from within the synchronized block. This method actually schedules customizer "added" callbacks, which will be invoked properly outside of the synchronized block  (when t.getExecutor().executor() will be called).
> But please notice that the the issue is that the t.trackInitial() will also invoke the "addingService" customizer callback from the synchronized block. Doing so will end up dereferencing the DS service reference while we are holding the ServiceTracker lock.
> Interestingly, notice that the BundleTracker is not calling the t.trackInitial() method from the synchronized block.
> so, I have attempted to make a patch, which seems to resolve the issue. All tests are passing OK, and we don't experience the deadlock anymore, and I would appreciate if another DM developer or user could take a look at my proposed patch, to make sure I don't have messed up :-)
> thanks



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)