You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@felix.apache.org by "Rob Walker (Issue Comment Edited) (JIRA)" <ji...@apache.org> on 2011/10/20 10:48:10 UTC

[jira] [Issue Comment Edited] (FELIX-3174) Startup freeze caused in acquireBundleLock

    [ https://issues.apache.org/jira/browse/FELIX-3174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13131453#comment-13131453 ] 

Rob Walker edited comment on FELIX-3174 at 10/20/11 8:47 AM:
-------------------------------------------------------------

I can't put my finger on it, but I'm pretty sure there's a sizable timing window in the code below:

{code:title=Felix.java (around line 4832)|borderStyle=solid}
    void acquireBundleLock(BundleImpl bundle, int desiredStates)
        throws IllegalStateException
    {
        synchronized (m_bundleLock)
        {
            // Wait if the desired bundle is already locked by someone else
            // or if any thread has the global lock, unless the current thread
            // holds the global lock or the bundle lock already.
            while (!bundle.isLockable() ||
                ((m_globalLockThread != null)
                    && (m_globalLockThread != Thread.currentThread())))
            {
                // Check to make sure the bundle is in a desired state.
                // If so, keep waiting. If not, throw an exception.
                if ((desiredStates & bundle.getState()) == 0)
                {
                    throw new IllegalStateException("Bundle in unexpected state.");
                }
                // If the calling thread already owns the global lock, then make
                // sure no other thread is trying to promote a bundle lock to a
                // global lock. If so, interrupt the other thread to avoid deadlock.
                else if (m_globalLockThread == Thread.currentThread()
                    && (bundle.getLockingThread() != null)
                    && m_globalLockWaitersList.contains(bundle.getLockingThread()))
                {
                    bundle.getLockingThread().interrupt();
                }

                try
                {
                    m_bundleLock.wait();
                }
                catch (InterruptedException ex)
                {
                    throw new IllegalStateException("Unable to acquire bundle lock, thread interrupted.");
                }
            }
{code}


By the time we go into m_bundleLock.wait() - any of the earlier conditions could have changed e.g. the bundle may not be lockable. 

I suspect that may actually be the case which is happing, but will add some trace code to try and nail it, but for certain, the code is going into a wait on a lock that never gets notified which implies the state which cause the lock has changed by the time we enter the wait.

My issue here is however many tests we put in ahead of the wait, if they aren't within a sync on an appropriate lock all we can do is narrow the timing window - since the bundle.lock() and isLockable() are being protected by a method sync lock, which has been release by the time they return and hence the condition may have changed.
                
      was (Author: walkerr):
    I can't put my finger on it, but I'm pretty sure there's a sizable timing window in the code below:

Felix.java, around line 4832:

    void acquireBundleLock(BundleImpl bundle, int desiredStates)
        throws IllegalStateException
    {
        synchronized (m_bundleLock)
        {
            // Wait if the desired bundle is already locked by someone else
            // or if any thread has the global lock, unless the current thread
            // holds the global lock or the bundle lock already.
            while (!bundle.isLockable() ||
                ((m_globalLockThread != null)
                    && (m_globalLockThread != Thread.currentThread())))
            {
                // Check to make sure the bundle is in a desired state.
                // If so, keep waiting. If not, throw an exception.
                if ((desiredStates & bundle.getState()) == 0)
                {
                    throw new IllegalStateException("Bundle in unexpected state.");
                }
                // If the calling thread already owns the global lock, then make
                // sure no other thread is trying to promote a bundle lock to a
                // global lock. If so, interrupt the other thread to avoid deadlock.
                else if (m_globalLockThread == Thread.currentThread()
                    && (bundle.getLockingThread() != null)
                    && m_globalLockWaitersList.contains(bundle.getLockingThread()))
                {
                    bundle.getLockingThread().interrupt();
                }

                try
                {
                    m_bundleLock.wait();
                }
                catch (InterruptedException ex)
                {
                    throw new IllegalStateException("Unable to acquire bundle lock, thread interrupted.");
                }
            }



By the time we go into m_bundleLock.wait() - any of the earlier conditions could have changed e.g. the bundle may not be lockable. 

I suspect that may actually be the case which is happing, but will add some trace code to try and nail it, but for certain, the code is going into a wait on a lock that never gets notified which implies the state which cause the lock has changed by the time we enter the wait.

My issue here is however many tests we put in ahead of the wait, if they aren't within a sync on an appropriate lock all we can do is narrow the timing window - since the bundle.lock() and isLockable() are being protected by a method sync lock, which has been release by the time they return and hence the condition may have changed.
                  
> Startup freeze caused in acquireBundleLock
> ------------------------------------------
>
>                 Key: FELIX-3174
>                 URL: https://issues.apache.org/jira/browse/FELIX-3174
>             Project: Felix
>          Issue Type: Bug
>          Components: Framework
>    Affects Versions: framework-4.2.0
>            Reporter: Rob Walker
>
> This may be a sub or related case of a few others which I've linked below.
> In the latest trunk we are now seeing a startup scenario where our HTTP bundle acquires a lock in the process of registering a service, but the later wait for this lock (Felix.java:4862) never seems to get notified.
> It doesn't seem a traditional deadlock per se - no other thread is holding this lock. It just seems that the lock never gets notified, hence the HTTP bundle never completes it's startup and service registration, causing all our other bundles that depend on the HTTP service never to start up.
> Stack trace of locked thread below:
> ====
> "Jetty HTTP Service" daemon prio=6 tid=0x0586ac00 nid=0x19dc in Object.wait() [0x05a8f000]
>    java.lang.Thread.State: WAITING (on object monitor)
>         at java.lang.Object.wait(Native Method)
>         - waiting on <0x1f84df50> (a [Ljava.lang.Object;)
>         at java.lang.Object.wait(Object.java:485)
>         at org.apache.felix.framework.Felix.acquireBundleLock(Felix.java:4862)
>         - locked <0x1f84df50> (a [Ljava.lang.Object;)
>         at org.apache.felix.framework.Felix.registerService(Felix.java:3205)
>         at org.apache.felix.framework.BundleContextImpl.registerService(BundleContextImpl.java:346)
>         at org.apache.felix.servicebinder.InstanceManager.requestRegistration(InstanceManager.java:508)
>         at org.apache.felix.servicebinder.InstanceManager.validate(InstanceManager.java:294)
>         - locked <0x1fa2ef78> (a org.apache.felix.servicebinder.InstanceManager)
>         at org.apache.felix.servicebinder.InstanceManager$DependencyManager.serviceChanged(InstanceManager.java:948)
>         - locked <0x1fa2ef78> (a org.apache.felix.servicebinder.InstanceManager)
>         at org.apache.felix.framework.util.EventDispatcher.invokeServiceListenerCallback(EventDispatcher.java:932)
>         at org.apache.felix.framework.util.EventDispatcher.fireEventImmediately(EventDispatcher.java:793)
>         at org.apache.felix.framework.util.EventDispatcher.fireServiceEvent(EventDispatcher.java:543)
>         at org.apache.felix.framework.Felix.fireServiceEvent(Felix.java:4252)
>         at org.apache.felix.framework.Felix.registerService(Felix.java:3275)
>         at org.apache.felix.framework.BundleContextImpl.registerService(BundleContextImpl.java:346)
>         at org.apache.felix.http.base.internal.HttpServiceController.register(HttpServiceController.java:135)
>         at org.apache.felix.http.base.internal.DispatcherServlet.init(DispatcherServlet.java:48)
>         at org.mortbay.jetty.servlet.ServletHolder.initServlet(ServletHolder.java:440)
>         at org.mortbay.jetty.servlet.ServletHolder.doStart(ServletHolder.java:263)
>         at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
>         - locked <0x1fa2f0b0> (a java.lang.Object)
>         at org.mortbay.jetty.servlet.ServletHandler.initialize(ServletHandler.java:736)
>         at org.mortbay.jetty.servlet.Context.startContext(Context.java:140)
>         at org.mortbay.jetty.handler.ContextHandler.doStart(ContextHandler.java:518)
>         at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
>         - locked <0x1fa2f1c0> (a java.lang.Object)
>         at org.mortbay.jetty.handler.HandlerWrapper.doStart(HandlerWrapper.java:130)
>         at org.mortbay.jetty.Server.doStart(Server.java:224)
>         at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
>         - locked <0x1fa03e50> (a java.lang.Object)
>         at org.apache.felix.http.jetty.internal.JettyService.initializeJetty(JettyService.java:181)
>         at org.apache.felix.http.jetty.internal.JettyService.startJetty(JettyService.java:116)
>         at org.apache.felix.http.jetty.internal.JettyService.run(JettyService.java:307)
>         at java.lang.Thread.run(Thread.java:619)
> ====

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira