You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@sling.apache.org by "Felix Meschberger (JIRA)" <ji...@apache.org> on 2012/07/17 17:03:34 UTC

[jira] [Created] (SLING-2535) QuartzScheduler:ApacheSling thread group remaining after stopping the scheduler bundle

Felix Meschberger created SLING-2535:
----------------------------------------

             Summary: QuartzScheduler:ApacheSling thread group remaining after stopping the scheduler bundle
                 Key: SLING-2535
                 URL: https://issues.apache.org/jira/browse/SLING-2535
             Project: Sling
          Issue Type: Bug
          Components: Commons
    Affects Versions: Commons Scheduler 2.3.4
            Reporter: Felix Meschberger


When the Scheduler bundle is stopped, the threads (probably the thread pool) is cleaned away but the thread group "QuartzScheduler:ApacheSling" remains. For ultimate cleanup, I would think the thread group should also be destroyed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (SLING-2535) QuartzScheduler:ApacheSling thread group remaining after stopping the scheduler bundle

Posted by "Ian Boston (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SLING-2535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13487598#comment-13487598 ] 

Ian Boston commented on SLING-2535:
-----------------------------------

Testing in the trunk at 1403475 shows that the thread pool does shutdown if Sling has just been started and no jobs have been run.

31.10.2012 17:48:56.013 *INFO* [401258966@qtp-1399560387-11] org.apache.sling.commons.threads.impl.DefaultThreadPool Shutting down thread pool [ThreadPool-0a751a8b-e907-4993-806f-64923e3d85cd (Apache Sling Eventing Thread Pool)] ...
31.10.2012 17:48:56.013 *INFO* [401258966@qtp-1399560387-11] org.apache.sling.commons.threads.impl.DefaultThreadPool Thread pool [ThreadPool-0a751a8b-e907-4993-806f-64923e3d85cd (Apache Sling Eventing Thread Pool)] is shut down.

JMX records the Thread group disappears when the bundle unloads.
(My JDK is 1.6 Java HotSpot(TM) 64-Bit Server VM version 20.12-b01-434)

Default configuration on thread groups is to shutdown non gracefully so its not an issue with threads in the thread group being slow to terminate.

Looking at the code:

In org.apache.sling.commons.scheduler.impl.QuartzScheduler.QuartzThreadPool.shutdown(boolean) does nothing. 

QuartzThreadPool.shutdown is called by org.quartz.core.QuartzScheduler.shutdown(boolean) line 677.

        resources.getThreadPool().shutdown(waitForJobsToComplete);


The comment on org.apache.sling.commons.scheduler.impl.QuartzScheduler.QuartzThreadPool.shutdown(boolean) indicates that the pool is managed by the thread pool manager.

in the org.apache.sling.commons.scheduler.impl.QuartzScheduler.dispose(Scheduler) line 222, tpm.release(this.threadPool); is called. 
This calls org.apache.sling.commons.threads.impl.DefaultThreadPoolManager.Entry.decUsage() which when a reference counter reaches zero the thread pool is shutdown by calling org.apache.sling.commons.threads.impl.DefaultThreadPool.shutdown() which calls down to the JDK.

That last method should emit some messages:
        this.logger.info("Shutting down thread pool [{}] ...", name);
followed by
        this.logger.info("Thread pool [{}] is shut down.", this.name);
 
the incUsage and decUsage methods reference count with ints protected by synchronized(this.pool) where this.pool is the pool they were added to.... except for one location.

org.apache.sling.commons.threads.impl.DefaultThreadPoolManager.create(ThreadPoolConfig) does this:

  final Entry entry = new Entry(null, config, name);
  synchronized ( this.pools ) {
          this.pools.put(name, entry);
   }
   return entry.incUsage();

which could result in an invalid reference count causing decUsage to never call the ThreadExecutor shutdown.


To recap:
If there is no 
Shutting down thread pool [ThreadPool-0a751a8b-e907-4993-806f-64923e3d85cd (Apache Sling Eventing Thread Pool)] 

in the logs there is a race condition in decUsage, incUsage

if there is, but there is no 
Thread pool [ThreadPool-0a751a8b-e907-4993-806f-64923e3d85cd (Apache Sling Eventing Thread Pool)] is shut down.

Then its an issue with running jobs in the Quartz scheduler.
Since the default configuration of the ThreadPools (unless there is a configuration) seems to be to a non graceful shutdown which points to a race condition.

In addition FindBugs reports 
VO_VOLATILE_INCREMENT
This code increments a volatile field. Increments of volatile fields aren't atomic. If more than one thread is incrementing the field at the same time, increments could be lost.

(It doesn't detect the potential synchronization issue or that the volatile field is protected)


To fix, for certain I would need to be able to reproduce. Is there are reliable way ?

(Sorry for the long comment)







                
> QuartzScheduler:ApacheSling thread group remaining after stopping the scheduler bundle
> --------------------------------------------------------------------------------------
>
>                 Key: SLING-2535
>                 URL: https://issues.apache.org/jira/browse/SLING-2535
>             Project: Sling
>          Issue Type: Bug
>          Components: Commons
>    Affects Versions: Commons Scheduler 2.3.4
>            Reporter: Felix Meschberger
>
> When the Scheduler bundle is stopped, the threads (probably the thread pool) is cleaned away but the thread group "QuartzScheduler:ApacheSling" remains. For ultimate cleanup, I would think the thread group should also be destroyed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (SLING-2535) QuartzScheduler:ApacheSling thread group remaining after stopping the scheduler bundle

Posted by "Ian Boston (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SLING-2535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13493750#comment-13493750 ] 

Ian Boston commented on SLING-2535:
-----------------------------------

I think this is fixed, but I am leaving the issue open to see if it reappears in the wild.
                
> QuartzScheduler:ApacheSling thread group remaining after stopping the scheduler bundle
> --------------------------------------------------------------------------------------
>
>                 Key: SLING-2535
>                 URL: https://issues.apache.org/jira/browse/SLING-2535
>             Project: Sling
>          Issue Type: Bug
>          Components: Commons
>    Affects Versions: Commons Scheduler 2.3.4
>            Reporter: Felix Meschberger
>             Fix For: Commons Scheduler 2.3.6
>
>
> When the Scheduler bundle is stopped, the threads (probably the thread pool) is cleaned away but the thread group "QuartzScheduler:ApacheSling" remains. For ultimate cleanup, I would think the thread group should also be destroyed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (SLING-2535) QuartzScheduler:ApacheSling thread group remaining after stopping the scheduler bundle

Posted by "Carsten Ziegeler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SLING-2535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13426675#comment-13426675 ] 

Carsten Ziegeler commented on SLING-2535:
-----------------------------------------

Actually calling shutdown on the quartz scheduler should remove the thread group. Maybe we need do to something additional
                
> QuartzScheduler:ApacheSling thread group remaining after stopping the scheduler bundle
> --------------------------------------------------------------------------------------
>
>                 Key: SLING-2535
>                 URL: https://issues.apache.org/jira/browse/SLING-2535
>             Project: Sling
>          Issue Type: Bug
>          Components: Commons
>    Affects Versions: Commons Scheduler 2.3.4
>            Reporter: Felix Meschberger
>
> When the Scheduler bundle is stopped, the threads (probably the thread pool) is cleaned away but the thread group "QuartzScheduler:ApacheSling" remains. For ultimate cleanup, I would think the thread group should also be destroyed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (SLING-2535) QuartzScheduler:ApacheSling thread group remaining after stopping the scheduler bundle

Posted by "Ian Boston (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SLING-2535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13487675#comment-13487675 ] 

Ian Boston commented on SLING-2535:
-----------------------------------

I have put sync blocks inside incUsage and decUsage to see if that addresses the problem in r1404087.
After looking at it for some time I felt this was safer since it doesn't require the caller to know they have to be synchronized.
                
> QuartzScheduler:ApacheSling thread group remaining after stopping the scheduler bundle
> --------------------------------------------------------------------------------------
>
>                 Key: SLING-2535
>                 URL: https://issues.apache.org/jira/browse/SLING-2535
>             Project: Sling
>          Issue Type: Bug
>          Components: Commons
>    Affects Versions: Commons Scheduler 2.3.4
>            Reporter: Felix Meschberger
>
> When the Scheduler bundle is stopped, the threads (probably the thread pool) is cleaned away but the thread group "QuartzScheduler:ApacheSling" remains. For ultimate cleanup, I would think the thread group should also be destroyed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (SLING-2535) QuartzScheduler:ApacheSling thread group remaining after stopping the scheduler bundle

Posted by "Carsten Ziegeler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SLING-2535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13487677#comment-13487677 ] 

Carsten Ziegeler commented on SLING-2535:
-----------------------------------------

Hmm, not 100% sure, need to check the code again but right now all other syncing is done from the outside. I think there is a decUsage()/ isInUse() combination which could maybe cause trouble
                
> QuartzScheduler:ApacheSling thread group remaining after stopping the scheduler bundle
> --------------------------------------------------------------------------------------
>
>                 Key: SLING-2535
>                 URL: https://issues.apache.org/jira/browse/SLING-2535
>             Project: Sling
>          Issue Type: Bug
>          Components: Commons
>    Affects Versions: Commons Scheduler 2.3.4
>            Reporter: Felix Meschberger
>
> When the Scheduler bundle is stopped, the threads (probably the thread pool) is cleaned away but the thread group "QuartzScheduler:ApacheSling" remains. For ultimate cleanup, I would think the thread group should also be destroyed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (SLING-2535) QuartzScheduler:ApacheSling thread group remaining after stopping the scheduler bundle

Posted by "Ian Boston (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SLING-2535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13488220#comment-13488220 ] 

Ian Boston commented on SLING-2535:
-----------------------------------

Hmm, I see what you mean. 
The shutdown in decUsage has the potential to cause a deadlock.

       public void decUsage() {
            synchronized (usagelock) {
                this.count--;
                if ( this.count == 0 ) {
                    this.shutdown();
                }
            }
        }

could be 
         public void decUsage() {
            boolean shutdown = false;
            synchronized (usagelock) {
                this.count--;
                if ( this.count == 0 ) {
                    shutdown = true;
                }
            }
            if ( shutdown ) {
                  this.shutdown();
            }
        }
                
> QuartzScheduler:ApacheSling thread group remaining after stopping the scheduler bundle
> --------------------------------------------------------------------------------------
>
>                 Key: SLING-2535
>                 URL: https://issues.apache.org/jira/browse/SLING-2535
>             Project: Sling
>          Issue Type: Bug
>          Components: Commons
>    Affects Versions: Commons Scheduler 2.3.4
>            Reporter: Felix Meschberger
>
> When the Scheduler bundle is stopped, the threads (probably the thread pool) is cleaned away but the thread group "QuartzScheduler:ApacheSling" remains. For ultimate cleanup, I would think the thread group should also be destroyed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (SLING-2535) QuartzScheduler:ApacheSling thread group remaining after stopping the scheduler bundle

Posted by "Ian Boston (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SLING-2535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ian Boston updated SLING-2535:
------------------------------

    Fix Version/s: Commons Scheduler 2.3.6
    
> QuartzScheduler:ApacheSling thread group remaining after stopping the scheduler bundle
> --------------------------------------------------------------------------------------
>
>                 Key: SLING-2535
>                 URL: https://issues.apache.org/jira/browse/SLING-2535
>             Project: Sling
>          Issue Type: Bug
>          Components: Commons
>    Affects Versions: Commons Scheduler 2.3.4
>            Reporter: Felix Meschberger
>             Fix For: Commons Scheduler 2.3.6
>
>
> When the Scheduler bundle is stopped, the threads (probably the thread pool) is cleaned away but the thread group "QuartzScheduler:ApacheSling" remains. For ultimate cleanup, I would think the thread group should also be destroyed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (SLING-2535) QuartzScheduler:ApacheSling thread group remaining after stopping the scheduler bundle

Posted by "Carsten Ziegeler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SLING-2535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13487609#comment-13487609 ] 

Carsten Ziegeler commented on SLING-2535:
-----------------------------------------

I think you're right, create() and also get(String) do not sync access to the Entry class (incUsage in this case) - so it should be moved into the sync block in both cases.
                
> QuartzScheduler:ApacheSling thread group remaining after stopping the scheduler bundle
> --------------------------------------------------------------------------------------
>
>                 Key: SLING-2535
>                 URL: https://issues.apache.org/jira/browse/SLING-2535
>             Project: Sling
>          Issue Type: Bug
>          Components: Commons
>    Affects Versions: Commons Scheduler 2.3.4
>            Reporter: Felix Meschberger
>
> When the Scheduler bundle is stopped, the threads (probably the thread pool) is cleaned away but the thread group "QuartzScheduler:ApacheSling" remains. For ultimate cleanup, I would think the thread group should also be destroyed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (SLING-2535) QuartzScheduler:ApacheSling thread group remaining after stopping the scheduler bundle

Posted by "Ian Boston (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SLING-2535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13488375#comment-13488375 ] 

Ian Boston commented on SLING-2535:
-----------------------------------

Reverted the commit and put the sync one layer further out as suggested.
Avoiding a deadlock or weird behaviour was going to be hard.
                
> QuartzScheduler:ApacheSling thread group remaining after stopping the scheduler bundle
> --------------------------------------------------------------------------------------
>
>                 Key: SLING-2535
>                 URL: https://issues.apache.org/jira/browse/SLING-2535
>             Project: Sling
>          Issue Type: Bug
>          Components: Commons
>    Affects Versions: Commons Scheduler 2.3.4
>            Reporter: Felix Meschberger
>
> When the Scheduler bundle is stopped, the threads (probably the thread pool) is cleaned away but the thread group "QuartzScheduler:ApacheSling" remains. For ultimate cleanup, I would think the thread group should also be destroyed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira