You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tomcat.apache.org by bu...@apache.org on 2010/11/20 00:09:13 UTC

DO NOT REPLY [Bug 50306] New: Detect stuck threads

https://issues.apache.org/bugzilla/show_bug.cgi?id=50306

           Summary: Detect stuck threads
           Product: Tomcat 7
           Version: 7.0.4
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: enhancement
          Priority: P2
         Component: Catalina
        AssignedTo: dev@tomcat.apache.org
        ReportedBy: sylvain.laurent@gmail.com


Feature request : 
regularly scan worker threads and if one has been processing the same request
for longer than a configurable delay, log a warning with the stack trace of
that thread.
This would allow to detect very long running threads, usually the ones that are
stuck in a network call or in a deadlock.

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


DO NOT REPLY [Bug 50306] Detect stuck threads

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=50306

--- Comment #6 from Sylvain Laurent <sy...@gmail.com> 2010-11-22 14:22:05 EST ---
My initial idea was : if a request takes too long, log a WARNING with the stack
trace of the thread processing the request.
I think that it is important to have this in log file (push information) than
to just allow access by JMX (pull information) because it allows to discover
problems pro-actively, for instance a network call that is stuck waiting for
data from an unhealthy server and no socket timeout has been set for the call.

As for the actual implementation, it could indeed be a LifeCycleListener, but
it will probably need some changes elsewhere to provide instrumentation.

I'll try to propose a patch in a couple of days.

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


DO NOT REPLY [Bug 50306] Detect stuck threads

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=50306

TomLu <di...@hotmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  Attachment #26451|0                           |1
        is obsolete|                            |

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


DO NOT REPLY [Bug 50306] Detect stuck threads

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=50306

--- Comment #2 from Sylvain Laurent <sy...@gmail.com> 2010-11-21 15:26:18 EST ---
What is this diagnostics package ? something in preparation ? Should I wait for
it before working on a patch for the stuck thread detection feature ?

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


DO NOT REPLY [Bug 50306] Detect stuck threads

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=50306

--- Comment #5 from Konstantin Kolinko <kn...@gmail.com> 2010-11-22 09:52:30 EST ---
1) The manager webapp already has a list of requests being processed on the
"Server Status" page, as Mark mentioned.

One can make them as clickable, with more info displayed on demand.

2) If some "scan worker thread" self-diagnostic activity is needed inside
Tomcat, one can wrap it into o.a.c.LifecycleListener and add to server.xml. I
doubt that there is one-fits-all solution here, but I won't stop anyone
contributing.

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


DO NOT REPLY [Bug 50306] Detect stuck threads

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=50306

--- Comment #8 from Pid <bu...@pidster.com> 2010-12-29 16:47:42 EST ---
If I understand the patch correctly, I disagree with the idea of an arbitrary
(even if configurable) time threshold per thread.

A long running thread is not necessarily a broken one; there are existing
methods in JMX for identifying deadlocked threads.

The same could be achieved with a Valve in the request processing pipeline,
without modifying the Executor.

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


DO NOT REPLY [Bug 50306] Detect stuck threads

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=50306

Sylvain Laurent <sl...@apache.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


DO NOT REPLY [Bug 50306] Detect stuck threads

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=50306

--- Comment #9 from TomLu <di...@hotmail.com> 2011-03-14 10:30:23 EDT ---
Created an attachment (id=26769)
 --> (https://issues.apache.org/bugzilla/attachment.cgi?id=26769)
New Valve: StuckThreadDetectionValve

Looking at the source, a Valve seems like a much better place for such code as
you wrote. However, I still think we need some kind of a threshold
(configurable or not). Deadlocked threads are not the only problem we can face.
We are often experiencing problems with erronous data in our DB, which causes
loops to occur. Http client request without any configured read or connect
timeout can also cause a thread to appear "stuck". The threads this Valve
should monitor are HTTP/AJP processor threads only (not user threads), and
these threads should _normally_ not be processing long running requests. Such
requests should preferably be allocated to a separate thread if possible. Users
who do need to execute long running request in the HTTP/AJP threads can choose
not to use this Valve. This seems to be the spirit of Tomcat 7.x: create tools
to help users improve their apps (Ref memory leakage detection logic). In many
servers, the main HTTP request threads are considered a limited resource, and
thus if such threads are running for too long, the pool will eventually be
depleted. This is the case for WebSphere, which has a default max pool size of
50, and a configurable threshold for it's "hung thread detection" (Default 600
seconds). A valve for detecting "long running threads" can be quite helpful not
only in development, but also in production environments in many cases. This
valve should probably be disabled by default.

I have now moved the code i previously posted to a Valve. I have attached a
patch for Tomcat 7.0.11 if anyone wants to add stuck thread detection logic to
their Tomcat instance.

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


DO NOT REPLY [Bug 50306] Detect stuck threads

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=50306

--- Comment #4 from Rainer Jung <ra...@kippdata.de> 2010-11-22 09:13:44 EST ---
I see basically two aspects that code inside Tomcat could add:

1) Extraction of relevant information

Since we know, which thread works on which request and we know a lot about the
request meta data (e.g. when it started) and about the threads (e.g. CPU
usage), we can very efficiently inspect the stacks for exactly those threads
that work on the same request for longer than some configured limit. No need to
dump stacks for all threads (a lot of them idle in the pools or connected to
the web server but not working on a request).

2) Reduction of polling interval

Since we focus on the relevant info we can do more frequent checks than one
would do with a plain "dump all stacks". E.g. every 10 seconds instead of every
one minute. If there is no long running request, no stack to inspect.

Yes, all of that can be done by a JMX client. But the client needs to be
powerful enough to add logic like retrieving the in-flight requests, extracting
those that already take to long, then retrieving the stacks for those requests
etc.

What I don't have a nice idea about is how to actually use that information. Of
course you could simply log the stacks and the request meta data for the long
running requests, but for an average admin this might be to technical (of no
real use). Though it would be helpful for his application provider when he
starts analyzing reasons for performance problems.

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


DO NOT REPLY [Bug 50306] Detect stuck threads

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=50306

--- Comment #3 from Mark Thomas <ma...@apache.org> 2010-11-21 15:45:21 EST ---
All of this info (except the stack trace) is already available via JMX and
displayed via the Manager app. There are also plenty of tools to generate stack
traces. I'm struggling to see what else could be usefully done here without
adding bloat.

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


[Bug 50306] Detect stuck threads

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=50306

Konstantin Kolinko <kn...@gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  Attachment #28961|0                           |1
        is obsolete|                            |

--- Comment #12 from Konstantin Kolinko <kn...@gmail.com> ---
Created attachment 28966
  --> https://issues.apache.org/bugzilla/attachment.cgi?id=28966&action=edit
2012-06-20_tc6_50306_StuckThreadDetectionValve.patch

Updated documentation and MBean description,
aligning proposed patch with Tomcat 7.

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


[Bug 50306] Detect stuck threads

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=50306

--- Comment #11 from Konstantin Kolinko <kn...@gmail.com> ---
Created attachment 28961
  --> https://issues.apache.org/bugzilla/attachment.cgi?id=28961&action=edit
2012-06-19_tc6_50306_StuckThreadDetectionValve.patch

Port of StuckThreadDetectionValve to Tomcat 6

I've also implemented the following additional features
(will apply to Tomcat 7 today, will be in 7.0.29):
- getStuckThreadNames() property, inspectable through JMX
- added thread ids to the log messages

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


DO NOT REPLY [Bug 50306] Detect stuck threads

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=50306

--- Comment #1 from Pid <bu...@pidster.com> 2010-11-20 15:53:38 EST ---
(In reply to comment #0)
> Feature request : 
> regularly scan worker threads and if one has been processing the same request
> for longer than a configurable delay, log a warning with the stack trace of
> that thread.
> This would allow to detect very long running threads, usually the ones that are
> stuck in a network call or in a deadlock.

The diagnostics package Rainer & I discussed included something similar to
this. It's possible to use JMX to identify if a thread is associated with a
request and report the properties of the request.

It's also possible to enumerate deadlocked threads using JMX.

I've got some sample code which does various diagnostic tasks, but it's very
experimental...

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


DO NOT REPLY [Bug 50306] Detect stuck threads

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=50306

--- Comment #10 from Sylvain Laurent <sl...@apache.org> 2011-04-07 17:33:10 EDT ---
Tom, thanks for your contribution. I committed the valve to trunk
http://svn.apache.org/viewvc?view=revision&revision=1090003 with the following
changes :

- fixed a race condition
- removed the minimum check for the threshold
- allowed any Container
- added other information in the logged warning (request, threshold)
- exposed as JMX, including the IDS of the stuck thread
- documentation

it will be available in tomcat 7.0.13

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


[Bug 50306] Detect stuck threads

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=50306

--- Comment #13 from Konstantin Kolinko <kn...@gmail.com> ---
StuckThreadDetectionValve added to Tomcat 6 in r1356246 and will be in 6.0.36.

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


DO NOT REPLY [Bug 50306] Detect stuck threads

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=50306

--- Comment #7 from TomLu <th...@tihlde.org> 2010-12-29 05:02:04 EST ---
Created an attachment (id=26451)
 --> (https://issues.apache.org/bugzilla/attachment.cgi?id=26451)
Patch for Tomcat's ThreadPoolExecutor

WebSphere has hung/stuck thread detection logic that has been very useful to
us, both in development and production. But since we are moving away from
WebSphere, it would be nice to have this kind of feature in Tomcat as well.
Some might consider this "stuck thread detection" unnessesary bloat, but we
have found it useful on numerous occasions. 

Applications are never perfect, and often threads will run for too long
(resulting in thread leakage and pool depletion):
-External connections might hang (like URLConnection) [you can configure
read/connect timeout, but not all applications has configured this]
-Loops (can be detected by excessive CPU usage)
-Loops which does not have excessive CPU usage (for example because it uses
external slow IO in each loop iteration, or has Thread.sleep!! (bad coding, but
it can happen))
-Deadlocks.
-Threads waiting on other hung threads.

I have created an experimental patch for Tomcat 7.0.5 that we have been using
in development for a short time now.

The hung thread monitoring does the following:
Monitors active (managed) threads, and issues a warning for each thread that
has been active for more than the configured threshold (seconds).
The message (log.warn) contains the thread name, stack trace, total time
active, and total number of suspected hung threads. 
(In our environment, we use Cron to scan system.out, and send email if a hung
thread is detected.)

A timer thread runs at a configurable interval and checks and reports hung
threads, and threads previously reported to be hung, but have now finished.

It detects hung threads in the HTTP/AJP thread pools. It does not monitor all
JVM threads, but since many user threads are spawned from managed "webrequest"
threads, the HTTP/AJP threads often waits for user threads to finish, so this
patch should give an indication that something is wrong in those cases as well.

If anyone is interested, I've attached the patch. 
This patch is just an example of how such monitoring can be implemented. This
is not production ready code. Performance impacts have not been considered!

The patch is for the following class:

org.apache.tomcat.util.threads.ThreadPoolExecutor

It uses 2 system properties for configuration:

org.apache.catalina.threadmonitor.interval (in seconds. Default 120)
org.apache.catalina.threadmonitor.threshold (in seconds. Default 600)

The monitoring is enabled by default, but can be disabled by setting the
"interval" to a negative value or 0.

As stated, it only monitors managed threads created by Tomcat's HTTP/AJP thread
pools.

Monitoring all JVM threads is also an option, but may require more code and
might lead to many false positives..? The most important thing is to detect
problems with stuck HTTP/AJP threads.

TODOs:

-Should the thread monitor properties be moved to server.xml instead of being
System properties?
-Add JMX notification when hung thread detected/hung thread finished


I have no problem understanding that such a feature might not be seen as useful
for everybody. We will continue to patch our own version of Tomcat anyway :)

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org