You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tomcat.apache.org by bu...@apache.org on 2021/07/16 20:59:48 UTC

[Bug 65454] New: Race condition with idle thread removal results in request being improperly enqueued by default ThreadPoolExecutor

https://bz.apache.org/bugzilla/show_bug.cgi?id=65454

            Bug ID: 65454
           Summary: Race condition with idle thread removal results in
                    request being improperly enqueued by default
                    ThreadPoolExecutor
           Product: Tomcat 9
           Version: 9.0.x
          Hardware: PC
                OS: All
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Util
          Assignee: dev@tomcat.apache.org
          Reporter: aogburn@redhat.com
  Target Milestone: -----

Created attachment 37959
  --> https://bz.apache.org/bugzilla/attachment.cgi?id=37959&action=edit
Reproducer byteman rule and test app

I've found a race condition that can play out like below any time a non-core
thread above minSpareThreads is idling out.  This can result in a request to be
improperly enqueued like so:

1. The thread pool has 11 total threads, 10 threads are busy and 1 idle.  After
the idle timeout (default 60 seconds), the idle thread pool member ends its
ThreadPoolExecutor.getTask()/workQueue.poll() wait on the task queue.  It
hasn't yet called
ThreadPoolExecutor.compareAndDecrementWorkerCount()/ThreadPoolExecutor.processWorkerExit()
to decrement the worker count and remove itself from the executor worker map.

2. A new connection comes in and is handed off to the executor, which calls
TaskQueue.offer.  Since the idle thread hasn't removed itself from the map,
parent.getPoolSize() still returns 11.  Thus, it passes this if check and so it
is enqueued:

https://github.com/apache/tomcat/blob/9.0.x/java/org/apache/tomcat/util/threads/TaskQueue.java#L87

3. The idle thread then finishes exiting and removes itself from the executor. 
The executor does not inherently replace that thread under this condition.  So
there are now 10 busy threads in the pool and no idle thread available to
process the new enqueued request so it sits in the queue until one of the other
threads becomes idle.  This could then typically be a small imperceptible delay
by the time another thread is created or returns idle.  But worst case, a very
large unexpected delay is induced on the new request depending upon the run
time of the current 10 busy tasks.


This can be hard to trigger or notice the effects of if triggered in most
typical operations, but I was able to reproduce deterministically with some
help from byteman.  To use byteman and reproduce:

1. Download the byteman 4.0.13 from
https://downloads.jboss.org/byteman/4.0.13/byteman-download-4.0.13-bin.zip
2. Unzip byteman into install directory <mypath>.
3. Use the Byteman rules file DecrementWorkerSleep.btm from
WorkerQueueRaceCondition.zip
4. Enable the Byteman instrumentation by adding the following to your JVM
options:

-javaagent:<mypath>/lib/byteman.jar=script:<mypath>/DecrementWorkerSleep.btm,boot:<mypath>/lib/byteman.jar
-Dorg.jboss.byteman.transform.all=true

5. Add the helloworld from WorkerQueueRaceCondition.zip to webapps for a simple
slow jsp that can be easily used to trigger long requests
7. Start Tomcat
8. Make 10 slow requests (/helloworld/hi.jsp?sleep=true), resulting in usage of
the 10 core threads
9. Make an 11th normal request (/helloworld/hi.jsp), resulting in the creation
of the 11th non-core thread
10. Wait ~60 seconds after that to result in the idle timeout of that 11th
thread.  With the extra byteman delay induced in that operation, you can watch
for a message like this:

--------------------------->ThreadPoolExecutor.compareAndDecrementWorkerCount
http-nio-8080-exec-11 sleeping...

11. Upon seeing that message, issue another normal quick request like in step 9
within the next 30 seconds and note it is not getting a response.  And without
any other activity on the server, the request is never processed until one of
the 10 core threads may become available.  A heap dump at that time and
inspecting the executor/taskQueue confirms only the 10 busy threads in the pool
with slow requests while the expected quick request is sitting in the queue
doing nothing.

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


[Bug 65454] Race condition with idle thread removal results in request being improperly enqueued by default ThreadPoolExecutor

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=65454

--- Comment #8 from Peter Schuebl <pe...@gmail.com> ---
I believe this causes an issue in
https://github.com/apache/tomcat/blob/9.0.x/java/org/apache/tomcat/util/net/AbstractEndpoint.java#L753
where the executor is checked if it is an instance of
java.util.concurrent.ThreadPoolExecutor

Calling setMaxThreads on the Endpoint no longer changes the max pool size in
the executor.

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


[Bug 65454] Race condition with idle thread removal results in request being improperly enqueued by default ThreadPoolExecutor

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=65454

--- Comment #4 from Mark Thomas <ma...@apache.org> ---
Agreed. I found a whole bunch of ways to do something similar but they all just
narrowed the timing window.

Before I do too much work on LinkedBlockingQueue, there is some handling around
RejectedExecutionException and the additional force methods I want to look at.
I'm not currently convinced that handling is necessary. The more of that we can
clean up, the simpler the changes to the copy of LinkedBlockingQueue will be.

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


[Bug 65454] Race condition with idle thread removal results in request being improperly enqueued by default ThreadPoolExecutor

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=65454

Mark Thomas <ma...@apache.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|---                         |FIXED
             Status|NEW                         |RESOLVED

--- Comment #7 from Mark Thomas <ma...@apache.org> ---
While importing ThreadPoolExecutor added 2k+ lines of code, the actual fix was
only a few lines of code. We'll need to check for updates to ThreadPoolExecutor
from time to time but I'd be surprised if there are still bugs in the parts we
are using.

Fixed in:
- 10.1.x for 10.1.0-M3 onwards
- 10.0.x for 10.0.9 onwards
- 9.0.x for 9.0.51 onwards
- 8.5.x for 8.5.70 onwards

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


[Bug 65454] Race condition with idle thread removal results in request being improperly enqueued by default ThreadPoolExecutor

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=65454

--- Comment #3 from Aaron Ogburn <ao...@redhat.com> ---
Thanks, Mark.  I also wasn't coming up with a proper fix in the current
implementation.  The best thing I saw here was to adjust
https://github.com/apache/tomcat/blob/9.0.x/java/org/apache/tomcat/util/threads/TaskQueue.java#L87
to the following to ensure at least an extra idle thread if enqueuing:

        if (parent.getSubmittedCount()<(parent.getPoolSize())) {


But that'd only reduce chance and wouldn't be foolproof as we could still
potentially end up with both idle threads timing out in the same problem
timing.  Otherwise, it was looking like re-implementing was required to fix :(

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


[Bug 65454] Race condition with idle thread removal results in request being improperly enqueued by default ThreadPoolExecutor

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=65454

--- Comment #2 from Mark Thomas <ma...@apache.org> ---
Thanks for reporting this.

I'm currently looking at options that don't involved completely re-implementing
LinkedBlockingQueue and/or ThreadPoolExecutor.

I'm not hopeful of finding such an option at this point. We may need to do
something similar to Commons Pool and take a copy of LinkedBlockingQueue and
adapt it to our needs.

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


[Bug 65454] Race condition with idle thread removal results in request being improperly enqueued by default ThreadPoolExecutor

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=65454

--- Comment #5 from Mark Thomas <ma...@apache.org> ---
I see why the force handling is needed. It is to address the case where there
are maxThreads-1 current threads, two new connections are received, both try to
create a new thread, one gets created and the other gets rejected. The rejected
one needs to be put on the queue.

There is a balance to strike here between reducing complexity (and
opportunities for bugs) by re-using existing classes and the additional
complexity created by re-use because of having to work around not having access
to all of the internals we really need.

I'm still trying to figure out the best way forward.

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


[Bug 65454] Race condition with idle thread removal results in request being improperly enqueued by default ThreadPoolExecutor

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=65454

--- Comment #1 from Aaron Ogburn <ao...@redhat.com> ---
Fortunately for an easy workaround, you can set minSpareThreads equal to
maxThreads to avoid any idle thread removal that may allow for that condition.

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


[Bug 65454] Race condition with idle thread removal results in request being improperly enqueued by default ThreadPoolExecutor

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=65454

--- Comment #6 from Mark Thomas <ma...@apache.org> ---
Just a quick update. I am currently looking at a solution along the following
lines:
- import latest ThreadPoolExecutor from JSR 166 repo
- Fix any CheckStyle etc issues
- Merge in the changes from o.a.t.util.ThreadPoolexecutor
- Fix this issue by checking the queue size after stopping a worker thread and
adding a new thread to the pool if the queue size is not zero.

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org