You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tomcat.apache.org by bu...@apache.org on 2020/10/26 11:32:32 UTC

[Bug 64848] New: WsSession objects in OUTPUT_CLOSED state are implicitly held by waitingProcessors and GC cannot purge them from the JVM heap

https://bz.apache.org/bugzilla/show_bug.cgi?id=64848

            Bug ID: 64848
           Summary: WsSession objects in OUTPUT_CLOSED state are
                    implicitly held by waitingProcessors and GC cannot
                    purge them from the JVM heap
           Product: Tomcat 9
           Version: 9.0.36
          Hardware: PC
            Status: NEW
          Severity: major
          Priority: P4
         Component: WebSocket
          Assignee: dev@tomcat.apache.org
          Reporter: laszlo.peter.karolyi@gmail.com
  Target Milestone: -----

Created attachment 37534
  --> https://bz.apache.org/bugzilla/attachment.cgi?id=37534&action=edit
Attachment with the 3 services that help reproduce the issue +
SocketTimeoutException stack trace and WsSession GC root snapshot from a heap
dump

Overview:
---------

WebSocket session objects (represented as WsSession) "get stuck" on the heap
under heavy load in case Tomcat acts as a WebSocket API gateway between a
client and a server application.
By assuming a system configuration where the Tomcat WebSocket API gateway
service is deployed along with a server application on the same machine but the
client is launched elsewhere then the network latency can impose a considerable
overhead on the entire client-side data processing compared to the server side
where the data can be generated and transferred to the gateway service much
faster than the client can consume it and this can lead to the classic fast
producer-slow consumer phenomenon.

If the network latency goes beyond a certain threshold and the client cannot
keep up in a timely manner with the data flow coming from the gateway service
then Tomcat starts throwing SocketTimeoutException (by default after 20
seconds) at the WebSocket sessions for which the data weren't transmitted in
time. Such a timeout may end up in an abnormally closed WebSocket connection
(usually represented with 1006 status code at the client side) and even though
the corresponding sessions are moved into the OUTPUT_CLOSED state at Tomcat
level, they are still kept on the JVM heap endlessly by preventing the GC to
purge them out consequently producing a slow memory leak.

Steps to Reproduce:
-------------------

Reproducing such a situation is a bit cumbersome in terms of the required
hardware configuration as it needs two machines/VMs: one for the client and
another for the Tomcat WebSocket API gateway + server application. On the other
hand the client app should be hosted "far" from the Tomcat WebSocket app in
terms of network distance, i.e. if it connects to the Tomcat WebSocket app via
VPN then the network latency can be enough to reproduce the issue.
Alternatively in the Tomcat WebSocket app the
org.apache.tomcat.websocket.BLOCKING_SEND_TIMEOUT property can also be adjusted
as part of a customized RequestUpgradeStrategy to simulate a slow network.
The overall system demonstrates a simple distributed client-server application
inserting the Tomcat WebSocket API GW as an intermediary. The client can send a
number to the server that denotes a length (given in KBs) so the server will
respond with a random alphanumeric string having a length specified by the
given number. The Tomcat WebSocket API GW just routes the WS traffic back and
forth between the other two services.

1. Run the attached
random-string-ws-provider-undertow-1.0.0-SNAPSHOT-jar-with-dependencies
application (the server app) in a form of

java -jar
random-string-ws-provider-undertow-1.0.0-SNAPSHOT-jar-with-dependencies <host>
<port>

By default it configures the underlying Undertow webserver to launch on
localhost and listen on port #8193.

2. Run the attached ws-api-gateway-tomcat-1.0.0-SNAPSHOT.jar application (the
Tomcat WebSocket API GW app) in a form of

java -jar ws-api-gateway-tomcat-1.0.0-SNAPSHOT.jar

By default it listens on port #8444 and it can be overridden by setting the
server.port property.
If the Undertow server app runs with non-default host and port configurations
then this needs to be reflected here by specifying the
zuul.routes.random-string-websocket-provider.url property accordingly, e.g.:

java -jar
-Dzuul.routes.random-string-websocket-provider.url=http://<another-host>:<another-port>
ws-api-gateway-tomcat-1.0.0-SNAPSHOT.jar

3. Run the attached ws-random-string-gatling-load-test application (the client
app wrapped into gatling to generate artifical load) in a form of

mvn clean -B compile exec:java -Dexec.mainClass=RandomStringWebSocketRequestApp
-DrampUpUsers=<number-of-concurrent-users> -DrampUpTime=1
-DserverUrl=<host-where-other-two-services-run>
-DserverPort=<port-where-Tomcat-API-GW-listens-on>
-Dgatling.simulationClass=com.acme.wsrequest.simulation.RandomStringWebSocketRequestSimulation
-DrandomStringLengthInKb=1000

Actual Results:
---------------

Running the client app with 400 users will start producing the
SocketTimeoutException confidently in the Tomcat WebSocket API gateway service.
At the client side the gatling report starts showing unexpectedly closed WS
connections (with status code 1006) and the number of such connections seems to
have a strong correspondence to the number of "got stuck" WsSession objects on
the Tomcat WebSocket app's heap. That WsSession objects are preserved
indefinitely and hence cannot be garbage-collected.

Expected Results:
-----------------

WsSession objects representing abnormally closed WebSocket connections shall
eventually be the subject of garbage collection on the JVM heap.

Build Date & Hardware:
----------------------

Build 2020-10-26 on Windows Server 2016 Standard (Version 1607 - OS Build
14393.3930)

Additional Builds and Platforms:
--------------------------------

N/A

Additional Information:
-----------------------

There is an attachment
(tomcat-ws-api-gw-sockettimeoutexception-stack-trace.txt) to show the stack
trace produced when SocketTimeoutException is encountered.

Another attachment (tomcat-wssession-gc-root.png) contains the relevant prt of
the heap dump created after a 400-user gatling load execution. Searching for
"websocketsession" objects will bring up the preserved WebSocket session
objects and checking the GC root of such a session object can also show the
object reference chain up to the "waitingProcessors" map present in
Http11NioProtocol.

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


[Bug 64848] WsSession objects in OUTPUT_CLOSED state are implicitly held by waitingProcessors and GC cannot purge them from the JVM heap

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=64848

--- Comment #19 from Matt M <in...@gmail.com> ---
This is looking like a case where the remote endpoint is not able to respond
with its own close message.  

We have seen 2 types of exception recorded in the Nio2SocketWrapper present in
the heap:

AsynchronousCloseException
IOException - connection reset by peer

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


[Bug 64848] WsSession objects in OUTPUT_CLOSED state are implicitly held by waitingProcessors and GC cannot purge them from the JVM heap

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=64848

--- Comment #14 from Christopher Schultz <ch...@christopherschultz.net> ---
Tomcat 9.0.40 started on 12 November[1]. Feel free to test and vote!

[1]
https://lists.apache.org/thread.html/rce32f696e12fecfd4b4383a5e8a389f4dac7d357053fd03ac7280868%40%3Cdev.tomcat.apache.org%3E

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


[Bug 64848] WsSession objects in OUTPUT_CLOSED state are implicitly held by waitingProcessors and GC cannot purge them from the JVM heap

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=64848

--- Comment #15 from Vishwanath Karra <vi...@servicenow.com> ---
We are seeing this issue with 8.5.72 - is there some other patch needed for the
8.5 codeline ? The fix is confirmed for 9.0.40.

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


[Bug 64848] WsSession objects in OUTPUT_CLOSED state are implicitly held by waitingProcessors and GC cannot purge them from the JVM heap

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=64848

Mark Thomas <ma...@apache.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|---                         |FIXED
             Status|NEEDINFO                    |RESOLVED

--- Comment #5 from Mark Thomas <ma...@apache.org> ---
This bug has already been fixed in the 9.0.38 onwards.

https://github.com/apache/tomcat/commit/5707895bafa38c1443460356a268932a967b79e9

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


[Bug 64848] WsSession objects in OUTPUT_CLOSED state are implicitly held by waitingProcessors and GC cannot purge them from the JVM heap

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=64848

--- Comment #17 from zhougang <zh...@gmail.com> ---
(In reply to Christopher Schultz from comment #16)
> (In reply to Vishwanath Karra from comment #15)
> > We are seeing this issue with 8.5.72 - is there some other patch needed for
> > the 8.5 codeline ? The fix is confirmed for 9.0.40.
> 
> Are you sure it's the same issue? Please post all your relevant information
> (i.e. actual evidence that you have a Websocket session memory leak) to the
> user's mailing list to verify this is the same problem.

 We can see the problem on 8.5.84, the problem is when we configure server.xml
connector of the protocol is org. Apache. Coyote. Http11. Http11Nio2Protocol.
This is not an issue when changing to the default configuration (Nio).
  In tests, when Nio mode was set, the problem was fixed after 8.5.60, but when
Nio2 mode was set, the problem existed on versions 8.5.60, 8.5.75, 8.5.84, etc.

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


[Bug 64848] WsSession objects in OUTPUT_CLOSED state are implicitly held by waitingProcessors and GC cannot purge them from the JVM heap

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=64848

--- Comment #13 from Laszlo Karolyi <la...@gmail.com> ---
Hi Thomas,

According to the last information I got the Tomcat 9.0.40 release was estimated
to be ready this week but I haven't seen any update on this.

Since our internal release is still in progress this week, I think we may have
the chance to integrate the new Tomcat early next week in case it'll be ready
by then.
Would it be feasible to have the official Tomcat 9.0.40 build in the beginning
of next week?

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


[Bug 64848] WsSession objects in OUTPUT_CLOSED state are implicitly held by waitingProcessors and GC cannot purge them from the JVM heap

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=64848

--- Comment #11 from Mark Thomas <ma...@apache.org> ---
Soon. Need to resolve the remaining open issues then we can tag. Hopefully
early next week.

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


[Bug 64848] WsSession objects in OUTPUT_CLOSED state are implicitly held by waitingProcessors and GC cannot purge them from the JVM heap

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=64848

Mark Thomas <ma...@apache.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|---                         |FIXED
             Status|REOPENED                    |RESOLVED

--- Comment #8 from Mark Thomas <ma...@apache.org> ---
Fixed in:
- 10.0.x for 10.0.0-M10 onwards
- 9.0.x for 9.0.40 onwards
- 8.5.x for 8.5.60 onwards

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


[Bug 64848] WsSession objects in OUTPUT_CLOSED state are implicitly held by waitingProcessors and GC cannot purge them from the JVM heap

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=64848

--- Comment #3 from Mark Thomas <ma...@apache.org> ---
Actually, scratch that last comment. Tomcat detects that and will throw an
Exception. I'm still looking into this but there are changes between 9.0.36 and
9./0.39 that might help so that feedback would still be very helpful.

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


[Bug 64848] WsSession objects in OUTPUT_CLOSED state are implicitly held by waitingProcessors and GC cannot purge them from the JVM heap

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=64848

--- Comment #12 from Laszlo Karolyi <la...@gmail.com> ---
Ok, it sounds promising. Thanks for the quick feedback.

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


[Bug 64848] WsSession objects in OUTPUT_CLOSED state are implicitly held by waitingProcessors and GC cannot purge them from the JVM heap

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=64848

--- Comment #9 from Laszlo Karolyi <la...@gmail.com> ---
I can confirm that the issue I reported is indeed fixed in Tomcat 9.0.40 so
thanks for the quick remediation.
Do you happen to have an estimation as to when the 9.0.40 version of Tomcat is
expected to be released?

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


[Bug 64848] WsSession objects in OUTPUT_CLOSED state are implicitly held by waitingProcessors and GC cannot purge them from the JVM heap

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=64848

Mark Thomas <ma...@apache.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|RESOLVED                    |REOPENED
         Resolution|FIXED                       |---

--- Comment #7 from Mark Thomas <ma...@apache.org> ---
The issue as reproduced by the provided test case is fixed in 9.0.39.

Entries are removed from the waitingProcessor list when an event is processed
for the socket (and returned once processing is complete if the connection is
still open).

If the writes from the server are made from an application thread (rather then
in response to a message from the client and on the same thread) then it looks
like it is possible that an unclean close will result in problem described
here.

Re-opening while I investigate a potential unit test for this.

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


[Bug 64848] WsSession objects in OUTPUT_CLOSED state are implicitly held by waitingProcessors and GC cannot purge them from the JVM heap

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=64848

--- Comment #6 from Laszlo Karolyi <la...@gmail.com> ---
After re-testing the reported scenario with 9.0.39 Tomcat version (core +
websocket) the WsSession objects still seem to get stuck on the heap.

Upon investigating the Tomcat source code a bit deeper it's not entirely clear
how a particular Processor can be removed from the waitingProcessors set
located in Http11NioProtocol class (more accurately in its heir,
AbstractProtocol). According to the heap dump snapshot enclosed with the
original report, a WsSession object is held by a WsHttpUpgradeHandler object
that is used within UpgradeProcessorInternal class. UpgradeProcessorInternal
inherits from Processor, the type whose objects seem stored in
waitingProcessors.

If sending a message block fails in WsRemoteEndpointImplBase during the
invocation of writeMessagePart() method (as the referred
https://github.com/apache/tomcat/commit/5707895bafa38c1443460356a268932a967b79e9
pull request also contains) then even though the WebSocket session gets closed
followed by re-throwing of the caught exception, does it mean that the
WsHttpUpgradeHandler object associated to the closed WebSocket session should
eventually be purged from the above mentioned waitingProcessors set too?
In other words, under which conditions should a Processor be removed from
waitingProcessors?

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


[Bug 64848] WsSession objects in OUTPUT_CLOSED state are implicitly held by waitingProcessors and GC cannot purge them from the JVM heap

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=64848

--- Comment #18 from Matt M <in...@gmail.com> ---
After a recent upgrade from 9.0.63 to 9.0.75 (with the 2 websocket fixes from
9.0.77 patched in) I am seeing some occurrences of the WsSession Object stuck
on the heap with state OUTPUT_CLOSED in WsHttpUpgradeHandler under
Http11Nio2Protocol.waitingProcessors.

We are in the process of identifying the steps to reproduce but these
occurrences have been rare so far.

Looking at the more recent reworking of WsSession isOpen / isClosed and
surrounding logic from BZ66574 I am wondering if it makes sense for
OUTPUT_CLOSED to also be checked in the isClosed call?   Without a reproducible
test case in place it is difficult to determine via static code analysis, but I
wanted to mention that we are seeing this after our recent upgrade and did not
see it in 9.0.63.

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


[Bug 64848] WsSession objects in OUTPUT_CLOSED state are implicitly held by waitingProcessors and GC cannot purge them from the JVM heap

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=64848

--- Comment #10 from Laszlo Karolyi <la...@gmail.com> ---
I couldn't find any information about the publication date of 9.0.40 but it
would help us if we knew when it is expected as we would include it in our next
release. Please let me know if you happen to have any update on the next Tomcat
version's release date.

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


[Bug 64848] WsSession objects in OUTPUT_CLOSED state are implicitly held by waitingProcessors and GC cannot purge them from the JVM heap

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=64848

--- Comment #4 from Mark Thomas <ma...@apache.org> ---
I'm able to recreate this with the 3 provided Maven projects.

I ran the client on a local VM with VMware Workstation configured to limit the
network connection to 10Mbps with 20ms latency inbound and outbound.

Next step is to update to the latest Tomcat 9 code and see if the issue
persists.

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


[Bug 64848] WsSession objects in OUTPUT_CLOSED state are implicitly held by waitingProcessors and GC cannot purge them from the JVM heap

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=64848

--- Comment #2 from Mark Thomas <ma...@apache.org> ---
The other thing to check is whether you have multiple threads writing to the
same WebSocket session concurrently. That is not supported by the Java
WebSocket API and could lead to the issue you describe.

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


[Bug 64848] WsSession objects in OUTPUT_CLOSED state are implicitly held by waitingProcessors and GC cannot purge them from the JVM heap

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=64848

Mark Thomas <ma...@apache.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 OS|                            |All
             Status|NEW                         |NEEDINFO

--- Comment #1 from Mark Thomas <ma...@apache.org> ---
Please re-test with 9.0.39.

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


[Bug 64848] WsSession objects in OUTPUT_CLOSED state are implicitly held by waitingProcessors and GC cannot purge them from the JVM heap

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=64848

--- Comment #16 from Christopher Schultz <ch...@christopherschultz.net> ---
(In reply to Vishwanath Karra from comment #15)
> We are seeing this issue with 8.5.72 - is there some other patch needed for
> the 8.5 codeline ? The fix is confirmed for 9.0.40.

Are you sure it's the same issue? Please post all your relevant information
(i.e. actual evidence that you have a Websocket session memory leak) to the
user's mailing list to verify this is the same problem.

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org