You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tomcat.apache.org by Mark Thomas <ma...@apache.org> on 2019/12/03 21:56:18 UTC

Blocking IO does not appear to always be blocking with asyncIO

Hi,

I've been looking into BZ 63931. The error messages I was expecting to
see were not there.

The first odd thing I noticed was an NPE here:
https://github.com/apache/tomcat/blob/master/java/org/apache/tomcat/websocket/WsRemoteEndpointImplBase.java#L310

bsh.getSendResult() returned null. That shouldn't happen.

I added some debug logging and found that the call to writeMessagePart()
on the previous line was returning before the SendResult was populated.

Tracing the code for writeMessagePart() led me to
SocketWrapperBase.vectoredOperation()

The issue appears to be that state.start() at

https://github.com/apache/tomcat/blob/master/java/org/apache/tomcat/util/net/SocketWrapperBase.java#L1425

effectively always does non-blocking writes. If the write buffer is full
(which is the scenario I am testing) 0 bytes are written, the socket is
passed to the Poller and the call returns.

I hacked in the following code just after the state.start()

if (block == BlockingMode.BLOCK) {
    if (read) {
        readPending.acquireUninterruptibly();
        readPending.release();
    } else {
        writePending.acquireUninterruptibly();
        writePending.release();
    }
}

I'm sure that code is wrong. It was just intended as a quick hack to
force the call to block until the write had completed. It had the
desired effect and I started to see the error messages I expected.

I'm not familiar enough with the asyncIO code yet to know where the best
place to fix this is. Rémy, are you able to help?

Thanks,

Mark

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


Re: Blocking IO does not appear to always be blocking with asyncIO

Posted by Mark Thomas <ma...@apache.org>.
On 04/12/2019 14:20, Rémy Maucherat wrote:
> On Wed, Dec 4, 2019 at 12:44 PM Rémy Maucherat <remm@apache.org
> <ma...@apache.org>> wrote:
> 
>     On Wed, Dec 4, 2019 at 11:51 AM Mark Thomas <markt@apache.org
>     <ma...@apache.org>> wrote:
> 
>         On 04/12/2019 10:25, Rémy Maucherat wrote:
>         > On Wed, Dec 4, 2019 at 11:11 AM Mark Thomas
>         <mark@homeinbox.net <ma...@homeinbox.net>
>         > <mailto:mark@homeinbox.net <ma...@homeinbox.net>>> wrote:
>         >
>         >     On 04/12/2019 09:58, Mark Thomas wrote:
>         >     > On 04/12/2019 09:27, Mark Thomas wrote:
>         >     >> On 04/12/2019 07:55, Rémy Maucherat wrote:
>         >     >
>         >     > <snip/>
>         >     >
>         >     >>> Blocking is also used for HTTP/2 where it seems to
>         work. If the
>         >     write
>         >     >>> buffer is full, 0 bytes are written and the socket is
>         indeed
>         >     passed to
>         >     >>> the poller but until the operation is complete it is
>         supposed to be
>         >     >>> blocked here:
>         >     >>>
>         >   
>          https://github.com/apache/tomcat/blob/master/java/org/apache/tomcat/util/net/SocketWrapperBase.java#L1430
>         >     >>
>         >     >> That helps a lot. I can try and see why things aren't
>         blocking there.
>         >     >
>         >     > Some progress. The thread does wait there but it is getting
>         >     released too
>         >     > early. I'm guessing something is calling notify(). I'm
>         planning to add
>         >     > more debug to figure out what / why.
>         >
>         >     Found it.
>         >
>         >     The wait on line 1430 (see link above) waits for the
>         write/read timeout.
>         >     The problem was that this wait was timing out, allowing
>         the code to
>         >     continue before the Poller timed out a few milliseconds
>         later and called
>         >     the CompletionHander which ultimately sets the SendResult
>         the WebSocket
>         >     code is looking for.
>         >
>         >     I'm currently testing using:
>         >
>         >     state.wait()
>         >
>         >     rather than
>         >
>         >     state.wait(unit.toMillis(timeout));
>         >
>         >
>         > Well, it would remove the enforcement of the overall timeout
>         since the
>         > write could be made up of multiple operations. But I think I
>         get the
>         > idea that it's imperfect (once the IO is running, you cannot
>         stop it
>         > this way).
> 
>         Could we record the start time of the write (or similar) and
>         adjust the
>         timeout passed the Poller accordingly?
> 
> 
>     Yes, it's usually this kind of strategy, but it's complex :( I may
>     keep the wait(timeout) instead but make the processing of the failed
>     more robust.
> 
> 
> It seemed unreasonably complex for what it is and the breakage risk was
> really high too, so I preferred making the call of the "user" completion
> handler robust. Let me know if it doesn't work well enough but it seems
> ok with the test you gave (no null send result).

Thanks for the quick turnaround. All looks good to me.

> I will revisit this after the next build.

Ack.

Mark

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


Re: Blocking IO does not appear to always be blocking with asyncIO

Posted by Rémy Maucherat <re...@apache.org>.
On Wed, Dec 4, 2019 at 12:44 PM Rémy Maucherat <re...@apache.org> wrote:

> On Wed, Dec 4, 2019 at 11:51 AM Mark Thomas <ma...@apache.org> wrote:
>
>> On 04/12/2019 10:25, Rémy Maucherat wrote:
>> > On Wed, Dec 4, 2019 at 11:11 AM Mark Thomas <mark@homeinbox.net
>> > <ma...@homeinbox.net>> wrote:
>> >
>> >     On 04/12/2019 09:58, Mark Thomas wrote:
>> >     > On 04/12/2019 09:27, Mark Thomas wrote:
>> >     >> On 04/12/2019 07:55, Rémy Maucherat wrote:
>> >     >
>> >     > <snip/>
>> >     >
>> >     >>> Blocking is also used for HTTP/2 where it seems to work. If the
>> >     write
>> >     >>> buffer is full, 0 bytes are written and the socket is indeed
>> >     passed to
>> >     >>> the poller but until the operation is complete it is supposed
>> to be
>> >     >>> blocked here:
>> >     >>>
>> >
>> https://github.com/apache/tomcat/blob/master/java/org/apache/tomcat/util/net/SocketWrapperBase.java#L1430
>> >     >>
>> >     >> That helps a lot. I can try and see why things aren't blocking
>> there.
>> >     >
>> >     > Some progress. The thread does wait there but it is getting
>> >     released too
>> >     > early. I'm guessing something is calling notify(). I'm planning
>> to add
>> >     > more debug to figure out what / why.
>> >
>> >     Found it.
>> >
>> >     The wait on line 1430 (see link above) waits for the write/read
>> timeout.
>> >     The problem was that this wait was timing out, allowing the code to
>> >     continue before the Poller timed out a few milliseconds later and
>> called
>> >     the CompletionHander which ultimately sets the SendResult the
>> WebSocket
>> >     code is looking for.
>> >
>> >     I'm currently testing using:
>> >
>> >     state.wait()
>> >
>> >     rather than
>> >
>> >     state.wait(unit.toMillis(timeout));
>> >
>> >
>> > Well, it would remove the enforcement of the overall timeout since the
>> > write could be made up of multiple operations. But I think I get the
>> > idea that it's imperfect (once the IO is running, you cannot stop it
>> > this way).
>>
>> Could we record the start time of the write (or similar) and adjust the
>> timeout passed the Poller accordingly?
>>
>
> Yes, it's usually this kind of strategy, but it's complex :( I may keep
> the wait(timeout) instead but make the processing of the failed more robust.
>

It seemed unreasonably complex for what it is and the breakage risk was
really high too, so I preferred making the call of the "user" completion
handler robust. Let me know if it doesn't work well enough but it seems ok
with the test you gave (no null send result).

I will revisit this after the next build.

Rémy

Re: Blocking IO does not appear to always be blocking with asyncIO

Posted by Rémy Maucherat <re...@apache.org>.
On Wed, Dec 4, 2019 at 11:51 AM Mark Thomas <ma...@apache.org> wrote:

> On 04/12/2019 10:25, Rémy Maucherat wrote:
> > On Wed, Dec 4, 2019 at 11:11 AM Mark Thomas <mark@homeinbox.net
> > <ma...@homeinbox.net>> wrote:
> >
> >     On 04/12/2019 09:58, Mark Thomas wrote:
> >     > On 04/12/2019 09:27, Mark Thomas wrote:
> >     >> On 04/12/2019 07:55, Rémy Maucherat wrote:
> >     >
> >     > <snip/>
> >     >
> >     >>> Blocking is also used for HTTP/2 where it seems to work. If the
> >     write
> >     >>> buffer is full, 0 bytes are written and the socket is indeed
> >     passed to
> >     >>> the poller but until the operation is complete it is supposed to
> be
> >     >>> blocked here:
> >     >>>
> >
> https://github.com/apache/tomcat/blob/master/java/org/apache/tomcat/util/net/SocketWrapperBase.java#L1430
> >     >>
> >     >> That helps a lot. I can try and see why things aren't blocking
> there.
> >     >
> >     > Some progress. The thread does wait there but it is getting
> >     released too
> >     > early. I'm guessing something is calling notify(). I'm planning to
> add
> >     > more debug to figure out what / why.
> >
> >     Found it.
> >
> >     The wait on line 1430 (see link above) waits for the write/read
> timeout.
> >     The problem was that this wait was timing out, allowing the code to
> >     continue before the Poller timed out a few milliseconds later and
> called
> >     the CompletionHander which ultimately sets the SendResult the
> WebSocket
> >     code is looking for.
> >
> >     I'm currently testing using:
> >
> >     state.wait()
> >
> >     rather than
> >
> >     state.wait(unit.toMillis(timeout));
> >
> >
> > Well, it would remove the enforcement of the overall timeout since the
> > write could be made up of multiple operations. But I think I get the
> > idea that it's imperfect (once the IO is running, you cannot stop it
> > this way).
>
> Could we record the start time of the write (or similar) and adjust the
> timeout passed the Poller accordingly?
>

Yes, it's usually this kind of strategy, but it's complex :( I may keep the
wait(timeout) instead but make the processing of the failed more robust.

Rémy

Re: Blocking IO does not appear to always be blocking with asyncIO

Posted by Mark Thomas <ma...@apache.org>.
On 04/12/2019 10:25, Rémy Maucherat wrote:
> On Wed, Dec 4, 2019 at 11:11 AM Mark Thomas <mark@homeinbox.net
> <ma...@homeinbox.net>> wrote:
> 
>     On 04/12/2019 09:58, Mark Thomas wrote:
>     > On 04/12/2019 09:27, Mark Thomas wrote:
>     >> On 04/12/2019 07:55, Rémy Maucherat wrote:
>     >
>     > <snip/>
>     >
>     >>> Blocking is also used for HTTP/2 where it seems to work. If the
>     write
>     >>> buffer is full, 0 bytes are written and the socket is indeed
>     passed to
>     >>> the poller but until the operation is complete it is supposed to be
>     >>> blocked here:
>     >>>
>     https://github.com/apache/tomcat/blob/master/java/org/apache/tomcat/util/net/SocketWrapperBase.java#L1430
>     >>
>     >> That helps a lot. I can try and see why things aren't blocking there.
>     >
>     > Some progress. The thread does wait there but it is getting
>     released too
>     > early. I'm guessing something is calling notify(). I'm planning to add
>     > more debug to figure out what / why.
> 
>     Found it.
> 
>     The wait on line 1430 (see link above) waits for the write/read timeout.
>     The problem was that this wait was timing out, allowing the code to
>     continue before the Poller timed out a few milliseconds later and called
>     the CompletionHander which ultimately sets the SendResult the WebSocket
>     code is looking for.
> 
>     I'm currently testing using:
> 
>     state.wait()
> 
>     rather than
> 
>     state.wait(unit.toMillis(timeout));
> 
> 
> Well, it would remove the enforcement of the overall timeout since the
> write could be made up of multiple operations. But I think I get the
> idea that it's imperfect (once the IO is running, you cannot stop it
> this way).

Could we record the start time of the write (or similar) and adjust the
timeout passed the Poller accordingly?

Mark

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


Re: Blocking IO does not appear to always be blocking with asyncIO

Posted by Rémy Maucherat <re...@apache.org>.
On Wed, Dec 4, 2019 at 11:11 AM Mark Thomas <ma...@homeinbox.net> wrote:

> On 04/12/2019 09:58, Mark Thomas wrote:
> > On 04/12/2019 09:27, Mark Thomas wrote:
> >> On 04/12/2019 07:55, Rémy Maucherat wrote:
> >
> > <snip/>
> >
> >>> Blocking is also used for HTTP/2 where it seems to work. If the write
> >>> buffer is full, 0 bytes are written and the socket is indeed passed to
> >>> the poller but until the operation is complete it is supposed to be
> >>> blocked here:
> >>>
> https://github.com/apache/tomcat/blob/master/java/org/apache/tomcat/util/net/SocketWrapperBase.java#L1430
> >>
> >> That helps a lot. I can try and see why things aren't blocking there.
> >
> > Some progress. The thread does wait there but it is getting released too
> > early. I'm guessing something is calling notify(). I'm planning to add
> > more debug to figure out what / why.
>
> Found it.
>
> The wait on line 1430 (see link above) waits for the write/read timeout.
> The problem was that this wait was timing out, allowing the code to
> continue before the Poller timed out a few milliseconds later and called
> the CompletionHander which ultimately sets the SendResult the WebSocket
> code is looking for.
>
> I'm currently testing using:
>
> state.wait()
>
> rather than
>
> state.wait(unit.toMillis(timeout));
>

Well, it would remove the enforcement of the overall timeout since the
write could be made up of multiple operations. But I think I get the idea
that it's imperfect (once the IO is running, you cannot stop it this way).

Rémy

Re: Blocking IO does not appear to always be blocking with asyncIO

Posted by Mark Thomas <ma...@homeinbox.net>.
On 04/12/2019 09:58, Mark Thomas wrote:
> On 04/12/2019 09:27, Mark Thomas wrote:
>> On 04/12/2019 07:55, Rémy Maucherat wrote:
> 
> <snip/>
> 
>>> Blocking is also used for HTTP/2 where it seems to work. If the write
>>> buffer is full, 0 bytes are written and the socket is indeed passed to
>>> the poller but until the operation is complete it is supposed to be
>>> blocked here:
>>> https://github.com/apache/tomcat/blob/master/java/org/apache/tomcat/util/net/SocketWrapperBase.java#L1430
>>
>> That helps a lot. I can try and see why things aren't blocking there.
> 
> Some progress. The thread does wait there but it is getting released too
> early. I'm guessing something is calling notify(). I'm planning to add
> more debug to figure out what / why.

Found it.

The wait on line 1430 (see link above) waits for the write/read timeout.
The problem was that this wait was timing out, allowing the code to
continue before the Poller timed out a few milliseconds later and called
the CompletionHander which ultimately sets the SendResult the WebSocket
code is looking for.

I'm currently testing using:

state.wait()

rather than

state.wait(unit.toMillis(timeout));

Mark

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


Re: Blocking IO does not appear to always be blocking with asyncIO

Posted by Rémy Maucherat <re...@apache.org>.
On Wed, Dec 4, 2019 at 10:59 AM Mark Thomas <ma...@apache.org> wrote:

> On 04/12/2019 09:27, Mark Thomas wrote:
> > On 04/12/2019 07:55, Rémy Maucherat wrote:
>
> <snip/>
>
> >> Blocking is also used for HTTP/2 where it seems to work. If the write
> >> buffer is full, 0 bytes are written and the socket is indeed passed to
> >> the poller but until the operation is complete it is supposed to be
> >> blocked here:
> >>
> https://github.com/apache/tomcat/blob/master/java/org/apache/tomcat/util/net/SocketWrapperBase.java#L1430
> >
> > That helps a lot. I can try and see why things aren't blocking there.
>
> Some progress. The thread does wait there but it is getting released too
> early. I'm guessing something is calling notify(). I'm planning to add
> more debug to figure out what / why.
>

It works for me, but here the timeout for the write is only 5s for whatever
reason. So it really waits 5s and then throws an exception on the caller.
The "problem" is that the completion handler isn't called in that case (I
was afraid of concurrency issues there maybe).

Rémy

Re: Blocking IO does not appear to always be blocking with asyncIO

Posted by Mark Thomas <ma...@apache.org>.
On 04/12/2019 09:27, Mark Thomas wrote:
> On 04/12/2019 07:55, Rémy Maucherat wrote:

<snip/>

>> Blocking is also used for HTTP/2 where it seems to work. If the write
>> buffer is full, 0 bytes are written and the socket is indeed passed to
>> the poller but until the operation is complete it is supposed to be
>> blocked here:
>> https://github.com/apache/tomcat/blob/master/java/org/apache/tomcat/util/net/SocketWrapperBase.java#L1430
> 
> That helps a lot. I can try and see why things aren't blocking there.

Some progress. The thread does wait there but it is getting released too
early. I'm guessing something is calling notify(). I'm planning to add
more debug to figure out what / why.

Mark

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


Re: Blocking IO does not appear to always be blocking with asyncIO

Posted by Mark Thomas <ma...@apache.org>.
On 04/12/2019 07:55, Rémy Maucherat wrote:
> On Tue, Dec 3, 2019 at 10:56 PM Mark Thomas <markt@apache.org
> <ma...@apache.org>> wrote:
> 
>     Hi,
> 
>     I've been looking into BZ 63931. The error messages I was expecting to
>     see were not there.
> 
>     The first odd thing I noticed was an NPE here:
>     https://github.com/apache/tomcat/blob/master/java/org/apache/tomcat/websocket/WsRemoteEndpointImplBase.java#L310
> 
>     bsh.getSendResult() returned null. That shouldn't happen.
> 
>     I added some debug logging and found that the call to writeMessagePart()
>     on the previous line was returning before the SendResult was populated.
> 
>     Tracing the code for writeMessagePart() led me to
>     SocketWrapperBase.vectoredOperation()
> 
>     The issue appears to be that state.start() at
> 
>     https://github.com/apache/tomcat/blob/master/java/org/apache/tomcat/util/net/SocketWrapperBase.java#L1425
> 
>     effectively always does non-blocking writes. If the write buffer is full
>     (which is the scenario I am testing) 0 bytes are written, the socket is
>     passed to the Poller and the call returns.
> 
>     I hacked in the following code just after the state.start()
> 
>     if (block == BlockingMode.BLOCK) {
>         if (read) {
>             readPending.acquireUninterruptibly();
>             readPending.release();
>         } else {
>             writePending.acquireUninterruptibly();
>             writePending.release();
>         }
>     }
> 
>     I'm sure that code is wrong. It was just intended as a quick hack to
>     force the call to block until the write had completed. It had the
>     desired effect and I started to see the error messages I expected.
> 
>     I'm not familiar enough with the asyncIO code yet to know where the best
>     place to fix this is. Rémy, are you able to help?
> 
> 
> I'm not sure about the problem. Is there a test case ?

Sort of. I've been using the following:

1. Configure TestWebSocketFrameClient.testConnectToServerEndpoint() to
   run from an IDE in debug mode.

2. Put a break point in
   TesterMessageCountClient.BasicText.onMessage(String) so the client
   hangs and doesn't process any messages (causing the network buffers
   to fill up).

3. In WsRemoteEndpointImplBase.sendMessageBlock() in the loop that
   writes each of the message parts add this to the start of the loop:
   if (bsh.getSendResult() == null) {
       System.out.println("Prepare for NPE");
   }
   and put a breakpoint on the System.out. Execution should never reach
   that line but it does.

> Blocking is also used for HTTP/2 where it seems to work. If the write
> buffer is full, 0 bytes are written and the socket is indeed passed to
> the poller but until the operation is complete it is supposed to be
> blocked here:
> https://github.com/apache/tomcat/blob/master/java/org/apache/tomcat/util/net/SocketWrapperBase.java#L1430

That helps a lot. A san try and see why things aren't blocking there.

> Since I made the fix here for NIO (
> https://github.com/apache/tomcat/commit/f5f2b62670f972fc6a857788084e4352f2d4cd87
> ) I may need to do the same loop here however (when writing the buffer
> it is there):
> https://github.com/apache/tomcat/blob/master/java/org/apache/tomcat/util/net/NioEndpoint.java#L1493

I see you committed that. I'll see what difference it makes.

Mark

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


Re: Blocking IO does not appear to always be blocking with asyncIO

Posted by Rémy Maucherat <re...@apache.org>.
On Tue, Dec 3, 2019 at 10:56 PM Mark Thomas <ma...@apache.org> wrote:

> Hi,
>
> I've been looking into BZ 63931. The error messages I was expecting to
> see were not there.
>
> The first odd thing I noticed was an NPE here:
>
> https://github.com/apache/tomcat/blob/master/java/org/apache/tomcat/websocket/WsRemoteEndpointImplBase.java#L310
>
> bsh.getSendResult() returned null. That shouldn't happen.
>
> I added some debug logging and found that the call to writeMessagePart()
> on the previous line was returning before the SendResult was populated.
>
> Tracing the code for writeMessagePart() led me to
> SocketWrapperBase.vectoredOperation()
>
> The issue appears to be that state.start() at
>
>
> https://github.com/apache/tomcat/blob/master/java/org/apache/tomcat/util/net/SocketWrapperBase.java#L1425
>
> effectively always does non-blocking writes. If the write buffer is full
> (which is the scenario I am testing) 0 bytes are written, the socket is
> passed to the Poller and the call returns.
>
> I hacked in the following code just after the state.start()
>
> if (block == BlockingMode.BLOCK) {
>     if (read) {
>         readPending.acquireUninterruptibly();
>         readPending.release();
>     } else {
>         writePending.acquireUninterruptibly();
>         writePending.release();
>     }
> }
>
> I'm sure that code is wrong. It was just intended as a quick hack to
> force the call to block until the write had completed. It had the
> desired effect and I started to see the error messages I expected.
>
> I'm not familiar enough with the asyncIO code yet to know where the best
> place to fix this is. Rémy, are you able to help?
>

I'm not sure about the problem. Is there a test case ?

Blocking is also used for HTTP/2 where it seems to work. If the write
buffer is full, 0 bytes are written and the socket is indeed passed to the
poller but until the operation is complete it is supposed to be blocked
here:
https://github.com/apache/tomcat/blob/master/java/org/apache/tomcat/util/net/SocketWrapperBase.java#L1430

Since I made the fix here for NIO (
https://github.com/apache/tomcat/commit/f5f2b62670f972fc6a857788084e4352f2d4cd87
) I may need to do the same loop here however (when writing the buffer it
is there):
https://github.com/apache/tomcat/blob/master/java/org/apache/tomcat/util/net/NioEndpoint.java#L1493

Rémy