You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tomcat.apache.org by ma...@apache.org on 2018/11/28 15:00:06 UTC

svn commit: r1847646 - /tomcat/trunk/java/org/apache/tomcat/util/net/Nio2Endpoint.java

Author: markt
Date: Wed Nov 28 15:00:06 2018
New Revision: 1847646

URL: http://svn.apache.org/viewvc?rev=1847646&view=rev
Log:
Fix possible cause of intermittent TestCoyoteOutputStream failures.

Modified:
    tomcat/trunk/java/org/apache/tomcat/util/net/Nio2Endpoint.java

Modified: tomcat/trunk/java/org/apache/tomcat/util/net/Nio2Endpoint.java
URL: http://svn.apache.org/viewvc/tomcat/trunk/java/org/apache/tomcat/util/net/Nio2Endpoint.java?rev=1847646&r1=1847645&r2=1847646&view=diff
==============================================================================
--- tomcat/trunk/java/org/apache/tomcat/util/net/Nio2Endpoint.java (original)
+++ tomcat/trunk/java/org/apache/tomcat/util/net/Nio2Endpoint.java Wed Nov 28 15:00:06 2018
@@ -1424,7 +1424,9 @@ public class Nio2Endpoint extends Abstra
         public boolean hasDataToWrite() {
             synchronized (writeCompletionHandler) {
                 return !socketBufferHandler.isWriteBufferEmpty() ||
-                        !nonBlockingWriteBuffer.isEmpty() || getError() != null;
+                        !nonBlockingWriteBuffer.isEmpty() ||
+                        writePending.availablePermits() == 0 ||
+                        getError() != null;
             }
         }
 



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


Re: svn commit: r1847646 - /tomcat/trunk/java/org/apache/tomcat/util/net/Nio2Endpoint.java

Posted by Rémy Maucherat <re...@apache.org>.
On Wed, Nov 28, 2018 at 4:52 PM Mark Thomas <ma...@apache.org> wrote:

> Indeed. I saw that and started to mentally draft a "Ignore my last"
> email while I scrolled to the end to see which test had failed.
>
> Looks really fixed to me now. Outstanding !

Rémy

Re: svn commit: r1847646 - /tomcat/trunk/java/org/apache/tomcat/util/net/Nio2Endpoint.java

Posted by Mark Thomas <ma...@apache.org>.
On 28/11/2018 15:30, Rémy Maucherat wrote:
> On Wed, Nov 28, 2018 at 4:20 PM Mark Thomas <ma...@apache.org> wrote:
> 
>> On 28/11/2018 15:00, markt@apache.org wrote:
>>> Author: markt
>>> Date: Wed Nov 28 15:00:06 2018
>>> New Revision: 1847646
>>>
>>> URL: http://svn.apache.org/viewvc?rev=1847646&view=rev
>>> Log:
>>> Fix possible cause of intermittent TestCoyoteOutputStream failures.
>>
>> I thought this would be worthy of a longer explanation than seemed
>> appropriate for a commit message.
>>
> 
> I feel bad for not thinking about it since it does sound quite logical.

I wouldn't feel too bad. I've lost track of the number of logical
explanations I have had for problems that I have had to throw away once
I had more data. To be honest, I'm expecting this to carry on failing
intermittently and that the root cause - when we find it - will turn out
to be my refactoring.

> BTW, the testsuite failed but it wasn't *that*. What a coincidence !

Indeed. I saw that and started to mentally draft a "Ignore my last"
email while I scrolled to the end to see which test had failed.

Mark


> 
> Rémy
> 
> 
>>
>> I have tried to recreate the issue locally without success. I was able
>> to recreate it occasionally running the tests on silvanus.a.o (the CI
>> machine that runs all our buildbot jobs).
>>
>> I captured a network trace that confirmed that this was a server side
>> bug. What I saw was a corrupted response. The headers and first chunk
>> were correct but rather than the 5 bytes of the end chunk I saw the
>> other 8187 (8192-5) bytes of the buffer. It was clear the buffer was
>> configured for write when it was being read.
>>
>> I then tried to figure out how this could happen with a view to
>> reproducing the issue.
>>
>> There were a lot of dead ends during which I noticed that the write
>> pattern varied when I added additional debug statements. I discovered
>> that, depending on timing, the NIO2 endpoint would sometimes use a
>> gathering write when performing a non-blocking flush.
>>
>> There is a non-blocking flush just before the switch back to blocking
>> I/O (after the dispatch to end the async component) and it looked to be
>> possible that the gathering write could still be in progress when the
>> following blocking write was performed. That in turn meant that one of
>> the buffers used by the gathering write could be modified during the
>> following blocking write.
>>
>> However, my current understanding of the code is that the gathering
>> write will have written all the data from the buffer that is used by the
>> following blocking write before that blocking write occurs. So I may
>> have missed the root cause completely. It depends a lot on the internal
>> workings of the AsynchronousSocketChannel.
>>
>> On balance, I decided to commit this fix as there does appear to be a
>> bug here. Hopefully, it is the root cause of the intermittent
>> TestCoyoteOutputStream failures. If it is, great. If not, I'll keep
>> looking.
>>
>> Mark
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
>> For additional commands, e-mail: dev-help@tomcat.apache.org
>>
>>
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


Re: svn commit: r1847646 - /tomcat/trunk/java/org/apache/tomcat/util/net/Nio2Endpoint.java

Posted by Rémy Maucherat <re...@apache.org>.
On Wed, Nov 28, 2018 at 4:20 PM Mark Thomas <ma...@apache.org> wrote:

> On 28/11/2018 15:00, markt@apache.org wrote:
> > Author: markt
> > Date: Wed Nov 28 15:00:06 2018
> > New Revision: 1847646
> >
> > URL: http://svn.apache.org/viewvc?rev=1847646&view=rev
> > Log:
> > Fix possible cause of intermittent TestCoyoteOutputStream failures.
>
> I thought this would be worthy of a longer explanation than seemed
> appropriate for a commit message.
>

I feel bad for not thinking about it since it does sound quite logical.
BTW, the testsuite failed but it wasn't *that*. What a coincidence !

Rémy


>
> I have tried to recreate the issue locally without success. I was able
> to recreate it occasionally running the tests on silvanus.a.o (the CI
> machine that runs all our buildbot jobs).
>
> I captured a network trace that confirmed that this was a server side
> bug. What I saw was a corrupted response. The headers and first chunk
> were correct but rather than the 5 bytes of the end chunk I saw the
> other 8187 (8192-5) bytes of the buffer. It was clear the buffer was
> configured for write when it was being read.
>
> I then tried to figure out how this could happen with a view to
> reproducing the issue.
>
> There were a lot of dead ends during which I noticed that the write
> pattern varied when I added additional debug statements. I discovered
> that, depending on timing, the NIO2 endpoint would sometimes use a
> gathering write when performing a non-blocking flush.
>
> There is a non-blocking flush just before the switch back to blocking
> I/O (after the dispatch to end the async component) and it looked to be
> possible that the gathering write could still be in progress when the
> following blocking write was performed. That in turn meant that one of
> the buffers used by the gathering write could be modified during the
> following blocking write.
>
> However, my current understanding of the code is that the gathering
> write will have written all the data from the buffer that is used by the
> following blocking write before that blocking write occurs. So I may
> have missed the root cause completely. It depends a lot on the internal
> workings of the AsynchronousSocketChannel.
>
> On balance, I decided to commit this fix as there does appear to be a
> bug here. Hopefully, it is the root cause of the intermittent
> TestCoyoteOutputStream failures. If it is, great. If not, I'll keep
> looking.
>
> Mark
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
> For additional commands, e-mail: dev-help@tomcat.apache.org
>
>

Re: svn commit: r1847646 - /tomcat/trunk/java/org/apache/tomcat/util/net/Nio2Endpoint.java

Posted by Mark Thomas <ma...@apache.org>.
On 28/11/2018 15:00, markt@apache.org wrote:
> Author: markt
> Date: Wed Nov 28 15:00:06 2018
> New Revision: 1847646
> 
> URL: http://svn.apache.org/viewvc?rev=1847646&view=rev
> Log:
> Fix possible cause of intermittent TestCoyoteOutputStream failures.

I thought this would be worthy of a longer explanation than seemed
appropriate for a commit message.

I have tried to recreate the issue locally without success. I was able
to recreate it occasionally running the tests on silvanus.a.o (the CI
machine that runs all our buildbot jobs).

I captured a network trace that confirmed that this was a server side
bug. What I saw was a corrupted response. The headers and first chunk
were correct but rather than the 5 bytes of the end chunk I saw the
other 8187 (8192-5) bytes of the buffer. It was clear the buffer was
configured for write when it was being read.

I then tried to figure out how this could happen with a view to
reproducing the issue.

There were a lot of dead ends during which I noticed that the write
pattern varied when I added additional debug statements. I discovered
that, depending on timing, the NIO2 endpoint would sometimes use a
gathering write when performing a non-blocking flush.

There is a non-blocking flush just before the switch back to blocking
I/O (after the dispatch to end the async component) and it looked to be
possible that the gathering write could still be in progress when the
following blocking write was performed. That in turn meant that one of
the buffers used by the gathering write could be modified during the
following blocking write.

However, my current understanding of the code is that the gathering
write will have written all the data from the buffer that is used by the
following blocking write before that blocking write occurs. So I may
have missed the root cause completely. It depends a lot on the internal
workings of the AsynchronousSocketChannel.

On balance, I decided to commit this fix as there does appear to be a
bug here. Hopefully, it is the root cause of the intermittent
TestCoyoteOutputStream failures. If it is, great. If not, I'll keep looking.

Mark



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org