You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tomcat.apache.org by Mark Thomas <ma...@apache.org> on 2013/05/31 20:37:59 UTC

APR/native errors with non-blocking I/O

I'm consistently seeing errors with APR/native and
TestNonBlockingAPI.testNonBlockingWrite

This works on Windows but fails on Linux and OSX. Failures always occur
with Socket.sendbb(socket, offset, long)

The sequence of events is:
~100 successful writes (9000 bytes at a time)
write returns 120002 (EAGAIN)
add socket to poller
poller indicates socket is available for write
~80 successful writes (mostly 9000 bytes at a time but the odd 2 bytes
and some ~500)
write returns 120002 (EAGAIN)
add socket to poller
poller indicates socket is available for write
one write
write returns error code

On OSX the error code is -32
On Linux the error code is -104

I could really do with some hints here.

I've been debugging this for days. I've found and fixed various problems
but this underlying problem remains.

Mark

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


Re: APR/native errors with non-blocking I/O

Posted by Mark Thomas <ma...@apache.org>.
On 31/05/2013 21:42, Caldarale, Charles R wrote:
>> From: Rainer Jung [mailto:rainer.jung@kippdata.de] 
>> Subject: Re: APR/native errors with non-blocking I/O
> 
>> Compile and have fun.
> 
> Or we could talk about Mark's familiarity with C :-)

That will be an extremely short conversation ;)

>> IMHO we don't have that in the code to output text instead of cryptic
>> numbers because it isn't really available on all needed platforms. I
>> could be wrong though.
> 
> The strerror() API is part of the POSIX and C standards, so it should be there unless you're running on some cut-down embedded platform.  However, it is not thread-safe, so it might not be appropriate to include it in JNI code.

If possible that would be a help. On the other hand, I wouldn't have
fixed all the other problems in the non-blocking code if it hadn't been
for not realising these were pointing me at the client.

Mark


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


Re: APR/native errors with non-blocking I/O

Posted by Christopher Schultz <ch...@christopherschultz.net>.
Chuck,

On 5/31/13 5:46 PM, Caldarale, Charles R wrote:
>> From: Christopher Schultz [mailto:chris@christopherschultz.net] 
>> Subject: Re: APR/native errors with non-blocking I/O
> 
>> I'm pretty sure that sterror is thread safe: it should just return a
>> static char*.
> 
> Would that it were that simple.  Seriously, it's not thread safe; a
> second thread calling the API can overlay a prior thread's message.
> If it were thread safe, GNU wouldn't have bothered with the _r
> alternative.
> 
>> "For unknown error numbers, the strerror() function will return its 
>> result in a static buffer which may be overwritten by subsequent calls."
>> Hopefully, that means static and private to the thread, but I don't know.
> 
> Nope, it's static and global.

:(

>> Back to the GNU man page, it says: "strerror()  is specified by
>> POSIX.1-2001, C89, C99.  strerror_r() is specified by POSIX.1-2001.", so
>> we may be able to rely upon it.
> 
> Yes, the _r alternative is available on many (probably most, these
> days) platforms, but it's by no means universal.  (On another mailing
> list, we had someone still trying to use gcc 3.3...)

Can 'configure' figure out if strerror_r is available? I have no idea
how it works its magic.

-chris


RE: APR/native errors with non-blocking I/O

Posted by "Caldarale, Charles R" <Ch...@unisys.com>.
> From: Christopher Schultz [mailto:chris@christopherschultz.net] 
> Subject: Re: APR/native errors with non-blocking I/O

> I'm pretty sure that sterror is thread safe: it should just return a
> static char*.

Would that it were that simple.  Seriously, it's not thread safe; a second thread calling the API can overlay a prior thread's message.  If it were thread safe, GNU wouldn't have bothered with the _r alternative.

> "For unknown error numbers, the strerror() function will return its 
> result in a static buffer which may be overwritten by subsequent calls."
> Hopefully, that means static and private to the thread, but I don't know.

Nope, it's static and global.

> Back to the GNU man page, it says: "strerror()  is specified by
> POSIX.1-2001, C89, C99.  strerror_r() is specified by POSIX.1-2001.", so
> we may be able to rely upon it.

Yes, the _r alternative is available on many (probably most, these days) platforms, but it's by no means universal.  (On another mailing list, we had someone still trying to use gcc 3.3...)

 - Chuck


THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY MATERIAL and is thus for use only by the intended recipient. If you received this in error, please contact the sender and delete the e-mail and its attachments from all computers.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


Re: APR/native errors with non-blocking I/O

Posted by Christopher Schultz <ch...@christopherschultz.net>.
Chuck,

On 5/31/13 4:42 PM, Caldarale, Charles R wrote:
>> From: Rainer Jung [mailto:rainer.jung@kippdata.de] 
>> Subject: Re: APR/native errors with non-blocking I/O
> 
>> Compile and have fun.
> 
> Or we could talk about Mark's familiarity with C :-)
> 
>> IMHO we don't have that in the code to output text instead of cryptic
>> numbers because it isn't really available on all needed platforms. I
>> could be wrong though.
> 
> The strerror() API is part of the POSIX and C standards, so it should
> be there unless you're running on some cut-down embedded platform.
> However, it is not thread-safe, so it might not be appropriate to
> include it in JNI code.

I'm pretty sure that sterror is thread safe: it should just return a
static char*. The GNU man page for strerror() says that strerror_r is
just like strerror(), "but is thread safe". It doesn't specifically say
that strerror() is *not* thread-safe, but does imply it.

On my Mac, there is no mention of thread safety at all on that man page
(except this: "For unknown error numbers, the strerror() function will
return its result in a static buffer which may be overwritten by
subsequent calls." Hopefully, that means static and private to the
thread, but I don't know.)

Back to the GNU man page, it says: "strerror()  is specified by
POSIX.1-2001, C89, C99.  strerror_r() is specified by POSIX.1-2001.", so
we may be able to rely upon it.

-chris


RE: APR/native errors with non-blocking I/O

Posted by "Caldarale, Charles R" <Ch...@unisys.com>.
> From: Rainer Jung [mailto:rainer.jung@kippdata.de] 
> Subject: Re: APR/native errors with non-blocking I/O

> Compile and have fun.

Or we could talk about Mark's familiarity with C :-)

> IMHO we don't have that in the code to output text instead of cryptic
> numbers because it isn't really available on all needed platforms. I
> could be wrong though.

The strerror() API is part of the POSIX and C standards, so it should be there unless you're running on some cut-down embedded platform.  However, it is not thread-safe, so it might not be appropriate to include it in JNI code.

 - Chuck


THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY MATERIAL and is thus for use only by the intended recipient. If you received this in error, please contact the sender and delete the e-mail and its attachments from all computers.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


Re: APR/native errors with non-blocking I/O

Posted by Christopher Schultz <ch...@christopherschultz.net>.
Rainer,

On 5/31/13 4:35 PM, Rainer Jung wrote:
> On 31.05.2013 21:34, Mark Thomas wrote:
>> "Caldarale, Charles R" <Ch...@unisys.com> wrote:
>>
>>>> From: Mark Thomas [mailto:markt@apache.org] 
>>>> Subject: APR/native errors with non-blocking I/O
>>>
>>> Assuming these are negative errno values:
>>>
>>>> On OSX the error code is -32
>>>
>>> Broken pipe.
>>>
>>>> On Linux the error code is -104
>>>
>>> Connection reset by peer.
>>>
>>> Did the other end go away?
>>>
>>> Can you get a packet capture from both one end or the other?
>>
>> Thanks Chuck. Very helpful.
>>
>> The other end does hang up but it wasn't clear if that was the root cause or the result. The client reports invalid chunked encoding. I'll look into the client code.
>>
>> Where might I find a list of these error codes. My Google fu let me down.
> 
> First: the real numbers are the positive ones, so multiply all with -1.
> 
> The errno numbers are defined in
> 
> /usr/include/errno.h
> 
> and
> 
> /usr/include/sys/errno.h
> 
> at least on Linux. Most of them are not standardized, so can vary by
> platform.
> 
> Then there's strerror(3C) and perror(3C) (so "man strerror", "man perror").
> 
> Example:
> 
> #include <stdio.h>
> #include <string.h>
> int main() {
>     int n;
>     while(1) {
>         printf("Enter errno: ");
>         scanf("%d", &n);
>         printf("Error string for errno %d is: %s\n",
>                n, strerror(n));
>     }
> }
> 
> Compile and have fun.
> 
> IMHO we don't have that in the code to output text instead of cryptic
> numbers because it isn't really available on all needed platforms. I
> could be wrong though.

From the (Mac OS) man page:
"The perror() and strerror() functions conform to ISO/IEC 9899:1999
(``ISO C99'')".

Can we bet on C99?

Or we could create a macro that uses strerror() when available or just
returns the error code as a string when not available.

-chris


Re: APR/native errors with non-blocking I/O

Posted by Rainer Jung <ra...@kippdata.de>.
On 31.05.2013 21:34, Mark Thomas wrote:
> "Caldarale, Charles R" <Ch...@unisys.com> wrote:
> 
>>> From: Mark Thomas [mailto:markt@apache.org] 
>>> Subject: APR/native errors with non-blocking I/O
>>
>> Assuming these are negative errno values:
>>
>>> On OSX the error code is -32
>>
>> Broken pipe.
>>
>>> On Linux the error code is -104
>>
>> Connection reset by peer.
>>
>> Did the other end go away?
>>
>> Can you get a packet capture from both one end or the other?
> 
> Thanks Chuck. Very helpful.
> 
> The other end does hang up but it wasn't clear if that was the root cause or the result. The client reports invalid chunked encoding. I'll look into the client code.
> 
> Where might I find a list of these error codes. My Google fu let me down.

First: the real numbers are the positive ones, so multiply all with -1.

The errno numbers are defined in

/usr/include/errno.h

and

/usr/include/sys/errno.h

at least on Linux. Most of them are not standardized, so can vary by
platform.

Then there's strerror(3C) and perror(3C) (so "man strerror", "man perror").

Example:

#include <stdio.h>
#include <string.h>
int main() {
    int n;
    while(1) {
        printf("Enter errno: ");
        scanf("%d", &n);
        printf("Error string for errno %d is: %s\n",
               n, strerror(n));
    }
}

Compile and have fun.

IMHO we don't have that in the code to output text instead of cryptic
numbers because it isn't really available on all needed platforms. I
could be wrong though.

Regards,

Rainer

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


RE: APR/native errors with non-blocking I/O

Posted by "Caldarale, Charles R" <Ch...@unisys.com>.
> From: Mark Thomas [mailto:markt@apache.org] 
> Subject: RE: APR/native errors with non-blocking I/O

> Where might I find a list of these error codes.

It's a bit old, but still pretty accurate:
http://www.ioplex.com/~miallen/errcmpp.html

 - Chuck


THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY MATERIAL and is thus for use only by the intended recipient. If you received this in error, please contact the sender and delete the e-mail and its attachments from all computers.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


Re: APR/native errors with non-blocking I/O

Posted by Rainer Jung <ra...@kippdata.de>.
On 03.06.2013 14:54, Caldarale, Charles R wrote:
>> From: Mark Thomas [mailto:markt@apache.org] 
>> Subject: Re: APR/native errors with non-blocking I/O
> 
>> I'm thinking something along the lines of the following:
> 
>> if (ss == APR_SUCCESS)
>>     return (jint)sent;
>> else if ((APR_STATUS_IS_EAGAIN(ss) || ss == TCN_EAGAIN)  && sent > 0) {
>>     return (jint)sent;
>> } else {
>>     TCN_ERROR_WRAP(ss);
>>     return -(jint)ss;
>> }
> 
> Looks reasonable - just get rid of the else keywords, since each condition ends with a return.

+1

I guess this is for sendb and also sendbb?

Regards,

Rainer


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


RE: APR/native errors with non-blocking I/O

Posted by "Caldarale, Charles R" <Ch...@unisys.com>.
> From: Mark Thomas [mailto:markt@apache.org] 
> Subject: Re: APR/native errors with non-blocking I/O

> I'm thinking something along the lines of the following:

> if (ss == APR_SUCCESS)
>     return (jint)sent;
> else if ((APR_STATUS_IS_EAGAIN(ss) || ss == TCN_EAGAIN)  && sent > 0) {
>     return (jint)sent;
> } else {
>     TCN_ERROR_WRAP(ss);
>     return -(jint)ss;
> }

Looks reasonable - just get rid of the else keywords, since each condition ends with a return.

 - Chuck


THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY MATERIAL and is thus for use only by the intended recipient. If you received this in error, please contact the sender and delete the e-mail and its attachments from all computers.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


Re: APR/native errors with non-blocking I/O

Posted by Mark Thomas <ma...@apache.org>.
On 03/06/2013 10:10, Mark Thomas wrote:
> My next step is to look more closely at the server code (the issue is
> sensitive to timing so it can be tricky to add debug code and still see
> the issue) to figure out if I am misusing the API or if there might be
> an APR/native bug at the root of this.

I've finally figured out what is going on. The  non-blocking write test
is more likely to hit this bug because it uses a slow client so the
buffers all fill up. In theory, any of the non-blocking code could hit this.

The problem is actually quite simple. When the buffers are almost full
the next write triggers an APR_STATUS_IS_EAGAIN. However, some of the
requested data will have been written. How much data is not available
through the current API. The Tomcat code assumes no data was written and
hence ends up duplicating part of the previous write.

Thoughts on how to extend the tc native API?

I'm thinking something along the lines of the following:

if (ss == APR_SUCCESS)
    return (jint)sent;
else if ((APR_STATUS_IS_EAGAIN(ss) || ss == TCN_EAGAIN)  && sent > 0) {
    return (jint)sent;
} else {
    TCN_ERROR_WRAP(ss);
    return -(jint)ss;
}

to fix the immediate problem.

Mark


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


Re: APR/native errors with non-blocking I/O

Posted by Mark Thomas <ma...@apache.org>.
On 31/05/2013 20:34, Mark Thomas wrote:

> The other end does hang up but it wasn't clear if that was the root
> cause or the result. The client reports invalid chunked encoding.
> I'll look into the client code.

I made some progress with this over the weekend. I'm still not sure
where the problem is but I have a clearer idea of what is happening.

I've modified the test to send a sequence of "0123456789ABCDEF0123..."
rather than "XXX..." so it is easier to spot when / if the data is
corrupted.

I've also switched the client to a simple socket based client so I can
examine the bytes directly.

What this shows is that towards the end of the 5MB, the client receives
a chunk that is long than it should be and the chunk shows corruption in
that the expected sequence is broken.

The total bytes the server thinks it sent (including headers, chunking
overhead etc.) does not agree with the total bytes received by the client.

My next step is to look more closely at the server code (the issue is
sensitive to timing so it can be tricky to add debug code and still see
the issue) to figure out if I am misusing the API or if there might be
an APR/native bug at the root of this.

Mark

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


RE: APR/native errors with non-blocking I/O

Posted by Mark Thomas <ma...@apache.org>.
"Caldarale, Charles R" <Ch...@unisys.com> wrote:

>> From: Mark Thomas [mailto:markt@apache.org] 
>> Subject: APR/native errors with non-blocking I/O
>
>Assuming these are negative errno values:
>
>> On OSX the error code is -32
>
>Broken pipe.
>
>> On Linux the error code is -104
>
>Connection reset by peer.
>
>Did the other end go away?
>
>Can you get a packet capture from both one end or the other?

Thanks Chuck. Very helpful.

The other end does hang up but it wasn't clear if that was the root cause or the result. The client reports invalid chunked encoding. I'll look into the client code.

Where might I find a list of these error codes. My Google fu let me down.

Mark


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


RE: APR/native errors with non-blocking I/O

Posted by "Caldarale, Charles R" <Ch...@unisys.com>.
> From: Mark Thomas [mailto:markt@apache.org] 
> Subject: APR/native errors with non-blocking I/O

Assuming these are negative errno values:

> On OSX the error code is -32

Broken pipe.

> On Linux the error code is -104

Connection reset by peer.

Did the other end go away?

Can you get a packet capture from both one end or the other?

 - Chuck


THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY MATERIAL and is thus for use only by the intended recipient. If you received this in error, please contact the sender and delete the e-mail and its attachments from all computers.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org