You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@harmony.apache.org by Mark Hindess <ma...@googlemail.com> on 2008/12/11 09:54:42 UTC

[classlib] Network changes causing linux hang in HttpURLConnectionTest

When running 
org.apache.harmony.luni.tests.internal.net.www.protocol.http.HttpURLConnectionTest
I am seeing a hang in testConnectionPersistence method on linux (x86-64 and x86).
The system call trace shows:

22730 22:53:22.410114 poll([{fd=105, events=POLLIN|POLLPRI}], 1, 5000) = 1 ([{fd=105, revents=POLLIN}])
22730 22:53:22.410168 recvfrom(105, "<html></html>"..., 13, 0, NULL, NULL) = 13
22730 22:53:22.411624 ioctl(105, FIONREAD, [0]) = 0
22730 22:53:22.411687 poll([{fd=105, events=POLLIN|POLLPRI}], 1, 1) = 0 (Timeout)
22730 22:53:22.416590 recvfrom(105,  <unfinished ...>
22730 22:58:47.104906 <... recvfrom resumed> 0x7faa1435cf20, 1, 0, 0, 0) = ? ERESTARTSYS (To be restarted)
22730 22:58:47.105262 --- SIGINT (Interrupt) @ 0 (0) ---

So it looks like we are trying a (blocking) read even though the poll
timed out saying there was nothing to read.  I'm trying to isolate the
bad change but unfortunately there were quite a few other problems
(compile errors and unsatisfied link errors around the time of the
change that seems to break it).

-Mark.




Re: [classlib] Network changes causing linux hang in HttpURLConnectionTest

Posted by Tim Ellison <t....@gmail.com>.
Mark Hindess wrote:
> In message <49...@gmail.com>, Tim Ellison writes:
>> Mark Hindess wrote:
>>> In message <20...@d06av02.portsmouth.uk.ibm.com>,
>>> "Mark Hindess" writes:
>>>> When running 
>>>> org.apache.harmony.luni.tests...protocol.http.HttpURLConnectionTest
>>>> I am seeing a hang in testConnectionPersistence method on linux
>>>> (x86-64 and x86).
>>>> [snip]
>>> Ok.  It looks like there is a problem with the selectRead implementation on
>>> unix.  The use of this function in:
>>>
>>>   Java_org_apache_harmony_luni_platform_OSNetworkSystem_readDirect
>>>
>>> compares the result of the selectRead call using portlib constants.  This
>>> is valid for the windows implementation of selectRead - because it uses
>>> hysock_select.  However, the unix implementation uses poll which is returni
>> ng:
>>>   On success, a positive number is returned; [snip].  A value of 0
>>>   indicates that the call timed out and no file descriptors were ready.
>>>   On error, -1 is returned, and errno is set appropriately.
>>>
>>> I think the fix is:
>>>
>>> 1) Check for other uses of selectRead and make sure they all use portlib 
>>> constants.
>>>
>>> 2) Fix selectRead on unix to map the poll return codes to portlib constants
>> .
>>> I'll take a look at doing this.  Shout if you don't think this is a good
>>> approach.
>>
>> Having selectRead return different values on different platforms is not
>> a good idea (even though it is not a portlib function), so yes it should
>> be fixed.
>>
>> There is plenty of work left in tidying up and optimizing the networking
>> code.  Thanks for tracking this down.
> 
> It gets worse...
> 
> Fixing 1) is find because there are no other uses (except one commented out).
> 
> Fixing 2) is ugly because the findError function that converts system
> socket errors to portlib ones is static and thus not accessible.  I've worked
> around this by just returning either HYPORT_ERROR_SOCKET_TIMEOUT or the
> generic HYPORT_ERROR_SOCKET_OPFAILED which is sufficient.  See r725677.
> 
> I also notice that hysock_select in unix/hysock.c says:
> 
>   @return 0 if timeout, number of ready FDs, or otherwise return the (negative)
>           error code.
> 
> but in fact actually does:
> 
>   result = select(...);
>   ...
>   if (result) {
>     rc = result
>   } else {
>     rc = HYPORT_ERROR_SOCKET_TIMEOUT;
>   }
> 
> so I now wonder if any hysock_select callers are expecting the documented
> behaviour.  (I'm afraid to even look at the windows version.)

Windows also returns HYPORT_ERROR_SOCKET_TIMEOUT(-209) for the timeout,
so it appears to be a documentation bug.

Regards,
Tim

Re: [classlib] Network changes causing linux hang in HttpURLConnectionTest

Posted by Mark Hindess <ma...@googlemail.com>.
In message <49...@gmail.com>, Tim Ellison writes:
>
> Mark Hindess wrote:
> > In message <20...@d06av02.portsmouth.uk.ibm.com>,
> > "Mark Hindess" writes:
> >> When running 
> >> org.apache.harmony.luni.tests...protocol.http.HttpURLConnectionTest
> >> I am seeing a hang in testConnectionPersistence method on linux
> >> (x86-64 and x86).
> >> [snip]
> > 
> > Ok.  It looks like there is a problem with the selectRead implementation on
> > unix.  The use of this function in:
> > 
> >   Java_org_apache_harmony_luni_platform_OSNetworkSystem_readDirect
> > 
> > compares the result of the selectRead call using portlib constants.  This
> > is valid for the windows implementation of selectRead - because it uses
> > hysock_select.  However, the unix implementation uses poll which is returni
> ng:
> > 
> >   On success, a positive number is returned; [snip].  A value of 0
> >   indicates that the call timed out and no file descriptors were ready.
> >   On error, -1 is returned, and errno is set appropriately.
> > 
> > I think the fix is:
> > 
> > 1) Check for other uses of selectRead and make sure they all use portlib 
> > constants.
> > 
> > 2) Fix selectRead on unix to map the poll return codes to portlib constants
> .
> > 
> > I'll take a look at doing this.  Shout if you don't think this is a good
> > approach.
> 
> 
> Having selectRead return different values on different platforms is not
> a good idea (even though it is not a portlib function), so yes it should
> be fixed.
> 
> There is plenty of work left in tidying up and optimizing the networking
> code.  Thanks for tracking this down.

It gets worse...

Fixing 1) is find because there are no other uses (except one commented out).

Fixing 2) is ugly because the findError function that converts system
socket errors to portlib ones is static and thus not accessible.  I've worked
around this by just returning either HYPORT_ERROR_SOCKET_TIMEOUT or the
generic HYPORT_ERROR_SOCKET_OPFAILED which is sufficient.  See r725677.

I also notice that hysock_select in unix/hysock.c says:

  @return 0 if timeout, number of ready FDs, or otherwise return the (negative)
          error code.

but in fact actually does:

  result = select(...);
  ...
  if (result) {
    rc = result
  } else {
    rc = HYPORT_ERROR_SOCKET_TIMEOUT;
  }

so I now wonder if any hysock_select callers are expecting the documented
behaviour.  (I'm afraid to even look at the windows version.)

Regards,
 Mark.



Re: [classlib] Network changes causing linux hang in HttpURLConnectionTest

Posted by Tim Ellison <t....@gmail.com>.
Mark Hindess wrote:
> In message <20...@d06av02.portsmouth.uk.ibm.com>,
> "Mark Hindess" writes:
>> When running 
>> org.apache.harmony.luni.tests...protocol.http.HttpURLConnectionTest
>> I am seeing a hang in testConnectionPersistence method on linux
>> (x86-64 and x86).
>> [snip]
> 
> Ok.  It looks like there is a problem with the selectRead implementation on
> unix.  The use of this function in:
> 
>   Java_org_apache_harmony_luni_platform_OSNetworkSystem_readDirect
> 
> compares the result of the selectRead call using portlib constants.  This
> is valid for the windows implementation of selectRead - because it uses
> hysock_select.  However, the unix implementation uses poll which is returning:
> 
>   On success, a positive number is returned; [snip].  A value of 0
>   indicates that the call timed out and no file descriptors were ready.
>   On error, -1 is returned, and errno is set appropriately.
> 
> I think the fix is:
> 
> 1) Check for other uses of selectRead and make sure they all use portlib 
> constants.
> 
> 2) Fix selectRead on unix to map the poll return codes to portlib constants.
> 
> I'll take a look at doing this.  Shout if you don't think this is a good
> approach.


Having selectRead return different values on different platforms is not
a good idea (even though it is not a portlib function), so yes it should
be fixed.

There is plenty of work left in tidying up and optimizing the networking
code.  Thanks for tracking this down.

Regards,
Tim


Re: [classlib] Network changes causing linux hang in HttpURLConnectionTest

Posted by Mark Hindess <ma...@googlemail.com>.
In message <20...@d06av02.portsmouth.uk.ibm.com>,
"Mark Hindess" writes:
>
> When running 
> org.apache.harmony.luni.tests...protocol.http.HttpURLConnectionTest
> I am seeing a hang in testConnectionPersistence method on linux
> (x86-64 and x86).
> [snip]

Ok.  It looks like there is a problem with the selectRead implementation on
unix.  The use of this function in:

  Java_org_apache_harmony_luni_platform_OSNetworkSystem_readDirect

compares the result of the selectRead call using portlib constants.  This
is valid for the windows implementation of selectRead - because it uses
hysock_select.  However, the unix implementation uses poll which is returning:

  On success, a positive number is returned; [snip].  A value of 0
  indicates that the call timed out and no file descriptors were ready.
  On error, -1 is returned, and errno is set appropriately.

I think the fix is:

1) Check for other uses of selectRead and make sure they all use portlib 
constants.

2) Fix selectRead on unix to map the poll return codes to portlib constants.

I'll take a look at doing this.  Shout if you don't think this is a good
approach.

-Mark.