You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@geronimo.apache.org by Aaron Mulder <am...@alumni.princeton.edu> on 2004/10/31 15:59:04 UTC

Weird build errors on SuSE 9.2

	FYI, I have upgraded my Linux machines to SuSE 9.2 (SuSE kernel
2.6.8-24.3).  On one of them, with JDK 1.4, during an m:rebuild-all, I got
a JDK crash in an NIO accept() in the unit tests in our remoting stuff.  
I got that a few times in a row, then a while later got a test failure
instead (with an address in use during bind) also in the remoting tests.  
I tried JDK 1.4.2_05 and _06 and both had the problem.  On another machine
with the same distro/kernel, JMXRemotingTest hangs (both sides of the asyc
conversation seem to be blocked on read).

	In any of these cases JDK 1.5 seems fine, so I'm guess there is
some sort of kernel compatibility problem with 1.4 NIO networking going
on.

Aaron

Re: Weird build errors on SuSE 9.2

Posted by Craig Johannsen <cj...@shaw.ca>.
Hi Aaron,

My environment is Mandrake Linux 9.1, kernel 2.4.2-0.13, libc-2.3.1, 651 
MHz Pentium III (Coppermine), and KDE shell. I tested your code with 
J2SDK 1.4.2_03-b02, mixed mode, and JDK 1.5.0-rc-b63, mixed mode, 
sharing. In each case the socket timeout didn't work using ssc.accept(). 
It would wait indefinately, just as the ServerSocketChannel.accept() API 
documentation says it will: " If this channel is in non-blocking mode 
then this method will immediately return null if there are no pending 
connections. Otherwise it will block indefinitely until a new connection 
is available or an I/O error occurs." The ServerSocket.accept() API 
documentation erroneously says: "Listens for a connection to be made to 
this socket and accepts it. The method blocks until a connection is 
made." However, at least it also says it may throw: 
"SocketTimeoutException 
<http://java.sun.com/j2se/1.4.2/docs/api/java/net/SocketTimeoutException.html> 
- if a timeout was previously set with setSoTimeout and the timeout has 
been reached." So, the behaviour is mostly following the documentation, 
though I would like to see ServerSocketChannel.accept() pay attention to 
the timeout setting. I can't think of any reason it shouldn't. Probably 
it's a bug, but maybe a bug in the spec, assuming the API docs are correct.

Unfortunately, if you use ServerSocket instead of ServerSocketChannel, 
you lose all those nifty ByteBuffer operations. So, maybe it is a big 
change and much less efficient to use input and output streams rather 
than ByteBuffers.

Cheers,
Craig


Aaron Mulder wrote:

>On Sun, 31 Oct 2004, Aaron Mulder wrote:
>  
>
>>	Right.  The question is, are we willing to change our code to 
>>accomodate a bug that, so far, only I am running into?  :)  I sent bug 
>>reports to Sun and SuSE, but I suspect they'll go to the great bit bucket 
>>in the sky, since SuSE 9.2 isn't a supported platform for Sun, and I doubt 
>>SuSE cares that much about JDK problems.
>>
>>	Though, I guess I should ask, how were you able to duplicate the 
>>problem?  Do you also have a SuSE 9.2 machine, or did you do it on some 
>>other platform/kernel?
>>
>>Thanks,
>>	Aaron
>>
>>On Sun, 31 Oct 2004, Craig Johannsen wrote:
>>    
>>
>>>Hi Aaron,
>>>
>>>"ssc.socket().accept()" throws a SocketTimeoutException, while 
>>>"ssc.accept()" does not. One would think that they would have identical 
>>>behaviour, but apparently not. Here's an example where the socket 
>>>timeout works with both JDK 1.4.2_03 and JDK 1.5.
>>>
>>>Cheers,
>>>Craig
>>>
>>>import java.io.IOException;
>>>import java.net.*;
>>>import java.nio.channels.*;
>>>
>>>// This code hangs on some Linux systems.
>>>public class Hang {
>>>public static void main(String[] args) {
>>>try {
>>>ServerSocketChannel ssc = ServerSocketChannel.open();
>>>ssc.configureBlocking(true);
>>>ssc.socket().bind(new InetSocketAddress(InetAddress.getLocalHost(), 
>>>2010),50);
>>>ssc.socket().setSoTimeout(5000);
>>>System.out.println("accepting...");
>>>ssc.socket().accept();
>>>} catch (IOException e) {
>>>e.printStackTrace();
>>>}
>>>System.out.println("Finished");
>>>}
>>>}
>>>
>>>      
>>>
>

Re: Weird build errors on SuSE 9.2

Posted by Aaron Mulder <am...@alumni.princeton.edu>.
	Alright, the saga continues.  Erin tried my test on SuSE 9.1, and
that has the accept problem too, so it seems that it's nothing new.

	If I run "maven -o clean && maven -o" from the modules/remoting 
directory, then the test runs fine.  If I write a little test case, 
including that bit where a separate thread interrupts the accept thread, 
it successfully wakes up the accept thread.

	And now if I run a "maven -o m:rebuild-all" from the top level 
directory, it works.

	I'm seriously confused.  What did I do in between?  I ran a
complete build by disabling tests in the remoting module (but they're now
enabled again and I see their output).  I suspended my laptop and ate
dinner (though I had tried suspending and rebooting before and it didn't
make a difference).  I wrote a bit.  I fiddled with test cases, outside of
my Geronimo tree.  I haven't done any online builds or updates, so I have
the same code I had before.  WTF?

	Perhaps some internet site was down then and up now, but it
doesn't look like the test cares. Is there any way this could be caused by
binding to 0.0.0.0 instead of 127.0.0.1?  Could that cause a network
lookup somehow?  Is there something that doesn't get cleaned by
m:rebuild-all?  Was this all a goose chase?  I'm at a loss.

	Anyway, it now seems that I'm not unique in getting the accept 
error (that is, SO_TIMEOUT not working).  So I suggest we switch from 
ServerSocketChannel to ServerSocket for our blocking I/O server.

Aaron

On Sun, 31 Oct 2004, Aaron Mulder wrote:
> 	Right.  The question is, are we willing to change our code to 
> accomodate a bug that, so far, only I am running into?  :)  I sent bug 
> reports to Sun and SuSE, but I suspect they'll go to the great bit bucket 
> in the sky, since SuSE 9.2 isn't a supported platform for Sun, and I doubt 
> SuSE cares that much about JDK problems.
> 
> 	Though, I guess I should ask, how were you able to duplicate the 
> problem?  Do you also have a SuSE 9.2 machine, or did you do it on some 
> other platform/kernel?
> 
> Thanks,
> 	Aaron
> 
> On Sun, 31 Oct 2004, Craig Johannsen wrote:
> > Hi Aaron,
> > 
> > "ssc.socket().accept()" throws a SocketTimeoutException, while 
> > "ssc.accept()" does not. One would think that they would have identical 
> > behaviour, but apparently not. Here's an example where the socket 
> > timeout works with both JDK 1.4.2_03 and JDK 1.5.
> > 
> > Cheers,
> > Craig
> > 
> > import java.io.IOException;
> > import java.net.*;
> > import java.nio.channels.*;
> > 
> > // This code hangs on some Linux systems.
> > public class Hang {
> > public static void main(String[] args) {
> > try {
> > ServerSocketChannel ssc = ServerSocketChannel.open();
> > ssc.configureBlocking(true);
> > ssc.socket().bind(new InetSocketAddress(InetAddress.getLocalHost(), 
> > 2010),50);
> > ssc.socket().setSoTimeout(5000);
> > System.out.println("accepting...");
> > ssc.socket().accept();
> > } catch (IOException e) {
> > e.printStackTrace();
> > }
> > System.out.println("Finished");
> > }
> > }
> > 
> > 
> > Aaron Mulder wrote:
> > 
> > >	Now I'm having trouble under 1.5 too, on my laptop.
> > >
> > >	Looking at a thread dump, it seems like the SO_TIMEOUT is ignored, 
> > >and also interrupting a thread in a ServerSocketChannel blocking accept 
> > >doesn't have any effect at all.
> > >
> > >Aaron
> > >
> > >-- This class hangs when run:
> > >
> > >public class Temp {
> > >    public static void main(String[] args) {
> > >        try {
> > >            ServerSocketChannel ssc  = ServerSocketChannel.open();
> > >            ssc.socket().bind(new InetSocketAddress(InetAddress.getLocalHost(), 2010),50);
> > >            ssc.socket().setSoTimeout(5000);
> > >            System.out.println("accepting...");
> > >            ssc.accept();
> > >        } catch (IOException e) {
> > >            e.printStackTrace();
> > >        }
> > >        System.out.println("Finished");
> > >    }
> > >}
> > >
> > >-- A similar test with only a ServerSocket runs fine.
> > >  
> > >
> > >
> > 
> > 
> 

Re: Weird build errors on SuSE 9.2

Posted by Aaron Mulder <am...@alumni.princeton.edu>.
	Right.  The question is, are we willing to change our code to 
accomodate a bug that, so far, only I am running into?  :)  I sent bug 
reports to Sun and SuSE, but I suspect they'll go to the great bit bucket 
in the sky, since SuSE 9.2 isn't a supported platform for Sun, and I doubt 
SuSE cares that much about JDK problems.

	Though, I guess I should ask, how were you able to duplicate the 
problem?  Do you also have a SuSE 9.2 machine, or did you do it on some 
other platform/kernel?

Thanks,
	Aaron

On Sun, 31 Oct 2004, Craig Johannsen wrote:
> Hi Aaron,
> 
> "ssc.socket().accept()" throws a SocketTimeoutException, while 
> "ssc.accept()" does not. One would think that they would have identical 
> behaviour, but apparently not. Here's an example where the socket 
> timeout works with both JDK 1.4.2_03 and JDK 1.5.
> 
> Cheers,
> Craig
> 
> import java.io.IOException;
> import java.net.*;
> import java.nio.channels.*;
> 
> // This code hangs on some Linux systems.
> public class Hang {
> public static void main(String[] args) {
> try {
> ServerSocketChannel ssc = ServerSocketChannel.open();
> ssc.configureBlocking(true);
> ssc.socket().bind(new InetSocketAddress(InetAddress.getLocalHost(), 
> 2010),50);
> ssc.socket().setSoTimeout(5000);
> System.out.println("accepting...");
> ssc.socket().accept();
> } catch (IOException e) {
> e.printStackTrace();
> }
> System.out.println("Finished");
> }
> }
> 
> 
> Aaron Mulder wrote:
> 
> >	Now I'm having trouble under 1.5 too, on my laptop.
> >
> >	Looking at a thread dump, it seems like the SO_TIMEOUT is ignored, 
> >and also interrupting a thread in a ServerSocketChannel blocking accept 
> >doesn't have any effect at all.
> >
> >Aaron
> >
> >-- This class hangs when run:
> >
> >public class Temp {
> >    public static void main(String[] args) {
> >        try {
> >            ServerSocketChannel ssc  = ServerSocketChannel.open();
> >            ssc.socket().bind(new InetSocketAddress(InetAddress.getLocalHost(), 2010),50);
> >            ssc.socket().setSoTimeout(5000);
> >            System.out.println("accepting...");
> >            ssc.accept();
> >        } catch (IOException e) {
> >            e.printStackTrace();
> >        }
> >        System.out.println("Finished");
> >    }
> >}
> >
> >-- A similar test with only a ServerSocket runs fine.
> >  
> >
> >
> 
> 

Re: Weird build errors on SuSE 9.2

Posted by Craig Johannsen <cj...@shaw.ca>.
Hi Aaron,

"ssc.socket().accept()" throws a SocketTimeoutException, while 
"ssc.accept()" does not. One would think that they would have identical 
behaviour, but apparently not. Here's an example where the socket 
timeout works with both JDK 1.4.2_03 and JDK 1.5.

Cheers,
Craig

import java.io.IOException;
import java.net.*;
import java.nio.channels.*;

// This code hangs on some Linux systems.
public class Hang {
public static void main(String[] args) {
try {
ServerSocketChannel ssc = ServerSocketChannel.open();
ssc.configureBlocking(true);
ssc.socket().bind(new InetSocketAddress(InetAddress.getLocalHost(), 
2010),50);
ssc.socket().setSoTimeout(5000);
System.out.println("accepting...");
ssc.socket().accept();
} catch (IOException e) {
e.printStackTrace();
}
System.out.println("Finished");
}
}


Aaron Mulder wrote:

>	Now I'm having trouble under 1.5 too, on my laptop.
>
>	Looking at a thread dump, it seems like the SO_TIMEOUT is ignored, 
>and also interrupting a thread in a ServerSocketChannel blocking accept 
>doesn't have any effect at all.
>
>Aaron
>
>-- This class hangs when run:
>
>public class Temp {
>    public static void main(String[] args) {
>        try {
>            ServerSocketChannel ssc  = ServerSocketChannel.open();
>            ssc.socket().bind(new InetSocketAddress(InetAddress.getLocalHost(), 2010),50);
>            ssc.socket().setSoTimeout(5000);
>            System.out.println("accepting...");
>            ssc.accept();
>        } catch (IOException e) {
>            e.printStackTrace();
>        }
>        System.out.println("Finished");
>    }
>}
>
>-- A similar test with only a ServerSocket runs fine.
>  
>
>


Re: Weird build errors on SuSE 9.2

Posted by Aaron Mulder <am...@alumni.princeton.edu>.
	Now I'm having trouble under 1.5 too, on my laptop.

	Looking at a thread dump, it seems like the SO_TIMEOUT is ignored, 
and also interrupting a thread in a ServerSocketChannel blocking accept 
doesn't have any effect at all.

Aaron

-- This class hangs when run:

public class Temp {
    public static void main(String[] args) {
        try {
            ServerSocketChannel ssc  = ServerSocketChannel.open();
            ssc.socket().bind(new InetSocketAddress(InetAddress.getLocalHost(), 2010),50);
            ssc.socket().setSoTimeout(5000);
            System.out.println("accepting...");
            ssc.accept();
        } catch (IOException e) {
            e.printStackTrace();
        }
        System.out.println("Finished");
    }
}

-- A similar test with only a ServerSocket runs fine.


On Sun, 31 Oct 2004, Jeremy Boynes wrote:
> Can you add these configs to the Known(not)Working page on the wiki.
> 
> Aaron Mulder wrote:
> > 	FYI, I have upgraded my Linux machines to SuSE 9.2 (SuSE kernel
> > 2.6.8-24.3).  On one of them, with JDK 1.4, during an m:rebuild-all, I got
> > a JDK crash in an NIO accept() in the unit tests in our remoting stuff.  
> > I got that a few times in a row, then a while later got a test failure
> > instead (with an address in use during bind) also in the remoting tests.  
> > I tried JDK 1.4.2_05 and _06 and both had the problem.  On another machine
> > with the same distro/kernel, JMXRemotingTest hangs (both sides of the asyc
> > conversation seem to be blocked on read).
> > 
> > 	In any of these cases JDK 1.5 seems fine, so I'm guess there is
> > some sort of kernel compatibility problem with 1.4 NIO networking going
> > on.
> > 
> > Aaron
> 
> 

Re: Weird build errors on SuSE 9.2

Posted by Jeremy Boynes <jb...@gluecode.com>.
Can you add these configs to the Known(not)Working page on the wiki.

Aaron Mulder wrote:
> 	FYI, I have upgraded my Linux machines to SuSE 9.2 (SuSE kernel
> 2.6.8-24.3).  On one of them, with JDK 1.4, during an m:rebuild-all, I got
> a JDK crash in an NIO accept() in the unit tests in our remoting stuff.  
> I got that a few times in a row, then a while later got a test failure
> instead (with an address in use during bind) also in the remoting tests.  
> I tried JDK 1.4.2_05 and _06 and both had the problem.  On another machine
> with the same distro/kernel, JMXRemotingTest hangs (both sides of the asyc
> conversation seem to be blocked on read).
> 
> 	In any of these cases JDK 1.5 seems fine, so I'm guess there is
> some sort of kernel compatibility problem with 1.4 NIO networking going
> on.
> 
> Aaron