You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@river.apache.org by pe...@apache.org on 2012/11/22 07:45:51 UTC

svn commit: r1412436 - /river/jtsk/trunk/qa/src/com/sun/jini/qa/harness/MasterHarness.java

Author: peter_firmstone
Date: Thu Nov 22 06:45:50 2012
New Revision: 1412436

URL: http://svn.apache.org/viewvc?rev=1412436&view=rev
Log:
Attempt to fix socket issue on FreeBSD

Modified:
    river/jtsk/trunk/qa/src/com/sun/jini/qa/harness/MasterHarness.java

Modified: river/jtsk/trunk/qa/src/com/sun/jini/qa/harness/MasterHarness.java
URL: http://svn.apache.org/viewvc/river/jtsk/trunk/qa/src/com/sun/jini/qa/harness/MasterHarness.java?rev=1412436&r1=1412435&r2=1412436&view=diff
==============================================================================
--- river/jtsk/trunk/qa/src/com/sun/jini/qa/harness/MasterHarness.java (original)
+++ river/jtsk/trunk/qa/src/com/sun/jini/qa/harness/MasterHarness.java Thu Nov 22 06:45:50 2012
@@ -51,8 +51,10 @@ import java.util.TreeSet;
 import java.util.jar.JarFile;
 import java.util.zip.ZipEntry;
 import java.lang.reflect.Field;
+import java.net.BindException;
 import java.net.InetAddress;
 import java.net.InetSocketAddress;
+import java.net.Socket;
 import java.net.SocketAddress;
 
 //Should there be an 'AbortTestRequest' ?
@@ -249,16 +251,39 @@ class MasterHarness {
     private class KeepAlivePort implements Runnable {
 
 	public void run() {
-	    ArrayList socketList = new ArrayList(); // keep references
+	    ArrayList<Socket> socketList = new ArrayList<Socket>(); // keep references
+            SocketAddress add = new InetSocketAddress(KEEPALIVE_PORT);
 	    try {
-                SocketAddress add = new InetSocketAddress(KEEPALIVE_PORT);
+                
 		ServerSocket socket = new ServerSocket();
-//                if (!socket.getReuseAddress()) socket.setReuseAddress(true);
                 socket.bind(add);
 		while (true) {
 		    socketList.add(socket.accept());
 		}
-	    } catch (Exception e) {
+	    } catch (BindException e){
+                try {
+                        Thread.sleep(240000); // Wait 4 minutes for TCP 2MSL TIME_WAIT
+                        ServerSocket socket = new ServerSocket();
+                        socket.bind(add);
+                        while (true) {
+                            socketList.add(socket.accept());
+                        }
+                } catch (InterruptedException ex){
+                    outStream.println("Interruped while opening ServerSocket with KEEPALIVE_PORT:" + KEEPALIVE_PORT );
+                    outStream.println("Unexpected exception after waiting 4 minutes for port to become available:\n");
+                    ex.printStackTrace(outStream);
+                    outStream.println("Initial attempt failed:\n");
+                    e.printStackTrace(outStream);
+                    System.exit(1);
+                }catch (Exception ex){
+                    outStream.println("Error occurred while attempting to open ServerSocket with KEEPALIVE_PORT:" + KEEPALIVE_PORT );
+                    outStream.println("Unexpected exception after waiting 4 minutes for port to become available:\n");
+                    ex.printStackTrace(outStream);
+                    outStream.println("Initial attempt failed:\n");
+                    e.printStackTrace(outStream);
+                    System.exit(1);
+                }
+            }catch (Exception e) {
 		outStream.println("Problem with KEEPALIVE_PORT:" + KEEPALIVE_PORT );
 		outStream.println("Unexpected exception:");
 		e.printStackTrace(outStream);



Re: svn commit: r1412436 - /river/jtsk/trunk/qa/src/com/sun/jini/qa/harness/MasterHarness.java

Posted by Peter Firmstone <ji...@zeus.net.au>.
Only freebsd openjdk is suffering socket issues now.

I'll be cleaning up and removing hacks where relevant shortly.

I'm waiting for the Ubuntu openjdk test run to complete, that should 
help determine if it's freebsd or openjdk at issue, it could be those 
ports just aren't available.

A netstat output at the point when failure occurs would be very helpful 
and any help is much appreciated.

Regards,

Peter.





On 22/11/2012 7:39 PM, Simon IJskes - QCG wrote:
> On 22-11-12 10:18, Peter Firmstone wrote:
>> I tried SO_REUSEADDR on an earlier attempt, that didn't work either,
>> that was a hack too.
>
> In general, i do not consider SO_REUSEADDR a hack. It is a perfectly 
> permissable construct for servers.
>
>> The real fix will is to have the client close the port, rather than the
>> server, but since I don't have direct access to this box, I can't really
>> tell if that's the actual problem or if those ports aren't available at
>> all.
>
> You cannot dictate the behaviour of a client. So any solution needs to 
> be robust enough, to behave correctly independent of the client. TCP 
> is such a solution. There are problems with the TCP protocol, 
> exploited by malicious parties, but they are not manifesting themselfs 
> in the test environment.
>
> So we could have a number of possible causes:
> - incorrect assumptions or bugs in the java program.
> - bugs in the java VM Socket implementation.
> - bugs in the TCP stack.
>
> There are a number of instances where an interrupt is not triggered in 
> some system calls. Therefore a plausible cause is ServerSockets that 
> are not really interrupted. Or not closed by java instances of 
> ourselfs not correctly terminated.
>
> You could try to make a class, that reports with the use of the 
> 'netstat' program which ports are in use and what status they have, to 
> be triggered at the problem points.
>
> I can help you with that, but only if you stop making those 'just try' 
> patches, and with improved communication (improved in quality, not 
> verbosity).
>
> Gr. Simon
>


Re: svn commit: r1412436 - /river/jtsk/trunk/qa/src/com/sun/jini/qa/harness/MasterHarness.java

Posted by Simon IJskes - QCG <si...@qcg.nl>.
On 22-11-12 10:18, Peter Firmstone wrote:
> I tried SO_REUSEADDR on an earlier attempt, that didn't work either,
> that was a hack too.

In general, i do not consider SO_REUSEADDR a hack. It is a perfectly 
permissable construct for servers.

> The real fix will is to have the client close the port, rather than the
> server, but since I don't have direct access to this box, I can't really
> tell if that's the actual problem or if those ports aren't available at
> all.

You cannot dictate the behaviour of a client. So any solution needs to 
be robust enough, to behave correctly independent of the client. TCP is 
such a solution. There are problems with the TCP protocol, exploited by 
malicious parties, but they are not manifesting themselfs in the test 
environment.

So we could have a number of possible causes:
- incorrect assumptions or bugs in the java program.
- bugs in the java VM Socket implementation.
- bugs in the TCP stack.

There are a number of instances where an interrupt is not triggered in 
some system calls. Therefore a plausible cause is ServerSockets that are 
not really interrupted. Or not closed by java instances of ourselfs not 
correctly terminated.

You could try to make a class, that reports with the use of the 
'netstat' program which ports are in use and what status they have, to 
be triggered at the problem points.

I can help you with that, but only if you stop making those 'just try' 
patches, and with improved communication (improved in quality, not 
verbosity).

Gr. Simon

-- 
QCG, Software voor het MKB, 071-5890970, http://www.qcg.nl
Quality Consultancy Group b.v., Leiderdorp, Kvk Den Haag: 28088397

Re: svn commit: r1412436 - /river/jtsk/trunk/qa/src/com/sun/jini/qa/harness/MasterHarness.java

Posted by Peter Firmstone <ji...@zeus.net.au>.
I tried SO_REUSEADDR on an earlier attempt, that didn't work either, 
that was a hack too.

The 4 minute retry is just a temporary hack to see if the port is 
released, it wasn't.

The real fix will is to have the client close the port, rather than the 
server, but since I don't have direct access to this box, I can't really 
tell if that's the actual problem or if those ports aren't available at all.

Peter.

On 22/11/2012 6:42 PM, Simon IJskes - QCG wrote:
> On 22-11-12 07:45, peter_firmstone@apache.org wrote:
>> +                try {
>> +                        Thread.sleep(240000); // Wait 4 minutes for 
>> TCP 2MSL TIME_WAIT
>
> Peter,
>
> could you please try to explain why you are removing SO_REUSEADDR and 
> introducing this wait-retry?
>
> Please speculate about the potential bug you think there is.
>
> Is it a desperate attempt? Or have you found a solid theory behind it?
>
> Gr. Simon
>


Failing tests

Posted by Peter Firmstone <ji...@zeus.net.au>.
The good news, is the arm platform is now down to only 11 failing tests 
(previously 294):

      [java] -----------------------------------------
      [java] com/sun/jini/test/spec/lookupdiscovery/MulticastMonitorStopReplace.td
      [java] Test Failed: Test Failed: com.sun.jini.qa.harness.TestException: discard failed -- waited 150 seconds (2 minutes) -- 3 discard event(s) expected, 2 discard event(s) received
      [java]
      [java] -----------------------------------------
      [java] com/sun/jini/test/spec/lookupdiscovery/MulticastMonitorTerminate.td
      [java] Test Failed: Test Failed: com.sun.jini.qa.harness.TestException: discard failed -- waited 150 seconds (2 minutes) -- 3 discard event(s) expected, 0 discard event(s) received
      [java] -----------------------------------------
      [java] com/sun/jini/test/impl/mercury/AdminIFShutdownTest.td
      [java] Test Failed: Test Failed: com.sun.jini.qa.harness.TestException: assertLocators returned true for bogus values
      [java]
      [java] -----------------------------------------
      [java] com/sun/jini/test/impl/mercury/AdminIFTest.td
      [java] Test Failed: Test Failed: com.sun.jini.qa.harness.TestException: assertLocators returned true for bogus values
      [java]  -----------------------------------------
      [java] com/sun/jini/test/impl/mercury/PullAdminIFShutdownTest.td
      [java] Test Failed: Test Failed: com.sun.jini.qa.harness.TestException: assertLocators returned true for bogus values
      [java]
      [java] -----------------------------------------
      [java] com/sun/jini/test/impl/mercury/PullAdminIFTest.td
      [java] Test Failed: Test Failed: com.sun.jini.qa.harness.TestException: assertLocators returned true for bogus values
      [java]  -----------------------------------------
      [java] com/sun/jini/test/spec/discoveryservice/event/DiscardUnreachable.td
      [java] Test Failed: Test Failed: com.sun.jini.qa.harness.TestException: discard failed -- waited 300 seconds (5 minutes) -- 3 registration(s) with lookup discovery service, 0 registrations with successful discards
      [java]  -----------------------------------------
      [java] com/sun/jini/test/spec/discoveryservice/event/MulticastMonitorStop.td
      [java] Test Failed: Test Failed: com.sun.jini.qa.harness.TestException: discard failed -- waited 300 seconds (5 minutes) -- 3 registration(s) with lookup discovery service, 0 registrations with successful discards
      [java]
      [java] -----------------------------------------
      [java] com/sun/jini/test/spec/discoveryservice/event/MulticastMonitorTerminate.td
      [java] Test Failed: Test Failed: com.sun.jini.qa.harness.TestException: discard failed -- waited 300 seconds (5 minutes) -- 3 registration(s) with lookup discovery service, 0 registrations with successful discards
      [java] -----------------------------------------
      [java] com/sun/jini/test/impl/mahalo/AdminIFShutdownTest.td
      [java] Test Failed: Test Failed: com.sun.jini.qa.harness.TestException: assertLocators returned true for bogus values
      [java]
      [java] -----------------------------------------
      [java] com/sun/jini/test/impl/mahalo/AdminIFTest.td
      [java] Test Failed: Test Failed: com.sun.jini.qa.harness.TestException: assertLocators returned true for bogus values
      [java]


I'm still waiting for other platforms to catch up to svn,

Ubuntu JDK 7 has 6 failing tests, but the good news is some of these 
have been failing randomly for some time on different platforms, now 
they're doing so more consistently, hopefully they'll also fail on my 
sparc, which has excellent debugging tools:

      [java] -----------------------------------------
      [java] com/sun/jini/test/spec/lookupdiscovery/MulticastMonitorAllChange.td
      [java] Test Failed: Test Failed: com.sun.jini.qa.harness.TestException: change failed -- waited 870 seconds (14 minutes) -- 3 change event(s) expected, 0 change event(s) received
      [java]
      [java] -----------------------------------------
      [java] com/sun/jini/test/spec/lookupdiscovery/MulticastMonitorStopReplace.td
      [java] Test Failed: Test Failed: com.sun.jini.qa.harness.TestException: discard failed -- waited 150 seconds (2 minutes) -- 3 discard event(s) expected, 2 discard event(s) received
      [java]
      [java] -----------------------------------------
      [java] com/sun/jini/test/spec/lookupdiscovery/MulticastMonitorTerminate.td
      [java] Test Failed: Test Failed: com.sun.jini.qa.harness.TestException: discard failed -- waited 150 seconds (2 minutes) -- 3 discard event(s) expected, 0 discard event(s) received
      [java]
      [java] -----------------------------------------
      [java] com/sun/jini/test/spec/discoveryservice/event/DiscardUnreachable.td
      [java] Test Failed: Test Failed: com.sun.jini.qa.harness.TestException: discard failed -- waited 300 seconds (5 minutes) -- 3 registration(s) with lookup discovery service, 0 registrations with successful discards
      [java]
      [java] -----------------------------------------
      [java] com/sun/jini/test/spec/discoveryservice/event/MulticastMonitorStop.td
      [java] Test Failed: Test Failed: com.sun.jini.qa.harness.TestException: discard failed -- waited 300 seconds (5 minutes) -- 3 registration(s) with lookup discovery service, 0 registrations with successful discards
      [java]
      [java] -----------------------------------------
      [java] com/sun/jini/test/spec/discoveryservice/event/MulticastMonitorTerminate.td
      [java] Test Failed: Test Failed: com.sun.jini.qa.harness.TestException: discard failed -- waited 300 seconds (5 minutes) -- 3 registration(s) with lookup discovery service, 0 registrations with successful discards
      [java]
      [java]


Cheers,

Peter.

Re: svn commit: r1412436 - /river/jtsk/trunk/qa/src/com/sun/jini/qa/harness/MasterHarness.java

Posted by Simon IJskes - QCG <si...@qcg.nl>.
On 22-11-12 14:41, Peter Firmstone wrote:
> On 22/11/2012 8:22 PM, Simon IJskes - QCG wrote:
>> On 22-11-12 11:16, Dan Creswell wrote:
>>> See, if it wasn't on trunk, the changes would be less of a big deal.
>>> It'd
>>> be natural to check one's work in (whether it be an in-progress test
>>> snapshot or otherwise), good discipline even.
>>
>> Yes. I would have branched trunk, copied a jenkins job, and
>> experimented on it. If something good would come out of it, it would
>> probably be small, and easily patchable on trunk.
>>
>
> I might remind you Sim, that recent evidence demonstrates you didn't
> make this choice.

Ok, if even so, intentional or not. We talk about things in retrospect. 
If you cannot endure critisism, i suggest that you find some other way 
to educate yourself.

If i may point out, your emails are way to verbose for me. If you want 
to enlist my help, to function as a team, i would suggest to entice 
others to follow your dreams by way of smaller steps, allowing others to 
chime in, and more important to allow others to have some influence.

All in good faith,

Gr. Simon

-- 
QCG, Software voor het MKB, 071-5890970, http://www.qcg.nl
Quality Consultancy Group b.v., Leiderdorp, Kvk Den Haag: 28088397

Re: svn commit: r1412436 - /river/jtsk/trunk/qa/src/com/sun/jini/qa/harness/MasterHarness.java

Posted by Peter Firmstone <ji...@zeus.net.au>.
On 22/11/2012 8:22 PM, Simon IJskes - QCG wrote:
> On 22-11-12 11:16, Dan Creswell wrote:
>> See, if it wasn't on trunk, the changes would be less of a big deal. 
>> It'd
>> be natural to check one's work in (whether it be an in-progress test
>> snapshot or otherwise), good discipline even.
>
> Yes. I would have branched trunk, copied a jenkins job, and 
> experimented on it. If something good would come out of it, it would 
> probably be small, and easily patchable on trunk.
>

I might remind you Sim, that recent evidence demonstrates you didn't 
make this choice.

I'll put this in context for you, I left trunk in a stable passing state 
in late August.  These tests from 16th of August through to 5th of 
September show that:

https://builds.apache.org/view/M-R/view/River/job/River-QA-windows/54/
https://builds.apache.org/view/M-R/view/River/job/River-QA-solaris/60/
https://builds.apache.org/view/M-R/view/River/job/River-QA-ubuntu-jdk6/130/
https://builds.apache.org/view/M-R/view/River/job/River-QA-ubuntu-jdk7/52/

This was also the first time that River has passed on the Windows 
platform, if you don't believe me, try running the qa tests on windows 
with the previous release.  As far as I was concerned, the codebase was 
almost ready for release, the release version needed incrementing and 
documentation updated.

But Sim, you developed in trunk without running the qa test suite, you 
must run these tests every time you make changes, that way you know 
which change caused the breakage.  Did you run the jtreg regression test 
suite?

This is what happened:

When I returned, I found most Jenkins tests disabled, then I ran them 
and found many failing tests.

https://builds.apache.org/view/M-R/view/River/job/River-QA-ubuntu-jdk7/55/consoleText

      [java] -----------------------------------------
      [java]
      [java] # of tests started   = 1411
      [java] # of tests completed = 1411
      [java] # of tests skipped   = 47
      [java] # of tests passed    = 1268
      [java] # of tests failed    = 143
      [java]
      [java] -----------------------------------------
      [java]
      [java]    Date finished:
      [java]       Mon Oct 29 18:30:08 UTC 2012
      [java]    Time elapsed:
      [java]       61854 seconds
      [java]
      [java] Java Result: 1



I have now fixed most failing tests, I have done so remotely without 
access to debugging tools and also without criticising any other 
developers, I've also just taken the opportunity to fix 283 failing 
tests on arm.

Apportioning blame doesn't fix problems.  I had made the mistake of 
developing directly in trunk myself some time back, so I refrained from 
dispensening criticism.

Would you like me to go on?

You have two options:

   1. Work as a team, lets get trunk stable again and ready for release.
   2. Continue the pissing contest.

The balls in your court.

Peter.


Re: svn commit: r1412436 - /river/jtsk/trunk/qa/src/com/sun/jini/qa/harness/MasterHarness.java

Posted by Simon IJskes - QCG <si...@qcg.nl>.
On 22-11-12 11:16, Dan Creswell wrote:
> See, if it wasn't on trunk, the changes would be less of a big deal. It'd
> be natural to check one's work in (whether it be an in-progress test
> snapshot or otherwise), good discipline even.

Yes. I would have branched trunk, copied a jenkins job, and experimented 
on it. If something good would come out of it, it would probably be 
small, and easily patchable on trunk.


-- 
QCG, Software voor het MKB, 071-5890970, http://www.qcg.nl
Quality Consultancy Group b.v., Leiderdorp, Kvk Den Haag: 28088397

Re: svn commit: r1412436 - /river/jtsk/trunk/qa/src/com/sun/jini/qa/harness/MasterHarness.java

Posted by Dan Creswell <da...@gmail.com>.
See, if it wasn't on trunk, the changes would be less of a big deal. It'd
be natural to check one's work in (whether it be an in-progress test
snapshot or otherwise), good discipline even.


On 22 November 2012 10:08, Simon IJskes - QCG <si...@qcg.nl> wrote:

> On 22-11-12 10:58, Dan Creswell wrote:
>
>> I have a different question:
>>
>> Is this really being done on trunk and, if so, why?
>>
>
> Indeed. I wouldn't have done this on trunk.
>
>
>
> --
> QCG, Software voor het MKB, 071-5890970, http://www.qcg.nl
> Quality Consultancy Group b.v., Leiderdorp, Kvk Den Haag: 28088397
>

Re: svn commit: r1412436 - /river/jtsk/trunk/qa/src/com/sun/jini/qa/harness/MasterHarness.java

Posted by Simon IJskes - QCG <si...@qcg.nl>.
On 22-11-12 10:58, Dan Creswell wrote:
> I have a different question:
>
> Is this really being done on trunk and, if so, why?

Indeed. I wouldn't have done this on trunk.


-- 
QCG, Software voor het MKB, 071-5890970, http://www.qcg.nl
Quality Consultancy Group b.v., Leiderdorp, Kvk Den Haag: 28088397

Re: svn commit: r1412436 - /river/jtsk/trunk/qa/src/com/sun/jini/qa/harness/MasterHarness.java

Posted by Dan Creswell <da...@gmail.com>.
I have a different question:

Is this really being done on trunk and, if so, why?

On 22 November 2012 08:42, Simon IJskes - QCG <si...@qcg.nl> wrote:

> On 22-11-12 07:45, peter_firmstone@apache.org wrote:
>
>> +                try {
>> +                        Thread.sleep(240000); // Wait 4 minutes for TCP
>> 2MSL TIME_WAIT
>>
>
> Peter,
>
> could you please try to explain why you are removing SO_REUSEADDR and
> introducing this wait-retry?
>
> Please speculate about the potential bug you think there is.
>
> Is it a desperate attempt? Or have you found a solid theory behind it?
>
> Gr. Simon
>
> --
> QCG, Software voor het MKB, 071-5890970, http://www.qcg.nl
> Quality Consultancy Group b.v., Leiderdorp, Kvk Den Haag: 28088397
>

Re: svn commit: r1412436 - /river/jtsk/trunk/qa/src/com/sun/jini/qa/harness/MasterHarness.java

Posted by Simon IJskes - QCG <si...@qcg.nl>.
On 22-11-12 07:45, peter_firmstone@apache.org wrote:
> +                try {
> +                        Thread.sleep(240000); // Wait 4 minutes for TCP 2MSL TIME_WAIT

Peter,

could you please try to explain why you are removing SO_REUSEADDR and 
introducing this wait-retry?

Please speculate about the potential bug you think there is.

Is it a desperate attempt? Or have you found a solid theory behind it?

Gr. Simon

-- 
QCG, Software voor het MKB, 071-5890970, http://www.qcg.nl
Quality Consultancy Group b.v., Leiderdorp, Kvk Den Haag: 28088397