You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@river.apache.org by pe...@apache.org on 2012/11/22 07:45:51 UTC
svn commit: r1412436 -
/river/jtsk/trunk/qa/src/com/sun/jini/qa/harness/MasterHarness.java
Author: peter_firmstone
Date: Thu Nov 22 06:45:50 2012
New Revision: 1412436
URL: http://svn.apache.org/viewvc?rev=1412436&view=rev
Log:
Attempt to fix socket issue on FreeBSD
Modified:
river/jtsk/trunk/qa/src/com/sun/jini/qa/harness/MasterHarness.java
Modified: river/jtsk/trunk/qa/src/com/sun/jini/qa/harness/MasterHarness.java
URL: http://svn.apache.org/viewvc/river/jtsk/trunk/qa/src/com/sun/jini/qa/harness/MasterHarness.java?rev=1412436&r1=1412435&r2=1412436&view=diff
==============================================================================
--- river/jtsk/trunk/qa/src/com/sun/jini/qa/harness/MasterHarness.java (original)
+++ river/jtsk/trunk/qa/src/com/sun/jini/qa/harness/MasterHarness.java Thu Nov 22 06:45:50 2012
@@ -51,8 +51,10 @@ import java.util.TreeSet;
import java.util.jar.JarFile;
import java.util.zip.ZipEntry;
import java.lang.reflect.Field;
+import java.net.BindException;
import java.net.InetAddress;
import java.net.InetSocketAddress;
+import java.net.Socket;
import java.net.SocketAddress;
//Should there be an 'AbortTestRequest' ?
@@ -249,16 +251,39 @@ class MasterHarness {
private class KeepAlivePort implements Runnable {
public void run() {
- ArrayList socketList = new ArrayList(); // keep references
+ ArrayList<Socket> socketList = new ArrayList<Socket>(); // keep references
+ SocketAddress add = new InetSocketAddress(KEEPALIVE_PORT);
try {
- SocketAddress add = new InetSocketAddress(KEEPALIVE_PORT);
+
ServerSocket socket = new ServerSocket();
-// if (!socket.getReuseAddress()) socket.setReuseAddress(true);
socket.bind(add);
while (true) {
socketList.add(socket.accept());
}
- } catch (Exception e) {
+ } catch (BindException e){
+ try {
+ Thread.sleep(240000); // Wait 4 minutes for TCP 2MSL TIME_WAIT
+ ServerSocket socket = new ServerSocket();
+ socket.bind(add);
+ while (true) {
+ socketList.add(socket.accept());
+ }
+ } catch (InterruptedException ex){
+ outStream.println("Interruped while opening ServerSocket with KEEPALIVE_PORT:" + KEEPALIVE_PORT );
+ outStream.println("Unexpected exception after waiting 4 minutes for port to become available:\n");
+ ex.printStackTrace(outStream);
+ outStream.println("Initial attempt failed:\n");
+ e.printStackTrace(outStream);
+ System.exit(1);
+ }catch (Exception ex){
+ outStream.println("Error occurred while attempting to open ServerSocket with KEEPALIVE_PORT:" + KEEPALIVE_PORT );
+ outStream.println("Unexpected exception after waiting 4 minutes for port to become available:\n");
+ ex.printStackTrace(outStream);
+ outStream.println("Initial attempt failed:\n");
+ e.printStackTrace(outStream);
+ System.exit(1);
+ }
+ }catch (Exception e) {
outStream.println("Problem with KEEPALIVE_PORT:" + KEEPALIVE_PORT );
outStream.println("Unexpected exception:");
e.printStackTrace(outStream);
Re: svn commit: r1412436 - /river/jtsk/trunk/qa/src/com/sun/jini/qa/harness/MasterHarness.java
Posted by Peter Firmstone <ji...@zeus.net.au>.
Only freebsd openjdk is suffering socket issues now.
I'll be cleaning up and removing hacks where relevant shortly.
I'm waiting for the Ubuntu openjdk test run to complete, that should
help determine if it's freebsd or openjdk at issue, it could be those
ports just aren't available.
A netstat output at the point when failure occurs would be very helpful
and any help is much appreciated.
Regards,
Peter.
On 22/11/2012 7:39 PM, Simon IJskes - QCG wrote:
> On 22-11-12 10:18, Peter Firmstone wrote:
>> I tried SO_REUSEADDR on an earlier attempt, that didn't work either,
>> that was a hack too.
>
> In general, i do not consider SO_REUSEADDR a hack. It is a perfectly
> permissable construct for servers.
>
>> The real fix will is to have the client close the port, rather than the
>> server, but since I don't have direct access to this box, I can't really
>> tell if that's the actual problem or if those ports aren't available at
>> all.
>
> You cannot dictate the behaviour of a client. So any solution needs to
> be robust enough, to behave correctly independent of the client. TCP
> is such a solution. There are problems with the TCP protocol,
> exploited by malicious parties, but they are not manifesting themselfs
> in the test environment.
>
> So we could have a number of possible causes:
> - incorrect assumptions or bugs in the java program.
> - bugs in the java VM Socket implementation.
> - bugs in the TCP stack.
>
> There are a number of instances where an interrupt is not triggered in
> some system calls. Therefore a plausible cause is ServerSockets that
> are not really interrupted. Or not closed by java instances of
> ourselfs not correctly terminated.
>
> You could try to make a class, that reports with the use of the
> 'netstat' program which ports are in use and what status they have, to
> be triggered at the problem points.
>
> I can help you with that, but only if you stop making those 'just try'
> patches, and with improved communication (improved in quality, not
> verbosity).
>
> Gr. Simon
>
Re: svn commit: r1412436 - /river/jtsk/trunk/qa/src/com/sun/jini/qa/harness/MasterHarness.java
Posted by Simon IJskes - QCG <si...@qcg.nl>.
On 22-11-12 10:18, Peter Firmstone wrote:
> I tried SO_REUSEADDR on an earlier attempt, that didn't work either,
> that was a hack too.
In general, i do not consider SO_REUSEADDR a hack. It is a perfectly
permissable construct for servers.
> The real fix will is to have the client close the port, rather than the
> server, but since I don't have direct access to this box, I can't really
> tell if that's the actual problem or if those ports aren't available at
> all.
You cannot dictate the behaviour of a client. So any solution needs to
be robust enough, to behave correctly independent of the client. TCP is
such a solution. There are problems with the TCP protocol, exploited by
malicious parties, but they are not manifesting themselfs in the test
environment.
So we could have a number of possible causes:
- incorrect assumptions or bugs in the java program.
- bugs in the java VM Socket implementation.
- bugs in the TCP stack.
There are a number of instances where an interrupt is not triggered in
some system calls. Therefore a plausible cause is ServerSockets that are
not really interrupted. Or not closed by java instances of ourselfs not
correctly terminated.
You could try to make a class, that reports with the use of the
'netstat' program which ports are in use and what status they have, to
be triggered at the problem points.
I can help you with that, but only if you stop making those 'just try'
patches, and with improved communication (improved in quality, not
verbosity).
Gr. Simon
--
QCG, Software voor het MKB, 071-5890970, http://www.qcg.nl
Quality Consultancy Group b.v., Leiderdorp, Kvk Den Haag: 28088397
Re: svn commit: r1412436 - /river/jtsk/trunk/qa/src/com/sun/jini/qa/harness/MasterHarness.java
Posted by Peter Firmstone <ji...@zeus.net.au>.
I tried SO_REUSEADDR on an earlier attempt, that didn't work either,
that was a hack too.
The 4 minute retry is just a temporary hack to see if the port is
released, it wasn't.
The real fix will is to have the client close the port, rather than the
server, but since I don't have direct access to this box, I can't really
tell if that's the actual problem or if those ports aren't available at all.
Peter.
On 22/11/2012 6:42 PM, Simon IJskes - QCG wrote:
> On 22-11-12 07:45, peter_firmstone@apache.org wrote:
>> + try {
>> + Thread.sleep(240000); // Wait 4 minutes for
>> TCP 2MSL TIME_WAIT
>
> Peter,
>
> could you please try to explain why you are removing SO_REUSEADDR and
> introducing this wait-retry?
>
> Please speculate about the potential bug you think there is.
>
> Is it a desperate attempt? Or have you found a solid theory behind it?
>
> Gr. Simon
>
Failing tests
Posted by Peter Firmstone <ji...@zeus.net.au>.
The good news, is the arm platform is now down to only 11 failing tests
(previously 294):
[java] -----------------------------------------
[java] com/sun/jini/test/spec/lookupdiscovery/MulticastMonitorStopReplace.td
[java] Test Failed: Test Failed: com.sun.jini.qa.harness.TestException: discard failed -- waited 150 seconds (2 minutes) -- 3 discard event(s) expected, 2 discard event(s) received
[java]
[java] -----------------------------------------
[java] com/sun/jini/test/spec/lookupdiscovery/MulticastMonitorTerminate.td
[java] Test Failed: Test Failed: com.sun.jini.qa.harness.TestException: discard failed -- waited 150 seconds (2 minutes) -- 3 discard event(s) expected, 0 discard event(s) received
[java] -----------------------------------------
[java] com/sun/jini/test/impl/mercury/AdminIFShutdownTest.td
[java] Test Failed: Test Failed: com.sun.jini.qa.harness.TestException: assertLocators returned true for bogus values
[java]
[java] -----------------------------------------
[java] com/sun/jini/test/impl/mercury/AdminIFTest.td
[java] Test Failed: Test Failed: com.sun.jini.qa.harness.TestException: assertLocators returned true for bogus values
[java] -----------------------------------------
[java] com/sun/jini/test/impl/mercury/PullAdminIFShutdownTest.td
[java] Test Failed: Test Failed: com.sun.jini.qa.harness.TestException: assertLocators returned true for bogus values
[java]
[java] -----------------------------------------
[java] com/sun/jini/test/impl/mercury/PullAdminIFTest.td
[java] Test Failed: Test Failed: com.sun.jini.qa.harness.TestException: assertLocators returned true for bogus values
[java] -----------------------------------------
[java] com/sun/jini/test/spec/discoveryservice/event/DiscardUnreachable.td
[java] Test Failed: Test Failed: com.sun.jini.qa.harness.TestException: discard failed -- waited 300 seconds (5 minutes) -- 3 registration(s) with lookup discovery service, 0 registrations with successful discards
[java] -----------------------------------------
[java] com/sun/jini/test/spec/discoveryservice/event/MulticastMonitorStop.td
[java] Test Failed: Test Failed: com.sun.jini.qa.harness.TestException: discard failed -- waited 300 seconds (5 minutes) -- 3 registration(s) with lookup discovery service, 0 registrations with successful discards
[java]
[java] -----------------------------------------
[java] com/sun/jini/test/spec/discoveryservice/event/MulticastMonitorTerminate.td
[java] Test Failed: Test Failed: com.sun.jini.qa.harness.TestException: discard failed -- waited 300 seconds (5 minutes) -- 3 registration(s) with lookup discovery service, 0 registrations with successful discards
[java] -----------------------------------------
[java] com/sun/jini/test/impl/mahalo/AdminIFShutdownTest.td
[java] Test Failed: Test Failed: com.sun.jini.qa.harness.TestException: assertLocators returned true for bogus values
[java]
[java] -----------------------------------------
[java] com/sun/jini/test/impl/mahalo/AdminIFTest.td
[java] Test Failed: Test Failed: com.sun.jini.qa.harness.TestException: assertLocators returned true for bogus values
[java]
I'm still waiting for other platforms to catch up to svn,
Ubuntu JDK 7 has 6 failing tests, but the good news is some of these
have been failing randomly for some time on different platforms, now
they're doing so more consistently, hopefully they'll also fail on my
sparc, which has excellent debugging tools:
[java] -----------------------------------------
[java] com/sun/jini/test/spec/lookupdiscovery/MulticastMonitorAllChange.td
[java] Test Failed: Test Failed: com.sun.jini.qa.harness.TestException: change failed -- waited 870 seconds (14 minutes) -- 3 change event(s) expected, 0 change event(s) received
[java]
[java] -----------------------------------------
[java] com/sun/jini/test/spec/lookupdiscovery/MulticastMonitorStopReplace.td
[java] Test Failed: Test Failed: com.sun.jini.qa.harness.TestException: discard failed -- waited 150 seconds (2 minutes) -- 3 discard event(s) expected, 2 discard event(s) received
[java]
[java] -----------------------------------------
[java] com/sun/jini/test/spec/lookupdiscovery/MulticastMonitorTerminate.td
[java] Test Failed: Test Failed: com.sun.jini.qa.harness.TestException: discard failed -- waited 150 seconds (2 minutes) -- 3 discard event(s) expected, 0 discard event(s) received
[java]
[java] -----------------------------------------
[java] com/sun/jini/test/spec/discoveryservice/event/DiscardUnreachable.td
[java] Test Failed: Test Failed: com.sun.jini.qa.harness.TestException: discard failed -- waited 300 seconds (5 minutes) -- 3 registration(s) with lookup discovery service, 0 registrations with successful discards
[java]
[java] -----------------------------------------
[java] com/sun/jini/test/spec/discoveryservice/event/MulticastMonitorStop.td
[java] Test Failed: Test Failed: com.sun.jini.qa.harness.TestException: discard failed -- waited 300 seconds (5 minutes) -- 3 registration(s) with lookup discovery service, 0 registrations with successful discards
[java]
[java] -----------------------------------------
[java] com/sun/jini/test/spec/discoveryservice/event/MulticastMonitorTerminate.td
[java] Test Failed: Test Failed: com.sun.jini.qa.harness.TestException: discard failed -- waited 300 seconds (5 minutes) -- 3 registration(s) with lookup discovery service, 0 registrations with successful discards
[java]
[java]
Cheers,
Peter.
Re: svn commit: r1412436 - /river/jtsk/trunk/qa/src/com/sun/jini/qa/harness/MasterHarness.java
Posted by Simon IJskes - QCG <si...@qcg.nl>.
On 22-11-12 14:41, Peter Firmstone wrote:
> On 22/11/2012 8:22 PM, Simon IJskes - QCG wrote:
>> On 22-11-12 11:16, Dan Creswell wrote:
>>> See, if it wasn't on trunk, the changes would be less of a big deal.
>>> It'd
>>> be natural to check one's work in (whether it be an in-progress test
>>> snapshot or otherwise), good discipline even.
>>
>> Yes. I would have branched trunk, copied a jenkins job, and
>> experimented on it. If something good would come out of it, it would
>> probably be small, and easily patchable on trunk.
>>
>
> I might remind you Sim, that recent evidence demonstrates you didn't
> make this choice.
Ok, if even so, intentional or not. We talk about things in retrospect.
If you cannot endure critisism, i suggest that you find some other way
to educate yourself.
If i may point out, your emails are way to verbose for me. If you want
to enlist my help, to function as a team, i would suggest to entice
others to follow your dreams by way of smaller steps, allowing others to
chime in, and more important to allow others to have some influence.
All in good faith,
Gr. Simon
--
QCG, Software voor het MKB, 071-5890970, http://www.qcg.nl
Quality Consultancy Group b.v., Leiderdorp, Kvk Den Haag: 28088397
Re: svn commit: r1412436 - /river/jtsk/trunk/qa/src/com/sun/jini/qa/harness/MasterHarness.java
Posted by Peter Firmstone <ji...@zeus.net.au>.
On 22/11/2012 8:22 PM, Simon IJskes - QCG wrote:
> On 22-11-12 11:16, Dan Creswell wrote:
>> See, if it wasn't on trunk, the changes would be less of a big deal.
>> It'd
>> be natural to check one's work in (whether it be an in-progress test
>> snapshot or otherwise), good discipline even.
>
> Yes. I would have branched trunk, copied a jenkins job, and
> experimented on it. If something good would come out of it, it would
> probably be small, and easily patchable on trunk.
>
I might remind you Sim, that recent evidence demonstrates you didn't
make this choice.
I'll put this in context for you, I left trunk in a stable passing state
in late August. These tests from 16th of August through to 5th of
September show that:
https://builds.apache.org/view/M-R/view/River/job/River-QA-windows/54/
https://builds.apache.org/view/M-R/view/River/job/River-QA-solaris/60/
https://builds.apache.org/view/M-R/view/River/job/River-QA-ubuntu-jdk6/130/
https://builds.apache.org/view/M-R/view/River/job/River-QA-ubuntu-jdk7/52/
This was also the first time that River has passed on the Windows
platform, if you don't believe me, try running the qa tests on windows
with the previous release. As far as I was concerned, the codebase was
almost ready for release, the release version needed incrementing and
documentation updated.
But Sim, you developed in trunk without running the qa test suite, you
must run these tests every time you make changes, that way you know
which change caused the breakage. Did you run the jtreg regression test
suite?
This is what happened:
When I returned, I found most Jenkins tests disabled, then I ran them
and found many failing tests.
https://builds.apache.org/view/M-R/view/River/job/River-QA-ubuntu-jdk7/55/consoleText
[java] -----------------------------------------
[java]
[java] # of tests started = 1411
[java] # of tests completed = 1411
[java] # of tests skipped = 47
[java] # of tests passed = 1268
[java] # of tests failed = 143
[java]
[java] -----------------------------------------
[java]
[java] Date finished:
[java] Mon Oct 29 18:30:08 UTC 2012
[java] Time elapsed:
[java] 61854 seconds
[java]
[java] Java Result: 1
I have now fixed most failing tests, I have done so remotely without
access to debugging tools and also without criticising any other
developers, I've also just taken the opportunity to fix 283 failing
tests on arm.
Apportioning blame doesn't fix problems. I had made the mistake of
developing directly in trunk myself some time back, so I refrained from
dispensening criticism.
Would you like me to go on?
You have two options:
1. Work as a team, lets get trunk stable again and ready for release.
2. Continue the pissing contest.
The balls in your court.
Peter.
Re: svn commit: r1412436 - /river/jtsk/trunk/qa/src/com/sun/jini/qa/harness/MasterHarness.java
Posted by Simon IJskes - QCG <si...@qcg.nl>.
On 22-11-12 11:16, Dan Creswell wrote:
> See, if it wasn't on trunk, the changes would be less of a big deal. It'd
> be natural to check one's work in (whether it be an in-progress test
> snapshot or otherwise), good discipline even.
Yes. I would have branched trunk, copied a jenkins job, and experimented
on it. If something good would come out of it, it would probably be
small, and easily patchable on trunk.
--
QCG, Software voor het MKB, 071-5890970, http://www.qcg.nl
Quality Consultancy Group b.v., Leiderdorp, Kvk Den Haag: 28088397
Re: svn commit: r1412436 - /river/jtsk/trunk/qa/src/com/sun/jini/qa/harness/MasterHarness.java
Posted by Dan Creswell <da...@gmail.com>.
See, if it wasn't on trunk, the changes would be less of a big deal. It'd
be natural to check one's work in (whether it be an in-progress test
snapshot or otherwise), good discipline even.
On 22 November 2012 10:08, Simon IJskes - QCG <si...@qcg.nl> wrote:
> On 22-11-12 10:58, Dan Creswell wrote:
>
>> I have a different question:
>>
>> Is this really being done on trunk and, if so, why?
>>
>
> Indeed. I wouldn't have done this on trunk.
>
>
>
> --
> QCG, Software voor het MKB, 071-5890970, http://www.qcg.nl
> Quality Consultancy Group b.v., Leiderdorp, Kvk Den Haag: 28088397
>
Re: svn commit: r1412436 - /river/jtsk/trunk/qa/src/com/sun/jini/qa/harness/MasterHarness.java
Posted by Simon IJskes - QCG <si...@qcg.nl>.
On 22-11-12 10:58, Dan Creswell wrote:
> I have a different question:
>
> Is this really being done on trunk and, if so, why?
Indeed. I wouldn't have done this on trunk.
--
QCG, Software voor het MKB, 071-5890970, http://www.qcg.nl
Quality Consultancy Group b.v., Leiderdorp, Kvk Den Haag: 28088397
Re: svn commit: r1412436 - /river/jtsk/trunk/qa/src/com/sun/jini/qa/harness/MasterHarness.java
Posted by Dan Creswell <da...@gmail.com>.
I have a different question:
Is this really being done on trunk and, if so, why?
On 22 November 2012 08:42, Simon IJskes - QCG <si...@qcg.nl> wrote:
> On 22-11-12 07:45, peter_firmstone@apache.org wrote:
>
>> + try {
>> + Thread.sleep(240000); // Wait 4 minutes for TCP
>> 2MSL TIME_WAIT
>>
>
> Peter,
>
> could you please try to explain why you are removing SO_REUSEADDR and
> introducing this wait-retry?
>
> Please speculate about the potential bug you think there is.
>
> Is it a desperate attempt? Or have you found a solid theory behind it?
>
> Gr. Simon
>
> --
> QCG, Software voor het MKB, 071-5890970, http://www.qcg.nl
> Quality Consultancy Group b.v., Leiderdorp, Kvk Den Haag: 28088397
>
Re: svn commit: r1412436 - /river/jtsk/trunk/qa/src/com/sun/jini/qa/harness/MasterHarness.java
Posted by Simon IJskes - QCG <si...@qcg.nl>.
On 22-11-12 07:45, peter_firmstone@apache.org wrote:
> + try {
> + Thread.sleep(240000); // Wait 4 minutes for TCP 2MSL TIME_WAIT
Peter,
could you please try to explain why you are removing SO_REUSEADDR and
introducing this wait-retry?
Please speculate about the potential bug you think there is.
Is it a desperate attempt? Or have you found a solid theory behind it?
Gr. Simon
--
QCG, Software voor het MKB, 071-5890970, http://www.qcg.nl
Quality Consultancy Group b.v., Leiderdorp, Kvk Den Haag: 28088397