You are viewing a plain text version of this content. The canonical link for it is here.
Posted to derby-dev@db.apache.org by "Kathey Marsden (JIRA)" <ji...@apache.org> on 2011/03/01 00:10:36 UTC

[jira] Commented: (DERBY-4319) hang in suites.all with ibm 1.5 on AIX after ttestDefaultProperties

    [ https://issues.apache.org/jira/browse/DERBY-4319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13000595#comment-13000595 ] 

Kathey Marsden commented on DERBY-4319:
---------------------------------------

I ran the test and saw it hang on AIX.
When I think tried to ping the server manually once hung I saw a connection reset error:

$ java org.apache.derby.drda.NetworkServerControl ping
Mon Feb 28 14:52:49 PST 2011 : Error on client socket:
 Connection reset
Mon Feb 28 14:52:49 PST 2011 : Connection reset
java.net.SocketException: Connection reset
        at java.net.SocketInputStream.read(SocketInputStream.java:197)
        at java.net.SocketInputStream.read(SocketInputStream.java:116)
        at org.apache.derby.impl.drda.NetworkServerControlImpl.fillReplyBuffer(N
etworkServerControlImpl.java:2873)
        at org.apache.derby.impl.drda.NetworkServerControlImpl.readResult(Networ
kServerControlImpl.java:2817)
        at org.apache.derby.impl.drda.NetworkServerControlImpl.pingWithNoOpen(Ne
tworkServerControlImpl.java:1253)
        at org.apache.derby.impl.drda.NetworkServerControlImpl.ping(NetworkServe
rControlImpl.java:1228)
        at org.apache.derby.impl.drda.NetworkServerControlImpl.executeWork(Netwo
rkServerControlImpl.java:2260)
        at org.apache.derby.drda.NetworkServerControl.main(NetworkServerControl.
java:320)

After that occurred, any attempt to ping or shutdown the server hung and looking at the javacore, the ClientThread was no longer running

Since in the original trace from the hang was in SpawnedProcess.complete() called from NetworkServerTestSetup.teardown() this is what I think has happened:

In NetworkServerTestSetup.tearDown we have
  if (networkServerController != null) {
            boolean running = false;
            try {
                networkServerController.ping();
                running = true;
            } catch (Exception e) {
            }

Assuming the ping returned the connection reset, even though the process was still running it made teardown think that the server was actually down. It did not attempt to shutdown, but  called spawnedServer.complete(failedShutdown != null); with false as its argument so did not try to destroy the process either, so remains hung waiting for the process to enter.

There seem to be a few issues here.
1)  How does the server get into this state for this particular test?
2) How can we ensure that the server is brought down or destroyed no matter what?

I think I will focus on the second aspect first, so we don't have the risk of full runs getting held up and then try to understand the root cause for the network server state after that.




> hang in suites.all with ibm 1.5 on AIX after ttestDefaultProperties
> -------------------------------------------------------------------
>
>                 Key: DERBY-4319
>                 URL: https://issues.apache.org/jira/browse/DERBY-4319
>             Project: Derby
>          Issue Type: Bug
>          Components: Network Client
>    Affects Versions: 10.5.2.0
>         Environment: ibm jvm 1.5 SR9-0 on IBM AIX 3.5
>            Reporter: Myrna van Lunteren
>            Assignee: Kathey Marsden
>              Labels: derby_triage10_8
>         Attachments: javacore.20090723.093837.25380.0001.txt, javacore.20090723.093909.24726.0001.txt
>
>
> The test run for 10.5.2.0 hung in suites.All. The console output (the run was with -Dderby.tests.trace=true) showed ttestDefaultProperties had successfully completed but the run was halted.
> ps -eaf | grep java showed the process that kicked off suites.All, and a networkserver process with the following flags:
> - classpath <classpath including derby.jar, derbytools.jar, derbyclient.jar, derbynet.jar, derbyTesting.jar, derbyrun.jar, derbyTesting.jar and junit.jar> -Dderby.drda.logConnections= -Dderby.drda.traceAll= -Dderby.drda.traceDirectory= -Dderby.drda.keepAlive= -Dderby.drda.timeSlice= -Dderby.drda.host= -Dderby.drda.portNumber= -derby.drda.minThreads= -Dderby.drda.maxThreads= -Dderby.drda.startNetworkServer= -Dderby.drda.debug= org.apache.derby.drda.NetworkServerControl start -h localhost -p 1527
> This process had been sitting for 2 days.
> After killing the NetworkServerControl process, the test continued successfully (except for DERBY-4186, fixed in trunk), but the following was put out to the console:
>  START-SPAWNED:SpawnedNetworkServer STANDARD OUTPUT: exit code=137
> 2009-07-18 03:16:07.157 GMT : Security manager installed using the Basic server
> security policy.
> 2009-07-18 03:16:09.169 GMT : Apache Derby Network Server - 10.5.2.0 - (794445)
> started and ready to accept connections on port 1527
> END-SPAWNED  :SpawnedNetworkServer STANDARD OUTPUT:

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira