You are viewing a plain text version of this content. The canonical link for it is here.
Posted to derby-dev@db.apache.org by Dy...@Sun.COM on 2008/02/09 14:50:11 UTC
NetworkServer Issues

Hi everybody,

While working on DERBY-3192 I have discovered some problems that aren't
quite bugs, but more like QOI issues. I'd like to get some feedback
before (possibly) filing Jiras for them:

1) The DRDAConnThreads keep the EmbedConnection that they use in
Database.conn. This variable is modified through the
Database.setConnection() method. This method does not do any
sanity-checking before over-writing the connection already
referenced by conn. I tried to add an ASSERT which verified that Database.conn
was either null, identical (pointer equal) to the new connection, or at
least closed, when it was being over-written. I had to remove the ASSERT
right away since it was triggered all the time. Is there a reason not
close these connections before removing the server's reference to them?
Or do they get cleaned up in some other way that I'm not aware of? (I
realize that they will be gc'ed and that the finalizer probably will
clean up, but usually we don't encourage people to rely on that...)

2) In NetXAResource.start() we have the following code:
        // DERBY-1025 - Flow an auto-commit if in auto-commit mode before 
        // entering a global transaction
        try {
                if(conn_.autoCommit_)
                        conn_.flowAutoCommit();
        } catch (SqlException sqle) {
                rc = XAException.XAER_RMERR;
            exceptionsOnXA = org.apache.derby.client.am.Utils.accumulateSQLExcep
tion
                    (sqle, exceptionsOnXA);
        } 

There is a problem with this code because DisconnectException is a
sub-class of SqlException. So if conn_.flowAutoCommit() should throw a
DisconnectException this fact will be hidden (except for the fact that
the exception is stored in exceptionsOnXA. This isn't a problem in
itself, but further down in the same method we do

  netAgent.beginWriteChainOutsideUOW();
  netAgent.netConnectionRequest_.writeXaStartUnitOfWork(conn_);
  netAgent.flowOutsideUOW();

which is obviously not going to work if a DisconnectException has been
thrown. But the error we get now is a NullPointerException when trying
to access the socked (which was set to null when we got the
DisconnectException). 

In my case the root cause was an error on the server when doing commit,
but I ended up with an NPE when doing netAgent.flowOutsideUOW(). It took
me a while to figure out what was really happening.

3) Somewhat related to 2) is the fact that
NetConnectionReply.readLocalCommit() (and perhaps other methods as well)
is/are not able to detect and report a serialized DRDAProtocolException
at the end of the reply, as could happen if an exception is thrown on the
server. The result is that the client complains that not the whole DSS
has been read in endOfSameIdChainData(). Which is true, but that it
does not tell you very much about what the real problem is. 

In my case I ended up dumping each remaining byte as hex, and noticing
that it was code point 1232 which I could identify as

	// Codepoint for Agent Permanent Error Reply message
	static final int AGNPRMRM = 0x1232;

But note that this is only found in the server's CodePoint interface
(and CodePointNameTable). Knowing that, I could add a try/catch around
the commit code on the server and find the real problem. 

I think that in this case I *should* have seen the real error in the
console output or derby.log, but I could not find it. (I was running a
JUnit test).

Comments will be much appreciated.

Thanks,

-- 
dt