You are viewing a plain text version of this content. The canonical link for it is here.

Posted to derby-dev@db.apache.org by "Bergquist, Brett" <BB...@canoga.com> on 2011/12/28 16:49:18 UTC

Could someone give me some guidance on DERBY-5560

I created https://issues.apache.org/jira/browse/DERBY-5560 and am seeing this in production.

Basically what is happening is that the LogicalConnection.close() is being called which attempts to recycle the physical connection by calling ClientPooledConnection.recycleConnection().  At the same time ClientPooledConnection.close() is being called which attempts to call LogicalConnection.nullPhysicalConnection().    The first thread holds a lock on LogicalConnection and needs the lock on ClientPooledConnection and the second thread holds a lock on ClientPolledConnection and needs a lock on LogicalConnection and a deadlock occurs.

This is occurring because of the configuration of the connection pool in use (ClienXADataSource) and the pool is configured to close all connections on any error.   The stack trace of the deadlock as a transaction being committed by the first thread and the connection pool closing all threads on a detected error in the second thread.

I don't see immediately how to synchronize in an orderly way to eliminate the deadlock.  The first thread has a handle on the LogicalConnection which references the physical ClientPooledConnection.  The second thread has a handle on the physical ClientPooledConnection which references the LogicalConnection.

One thought is to change the LogicalConnection.close to be something like:

    public void close() throws SQLException {
       synchronized(phsyicalConnection) {
                _close();
       }
  }

   synchronized public void _close() throws SQLException {
        try
        {
            // we also need to loop thru all the logicalStatements and close them
            if (physicalConnection_ == null) {
                return;
            }
            if (physicalConnection_.agent_.loggingEnabled()) {
                physicalConnection_.agent_.logWriter_.traceEntry(this, "close");
            }

            if (physicalConnection_.isClosed()) // connection is closed or has become stale
            {
                pooledConnection_.informListeners(new SqlException(null,
                    new ClientMessageId(
                        SQLState.PHYSICAL_CONNECTION_ALREADY_CLOSED)));
            } else {
                physicalConnection_.checkForTransactionInProgress();
                physicalConnection_.closeForReuse(
                        pooledConnection_.isStatementPoolingEnabled());
                if (!physicalConnection_.isGlobalPending_()) {
                    pooledConnection_.recycleConnection();
                }
            }
            physicalConnection_ = null;
            pooledConnection_.nullLogicalConnection();
        }
        catch ( SqlException se )
        {
            throw se.getSQLException();
        }
    }

but this has a problem if "physicalConnection" is already null.
Any guidance will be greatly appreciated as I need to patch Derby 10.8.2.2 to work around this issue in any case.

Brett

Re: Could someone give me some guidance on DERBY-5560

Posted by Katherine Marsden <km...@sbcglobal.net>.

On 12/28/2011 7:49 AM, Bergquist, Brett wrote:
>
> I created https://issues.apache.org/jira/browse/DERBY-5560 and am 
> seeing this in production.
>
What an exciting holiday season you are having. I hope you didn't have 
plans.

I haven't looked closely at this and probably won't have time to as I 
have to peel away to complete some other tasks, but as an off the wall 
remark, would it work to synchronize on "this" instead of 
physicalConnection?

> Basically what is happening is that the LogicalConnection.close() is 
> being called which attempts to recycle the physical connection by 
> calling ClientPooledConnection.recycleConnection().  At the same time 
> ClientPooledConnection.close() is being called which attempts to call 
> LogicalConnection.nullPhysicalConnection().    The first thread holds 
> a lock on LogicalConnection and needs the lock on 
> ClientPooledConnection and the second thread holds a lock on 
> ClientPolledConnection and needs a lock on LogicalConnection and a 
> deadlock occurs.
>
> This is occurring because of the configuration of the connection pool 
> in use (ClienXADataSource) and the pool is configured to close all 
> connections on any error.   The stack trace of the deadlock as a 
> transaction being committed by the first thread and the connection 
> pool closing all threads on a detected error in the second thread.
>
> I don't see immediately how to synchronize in an orderly way to 
> eliminate the deadlock.  The first thread has a handle on the 
> LogicalConnection which references the physical 
> ClientPooledConnection.  The second thread has a handle on the 
> physical ClientPooledConnection which references the LogicalConnection.
>
> One thought is to change the LogicalConnection.close to be something like:
>
>     public void close() throws SQLException {
>
>        synchronized(phsyicalConnection) {
>
>                 _close();
>
>        }
>
>   }
>
>    synchronized public void _close() throws SQLException {
>
>         try
>
>         {
>
>             // we also need to loop thru all the logicalStatements and 
> close them
>
>             if (physicalConnection_ == null) {
>
>                 return;
>
>             }
>
>             if (physicalConnection_.agent_.loggingEnabled()) {
>
>                 physicalConnection_.agent_.logWriter_.traceEntry(this, 
> "close");
>
>             }
>
>             if (physicalConnection_.isClosed()) // connection is 
> closed or has become stale
>
>             {
>
>                 pooledConnection_.informListeners(new SqlException(null,
>
>                     new ClientMessageId(
>
>                         SQLState.PHYSICAL_CONNECTION_ALREADY_CLOSED)));
>
>             } else {
>
>                 physicalConnection_.checkForTransactionInProgress();
>
>                 physicalConnection_.closeForReuse(
>
>                         pooledConnection_.isStatementPoolingEnabled());
>
>                 if (!physicalConnection_.isGlobalPending_()) {
>
>                     pooledConnection_.recycleConnection();
>
>                 }
>
>             }
>
>             physicalConnection_ = null;
>
>             pooledConnection_.nullLogicalConnection();
>
>         }
>
>         catch ( SqlException se )
>
>         {
>
>             throw se.getSQLException();
>
>         }
>
>     }
>
> but this has a problem if "physicalConnection" is already null.
>
> Any guidance will be greatly appreciated as I need to patch Derby 
> 10.8.2.2 to work around this issue in any case.
>
> Brett
>

Re: Could someone give me some guidance on DERBY-5560

Posted by Bryan Pendleton <bp...@gmail.com>.

> Basically what is happening is that the LogicalConnection.close() is being called which attempts to recycle the physical
> connection by calling ClientPooledConnection.recycleConnection(). At the same time ClientPooledConnection.close() is being
> called which attempts to call LogicalConnection.nullPhysicalConnection(). The first thread holds a lock on LogicalConnection and
> needs the lock on ClientPooledConnection and the second thread holds a lock on ClientPolledConnection and needs a lock on
> LogicalConnection and a deadlock occurs.

That definitely seems like a bad design; your description is quite clear
and makes the problem really stand out. Thanks for all the hard work
on this problem!

I believe that the surest means to avoid such problems is to establish
and keep to a single well-defined order of synchronization.

My intuition is that the proper order should be logical connection first,
physical connection second; do you have a sense for whether there are any
other places where we try to move in the other direction?

It is often useful to catalog the current synchronization behaviors;
sometimes I simply insert some debugging code into the Derby libraries
to capture those behaviors and run the code to observe what is occurring.
However we do it, having a nice table of code paths where we lock these
objects, and the order in which we lock them, would help us determine
what order is followed by the current code in most cases.

Once we are clear on what the correct order of synchronization should be,
there are essentially two techniques for repairing code which is violating
the ordering (i.e., code which tries to lock the primary object while holding
the lock on the secondary object):

1) Change some method higher in the call stack so that it locks the primary
object first, before calling this problematic method on the secondary object.

2) Change the problematic method itself, so it doesn't try to lock the primary
object.

So, e.g., if we determine that the problem is where ClientPooledConnection.close()
calls LogicalConnection.nullPhysicalConnection, then we could either:
1) Change the caller of ClientPooledConnection.close() to first lock the
LogicalConnection before calling close(), or
2) Change ClientPooledConnection.close() so that it doesn't call nullPhysicalConnection()

Can you provide the complete stack traces of the problematic deadlock when
it occurs?

thanks,

bryan