You are viewing a plain text version of this content. The canonical link for it is here.
Posted to derby-dev@db.apache.org by "Brett Wooldridge (JIRA)" <ji...@apache.org> on 2012/06/27 09:35:44 UTC
[jira] [Updated] (DERBY-4279) Statement cache deadlock

     [ https://issues.apache.org/jira/browse/DERBY-4279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Brett Wooldridge updated DERBY-4279:
------------------------------------

    Attachment: patch4279_2.txt

Okay, here is my second cut a fix...

First of all, the original report surmised that any DML, not just SELECT, could cause this deadlock.  I myself thought this as well, but it turns out not to be the case.

There are two places in the statement compilation code where openConglomerate() is called.   All compiled statement nodes ultimately inherit from StatementNode.  In StatementNode.lockTableForCompilation() you will find this logic:

if (dataDictionary.getCacheMode() == DataDictionary.DDL_MODE)   <=== Important
{
    ...
    heapCC = tc.openConglomerate(td.getHeapConglomerateId(),
        false,
        TransactionController.OPENMODE_FORUPDATE |
        TransactionController.OPENMODE_FOR_LOCK_ONLY,
        TransactionController.MODE_RECORD,
        TransactionController.ISOLATION_SERIALIZABLE);   <=== Important
    ...
}

The method is preceded with this comment:

	/* We need to get some kind of table lock (IX here) at the beginning of
	 * compilation of DMLModStatementNode and DDLStatementNode, to prevent the
	 * interference of insert/update/delete/DDL compilation and DDL execution,
	 * see beetle 3976, 4343, and $WS/language/SolutionsToConcurrencyIssues.txt
	 */

However, ultimately, according to the logic the lock is only obtained if the DataDictionary is in DDL_MODE.  This is all fine.

Now to the heart of the issue.  The second place where openConglomerate() is called is in ResultColumnList.generateHolderMethod().  This is used for SELECT statements for columns that will ultimately appear in the ResultSet (or sub-SELECTs in nested queries).  In ResultColumnList.generateHolderMethod() you will find this logic:

cc = getLanguageConnectionContext().getTransactionCompile().openConglomerate(
        conglomerateId,
        false,
        0,
        TransactionController.MODE_RECORD,
        TransactionController.ISOLATION_READ_COMMITTED);   <=== Important

Notice no conditional logic.  In other words, a lock on the conglomerate is unconditionally obtained.  It is this lock that contends with a table lock that may have been obtained elsewhere (possibly DDL, possibly an explicit lock).  The attached patch changes this code to the following:

LanguageConnectionContext lcc = getLanguageConnectionContext();
DataDictionary dd = lcc.getDataDictionary();
int isolationLevel = (dd.getCacheMode() == DataDictionary.DDL_MODE) ?
        TransactionController.ISOLATION_READ_COMMITTED : TransactionController.ISOLATION_NOLOCK;
cc = lcc.getTransactionCompile().openConglomerate(
        conglomerateId,
        false,
        0,
        TransactionController.MODE_RECORD,
        isolationLevel);

Basically, it too checks the DataDictionary to see if it is in DDL_MODE.  If it is, it maintains the same locking level as the existing code.  However, if the DataDictionary is not being modified the code will not obtain a lock.

What is the impact?  The original code has the effect of blocking DDL from execution by locking the table(s) during this compile phase.  And in turn, if DDL has locked the table(s), the compilation will be blocked.  The first I take to be an unintended side-effect (as you'll see).

Let's take the second case first.  If DDL has occurred (and locked tables), the DataDictionary will be in DDL_MODE, and the new code behaves the same as the old code; the conglomerate is opened with READ_COMMITTED isolation and the compilation will be blocked just as before.

Now the first case.  If compilation of a statement (that involves ResultColumnList) has started and no DDL is in-progress at that instant, no locks are obtained (because the cache mode is not DDL_MODE).  This is the same as other statement types [that extend from StatementNode].  If DDL occurs after compilation has started, it will not be blocked (but it was never blocked in all other cases).  Because it was never blocked in other cases (due to the conditional logic), I take this to be a side-effect.  However...

If DDL does happen to alter the table(s) involved there is already code to handle it.  All roads to ResultColumnList.generateHolderMethod() lead back to GenericStatement.prepare().  GenericStatement.prepare() has this logic fragment:

try {
    return prepMinion(lcc, true, (Object[]) null, (SchemaDescriptor) null, forMetaData);
} catch (StandardException se) {
    // There is a chance that we didn't see the invalidation
    // request from a DDL operation in another thread because
    // the statement wasn't registered as a dependent until
    // after the invalidation had been completed. Assume that's
    // what has happened if we see a conglomerate does not exist
    // error, and force a retry even if the statement hasn't been
    // invalidated.
    if (SQLState.STORE_CONGLOMERATE_DOES_NOT_EXIST.equals(se.getMessageId())) {
        ... recompile ...
    }
}

So, even though the compile of a SELECT will no longer block DDL from occurring (remember UPDATE, DELETE, and all other DML never blocked it anyway) there is code in-place to handle the possibility that openConglomerate() in StatementNode or ResultColumnList might fail.  In that case, the compile is given one more shot before failing (recompile = true).

It is debatable now that the logic in ResultColumnList is the same as StatementNode, whether the isolation in ResultColumnList should also be ISOLATION_SERIALIZABLE like StatementNode given that it is now also conditional based on DataDictionary.getCacheMode() == DDL_MODE.  However, my approach was to be least impacting in terms of deviation from current known working code.

On a final note, this patch obviously fixed the test case attached to this issue, however past attempts to fix this bug did the same but ultimately failed in some of the long running stress tests.  It would be helpful if this patch could be applied and let run through the usual full barrage of tests.  That said, the previous patch attempted to fix the issue at the statement cache level with all of it's hairy synchronization.  This patch is considerably simpler both conceptually and implementation-wise.

                
> Statement cache deadlock
> ------------------------
>
>                 Key: DERBY-4279
>                 URL: https://issues.apache.org/jira/browse/DERBY-4279
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.0.2.1, 10.1.3.1, 10.2.2.0, 10.3.3.0, 10.4.2.0, 10.5.1.1, 10.8.1.2
>         Environment: Windows Vista, OS X 10.5+
>            Reporter: Jeff Stuckman
>              Labels: derby_triage10_5_2
>         Attachments: Derby4279.java, client_stacktrace_activation_closed.txt, patch4279.txt, patch4279_2.txt, stacktrace.txt
>
>
> Due to a design flaw in the statement cache, a deadlock can occur if a prepared statement becomes out-of-date.
> I will illustrate this with the following example:
> The application is using the embedded Derby driver. The application has two threads, and each thread uses its own connection.
> There is a table named MYTABLE with column MYCOLUMN.
> 1. A thread prepares and executes the query SELECT MYCOLUMN FROM MYTABLE. The prepared statement is stored in the statement cache (see org.apache.derby.impl.sql.GenericStatement for this logic)
> 2. After some time, the prepared statement becomes invalid or out-of-date for some reason (see org.apache.derby.impl.sql.GenericPreparedStatement)
> 3. Thread 1 begins a transaction and executes LOCK TABLE MYTABLE IN EXCLUSIVE MODE
> 4. Thread 2 begins a transaction and executes SELECT MYCOLUMN FROM MYTABLE. The statement is in the statement cache but it is out-of-date. The thread begins to recompile the statement. To compile the statement, the thread needs a shared lock on MYTABLE. Thread 1 already has an exclusive lock on MYTABLE. Thread 2 waits.
> 5. Thread 1 executes SELECT MYCOLUMN FROM MYTABLE. The statement is in the statement cache but it is being compiled. Thread 1 waits on the statement's monitor.
> 6. We have a deadlock. Derby eventually detects a lock timeout, but the error message is not descriptive. The stacks at the time of the deadlock are:
> This deadlock is unique because it can still occur in a properly designed database. You are only safe if all of your transactions are very simple and cannot be interleaved in a sequence that causes the deadlock, or if your particular statements do not require a table lock to compile. (For the sake of simplicity, I used LOCK TABLE in my example, but any UPDATE statement would fit.)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira