You are viewing a plain text version of this content. The canonical link for it is here.
Posted to derby-dev@db.apache.org by "Knut Anders Hatlen (JIRA)" <ji...@apache.org> on 2012/12/13 13:32:14 UTC
[jira] [Updated] (DERBY-5632) Logical deadlock happened when freezing/unfreezing the database

     [ https://issues.apache.org/jira/browse/DERBY-5632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Knut Anders Hatlen updated DERBY-5632:
--------------------------------------

    Attachment: experimental-v1.diff

I think there are two reasons why RAMAccessManager synchronizes on the conglomerate cache instance whenever it accesses it:

1) Because it manually faults in missing items in the cache, and it needs to ensure that no others fault it in between its calls to findCached() and create().

2) Because conglomCacheUpdateEntry() implements a create-or-replace operation, which is not provided by the CacheManager interface, and it needs to ensure no others add an item with the same key between findCached() and create().

As mentioned in an earlier comment, I think (1) should be solved by implementing CacheableConglomerate.setIdentity(), so that the cache manager takes care of faulting in the conglomerate.

(2) might be solved by adding a create-or-replace operation to CacheManager interface. However, I'm not sure it is needed. The conglomCacheUpdateEntry() method is only called once; by RAMTransaction.addColumnToConglomerate(). That method fetches a Conglomerate instance from the cache, modifies it, and reinserts it into the cache. The instance that's reinserted into the cache is the exact same instance that was fetched from the cache, so the call to conglomCacheUpdateEntry() doesn't really update the conglomerate cache, it just replaces an existing entry with itself.

It looks to me as if the conglomCacheUpdateEntry() can be removed, and that will take care of (2).

I created an experimental patch, attached as experimental-v1.diff. It removes conglomCacheUpdateEntry() as suggested. It also makes CacheableConglomerate implement setIdentity() so that conglomCacheFind() doesn't need to fault in conglomerates manually.

The patch is not ready for commit, as it doesn't pass all regression tests. But it could be used for testing, if someone has a test environment where the deadlock can be reliably reproduced.

There was only one failure in the regression tests. store/xaOffline1.sql had a diff in one of the transaction table listings, where a transaction showed up in the ACTIVE state whereas IDLE was expected.

This probably happens because the transaction used in the CacheableConglomerate.setIdentity() method is not necessarily the same as the one previously used by RAMAccessManager.conglomCacheFind().

The current implementation of setIdentity() in the patch just fetches the first transaction it finds on the context stack. That seems to do the trick in most cases, but it doesn't know whether conglomCacheFind() was called with a top-level transaction or a nested transaction, as setIdentity() cannot access conglomCacheFind()'s parameters. Maybe it can be solved by pushing some other context type (with a reference to the correct tx) on the context stack before accessing the conglomerate cache, and let setIdentity() check that instead?
                
> Logical deadlock happened when freezing/unfreezing the database
> ---------------------------------------------------------------
>
>                 Key: DERBY-5632
>                 URL: https://issues.apache.org/jira/browse/DERBY-5632
>             Project: Derby
>          Issue Type: Bug
>          Components: Documentation, Services
>    Affects Versions: 10.8.2.2
>         Environment: Oracle M3000/Solaris 10
>            Reporter: Brett Bergquist
>              Labels: derby_triage10_10
>         Attachments: experimental-v1.diff, stack.txt
>
>
> Tried to make a quick database backup by freezing the database, performing a ZFS snapshot, and then unfreezing the database.   The database was frozen but then a connection to the database could not be established to unfreeze the database.
> Looking at the stack trace of the network server, , I see 3 threads that are trying to process a connection request.   Each of these is waiting on:
>                 at org.apache.derby.impl.store.access.RAMAccessManager.conglomCacheFind(Unknown Source)
>                 - waiting to lock <0xfffffffd3a7fcc68> (a org.apache.derby.impl.services.cache.ConcurrentCache)
> That object is owned by:
>                 - locked <0xfffffffd3a7fcc68> (a org.apache.derby.impl.services.cache.ConcurrentCache)
>                 at org.apache.derby.impl.store.access.RAMTransaction.findExistingConglomerate(Unknown Source)
>                 at org.apache.derby.impl.store.access.RAMTransaction.openGroupFetchScan(Unknown Source)
>                 at org.apache.derby.impl.services.daemon.IndexStatisticsDaemonImpl.updateIndexStatsMinion(Unknown Source)
>                 at org.apache.derby.impl.services.daemon.IndexStatisticsDaemonImpl.runExplicitly(Unknown Source)
>                 at org.apache.derby.impl.sql.execute.AlterTableConstantAction.updateStatistics(Unknown Source)
> which itself is waiting for the object:
>                 at java.lang.Object.wait(Native Method)
>                 - waiting on <0xfffffffd3ac1d608> (a org.apache.derby.impl.store.raw.log.LogToFile)
>                 at java.lang.Object.wait(Object.java:485)
>                 at org.apache.derby.impl.store.raw.log.LogToFile.flush(Unknown Source)
>                 - locked <0xfffffffd3ac1d608> (a org.apache.derby.impl.store.raw.log.LogToFile)
>                 at org.apache.derby.impl.store.raw.log.LogToFile.flush(Unknown Source)
>                 at org.apache.derby.impl.store.raw.data.BaseDataFileFactory.flush(Unknown Source)
> So basically what I think is happening is that the database is frozen, the statistics are being updated on another thread which has the "org.apache.derby.impl.services.cache.ConcurrentCache" locked and then waits for the LogToFile lock and the connecting threads are waiting to lock "org.apache.derby.impl.services.cache.ConcurrentCache" to connect and these are where the database is going to be unfrozen.    Not a deadlock as far as the JVM is concerned but it will never leave this state either.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira