You are viewing a plain text version of this content. The canonical link for it is here.
Posted to derby-dev@db.apache.org by "Knut Anders Hatlen (JIRA)" <ji...@apache.org> on 2009/03/04 12:19:56 UTC

[jira] Updated: (DERBY-4080) Possible deadlock between locks and latches in BTreeController.compareRowsForInsert()

     [ https://issues.apache.org/jira/browse/DERBY-4080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Knut Anders Hatlen updated DERBY-4080:
--------------------------------------

    Attachment: repro.sql

Here's a script that exposes the bug (seen in 10.4.2.1 - (706043) and 10.5.0.0 alpha - (749659)).

The script inserts a row at the end of one index page, with an uncommitted deleted duplicate at the beginning of the next index page. The insert must wait for the uncommitted delete to be committed. The transaction that holds the exclusive lock on the deleted duplicate then tries to read a row on the previous index page, but cannot obtain the lock until the insert operation times out.

A thread dump during the hang shows that this is a deadlock involving both locks and latches:

"Thread-2" prio=3 tid=0x08473800 nid=0x12 in Object.wait() [0xb60de000..0xb60debe0]
   java.lang.Thread.State: TIMED_WAITING (on object monitor)
	at java.lang.Object.wait(Native Method)
	- waiting on <0xf3c06298> (a org.apache.derby.impl.services.locks.ActiveLock)
	at org.apache.derby.impl.services.locks.ActiveLock.waitForGrant(Unknown Source)
	- locked <0xf3c06298> (a org.apache.derby.impl.services.locks.ActiveLock)
	at org.apache.derby.impl.services.locks.ConcurrentLockSet.lockObject(Unknown Source)
.
.
.
"main" prio=3 tid=0x0806f400 nid=0x2 in Object.wait() [0xfe34e000..0xfe34ed38]
   java.lang.Thread.State: WAITING (on object monitor)
	at java.lang.Object.wait(Native Method)
	- waiting on <0xf4235668> (a org.apache.derby.impl.store.raw.data.StoredPage)
	at java.lang.Object.wait(Object.java:485)
	at org.apache.derby.impl.store.raw.data.BasePage.setExclusive(Unknown Source)
	- locked <0xf4235668> (a org.apache.derby.impl.store.raw.data.StoredPage)
	at org.apache.derby.impl.store.raw.data.BaseContainer.latchPage(Unknown Source)

If the transaction that waits for the row lock had released all latches once it detected that it would have to wait, there would not be a deadlock and both transactions would be able to complete successfully. Once it has obtained the lock, it will release the latch on the page that it "forgot" to unlatch and perform a rescan, so I would believe that it is fine to release the latch earlier in this case.

> Possible deadlock between locks and latches in BTreeController.compareRowsForInsert()
> -------------------------------------------------------------------------------------
>
>                 Key: DERBY-4080
>                 URL: https://issues.apache.org/jira/browse/DERBY-4080
>             Project: Derby
>          Issue Type: Bug
>          Components: Store
>    Affects Versions: 10.4.2.0
>            Reporter: Knut Anders Hatlen
>         Attachments: repro.sql
>
>
> It looks like BTreeController.compareRowsForInsert(), which is used to check for duplicates in a unique nullable index, can run into a deadlock which involves both locks and latches.
> Here's what I think can happen:
> comparePreviousRecord() (or compareNextRecord()) holds a latch on the index page where a new row is about to be inserted, and needs to check if there's a duplicate on one of the adjacent rows. Because the row is near a page boundary, this check moves to another index page, while still holding the latch on the original index page. Then compareRowsForInsert() is called, which tries to get an update lock on the existing row. If it has to wait for the update lock, the latch on the current page is released, but the latch on the original index page is kept. This means that the transaction is holding a latch while it is waiting for a lock, which means that it is blocking all access to that page until it has been granted the lock. If some other transaction that is holding a conflicting lock on the row later needs to latch the index page, those two transactions run into a deadlock and the one that's waiting for the lock will eventually time out (but it will not be reported as a dead
 lock).
> If compareRowsForInsert() releases all latches when it needs to wait for a lock, the deadlock is prevented, and both of the transactions may be able to complete without timing out.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Re: [jira] Updated: (DERBY-4080) Possible deadlock between locks and latches in BTreeController.compareRowsForInsert()

Posted by Knut Anders Hatlen <Kn...@Sun.COM>.
Mike Matrigali <mi...@sbcglobal.net> writes:

> You seem to have found a number of problems with the new code that
> tries to implement the unique constraint with multiple nulls.  I
> definitely missed the deleted row case when reviewing that code, and
> had assumed it
> just ever needed to search one row left and right.
>
> I am now wondering if it would be better to make this code work the same
> as the existing code in the regular unique case.  That code makes sure
> there is always only ONE row with the given key (not including the row
> id).  It manipulates the search params so that either that exact row
> is found or it positions in the right place to do the insert.
>
> It does have some tricky code for when it finds a matching row,
> especially if it is marked deleted, but that code has been working for
> a long time now.

(By the way, do you think this tricky code may be the cause of
DERBY-4032? There we have one deleted row hiding another row with the
same key in a unique index.)

> The approach would be something like:
> o If there is a null in the key then just do the insert normally into
> the non-unique table, using all keys.
> o If no null, then need to change the search params to only use leading
> columns and to not allow exact match - this means changing the
> existing code which would use the
> tables information which would say to use all columns.
> o and then if match use the existing code for unique tables in this case.
>
> Let me know what you think.  Is this a better approach or should we
> patch up the new code?  Are you interested in doing this?  If not I
> may have time to do this.

Your suggestion sounds like a more robust approach than the current one
to me. I only planned to fix DERBY-4028 and DERBY-4081 for 10.5, as they
seem to be rather easy to fix. So if you have time, feel free to pick
this up. Are you aiming for 10.5? If so, I should probably not check in
the fixes that I have posted. Otherwise, I could check them in now so
that we improve the situation for 10.5 too.

-- 
Knut Anders

Re: [jira] Updated: (DERBY-4080) Possible deadlock between locks and latches in BTreeController.compareRowsForInsert()

Posted by Mike Matrigali <mi...@sbcglobal.net>.
You seem to have found a number of problems with the new code that tries 
to implement the unique constraint with multiple nulls.  I definitely 
missed the deleted row case when reviewing that code, and had assumed it
just ever needed to search one row left and right.

I am now wondering if it would be better to make this code work the same
as the existing code in the regular unique case.  That code makes sure
there is always only ONE row with the given key (not including the row
id).  It manipulates the search params so that either that exact row
is found or it positions in the right place to do the insert.

It does have some tricky code for when it finds a matching row, 
especially if it is marked deleted, but that code has been working for
a long time now.

The approach would be something like:
o If there is a null in the key then just do the insert normally into 
the non-unique table, using all keys.
o If no null, then need to change the search params to only use leading
columns and to not allow exact match - this means changing the existing 
code which would use the
tables information which would say to use all columns.
o and then if match use the existing code for unique tables in this case.

Let me know what you think.  Is this a better approach or should we
patch up the new code?  Are you interested in doing this?  If not I
may have time to do this.

/mikem

Knut Anders Hatlen (JIRA) wrote:
>      [ https://issues.apache.org/jira/browse/DERBY-4080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
> 
> Knut Anders Hatlen updated DERBY-4080:
> --------------------------------------
> 
>     Attachment: repro.sql
> 
> Here's a script that exposes the bug (seen in 10.4.2.1 - (706043) and 10.5.0.0 alpha - (749659)).
> 
> The script inserts a row at the end of one index page, with an uncommitted deleted duplicate at the beginning of the next index page. The insert must wait for the uncommitted delete to be committed. The transaction that holds the exclusive lock on the deleted duplicate then tries to read a row on the previous index page, but cannot obtain the lock until the insert operation times out.
> 
> A thread dump during the hang shows that this is a deadlock involving both locks and latches:
> 
> "Thread-2" prio=3 tid=0x08473800 nid=0x12 in Object.wait() [0xb60de000..0xb60debe0]
>    java.lang.Thread.State: TIMED_WAITING (on object monitor)
> 	at java.lang.Object.wait(Native Method)
> 	- waiting on <0xf3c06298> (a org.apache.derby.impl.services.locks.ActiveLock)
> 	at org.apache.derby.impl.services.locks.ActiveLock.waitForGrant(Unknown Source)
> 	- locked <0xf3c06298> (a org.apache.derby.impl.services.locks.ActiveLock)
> 	at org.apache.derby.impl.services.locks.ConcurrentLockSet.lockObject(Unknown Source)
> .
> .
> .
> "main" prio=3 tid=0x0806f400 nid=0x2 in Object.wait() [0xfe34e000..0xfe34ed38]
>    java.lang.Thread.State: WAITING (on object monitor)
> 	at java.lang.Object.wait(Native Method)
> 	- waiting on <0xf4235668> (a org.apache.derby.impl.store.raw.data.StoredPage)
> 	at java.lang.Object.wait(Object.java:485)
> 	at org.apache.derby.impl.store.raw.data.BasePage.setExclusive(Unknown Source)
> 	- locked <0xf4235668> (a org.apache.derby.impl.store.raw.data.StoredPage)
> 	at org.apache.derby.impl.store.raw.data.BaseContainer.latchPage(Unknown Source)
> 
> If the transaction that waits for the row lock had released all latches once it detected that it would have to wait, there would not be a deadlock and both transactions would be able to complete successfully. Once it has obtained the lock, it will release the latch on the page that it "forgot" to unlatch and perform a rescan, so I would believe that it is fine to release the latch earlier in this case.
> 
>> Possible deadlock between locks and latches in BTreeController.compareRowsForInsert()
>> -------------------------------------------------------------------------------------
>>
>>                 Key: DERBY-4080
>>                 URL: https://issues.apache.org/jira/browse/DERBY-4080
>>             Project: Derby
>>          Issue Type: Bug
>>          Components: Store
>>    Affects Versions: 10.4.2.0
>>            Reporter: Knut Anders Hatlen
>>         Attachments: repro.sql
>>
>>
>> It looks like BTreeController.compareRowsForInsert(), which is used to check for duplicates in a unique nullable index, can run into a deadlock which involves both locks and latches.
>> Here's what I think can happen:
>> comparePreviousRecord() (or compareNextRecord()) holds a latch on the index page where a new row is about to be inserted, and needs to check if there's a duplicate on one of the adjacent rows. Because the row is near a page boundary, this check moves to another index page, while still holding the latch on the original index page. Then compareRowsForInsert() is called, which tries to get an update lock on the existing row. If it has to wait for the update lock, the latch on the current page is released, but the latch on the original index page is kept. This means that the transaction is holding a latch while it is waiting for a lock, which means that it is blocking all access to that page until it has been granted the lock. If some other transaction that is holding a conflicting lock on the row later needs to latch the index page, those two transactions run into a deadlock and the one that's waiting for the lock will eventually time out (but it will not be reported as a de
ad
>  lock).
>> If compareRowsForInsert() releases all latches when it needs to wait for a lock, the deadlock is prevented, and both of the transactions may be able to complete without timing out.
>