You are viewing a plain text version of this content. The canonical link for it is here.

Posted to derby-user@db.apache.org by Jeff Stuckman <st...@umd.edu> on 2008/11/12 17:39:08 UTC

Bugs? Deadlock detection with internal transactions, locking strategy when updating indices

Hello everyone,

I just solved a hard-to-troubleshoot deadlock bug in my application. The
fact that a deadlock occurred may or may not be a flaw in Derby, but the
failure of the deadlock detection code to detect the cycle is probably a
bug. I'm wondering if a knowledgeable person has time to read my analysis
and advise if I should file a defect or not.

Problem summary:
Even using READ_COMMITTED, a single non-updatable SELECT and a single UPDATE
statement can deadlock against each other when an index includes the updated
column. The transactions fail due to a lock timeout and the deadlock is not
detected, possibly because a transaction of type InternalTransaction is part
of the cycle.

More details:
My test case uses the following table and index:
CREATE TABLE urls (urlid INTEGER NOT NULL PRIMARY KEY, url VARCHAR(2048) NOT
NULL UNIQUE, site INTEGER, expectation INTEGER, jobflag CHAR DEFAULT 'N');
CREATE INDEX findurlbysiteandjob ON urls(site,jobflag);

My test case creates two threads and executes the following statements until
they deadlock against each other:
UPDATE urls SET jobflag=? WHERE urlid=?	
SELECT urlid,url,expectation FROM urls WHERE site=?

The test eventually deadlocks with the following transaction and lock table
contents:
XID     TYPE  MODE TABLENAME LOCKNAME  STATE TABLETYPE  LOCKCOUNT  INDEXNAME
2217109 ROW   S    URLS      (13,1)    GRANT T          1
FINDURLBYSITEANDJOB
2217114 ROW   X    URLS      (13,1)    WAIT  T          0
FINDURLBYSITEANDJOB
2217113 ROW   S    URLS      (15,1)    GRANT T          1
FINDURLBYSITEANDJOB
2217113 ROW   X    URLS      (3,132)   GRANT T          3          null
2217109 ROW   S    URLS      (3,132)   WAIT  T          0          null
2217109 TABLE IS   URLS      Tablelock GRANT T          2          null
2217113 TABLE IX   URLS      Tablelock GRANT T          4          null
2217114 TABLE IX   URLS      Tablelock GRANT T          1          null
2217113 ROW   S    URLS      (6,1)     GRANT T          1
SQL081111021116970

XID     GLOBAL_XID  USERNAME TYPE                 STATUS  FIRST_INSTANT
SQL_TEXT
2217115 null        APP      UserTransaction      IDLE    null
select * from SYSCS_DIAG.TRANSACTION_TABLE
2217114 null        APP      InternalTransaction  ACTIVE  null
UPDATE urls SET jobflag=? WHERE urlid=?
2217113 null        APP      UserTransaction      ACTIVE  (526,52925)
UPDATE urls SET jobflag=? WHERE urlid=?
2069160 null        null     SystemTransaction    IDLE    null          null
2217109 null        APP      UserTransaction      ACTIVE  null
SELECT urlid,url,expectation FROM urls WHERE site=?

Here is what I think is happening:
1. The SELECT statement begins to execute and the cursor is stepping through
the result set. The results are derived from index FINDURLBYSITEANDJOB as
expected.
2. The UPDATE statement begins to execute. The row to be updated is the row
immediately after the SELECT statement's cursor. The row is locked and
updated.
3. The UPDATE statement must perform index maintenance (tree rebalancing or
similar?). This apparently causes an InternalTransaction to be created. It
then must lock the row that the SELECT statement's cursor is currently
occupying. It cannot do this, so the transaction waits.
4. The SELECT statement is ready to advance the cursor. However, it cannot
advance the cursor because the UPDATE statement has locked the next row. The
transaction waits.
The result: Transaction 2217113 waits for the "nested transaction" 2217114
to complete. 2217114 waits for 2217109 to release its lock. 2217109 waits
for 2217113 to release its lock. We have a cycle and a deadlock. Even worse,
the transactions time out with "A lock could not be obtained within the time
requested", apparently because the dependency between transactions 2217113
and 2217114 is not detected.

Why I think this is a problem:
1. One would not expect these two statements to deadlock against each other.
Intuitively, the UPDATE should wait for the cursor to go away before locking
rows around it. Interestingly, this is a problem because the UPDATE locks
rows in the opposite order that the SELECT locks them, which greatly
increases the probability of a deadlock.
2. Apparently, the only way to avoid this deadlock (other than resorting to
READ UNCOMMITTED or other uglyness) is to LOCK TABLE before updating.
Because this deadlock scenario is so common (it could happen whenever an
index is simultaneously SELECTing and UPDATEing), this could be disastrous
for performance.
3. The failure to detect the deadlock obscures the problem and makes
troubleshooting more difficult, especially since it's not intuitive that
this would cause a deadlock in the first place.

Suggested improvements (I am not a Derby developer so feel free to give
critical comments):
1. Properly consider the dependency between a UserTransaction and an
InternalTransaction when detecting transaction/lock cycles.
2. Row locking could be more pessimistic when updating indicies. Instead of
locking rows/nodes in the index one by one, could one lock cover all
rows/nodes that must be modified?
3. When locking in an index for an update, could rows/nodes be locked
according to the "sort order" of the index, instead of being locked in the
reverse order?
4. At the READ COMMITTED isolation level, could the cursor for a SELECT
statement release its lock on the current row before acquiring a lock on the
next row? (this would probably mess up other things.)

Developers, do you advise that I file defects requesting any of these
suggested improvements?

Thanks for reading this -- Derby has worked very well for me, and I just
want to help the development team make it even better.

Re: Bugs? Deadlock detection with internal transactions, locking strategy when updating indices

Posted by Knut Anders Hatlen <Kn...@Sun.COM>.

Kristian Waagan <Kr...@Sun.COM> writes:

> Jeff Stuckman wrote:
>> Hello everyone,
>>
>> I just solved a hard-to-troubleshoot deadlock bug in my application. The
>> fact that a deadlock occurred may or may not be a flaw in Derby, but the
>> failure of the deadlock detection code to detect the cycle is probably a
>> bug. I'm wondering if a knowledgeable person has time to read my analysis
>> and advise if I should file a defect or not.
>>
>> Problem summary:
>> Even using READ_COMMITTED, a single non-updatable SELECT and a single UPDATE
>> statement can deadlock against each other when an index includes the updated
>> column. The transactions fail due to a lock timeout and the deadlock is not
>> detected, possibly because a transaction of type InternalTransaction is part
>> of the cycle.
>>   
>
> Hello Jeff,
>
> First of all, thanks for your analysis and feedback. Much appreciated :)
>
> I'm sure someone with more knowledge in this area will try to answer
> you questions, but your description made me think about a Jira issue
> that has already been logged. Since you have written this description
> and thought about the problem, maybe you could also find the time to
> read the Jira issue and consider whether it is the same problem or
> not?
>
> The issue is DERBY-2991 - Index split deadlock :
> https://issues.apache.org/jira/browse/DERBY-2991

I agree, the problem Jeff described looks very much like DERBY-2991. The
row locks on (XX,1) tell that the scan protection lock (a hybrid between
an ordinary row lock and a page latch) is involved, which is a clear
indication of DERBY-2991.

> I'm aware of a fix being worked on for this issue, but I don't know
> when it will be delivered.

Right. I've done some investigation and have posted some ideas on how to
fix it. I haven't received any negative feedback on the ideas, so I'll
just go ahead an try to make the necessary changes. I don't have a clear
picture yet of how much time it will take, though.

-- 
Knut Anders

Re: Bugs? Deadlock detection with internal transactions, locking strategy when updating indices

Posted by Kristian Waagan <Kr...@Sun.COM>.

Jeff Stuckman wrote:
> Hello everyone,
>
> I just solved a hard-to-troubleshoot deadlock bug in my application. The
> fact that a deadlock occurred may or may not be a flaw in Derby, but the
> failure of the deadlock detection code to detect the cycle is probably a
> bug. I'm wondering if a knowledgeable person has time to read my analysis
> and advise if I should file a defect or not.
>
> Problem summary:
> Even using READ_COMMITTED, a single non-updatable SELECT and a single UPDATE
> statement can deadlock against each other when an index includes the updated
> column. The transactions fail due to a lock timeout and the deadlock is not
> detected, possibly because a transaction of type InternalTransaction is part
> of the cycle.
>   

Hello Jeff,

First of all, thanks for your analysis and feedback. Much appreciated :)

I'm sure someone with more knowledge in this area will try to answer you 
questions, but your description made me think about a Jira issue that 
has already been logged. Since you have written this description and 
thought about the problem, maybe you could also find the time to read 
the Jira issue and consider whether it is the same problem or not?

The issue is DERBY-2991 - Index split deadlock :
https://issues.apache.org/jira/browse/DERBY-2991

I'm aware of a fix being worked on for this issue, but I don't know when 
it will be delivered.


-- 
Kristian

> More details:
> My test case uses the following table and index:
> CREATE TABLE urls (urlid INTEGER NOT NULL PRIMARY KEY, url VARCHAR(2048) NOT
> NULL UNIQUE, site INTEGER, expectation INTEGER, jobflag CHAR DEFAULT 'N');
> CREATE INDEX findurlbysiteandjob ON urls(site,jobflag);
>
> My test case creates two threads and executes the following statements until
> they deadlock against each other:
> UPDATE urls SET jobflag=? WHERE urlid=?	
> SELECT urlid,url,expectation FROM urls WHERE site=?
>
> The test eventually deadlocks with the following transaction and lock table
> contents:
> XID     TYPE  MODE TABLENAME LOCKNAME  STATE TABLETYPE  LOCKCOUNT  INDEXNAME
> 2217109 ROW   S    URLS      (13,1)    GRANT T          1
> FINDURLBYSITEANDJOB
> 2217114 ROW   X    URLS      (13,1)    WAIT  T          0
> FINDURLBYSITEANDJOB
> 2217113 ROW   S    URLS      (15,1)    GRANT T          1
> FINDURLBYSITEANDJOB
> 2217113 ROW   X    URLS      (3,132)   GRANT T          3          null
> 2217109 ROW   S    URLS      (3,132)   WAIT  T          0          null
> 2217109 TABLE IS   URLS      Tablelock GRANT T          2          null
> 2217113 TABLE IX   URLS      Tablelock GRANT T          4          null
> 2217114 TABLE IX   URLS      Tablelock GRANT T          1          null
> 2217113 ROW   S    URLS      (6,1)     GRANT T          1
> SQL081111021116970
>
> XID     GLOBAL_XID  USERNAME TYPE                 STATUS  FIRST_INSTANT
> SQL_TEXT
> 2217115 null        APP      UserTransaction      IDLE    null
> select * from SYSCS_DIAG.TRANSACTION_TABLE
> 2217114 null        APP      InternalTransaction  ACTIVE  null
> UPDATE urls SET jobflag=? WHERE urlid=?
> 2217113 null        APP      UserTransaction      ACTIVE  (526,52925)
> UPDATE urls SET jobflag=? WHERE urlid=?
> 2069160 null        null     SystemTransaction    IDLE    null          null
> 2217109 null        APP      UserTransaction      ACTIVE  null
> SELECT urlid,url,expectation FROM urls WHERE site=?
>
> Here is what I think is happening:
> 1. The SELECT statement begins to execute and the cursor is stepping through
> the result set. The results are derived from index FINDURLBYSITEANDJOB as
> expected.
> 2. The UPDATE statement begins to execute. The row to be updated is the row
> immediately after the SELECT statement's cursor. The row is locked and
> updated.
> 3. The UPDATE statement must perform index maintenance (tree rebalancing or
> similar?). This apparently causes an InternalTransaction to be created. It
> then must lock the row that the SELECT statement's cursor is currently
> occupying. It cannot do this, so the transaction waits.
> 4. The SELECT statement is ready to advance the cursor. However, it cannot
> advance the cursor because the UPDATE statement has locked the next row. The
> transaction waits.
> The result: Transaction 2217113 waits for the "nested transaction" 2217114
> to complete. 2217114 waits for 2217109 to release its lock. 2217109 waits
> for 2217113 to release its lock. We have a cycle and a deadlock. Even worse,
> the transactions time out with "A lock could not be obtained within the time
> requested", apparently because the dependency between transactions 2217113
> and 2217114 is not detected.
>
> Why I think this is a problem:
> 1. One would not expect these two statements to deadlock against each other.
> Intuitively, the UPDATE should wait for the cursor to go away before locking
> rows around it. Interestingly, this is a problem because the UPDATE locks
> rows in the opposite order that the SELECT locks them, which greatly
> increases the probability of a deadlock.
> 2. Apparently, the only way to avoid this deadlock (other than resorting to
> READ UNCOMMITTED or other uglyness) is to LOCK TABLE before updating.
> Because this deadlock scenario is so common (it could happen whenever an
> index is simultaneously SELECTing and UPDATEing), this could be disastrous
> for performance.
> 3. The failure to detect the deadlock obscures the problem and makes
> troubleshooting more difficult, especially since it's not intuitive that
> this would cause a deadlock in the first place.
>
> Suggested improvements (I am not a Derby developer so feel free to give
> critical comments):
> 1. Properly consider the dependency between a UserTransaction and an
> InternalTransaction when detecting transaction/lock cycles.
> 2. Row locking could be more pessimistic when updating indicies. Instead of
> locking rows/nodes in the index one by one, could one lock cover all
> rows/nodes that must be modified?
> 3. When locking in an index for an update, could rows/nodes be locked
> according to the "sort order" of the index, instead of being locked in the
> reverse order?
> 4. At the READ COMMITTED isolation level, could the cursor for a SELECT
> statement release its lock on the current row before acquiring a lock on the
> next row? (this would probably mess up other things.)
>
> Developers, do you advise that I file defects requesting any of these
> suggested improvements?
>
> Thanks for reading this -- Derby has worked very well for me, and I just
> want to help the development team make it even better.
>
>

Re: Bugs? Deadlock detection with internal transactions, locking strategy when updating indices

Posted by Geoff hendrey <ge...@yahoo.com>.

Hi Jeff,

I am glad you made this post, because many times I have wanted to send some form of "gripe" about how common and unexpected deadlocks are in my derby applications. I've had to resort to fairly wacky measures to prevent the deadlocks that affect my application, such as *serializing* all delete operations (even at READ_UNCOMMITTED). And of course I agree with you that Derby works very well!

As for your observation: " Interestingly, this is a problem because the UPDATE locks
rows in the opposite order that the SELECT locks them, which greatly
increases the probability of a deadlock."

I'd say not only does in increase the probability, it almost GUARANTEES the deadlock under high loads. My real-world experience with Derby indicates that unintuitive deadlocks have been the biggest issue I've experienced. The reason I stick with Derby is because the team is always amazingly responsive in turning around fixes.

 -geoff

"When I do good, I feel good. When I do bad, I feel bad. That's my religion."
-A. Lincoln




________________________________
From: Jeff Stuckman <st...@umd.edu>
To: derby-user@db.apache.org
Sent: Wednesday, November 12, 2008 8:39:08 AM
Subject: Bugs? Deadlock detection with internal transactions, locking strategy when updating indices

Hello everyone,

I just solved a hard-to-troubleshoot deadlock bug in my application. The
fact that a deadlock occurred may or may not be a flaw in Derby, but the
failure of the deadlock detection code to detect the cycle is probably a
bug. I'm wondering if a knowledgeable person has time to read my analysis
and advise if I should file a defect or not.

Problem summary:
Even using READ_COMMITTED, a single non-updatable SELECT and a single UPDATE
statement can deadlock against each other when an index includes the updated
column. The transactions fail due to a lock timeout and the deadlock is not
detected, possibly because a transaction of type InternalTransaction is part
of the cycle.

More details:
My test case uses the following table and index:
CREATE TABLE urls (urlid INTEGER NOT NULL PRIMARY KEY, url VARCHAR(2048) NOT
NULL UNIQUE, site INTEGER, expectation INTEGER, jobflag CHAR DEFAULT 'N');
CREATE INDEX findurlbysiteandjob ON urls(site,jobflag);

My test case creates two threads and executes the following statements until
they deadlock against each other:
UPDATE urls SET jobflag=? WHERE urlid=?    
SELECT urlid,url,expectation FROM urls WHERE site=?

The test eventually deadlocks with the following transaction and lock table
contents:
XID     TYPE  MODE TABLENAME LOCKNAME  STATE TABLETYPE  LOCKCOUNT  INDEXNAME
2217109 ROW   S    URLS      (13,1)    GRANT T          1
FINDURLBYSITEANDJOB
2217114 ROW   X    URLS      (13,1)    WAIT  T          0
FINDURLBYSITEANDJOB
2217113 ROW   S    URLS      (15,1)    GRANT T          1
FINDURLBYSITEANDJOB
2217113 ROW   X    URLS      (3,132)   GRANT T          3          null
2217109 ROW   S    URLS      (3,132)   WAIT  T          0          null
2217109 TABLE IS   URLS      Tablelock GRANT T          2          null
2217113 TABLE IX   URLS      Tablelock GRANT T          4          null
2217114 TABLE IX   URLS      Tablelock GRANT T          1          null
2217113 ROW   S    URLS      (6,1)     GRANT T          1
SQL081111021116970

XID     GLOBAL_XID  USERNAME TYPE                 STATUS  FIRST_INSTANT
SQL_TEXT
2217115 null        APP      UserTransaction      IDLE    null
select * from SYSCS_DIAG.TRANSACTION_TABLE
2217114 null        APP      InternalTransaction  ACTIVE  null
UPDATE urls SET jobflag=? WHERE urlid=?
2217113 null        APP      UserTransaction      ACTIVE  (526,52925)
UPDATE urls SET jobflag=? WHERE urlid=?
2069160 null        null     SystemTransaction    IDLE    null          null
2217109 null        APP      UserTransaction      ACTIVE  null
SELECT urlid,url,expectation FROM urls WHERE site=?

Here is what I think is happening:
1. The SELECT statement begins to execute and the cursor is stepping through
the result set. The results are derived from index FINDURLBYSITEANDJOB as
expected.
2. The UPDATE statement begins to execute. The row to be updated is the row
immediately after the SELECT statement's cursor. The row is locked and
updated.
3. The UPDATE statement must perform index maintenance (tree rebalancing or
similar?). This apparently causes an InternalTransaction to be created. It
then must lock the row that the SELECT statement's cursor is currently
occupying. It cannot do this, so the transaction waits.
4. The SELECT statement is ready to advance the cursor. However, it cannot
advance the cursor because the UPDATE statement has locked the next row. The
transaction waits.
The result: Transaction 2217113 waits for the "nested transaction" 2217114
to complete. 2217114 waits for 2217109 to release its lock. 2217109 waits
for 2217113 to release its lock. We have a cycle and a deadlock. Even worse,
the transactions time out with "A lock could not be obtained within the time
requested", apparently because the dependency between transactions 2217113
and 2217114 is not detected.

Why I think this is a problem:
1. One would not expect these two statements to deadlock against each other.
Intuitively, the UPDATE should wait for the cursor to go away before locking
rows around it. Interestingly, this is a problem because the UPDATE locks
rows in the opposite order that the SELECT locks them, which greatly
increases the probability of a deadlock.
2. Apparently, the only way to avoid this deadlock (other than resorting to
READ UNCOMMITTED or other uglyness) is to LOCK TABLE before updating.
Because this deadlock scenario is so common (it could happen whenever an
index is simultaneously SELECTing and UPDATEing), this could be disastrous
for performance.
3. The failure to detect the deadlock obscures the problem and makes
troubleshooting more difficult, especially since it's not intuitive that
this would cause a deadlock in the first place.

Suggested improvements (I am not a Derby developer so feel free to give
critical comments):
1. Properly consider the dependency between a UserTransaction and an
InternalTransaction when detecting transaction/lock cycles.
2. Row locking could be more pessimistic when updating indicies. Instead of
locking rows/nodes in the index one by one, could one lock cover all
rows/nodes that must be modified?
3. When locking in an index for an update, could rows/nodes be locked
according to the "sort order" of the index, instead of being locked in the
reverse order?
4. At the READ COMMITTED isolation level, could the cursor for a SELECT
statement release its lock on the current row before acquiring a lock on the
next row? (this would probably mess up other things.)

Developers, do you advise that I file defects requesting any of these
suggested improvements?

Thanks for reading this -- Derby has worked very well for me, and I just
want to help the development team make it even better.