You are viewing a plain text version of this content. The canonical link for it is here.
Posted to derby-user@db.apache.org by Kathey Marsden <km...@sbcglobal.net> on 2006/07/14 07:31:37 UTC

Lock debugging walktrhough on IRC

Sunitha was walking the folks on IRC, mostly X-Gen and myself through 
debugging a lock timeout.
The plan is for me to take what I learn and make it into a tutorial Wiki 
page so I won't go into great detail here as it will be more formally 
summarized later. There were two IRC sessions and actually the actual 
problem hasn't been figured out fully yet.   My impressions of going 
through this is all of this backtracing can and should be automated.  
Does anyone already have a tool?  I plan to pick this back up tomorrow 
and think I will try dumping the logs into a derby database with the 
ErrorLog VTI and the lock dump as well, then I should be able to run 
queries without getting lost.   Any other tips are welcome.


For folks that might want to follow along in detail the derby.log is 
actually available at:
http://people.apache.org/~kmarsden/sunitha_lock_irc/
This is a benchmark being run with isolation level repeatable read, 
(still getting full details).
We are trying to determine why we are getting this lock timeout which 
doesn't occur when run against DB2.
Already tried extending lock timeout to 110.

SESSION 1


    <kmarsden>    hello sunitha.
    <sunitha>    hi kathey
    <kmarsden>    Hello All, Sunitha was going to walk me through 
reading the lock dumps and I thought maybe we should do it here. It will 
be chatty. Does anyone mind?
    <kmarsden>    then maybe we will try to post a summary to the wiki. 
Go ahead Sunitha I guess silince means nobody minds.
    <sunitha>    ok :)
    <kmarsden>    First reference already given was 
http://db.apache.org/derby/faq.html#debug_lock_timeout
to turn on lock monitor etc.
    <sunitha>    The locktable dump in derby.log will have the following 
columns
    <sunitha>    XID |TYPE |MODE|LOCKCOUNT|LOCKNAME |STATE|TABLETYPE / 
LOCKOBJ |INDEXNAME / CONTAINER_ID / (MODE for LATCH only) |TABLENAME / 
CONGLOM_ID |
    <sunitha>    I know this information is availabe with the locktable 
vti . I cantseem to find out where the api is on the derby website.
    <sunitha>    XID is for transaction id, Type of lock will have ROW 
for a row-level ock., or TABLE for a table level lock.
    <kmarsden>    right, the diag API's e.g. Locktable are not there 
anymore.
    <sunitha>    i c. ok.
    <kmarsden>    in the public api
    <kmarsden>    jta: do you know where the engine javadoc is on the 
website?
    <sunitha>    Mode will have values S for shared lock, X for 
exclusive. LOCKCOUNT is the number of locks held by the transaction, 
Lockname is the just the name of the lock. State means in what state the 
lock is in. Has this lock been granted, or is this lock request waiting 
to get a lock.
    -->|    DavidVC (n=dv136566@nwkea-socks-1.sun.com) has joined #derby
    <sunitha>    TableType is T if it a lock on a conglomerate ( 
table/index). IndexName will have the name of the index if the lock is 
on a index object,else it will have value NULL.
    <sunitha>    The last column which is TableName/Conglom_ID will have 
the name of the table.
    <sunitha>    Ex. 1016758 |ROW |S |1 |(7,1) |GRANT|T 
|SQL050721104146100 |ACCOUNTPROFILEEJB |
    <sunitha>    so this row in the lock table means. A transaction id 
1016758 has a row level lock, in sharedmode, one lock instance, lock is 
on (7,1), this lock has been granted, lock is on an index with name 
SQL050721104146100 which is on table ACCOUNTPROFILEEJB
    <kmarsden>    I think I have it all except the meaning of (7,1)
    <sunitha>    it is the object on which the lock is obtained
    <sunitha>    so on a row level lock, thats the row id on which the 
lock is obtained
    <kmarsden>    What is the meaning of the 7 and the 1
?
    <jta>    kmarsden: engine javadoc is at 
http://db.apache.org/derby/integrate/index.html
    <jta>    expand the "Papers" tab, then "Derby Engine"
    <sunitha>    In this particular case, it is record id ( which is 
probably the pageid, recordnumber). I am not so sure. Have to look up 
the code. In case of table level lock, all we print is TableLock
    <kmarsden>    jta:thanks
    <kmarsden>    but short story is the (7,1) is not going to help me 
right?
    <sunitha>    so if you have multiple rows with same lockname, it 
means that all those rows ( XID's) have locks on that object.
    <kmarsden>    Oh I see. So it is important as an identifier but has 
no more meaning than that.
    <kmarsden>    correct?
    <sunitha>    yes.
    <kmarsden>    OK. locktable javadoc at 
http://db.apache.org/derby/javadoc/engine/org/apache/derby/diag/LockTable.html
    <sunitha>    thanks
    <kmarsden>    so if I have something like:
    <kmarsden>   
ERROR 40XL2: A lock could not be obtained within the time requested. The 
lockTable dump is:
2006-07-07 03:29:00.471 GMT
XID |TYPE |MODE|LOCKCOUNT|LOCKNAME |STATE|TABLETYPE / LOCKOBJ |INDEXNAME 
/ CONTAINER_ID / (MODE for LATCH only) |TABLENAME / CONGLOM_ID |
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
*** The following row is the victim ***
1016836 |ROW |S |0 |(5,82) |WAIT |T |NULL |HOLDINGEJB |
*** The above row is the victim ***
1016997 |ROW |S |1 |(176,30) |GRANT|T |NULL |ORDEREJB |
....
    <kmarsden>    How do I tell what the trouble is?
    <sunitha>    So this transaction 1016836 is waiting on a row level 
lock with shared access on object (5,82) on HoldingEJB table.
    <sunitha>    Try to see why this was an issue. In your lock table, 
search for which other transactions are holding incompatible locks on 
(5,82) on holdingEJB. was there another transaction that was holding an 
exclusive lock on the same object
    <sunitha>    
http://db.apache.org/derby/docs/dev/devguide/rdevconcepts2462.html
    <sunitha>    or even if some other transaction is holding a X lock 
on the table.
    <kmarsden>    so I found
1016834 |ROW |X |0 |(5,82) |WAIT |T |NULL |HOLDINGEJB
    <kmarsden>    so now search back for that XID?
    <kmarsden>    2006-07-07 03:27:05.357 GMT 
Thread[DRDAConnThread_46,5,main] (XID = 1016834), (SESSIONID = 44), 
(DATABASE = tradeDB), (DRDAID = NF000001.A644-796010081895087835{44}), 
Executing prepared statement: DELETE FROM HOLDINGEJB WHERE HOLDINGID = ? 
:End prepared statement with 1 parameters begin parameter#1: 17018 :end 
parameter
    <kmarsden>    so it is this delete causing the trouble?
    <sunitha>    I would first try to find out why that lock was not 
granted.
    <sunitha>    the lock on (5,82)
    <sunitha>    The delete statement is hitting the timeout / deadlock 
issue. I am looking at the lock table and in there I see a timeout 
occured and the statement that hit it was the delete as you found out.
    <sunitha>    do you see a tablelock on holdingejb held by someother 
transaction.
    <kmarsden>    I am getting a little confused.
    <sunitha>    1016758 |ROW |S |1 |(5,82) |GRANT|T |NULL |HOLDINGEJB
    <kmarsden>    from what I look at the select timed out. no?

    <kmarsden>    06-07-07 03:29:01.011 GMT 
Thread[DRDAConnThread_33,5,main] (XID = 1016836), (SESSIONID = 31), 
(DATABASE = tradeDB), (DRDAID = NF000001.A636-867504725979599249{31}), 
Failed Statement is: select q1."PURCHASEPRICE", q1."HOLDINGID", 
q1."QUANTITY", q1."PURCHASEDATE", q1."ACCOUNT_ACCOUNTID", 
q1."QUOTE_SYMBOL" from HOLDINGEJB q1, ACCOUNTEJB q2, ACCOUNTPROFILEEJB 
q3 where ( q3."USERID" = ?) and ( q2."ACCOUNTID" = 
q1."ACCOUNT_ACCOUNTID") and ( q3."USERID" = q2."PROFILE_USERID") with 1 
parameters begin parameter #1: uid:217 :end parameter
ERROR 40XL2: A lock could not be obtained within the time requested. The 
lockTable dump is: ...
    <sunitha>    ok. Ithought you searched on the XID and that brought 
the delete statement ?
    <kmarsden>    yes. it did.
    <kmarsden>    first I saw the lock timeout on the select.
    <kmarsden>    then I searced for (5,82) and found
    <sunitha>    In the log, I searched for 1016836, and this shows up 
the select statement.
    <kmarsden>    1016834 |ROW |X |0 |(5,82) |WAIT |T |NULL |HOLDINGEJB
    <kmarsden>    Then I searced backward to find XACT 1016834 and found 
the delete.
    <X-Gen>    is this dump online somewhere ? id like to have a peek 
aswell please.
    <kmarsden>    but you wanted to see why that lock wasn't granted.
    <sunitha>    yes
    <sunitha>    1016758 |ROW |S |1 |(5,82) |GRANT|T |NULL |HOLDINGEJB
    <kmarsden>    I am sorry X-Gen.
    <kmarsden>    I can't send the whole thing until I get permission 
from the user.
    <kmarsden>    But will do that later if I can and hopefully whatever 
we put on the Wiki will have a good example to follow.
    <sunitha>    I tried to search for the transactions holidng the 
(5,82) lock and I see some shared lock as I posted here.. but then the 
interesting ones are the X or the U lock.
    <sunitha>    There is a U lock on 1016834 |ROW |U |1 |(5,82) 
|GRANT|T |NULL |HOLDINGEJB |
    <kmarsden>    ok. a select for update.
    <kmarsden>    2006-07-07 03:27:05.145 GMT 
Thread[DRDAConnThread_46,5,main] (XID = 1016834), (SESSIONID = 44), 
(DATABASE = tradeDB), (DRDAID = NF000001.A644-796010081895087835{44}), 
Executing prepared statement: select q1."ADDRESS", q1."PASSWORD", 
q1."USERID", q1."EMAIL", q1."CREDITCARD", q1."FULLNAME" from 
ACCOUNTPROFILEEJB q1 where ( q1."USERID" = ?) for update of "ADDRESS" 
:End prepared statement with 1 parameters begin parameter#1: uid:154 
:end parameter
    <kmarsden>    I have to go in 5 minutes to pick up the young man. So 
do we have a complete story here yet?

    <sunitha>    I dont think thats the one.
    <sunitha>    2006-07-07 03:27:05.357 GMT 
Thread[DRDAConnThread_46,5,main] (XID = 1016834), (SESSIONID = 44), 
(DATABASE = tradeDB), (DRDAID = NF000001.A644-796010081895087835{44}), 
Executing prepared statement: DELETE FROM HOLDINGEJB WHERE HOLDINGID = ? 
:End prepared statement with 1 parameters begin parameter#1: 17018 :end 
parameter
    <kmarsden>    oops.
    <sunitha>    If you search backward from the lock table, it is the 
delete statement.
    <sunitha>    the select was on some other table.
    <kmarsden>    Oh no I am confused. Can you give me an overview so I 
can get my bearings and go back over this.
    <sunitha>    delete from holdingejb where holdingid = ?. thats seems 
fine to me. it is only doing a row level lock and it is valid to have a 
X lock.
    <sunitha>    ok here are the steps I use. 1) find out if some 
transaction is holding locks it is not supposed to. like for ex, if a 
transaction is hodling X table level locks when it shouldnt be.
    * X-Gen    takes a gues. the key to the problem is (5,28) thats the 
resourse that is being faught over, someone is using it (maybe within a 
log running transaction), and your select (1016836) timed out waiting for it
    <sunitha>    correct.
    <kmarsden>    Sorry I have to run. We can pick up tomorrow and by 
then I should know if I can share the log.
    <X-Gen>    Failed Statement is: select * from SYSCS_DIAG.LOCK_TABLE
    <X-Gen>    ERROR 42X05: Table 'SYSCS_DIAG.LOCK_TABLE' does not exist.
    <X-Gen>    sunitha, any idea why i would get that ?

SESSION 2

    <kmarsden>    hello sunitha
    <sunitha>    hi kathey
    <kmarsden>    X-Gen also downloaded the derby.log
    <sunitha>    great
    <sunitha>    do you want to recap on yesterday's chat.. or we can 
just proceed.
    <sunitha>    X-Gen. we were looking at the lock on resource (5,82) 
ln, 90563
    <sunitha>    in the derby.log
    <kmarsden>    Yes, can you give a recap and then just walk us 
through it. Then maybe I can come up with a smaller example and trace it 
through for the wiki,
    <sunitha>    1) The lock table info on what the column headings mean 
can be found at 
http://db.apache.org/derby/javadoc/engine/org/apache/derby/diag/LockTable.html
    <sunitha>    2) In the derby.log, the victim transaction that timed 
out getting a lock is the first row in the lock table where it says .. 
"The followign row is the victim"
    <sunitha>    so in our case - ie *** The following row is the victim ***
    <sunitha>    1016836 |ROW |S |0 |(5,82) |WAIT |T |NULL |HOLDINGEJB |
    <sunitha>    Figure out why this transaction was not able to receive 
the S lock. To do so, search for (5,82) on holding ejb in the lock table
    <sunitha>    To figure out which statement is holding the lock. Look 
up the XID. Search backward from the locktable for the XID and you will 
find the associated statement.
    <sunitha>    So first. can you search for what other T ( short from 
for transaction) are holding the lock on (5,82)
    <X-Gen>    'holiding the lock on (5,28)' means their STATE is GRANT ?
    <sunitha>    yes
    <kmarsden>    Is this correct?
1016834 |ROW |U |1 |(5,82) |GRANT|T |NULL |HOLDINGEJB
    <sunitha>    we are looking at locks on (5,82). apart from which 
transactions are holding locks on (5,82) we also want to see which T are 
waiting on locks. and what type of locks.
    <sunitha>    yes that is correct
    <sunitha>    so in there, we know that 1016834 is holding a U lock ( 
ie an update lock) on (5,82).
    -->|    deepa (n=chatzill@pixpat.austin.ibm.com) has joined #derby
    <kmarsden>    so we want the full list then?
    <sunitha>    yes
    <kmarsden>    k
    <sunitha>    This is what I found.
    <sunitha>    From this, observations are:
    <sunitha>    -- XID (Tj=1016834) is holding a U lock, waiting on a X 
lock.
    <sunitha>    -- following T have S locks
    <sunitha>    1016758
    <sunitha>    1016781
    <sunitha>    1016791
    <sunitha>    1016793
    |<--    deepa has left freenode (Nick collision from services.)
    <sunitha>    1016806
    <sunitha>    1016826
    <sunitha>    1016833
    <sunitha>    1016844
    <sunitha>    1016918
    <sunitha>    1016927
    <sunitha>    1016932
    <sunitha>    (Ti=1016836 timesout on S lock.
    <sunitha>    oops, sorry. didnt know the formatting would come out 
so bad
    -->|    deepa_ (n=chatzill@pixpat.austin.ibm.com) has joined #derby
    <kmarsden>    http://people.apache.org/~kmarsden/locklist.txt
    <sunitha>    In short, tehre are some T that have S locks on (5,82) 
row, and there is one transaction 1016834 (lets call it Tj) , that is 
holding a U lock, and waiting on X lock. Lets call our victim T as Ti
    <sunitha>    thanks kathey
    <kmarsden>    Suitha I was wondering if you could back up a little 
bit and explain the lock differences
    <kmarsden>    or point me to a better reference than 
http://db.apache.org/derby/javadoc/engine/org/apache/derby/diag/LockTable.html
    <sunitha>    There is a S lock which is shared lock. X is a 
exclusive lock. only one T can have a X lock. The U lock is an update 
lock. Let me give you a link from dev guide.
    <sunitha>    
http://db.apache.org/derby/docs/dev/devguide/cdevconcepts30291.html
    <sunitha>    that talks about lots of info abt locking etc.
    <kmarsden>    great! thanks. ok go ahead then
    <sunitha>    
http://db.apache.org/derby/docs/dev/devguide/cdevconcepts36402.html
    <sunitha>    that one talks abt the locks types of lock and lock 
compatibility
    <sunitha>    I am going back to the derby.log example.
    <sunitha>    In short, tehre are some T that have S locks on (5,82) 
row, and there is one transaction 1016834 (lets call it Tj) , that is 
holding a U lock, and waiting on X lock. Lets call our victim T as Ti
    <sunitha>    before i go further into this case. mainly the 
questions we need to ask is - does it make sense that the victim didnt 
get the lock, if so why didnt it receive it. is it possible that the 
application is not handling any deadlock scenarios, or is the app not 
taking care to close/release locks by commit the transaction, what is 
the isolation level of the transaction etc.
    <sunitha>    For Tj, it is the following statement
    <sunitha>    2006-07-07 03:27:05.357 GMT 
Thread[DRDAConnThread_46,5,main] (XID = 1016834), (SESSIONID = 44), 
(DATABASE = tradeDB), (DRDAID = NF000001.A644-796010081895087835{44}), 
Executing prepared statement: DELETE FROM HOLDINGEJB WHERE HOLDINGID = ? 
:End prepared statement with 1 parameters begin parameter#1: 17018 :end 
parameter
    <X-Gen>    sunitha, its not _just_ that statement, its all the 
statments that belong to 1016834 before the last comit. or am i guessing 
wrong here ?
    <sunitha>    I think all the statements before the last commit are 
interesting.
    <sunitha>    X-Gen, do you have a list of all statements as part of 
Tj before last commit ?
    * X-Gen    copies and pasts
    <sunitha>    so looking at the last statement for Tj - for a delete 
statement. the U lock is obtained for probing if the row exists and then 
it is upgraded to a X lock. which is why we see that it got the U lock, 
but was not able to get a X lock. Tj is waiting on getting a X lock. 
because there are other transactions that are holding a S lock. X lock 
is not compatible with S lock.
    <kmarsden>    oh boy, you just lost me but maybe after I read those 
doc references maybe I will understand better.
    <kmarsden>    no. I reread and get it now I think.
    <sunitha>    X lock is a exclusive lock, only one transaction can 
have X lock. No other transactions are allowd to have any other lock on 
same resource. So if there is a S lock held by Tx, then Tj cannot get a 
X lock, till Tx releases the S lock
    <sunitha>    ok
    <sunitha>    X-Gen, did you copy paste the stmts ? i didnt see it
    <X-Gen>    2 seconds busy doing that now
    <X-Gen>    http://www.rafb.net/paste/
    <X-Gen>    oops
    <X-Gen>    http://www.rafb.net/paste/results/guKTv151.html i dont 
think i missed any of them
    <sunitha>    thx.
    <sunitha>    In there, there are two statements that deal with the 
holdingejb. one is the select for update , and the delete both on the 
same row.
    <sunitha>    So locks that Tj holds seem fine to me. The reason that 
Ti ( our victim T) doesnt get the S lock is because of livelock 
handling. The livelock case is as such. Tj is waiting on X lock , 
because there are some n number of T holding S lock. Now comes Ti which 
wants a S lock. This S lock is not compatible with the wiating X lock 
and hence we wont grant Ti the S lock, because otherwise Tj will starve 
forever trying to get a X lock.
    <sunitha>    Next - why are the other(other than Ti and Tj) 
transactions that hold S lock, still holding their locks. and what 
statements are those. Kathey - what isolation level are these T running at ?
    <kmarsden>    repeatable read
    <sunitha>    ok. so in RR (repeatable read), the locks will be 
released at the end of the transaction. Even shared locks on reads will 
be released at end of Transaction. 
http://db.apache.org/derby/docs/dev/devguide/rdevconcepts8424.html
    <sunitha>    The doc says
    <sunitha>    For TRANSACTION_READ_COMMITTED or 
TRANSACTION_REPEATABLE_READ isolation, Derby treats rows as cursors for 
SELECT statements. It locks rows only as the application steps through 
the rows in the result. The current row is locked. The row lock is 
released when the application goes to the next row.
    <sunitha>    I think there is a bug in the documentation. For RR , 
we release locks at end of transaction, not after the appliction goes to 
next row.
    <sunitha>    I'll file a jira.
    <X-Gen>    can i do a select for update and a delete in the same 
transaction ? (on the same row) ?
    <kmarsden>    Hooray, Sunitha found a bug and gets a point! 
Everybody gets a point when they find a bug.
    <sunitha>    yes.
    <sunitha>    :)
    <sunitha>    http://www.rafb.net/paste/results/H9F1k145.html
    <sunitha>    I just looked at two other T that have S locks. its the 
same select statement
    <sunitha>    we should look at all the other T that have S locks on 
(5,82) from our lock list. http://people.apache.org/~kmarsden/locklist.txt
    -->|    samcintyre 
(n=Andrew_M@adsl-71-146-145-174.dsl.pltn13.sbcglobal.net) has joined #derby
    <sunitha>    kathey - i dont have the db. but does that make sense 
semantically that a multiple users's can have same holding accounts.
    <kmarsden>    oh I don't know. I just got the log and not the db 
unfortunately.
    <kmarsden>    I'll ask
    <sunitha>    ok. just the first two transactions that I checked. the 
statement is same. the uid on first one is 160 and second one is 210
    <sunitha>    X-Gen, do you have a list of the other statements from 
our locklist ?
    <X-Gen>    nope, you have to get all the statements related to that 
XID before the previous comit ?
    <sunitha>    yes.
    * X-Gen    wants to test that select for update and delete on the 
same row.
    <X-Gen>    Grrrr, i cant see what locks are opened. id have to 
upgrade my Db
    <sunitha>    :( , you can try with teh 10.2 snapshot jars.
    <X-Gen>    haha, boom
    <X-Gen>    sunitha, if i select for update a row, then delete it in 
the same tx. i dont commit this tx. then in a new window i do a select 
on that row i get a Locking error
    <sunitha>    in the same transaction, you should be able to delete 
it . correct .
    <sunitha>    that seems correct to me. because the T in which you do 
select for update and then delete will get the X lock. this wont be 
released till the T commits. so when you try to do a select on that row 
from another transaction, it needs to get a S lock, but it wont since 
there is another transaction holding a X lock.
    <X-Gen>    ..and thats what happening here. or did i miss the ship
    <sunitha>    In the case we were looking at, the select for update 
and delete happened in one transaction (Tj) and that seemed fine.
    <sunitha>    Tj is XID = 1016834
    <X-Gen>    but Tj had not commited. so anyone else selecting would 
get errors
    <sunitha>    so here, Tj didnt get the X lock. It is waiting for the 
X lock.
    <sunitha>    since there are others as can be seen in the locklist 
that have S lock on this same resource.
    <sunitha>    and Ti came along and wanted to do a select and ended 
up getting a timeout because of the livelock handling. since Tj is 
waiting on the X lock.
    <sunitha>    did that make sense ?
    <sunitha>    brb
    <kmarsden>    So in terms of reducing the case would it look like this?
    <X-Gen>    kmarsden, sorry, i seam to have hi-jacked your session.
    <kmarsden>    set transaction isolation level rr
    <kmarsden>    X-Gen: not at all this is definitely a community 
session. I wanted to invite derby-user but I thought it might make 
Sunitha nervous.
    <kmarsden>    so I will send a transcript.
    <kmarsden>    Anyway I 1) set transaction isolation RR 2) In one 
transaction select the row. Don't commit 3) In another transaction 
select for update and try to delete the row.
    <kmarsden>    In this case we should see the WAIT on the delete like 
we are in the derby.log. correct?
    <sunitha>    yes. if 2 happens before 3.
    <kmarsden>    But the question is why didn't our original select 
release locks. Is that correct Sunitha?
    <sunitha>    yes. there are several of those.
    <X-Gen>    they would only release once their entier transaction commits
    <sunitha>    I looked at first four and they are the same select 
query, with different uid.. but all use the holidngejb.
    <sunitha>    yes
    <sunitha>    1016826
    <sunitha>    copy paste didnt work.. ignore the number in my prev 
comment
    <sunitha>    is the application doing a commit on the transaction 
correctly ? is it reqd for these select qry to be running at RR.
    <sunitha>    is there an application level deadlock happening , 
which maybe the reason taht the transaction is not being committed ?
    <kmarsden>    Perhaps. Would we be able to see a smoking gun from 
the logs? For instance would we see that the transactions that 
originally took the shared lock was waiting for another lock?
    <sunitha>    yes.
    <sunitha>    basically look at the transactions that are hodling the 
S lock and see what locks they are holding or waiting on.
    <X-Gen>    is this machine very very busy ? maybe it just needs the 
deadlock timeout extended because of the load ? it cant cope with 
processing so many transactions ?
    <X-Gen>    or am i waaaay of there ?
    <kmarsden>    I had asked the user to try that first but did not 
hear back that it made a difference, but I did not do the proper 
confirmation of what timeout they used.

    <X-Gen>    with the test that i ran justnow, it came back right away 
telling me about the deadlock because the one transaction had X lock 
granted. In your case, no-one has a X lock so it must be a 60sec time 
being hit
    <X-Gen>    or whatever time it was
    <X-Gen>    timeout rather
    * X-Gen    looks for more assumptions.
    <sunitha>    ok. i thought that they are using the default timeout. 
which is big ( 1minute).
    <X-Gen>    here is something else. check the times on line 90556 and 
the next line
    <kmarsden>    So we have:
1016758 |ROW |S |0 |(6,74) |WAIT |T |NULL |HOLDINGEJB |
    <kmarsden>    1016758 is on our list of transactions so might that 
be our trouble maker?
    <kmarsden>    You know we should be able to right a program to parse 
the logs and help figure out these dependencies.
    <kmarsden>    write #!:)
    <sunitha>    who is holding (6,74)
    * X-Gen    is off 2 sleep. i'll read the irc logs in the morning.
    <kmarsden>    ok. Thanks X-Gen
    <kmarsden>    so we need to get all the transactions that are 
holding (6,47). Should I put it on the web?
    <sunitha>    I dont think thats a problem. 1016758 is wiating on 
(6,74). 1016972 has the X lock on (6,74) and this T is not waiting on 
any other locks ( or any locks from 1016758) so not a deadlock case.
    <sunitha>    some questions to followup on 1) what is the timeout 
they are using 2) why are they using RR, 3) it may just be that it is a 
busy app with multiple transactions hitting, and maybe the application 
logic needs to take care of it. 4) the semantics of the select query.. 
is that right fro the app.
    <sunitha>    so kathey, did that make sense about the (6,74) case ? 
or were you trying to investigate some other angle to it.
    <kmarsden>    so working on this scenario.
    <kmarsden>    X1 does a select with RR and has a shared lock. but 
doesn't commit.
    <kmarsden>    X2 does the select for update and tries to do the 
delete but can't get the lock.
    <kmarsden>    The reason is X1 hasn't commited.
    <kmarsden>    The question is why. So I looked at our lock list to 
see what all of our transactions that had shared locks were up to.
    <kmarsden>    I see that 1016758 was busy waiting on a lock for 
(6,74) so assumed that was the reason it couldn't commit.
    <kmarsden>    and kept holding the lock.
    <sunitha>    that makes sense. I was looking to see if (6,74) case 
might be causing any kind of deadlock and it isnt.
    <sunitha>    What I would do is just go over the whole locklist that 
we have, and look at if there is any other case where a transaction 
could not have committed because it is waiting on something else, or if 
there is any case of a deadlock.
    <kmarsden>    ok.



Re: Lock debugging walktrhough on IRC

Posted by John Embretsen <Jo...@Sun.COM>.
Kathey Marsden wrote:
> John Embretsen wrote:
>>
>> Does anyone know if any kind of lock debugging tutorial or related 
>> information
>> was ever added to the wiki [1] (as Kathey mentioned last year) or any 
>> of the
>> manuals?
> I was out for quite a while last year so was not able to pursue this.    
> Unfortunately, I don't think I will be able to get to it anytime soon.

So I carved out a simple version of such a wiki page at 
http://wiki.apache.org/db-derby/LockDebugging, accessible from the front page 
via the "HintsAndTips" page.
Anyone can add hints & tips on debugging deadlocks/timeouts, or improve what's 
already there.


-- 
John


Re: Lock debugging walktrhough on IRC

Posted by Kathey Marsden <km...@sbcglobal.net>.
John Embretsen wrote:
>
> Does anyone know if any kind of lock debugging tutorial or related 
> information
> was ever added to the wiki [1] (as Kathey mentioned last year) or any 
> of the
> manuals?
I was out for quite a while last year so was not able to pursue this.    
Unfortunately, I don't think I will be able to get to it anytime soon. 

Kathey


Re: Lock debugging walktrhough on IRC

Posted by John Embretsen <Jo...@Sun.COM>.
Hi all,

In mid-july last year this thread was started with the following information:

Kathey Marsden wrote:
> Sunitha was walking the folks on IRC, mostly X-Gen and myself through 
> debugging a lock timeout.
> The plan is for me to take what I learn and make it into a tutorial Wiki 
> page so I won't go into great detail here as it will be more formally 
> summarized later. There were two IRC sessions and actually the actual 
> problem hasn't been figured out fully yet.   My impressions of going 
> through this is all of this backtracing can and should be automated.  
> Does anyone already have a tool?  I plan to pick this back up tomorrow 
> and think I will try dumping the logs into a derby database with the 
> ErrorLog VTI and the lock dump as well, then I should be able to run 
> queries without getting lost.   Any other tips are welcome.

The rest of the first mail in this thread is available here:
http://www.mail-archive.com/derby-user@db.apache.org/msg04584.html

Does anyone know if any kind of lock debugging tutorial or related information
was ever added to the wiki [1] (as Kathey mentioned last year) or any of the
manuals? I searched the wiki but was not able to find anything related to
locking in Derby.

Debugging locking situations seems to be a rather frequently occurring topic on
derby-user. As I'm trying to learn more about this topic, I have found useful
information by searching the manuals _and_ the mail archives. It would be good
to have some kind of wiki page gathering hints, tips and links to further
information (I think it would save users a bit of time and effort when trying to
debug their applications), so please let me know if this already exists.
Improving the manuals could be an alternative solution.

Thanks,

-- 
John

[1]: http://wiki.apache.org/db-derby/