You are viewing a plain text version of this content. The canonical link for it is here.
Posted to derby-dev@db.apache.org by "Knut Anders Hatlen (JIRA)" <ji...@apache.org> on 2008/11/24 11:58:45 UTC

[jira] Updated: (DERBY-2991) Index split deadlock

     [ https://issues.apache.org/jira/browse/DERBY-2991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Knut Anders Hatlen updated DERBY-2991:
--------------------------------------

    Attachment: d2991-preview-1a.stat
                d2991-preview-1a.diff

Attaching an experimental patch (d2991-preview-1a.diff) to show how
I'm planning to fix the deadlock.

The patch makes the B-tree scans save the scan position and release
the scan lock if they release the latch in the middle of the
scan. Although the patch doesn't remove the use of scan locks, it does
make sure that no transaction holds the scan lock on a page without
holding the latch on the same page, so the scan locks don't have any
effect anymore.

The code to save the position and reposition could be used more or
less in its current state. Some small adjustments needed to be made:

  * The existing savePosition methods couldn't be used directly since
    one of them assumed that the page was not latched when called, and
    the other one was meant to be called from other transactions that
    needed to tell the scan to save its position if it happened to be
    positioned on a given page. A new variant of savePosition was
    added. I believe that both of the old ones can be removed now,
    since the position will already have been saved and they won't
    have any work to do, but I need to check that.

  * savePosition/reposition needed to save/restore information about
    whether or not the row at the current position was locked (which
    is available from scan_position.current_rh when the scan is
    positioned). This information wasn't need earlier since the scan
    position would only be saved on transactions boundaries, in which
    case the locks would be released.

The final solution should of course also remove the scan locks and
include the earlier discussed optimization to prevent unnecessary
repositioning when no rows have been moved off the page.

I ran the attached repro (InsertSelectDeadlock.java) and didn't see
any timeouts when the patch was applied.

I have also run the full regression test suite. suites.All ran
cleanly. Derbyall had five failures in the store tests:

store/RowLockIso.sql
store/rlliso2multi.sql
store/rlliso3multi.sql
store/readlocks.sql
unit/T_b2i.unit

The first four of them failed because (a) lock table dumps didn't
contain scan locks any longer, and (b) some inserts didn't fail with a
timeout trying to obtain the scan lock. These sound like expected
changes in behaviour and should probably be fixed by updating the
canons.

The last failure (unit/T_b2i.unit) was caused by some debug code that
is only enabled when running that test. The debug code simulates that
the latch is lost, but it has not been updated to save the position
before it releases the latch, so an assert is triggered when we later
try to reposition. This should be fixed by changing the method
OpenBTree.test_errors so that it saves the position if needed.

It would be great if someone could take a look at the patch and verify
that I'm on the right track and that I'm not missing something
essential.

> Index split deadlock
> --------------------
>
>                 Key: DERBY-2991
>                 URL: https://issues.apache.org/jira/browse/DERBY-2991
>             Project: Derby
>          Issue Type: Bug
>          Components: Store
>    Affects Versions: 10.2.2.0, 10.3.1.4
>         Environment: Windows XP, Java 6
>            Reporter: Bogdan Calmac
>            Assignee: Knut Anders Hatlen
>         Attachments: d2991-preview-1a.diff, d2991-preview-1a.stat, derby.log, InsertSelectDeadlock.java, Repro2991.java, stacktraces_during_deadlock.txt
>
>
> After doing dome research on the mailing list, it appears that the index split deadlock is a known behaviour, so I will start by describing the theoretical problem first and then follow with the details of my test case.
> If you have concurrent select and insert transactions on the same table, the observed locking behaviour is as follows:
>  - the select transaction acquires an S lock on the root block of the index and then waits for an S lock on some uncommitted row of the insert transaction
>  - the insert transaction acquires X locks on the inserted records and if it needs to do an index split creates a sub-transaction that tries to acquire an X lock on the root block of the index
> In summary: INDEX LOCK followed by ROW LOCK + ROW LOCK followed by INDEX LOCK = deadlock
> In the case of my project this is an important issue (lack of concurrency after being forced to use table level locking) and I would like to contribute to the project and fix this issue (if possible). I was wondering if someone that knows the code can give me a few pointers on the implications of this issue:
>  - Is this a limitation of the top-down algorithm used?
>  - Would fixing it require to use a bottom up algorithm for better concurrency (which is certainly non trivial)?
>  - Trying to break the circular locking above, I would first question why does the select transaction need to acquire (and hold) a lock on the root block of the index. Would it be possible to ensure the consistency of the select without locking the index?
> -----
> The attached test (InsertSelectDeadlock.java) tries to simulate a typical data collection application, it consists of: 
>  - an insert thread that inserts records in batch 
>  - a select thread that 'processes' the records inserted by the other thread: 'select * from table where id > ?' 
> The derby log provides detail about the deadlock trace and stacktraces_during_deadlock.txt shows that the inser thread is doing an index split.
> The test was run on 10.2.2.0 and 10.3.1.4 with identical behaviour.
> Thanks,
> Bogdan Calmac.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.