You are viewing a plain text version of this content. The canonical link for it is here.

Posted to derby-dev@db.apache.org by "Mike Matrigali (JIRA)" <ji...@apache.org> on 2011/06/18 19:21:47 UTC

[jira] [Commented] (DERBY-5284) A derby crash at exactly right time during a btree split can cause a corrupt db which can not be booted.

    [ https://issues.apache.org/jira/browse/DERBY-5284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13051562#comment-13051562 ] 

Mike Matrigali commented on DERBY-5284:
---------------------------------------

I found this problem by code inspection and have not reproduced it myself.  This problem is another
code path of the same problem fixed by DERBY-5258.  I believe that either DERBY-5258 or this
issue are causing the problems that have been reported as DERBY-5281 and DERBY-5248.  See
those 2 issues for detailed description of order of log records and crash timing necessary to 
reproduce this problem.

At high abstract level the current code does:

get latch
purge row
release latch
close table
commit

If another transaction gets the latch and inserts rows between the release latch and the commit and the system crashes before 
the commit then this problem can happen.  The fix is to not release the latch, and let commit release it.

> A derby crash at exactly right time during a btree split can cause a corrupt db which can not be booted.
> --------------------------------------------------------------------------------------------------------
>
>                 Key: DERBY-5284
>                 URL: https://issues.apache.org/jira/browse/DERBY-5284
>             Project: Derby
>          Issue Type: Bug
>          Components: Store
>    Affects Versions: 10.1.3.1, 10.2.2.0, 10.3.3.0, 10.4.2.0, 10.5.3.0, 10.6.1.0, 10.7.1.1, 10.8.1.2
>            Reporter: Mike Matrigali
>            Assignee: Mike Matrigali
>
> A derby crash at exactly wrong time during a btree split can cause a corrupt db which can not be booted.
> A problem in the split code and exact wrong timing of a crash can leave the database in as state 
> where undo of purge operations corrupts index pages during redo and can cause recovery boot
> to never succeed and thus the database never to be booted.  At hight level what happens is that
> a purge happens on a page and before it commits another transactions uses the space of the
> purge to do an insert and then commits, then the system crashes before the purging transactions
> gets a chance to commit.  During undo the purge expects there to be space to undo the purge
> but there is not, and it corrupts the page in various ways depending on the size and placement
> of the inserts.  The error that actually returns to user varies from sane to insane as the problem
> is actually noticed after the corruption occurs rather than during the undo.
>  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira