You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@bookkeeper.apache.org by "Vinay (JIRA)" <ji...@apache.org> on 2013/06/26 19:48:20 UTC

[jira] [Commented] (BOOKKEEPER-636) Latest txn logs might be deleted in a race condition which is not recoverable if BK goes down before next txn log created.

    [ https://issues.apache.org/jira/browse/BOOKKEEPER-636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13694114#comment-13694114 ] 

Vinay commented on BOOKKEEPER-636:
----------------------------------

Adding analysis to avoid very big description.


Analysis:
------------
1. EveryTime new journal txn log will be created with the nanoTime() as the id.
{code}          if (null == logFile) {
                    logId = MathUtils.now();
                    logFile = new JournalChannel(journalDirectory, logId);{code}

2. System.nanoTime() is the relative time with the system startup. It will start from 0 everytime system re-started. refer http://java.dzone.com/articles/how-arbitary-start


3. before reboot, last txn logId was very huge, but just after restart, latest txn id became *40974.txn*. 1c9d2f436.txn  was the last txn file.
 As part of first roll latest txn file was garbage collected.
{noformat}2013-06-25 14:38:14,730 - INFO  [SyncThread:Journal@341] - garbage collected journal 40974.txn{noformat}

4. After this one more restart gave the problem.


*Conclusion*:
We should not use {{System.nanoTime()}} for the storage and compare purposes.
                
> Latest txn logs might be deleted in a race condition which is not recoverable if BK goes down before next txn log created.
> --------------------------------------------------------------------------------------------------------------------------
>
>                 Key: BOOKKEEPER-636
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-636
>             Project: Bookkeeper
>          Issue Type: Bug
>          Components: bookkeeper-server
>    Affects Versions: 4.2.1, 4.3.0
>            Reporter: Vinay
>            Assignee: Vinay
>            Priority: Blocker
>
> With the following scenario latest transaction log can be deleted.
> 1. more than {{journalMaxBackups}} txn logs are there in journal dir.
> 2. BK machine was up for long time and the latest txn log id is some what huge number
> 3. Now reboot the machine.
> 4. after reboot BK restarted.
> 5. Now, Immediately after startup, One entry is added, due to which Synthread rolled the lastMark in ledger dirs before the lastLogId updated by Journal thread. (this lastMark was having the old logId which was before reboot). 
> 6. Now after roll, old journal txn logs were gc'ed. *Now latest created the txn log was deleted.*
> 7. After this Journal thread updated the lastLogMark, also some more rolls happened.
> 8. Now BK restarted again. But BK was not able to start because it was not able to find the latest txn log file in journal directory.
> {noformat}java.io.IOException: Recovery log 264564 is missing
>         at org.apache.bookkeeper.bookie.Journal.replay(Journal.java:424)
>         at org.apache.bookkeeper.bookie.Bookie.readJournal(Bookie.java:547)
>         at org.apache.bookkeeper.bookie.Bookie.start(Bookie.java:603)
>         at org.apache.bookkeeper.proto.BookieServer.start(BookieServer.java:111){noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira