You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@bookkeeper.apache.org by "Sijie Guo (JIRA)" <ji...@apache.org> on 2012/12/21 04:45:17 UTC

[jira] [Commented] (BOOKKEEPER-524) Bookie journal filesystem gets full after SyncThread is terminated with exception

    [ https://issues.apache.org/jira/browse/BOOKKEEPER-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13537656#comment-13537656 ] 

Sijie Guo commented on BOOKKEEPER-524:
--------------------------------------

Thanks Matteo. I think the NPE might be caused by a race condition between flushLedger and removeLedger.

when flushLedger, it first get the list of first entry, then flush ledger pages according to the first entry list. if removeLedger happened between them, removeLedger would remove ledger pages for that ledger from mapping, it cause NPE during flush.

I need to check the flush code to ensure there is no other NPE happened. besides that, it would be better to catch the throwable in SyncThread, when SyncThread quits, either turn it into readonly or shutdown. otherwise, it silence the exception until something bad happened (e.g journal disk is full. at this case, a bookie might take a long time to restart replaying its journal).


                
> Bookie journal filesystem gets full after SyncThread is terminated with exception
> ---------------------------------------------------------------------------------
>
>                 Key: BOOKKEEPER-524
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-524
>             Project: Bookkeeper
>          Issue Type: Bug
>          Components: bookkeeper-server
>    Affects Versions: 4.2.0
>            Reporter: Matteo Merli
>            Priority: Blocker
>             Fix For: 4.2.0
>
>         Attachments: 0001-BOOKKEEPER-524-Bookie-journal-filesystem-gets-full-a.patch
>
>
> The SyncThread get a NPE while the rest of the bookie is still running. This causes the journal gc to be stopped and the filesystem get full.
> Tue Dec 18 17:01:18 2012: Exception in thread "SyncThread" java.lang.NullPointerException
> Tue Dec 18 17:01:18 2012:       at org.apache.bookkeeper.bookie.LedgerCacheImpl.getLedgerEntryPage(LedgerCacheImpl.java:153)
> Tue Dec 18 17:01:18 2012:       at org.apache.bookkeeper.bookie.LedgerCacheImpl.flushLedger(LedgerCacheImpl.java:421)
> Tue Dec 18 17:01:18 2012:       at org.apache.bookkeeper.bookie.LedgerCacheImpl.flushLedger(LedgerCacheImpl.java:363)
> Tue Dec 18 17:01:18 2012:       at org.apache.bookkeeper.bookie.InterleavedLedgerStorage.flush(InterleavedLedgerStorage.java:148)
> Tue Dec 18 17:01:18 2012:       at org.apache.bookkeeper.bookie.Bookie$SyncThread.run(Bookie.java:221)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira