You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@bookkeeper.apache.org by "Sijie Guo (Updated) (JIRA)" <ji...@apache.org> on 2012/03/08 15:45:58 UTC
[jira] [Updated] (BOOKKEEPER-182) Entry log file is overwritten
when fail to read lastLogId.
[ https://issues.apache.org/jira/browse/BOOKKEEPER-182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sijie Guo updated BOOKKEEPER-182:
---------------------------------
Attachment: BK-182.diff
attach a patch to fix this issue by scanning ledger directory to get the biggest log id when read lastLogId failed.
> Entry log file is overwritten when fail to read lastLogId.
> ----------------------------------------------------------
>
> Key: BOOKKEEPER-182
> URL: https://issues.apache.org/jira/browse/BOOKKEEPER-182
> Project: Bookkeeper
> Issue Type: Bug
> Reporter: Sijie Guo
> Assignee: Sijie Guo
> Attachments: BK-182.diff
>
>
> we found data corruption happened on entry log files.
> 2012-03-06 07:26:14,947 - ERROR [NIOServerFactory-3181:BookieServer@413] - Error reading 229@114724
> java.io.IOException: problem found in 0@229 at position + 89030194 entry belongs to 6373236044838956613 not 114724
> at org.apache.bookkeeper.bookie.EntryLogger.readEntry(EntryLogger.java:347)
> at org.apache.bookkeeper.bookie.LedgerDescriptor.readEntry(LedgerDescriptor.java:180)
> at org.apache.bookkeeper.bookie.Bookie.readEntry(Bookie.java:1081)
> at org.apache.bookkeeper.proto.BookieServer.processPacket(BookieServer.java:386)
> at org.apache.bookkeeper.proto.NIOServerFactory$Cnxn.readRequest(NIOServerFactory.java:315)
> at org.apache.bookkeeper.proto.NIOServerFactory$Cnxn.doIO(NIOServerFactory.java:213)
> at org.apache.bookkeeper.proto.NIOServerFactory.run(NIOServerFactory.java:124
> then we did some investigation on failed ledger:
> first looked into ledger 114724's index file.
> {code}
> entry 75 : (log:11, pos: 100526580)
> entry 76 : (log:11, pos: 101849530)
> entry 77 : (log:11, pos: 103176596)
> entry 78 : (log:11, pos: 104403977)
> entry 79 : (log:11, pos: 105756017)
> entry 80 : (log:11, pos: 106740803)
> entry 81 : (log:0, pos: 73365)
> entry 82 : (log:0, pos: 1366625)
> entry 83 : (log:0, pos: 2719276)
> entry 84 : (log:0, pos: 4065142)
> {code}
> from entry 80, the data is written in 0 entry log which is less than 11. (means data is written to an older entry log file)
> then we looked into ledger directory as below
> {code}
> 2147483550 Mar 5 11:30 /var/bookkeeper/ledger/0.log
> 94122988 Mar 5 11:33 /var/bookkeeper/ledger/1.log
> 1984247565 Mar 5 11:34 /var/bookkeeper/ledger/2.log
> 288376 Mar 5 11:34 /var/bookkeeper/ledger/3.log
> 747151813 Mar 6 03:17 /var/bookkeeper/ledger/4.log
> 410381287 Mar 6 07:43 /var/bookkeeper/ledger/5.log
> 2147483363 Feb 27 19:59 /var/bookkeeper/ledger/7.log
> 2147483565 Feb 29 09:40 /var/bookkeeper/ledger/9.log
> 1691783168 Mar 1 03:22 /var/bookkeeper/ledger/a.log
> 125556720 Mar 1 08:30 /var/bookkeeper/ledger/b.log
> 0 Mar 1 08:33 /var/bookkeeper/ledger/c.log
> {code}
> the 0-5 entry log files are overwritten.
> looked into the code, found that when bookie server failed to read lastLogId, it would set the lastLogId to -1. then start writing entry log files from 0. and also there is not checking about the existen of the entry log file.
> it would better to scan the directories to found the biggest log id and start from it. and check whether the file exists or not when creating a new entry log file.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira