You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@bookkeeper.apache.org by "Sijie Guo (Created) (JIRA)" <ji...@apache.org> on 2012/03/08 08:37:06 UTC

[jira] [Created] (BOOKKEEPER-182) Entry log file is overwritten when fail to read lastLogId.

Entry log file is overwritten when fail to read lastLogId.
----------------------------------------------------------

                 Key: BOOKKEEPER-182
                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-182
             Project: Bookkeeper
          Issue Type: Bug
            Reporter: Sijie Guo
            Assignee: Sijie Guo


we found data corruption happened on entry log files.

2012-03-06 07:26:14,947 - ERROR [NIOServerFactory-3181:BookieServer@413] - Error reading 229@114724
java.io.IOException: problem found in 0@229 at position + 89030194 entry belongs to 6373236044838956613 not 114724
        at org.apache.bookkeeper.bookie.EntryLogger.readEntry(EntryLogger.java:347)
        at org.apache.bookkeeper.bookie.LedgerDescriptor.readEntry(LedgerDescriptor.java:180)
        at org.apache.bookkeeper.bookie.Bookie.readEntry(Bookie.java:1081)
        at org.apache.bookkeeper.proto.BookieServer.processPacket(BookieServer.java:386)
        at org.apache.bookkeeper.proto.NIOServerFactory$Cnxn.readRequest(NIOServerFactory.java:315)
        at org.apache.bookkeeper.proto.NIOServerFactory$Cnxn.doIO(NIOServerFactory.java:213)
        at org.apache.bookkeeper.proto.NIOServerFactory.run(NIOServerFactory.java:124

then we did some investigation on failed ledger:

first looked into ledger 114724's index file.

{code}
entry 75        :       (log:11, pos: 100526580)
entry 76        :       (log:11, pos: 101849530)
entry 77        :       (log:11, pos: 103176596)
entry 78        :       (log:11, pos: 104403977)
entry 79        :       (log:11, pos: 105756017)
entry 80        :       (log:11, pos: 106740803)
entry 81        :       (log:0, pos: 73365)
entry 82        :       (log:0, pos: 1366625)
entry 83        :       (log:0, pos: 2719276)
entry 84        :       (log:0, pos: 4065142)
{code}

from entry 80, the data is written in 0 entry log which is less than 11. (means data is written to an older entry log file)

then we looked into ledger directory as below

{code}
2147483550 Mar  5 11:30 /var/bookkeeper/ledger/0.log
  94122988 Mar  5 11:33 /var/bookkeeper/ledger/1.log
1984247565 Mar  5 11:34 /var/bookkeeper/ledger/2.log
    288376 Mar  5 11:34 /var/bookkeeper/ledger/3.log
 747151813 Mar  6 03:17 /var/bookkeeper/ledger/4.log
 410381287 Mar  6 07:43 /var/bookkeeper/ledger/5.log
2147483363 Feb 27 19:59 /var/bookkeeper/ledger/7.log
2147483565 Feb 29 09:40 /var/bookkeeper/ledger/9.log
1691783168 Mar  1 03:22 /var/bookkeeper/ledger/a.log
 125556720 Mar  1 08:30 /var/bookkeeper/ledger/b.log
         0 Mar  1 08:33 /var/bookkeeper/ledger/c.log
{code}

the 0-5 entry log files are overwritten.

looked into the code, found that when bookie server failed to read lastLogId, it would set the lastLogId to -1. then start writing entry log files from 0. and also there is not checking about the existen of the entry log file.

it would better to scan the directories to found the biggest log id and start from it. and check whether the file exists or not when creating a new entry log file.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (BOOKKEEPER-182) Entry log file is overwritten when fail to read lastLogId.

Posted by "Sijie Guo (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/BOOKKEEPER-182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sijie Guo updated BOOKKEEPER-182:
---------------------------------

    Attachment: BK-182.diff

attach a patch to fix this issue by scanning ledger directory to get the biggest log id when read lastLogId failed.
                
> Entry log file is overwritten when fail to read lastLogId.
> ----------------------------------------------------------
>
>                 Key: BOOKKEEPER-182
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-182
>             Project: Bookkeeper
>          Issue Type: Bug
>            Reporter: Sijie Guo
>            Assignee: Sijie Guo
>         Attachments: BK-182.diff
>
>
> we found data corruption happened on entry log files.
> 2012-03-06 07:26:14,947 - ERROR [NIOServerFactory-3181:BookieServer@413] - Error reading 229@114724
> java.io.IOException: problem found in 0@229 at position + 89030194 entry belongs to 6373236044838956613 not 114724
>         at org.apache.bookkeeper.bookie.EntryLogger.readEntry(EntryLogger.java:347)
>         at org.apache.bookkeeper.bookie.LedgerDescriptor.readEntry(LedgerDescriptor.java:180)
>         at org.apache.bookkeeper.bookie.Bookie.readEntry(Bookie.java:1081)
>         at org.apache.bookkeeper.proto.BookieServer.processPacket(BookieServer.java:386)
>         at org.apache.bookkeeper.proto.NIOServerFactory$Cnxn.readRequest(NIOServerFactory.java:315)
>         at org.apache.bookkeeper.proto.NIOServerFactory$Cnxn.doIO(NIOServerFactory.java:213)
>         at org.apache.bookkeeper.proto.NIOServerFactory.run(NIOServerFactory.java:124
> then we did some investigation on failed ledger:
> first looked into ledger 114724's index file.
> {code}
> entry 75        :       (log:11, pos: 100526580)
> entry 76        :       (log:11, pos: 101849530)
> entry 77        :       (log:11, pos: 103176596)
> entry 78        :       (log:11, pos: 104403977)
> entry 79        :       (log:11, pos: 105756017)
> entry 80        :       (log:11, pos: 106740803)
> entry 81        :       (log:0, pos: 73365)
> entry 82        :       (log:0, pos: 1366625)
> entry 83        :       (log:0, pos: 2719276)
> entry 84        :       (log:0, pos: 4065142)
> {code}
> from entry 80, the data is written in 0 entry log which is less than 11. (means data is written to an older entry log file)
> then we looked into ledger directory as below
> {code}
> 2147483550 Mar  5 11:30 /var/bookkeeper/ledger/0.log
>   94122988 Mar  5 11:33 /var/bookkeeper/ledger/1.log
> 1984247565 Mar  5 11:34 /var/bookkeeper/ledger/2.log
>     288376 Mar  5 11:34 /var/bookkeeper/ledger/3.log
>  747151813 Mar  6 03:17 /var/bookkeeper/ledger/4.log
>  410381287 Mar  6 07:43 /var/bookkeeper/ledger/5.log
> 2147483363 Feb 27 19:59 /var/bookkeeper/ledger/7.log
> 2147483565 Feb 29 09:40 /var/bookkeeper/ledger/9.log
> 1691783168 Mar  1 03:22 /var/bookkeeper/ledger/a.log
>  125556720 Mar  1 08:30 /var/bookkeeper/ledger/b.log
>          0 Mar  1 08:33 /var/bookkeeper/ledger/c.log
> {code}
> the 0-5 entry log files are overwritten.
> looked into the code, found that when bookie server failed to read lastLogId, it would set the lastLogId to -1. then start writing entry log files from 0. and also there is not checking about the existen of the entry log file.
> it would better to scan the directories to found the biggest log id and start from it. and check whether the file exists or not when creating a new entry log file.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (BOOKKEEPER-182) Entry log file is overwritten when fail to read lastLogId.

Posted by "Sijie Guo (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/BOOKKEEPER-182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13229168#comment-13229168 ] 

Sijie Guo commented on BOOKKEEPER-182:
--------------------------------------

yes. ur right, the directory is not right in the test. I will update the patch.

                
> Entry log file is overwritten when fail to read lastLogId.
> ----------------------------------------------------------
>
>                 Key: BOOKKEEPER-182
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-182
>             Project: Bookkeeper
>          Issue Type: Bug
>            Reporter: Sijie Guo
>            Assignee: Sijie Guo
>             Fix For: 4.1.0
>
>         Attachments: BK-182.diff, BK-182.diff_v2
>
>
> we found data corruption happened on entry log files.
> 2012-03-06 07:26:14,947 - ERROR [NIOServerFactory-3181:BookieServer@413] - Error reading 229@114724
> java.io.IOException: problem found in 0@229 at position + 89030194 entry belongs to 6373236044838956613 not 114724
>         at org.apache.bookkeeper.bookie.EntryLogger.readEntry(EntryLogger.java:347)
>         at org.apache.bookkeeper.bookie.LedgerDescriptor.readEntry(LedgerDescriptor.java:180)
>         at org.apache.bookkeeper.bookie.Bookie.readEntry(Bookie.java:1081)
>         at org.apache.bookkeeper.proto.BookieServer.processPacket(BookieServer.java:386)
>         at org.apache.bookkeeper.proto.NIOServerFactory$Cnxn.readRequest(NIOServerFactory.java:315)
>         at org.apache.bookkeeper.proto.NIOServerFactory$Cnxn.doIO(NIOServerFactory.java:213)
>         at org.apache.bookkeeper.proto.NIOServerFactory.run(NIOServerFactory.java:124
> then we did some investigation on failed ledger:
> first looked into ledger 114724's index file.
> {code}
> entry 75        :       (log:11, pos: 100526580)
> entry 76        :       (log:11, pos: 101849530)
> entry 77        :       (log:11, pos: 103176596)
> entry 78        :       (log:11, pos: 104403977)
> entry 79        :       (log:11, pos: 105756017)
> entry 80        :       (log:11, pos: 106740803)
> entry 81        :       (log:0, pos: 73365)
> entry 82        :       (log:0, pos: 1366625)
> entry 83        :       (log:0, pos: 2719276)
> entry 84        :       (log:0, pos: 4065142)
> {code}
> from entry 80, the data is written in 0 entry log which is less than 11. (means data is written to an older entry log file)
> then we looked into ledger directory as below
> {code}
> 2147483550 Mar  5 11:30 /var/bookkeeper/ledger/0.log
>   94122988 Mar  5 11:33 /var/bookkeeper/ledger/1.log
> 1984247565 Mar  5 11:34 /var/bookkeeper/ledger/2.log
>     288376 Mar  5 11:34 /var/bookkeeper/ledger/3.log
>  747151813 Mar  6 03:17 /var/bookkeeper/ledger/4.log
>  410381287 Mar  6 07:43 /var/bookkeeper/ledger/5.log
> 2147483363 Feb 27 19:59 /var/bookkeeper/ledger/7.log
> 2147483565 Feb 29 09:40 /var/bookkeeper/ledger/9.log
> 1691783168 Mar  1 03:22 /var/bookkeeper/ledger/a.log
>  125556720 Mar  1 08:30 /var/bookkeeper/ledger/b.log
>          0 Mar  1 08:33 /var/bookkeeper/ledger/c.log
> {code}
> the 0-5 entry log files are overwritten.
> looked into the code, found that when bookie server failed to read lastLogId, it would set the lastLogId to -1. then start writing entry log files from 0. and also there is not checking about the existen of the entry log file.
> it would better to scan the directories to found the biggest log id and start from it. and check whether the file exists or not when creating a new entry log file.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (BOOKKEEPER-182) Entry log file is overwritten when fail to read lastLogId.

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/BOOKKEEPER-182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13231044#comment-13231044 ] 

Hudson commented on BOOKKEEPER-182:
-----------------------------------

Integrated in bookkeeper-trunk #412 (See [https://builds.apache.org/job/bookkeeper-trunk/412/])
    BOOKKEEPER-182: Entry log file is overwritten when fail to read lastLogId. (sijie via ivank) (Revision 1301395)

     Result = ABORTED
ivank : 
Files : 
* /zookeeper/bookkeeper/trunk/CHANGES.txt
* /zookeeper/bookkeeper/trunk/bookkeeper-server/src/main/java/org/apache/bookkeeper/bookie/EntryLogger.java
* /zookeeper/bookkeeper/trunk/bookkeeper-server/src/test/java/org/apache/bookkeeper/bookie/EntryLogTest.java

                
> Entry log file is overwritten when fail to read lastLogId.
> ----------------------------------------------------------
>
>                 Key: BOOKKEEPER-182
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-182
>             Project: Bookkeeper
>          Issue Type: Bug
>            Reporter: Sijie Guo
>            Assignee: Ivan Kelly
>             Fix For: 4.1.0
>
>         Attachments: BK-182.diff, BK-182.diff_v2
>
>
> we found data corruption happened on entry log files.
> 2012-03-06 07:26:14,947 - ERROR [NIOServerFactory-3181:BookieServer@413] - Error reading 229@114724
> java.io.IOException: problem found in 0@229 at position + 89030194 entry belongs to 6373236044838956613 not 114724
>         at org.apache.bookkeeper.bookie.EntryLogger.readEntry(EntryLogger.java:347)
>         at org.apache.bookkeeper.bookie.LedgerDescriptor.readEntry(LedgerDescriptor.java:180)
>         at org.apache.bookkeeper.bookie.Bookie.readEntry(Bookie.java:1081)
>         at org.apache.bookkeeper.proto.BookieServer.processPacket(BookieServer.java:386)
>         at org.apache.bookkeeper.proto.NIOServerFactory$Cnxn.readRequest(NIOServerFactory.java:315)
>         at org.apache.bookkeeper.proto.NIOServerFactory$Cnxn.doIO(NIOServerFactory.java:213)
>         at org.apache.bookkeeper.proto.NIOServerFactory.run(NIOServerFactory.java:124
> then we did some investigation on failed ledger:
> first looked into ledger 114724's index file.
> {code}
> entry 75        :       (log:11, pos: 100526580)
> entry 76        :       (log:11, pos: 101849530)
> entry 77        :       (log:11, pos: 103176596)
> entry 78        :       (log:11, pos: 104403977)
> entry 79        :       (log:11, pos: 105756017)
> entry 80        :       (log:11, pos: 106740803)
> entry 81        :       (log:0, pos: 73365)
> entry 82        :       (log:0, pos: 1366625)
> entry 83        :       (log:0, pos: 2719276)
> entry 84        :       (log:0, pos: 4065142)
> {code}
> from entry 80, the data is written in 0 entry log which is less than 11. (means data is written to an older entry log file)
> then we looked into ledger directory as below
> {code}
> 2147483550 Mar  5 11:30 /var/bookkeeper/ledger/0.log
>   94122988 Mar  5 11:33 /var/bookkeeper/ledger/1.log
> 1984247565 Mar  5 11:34 /var/bookkeeper/ledger/2.log
>     288376 Mar  5 11:34 /var/bookkeeper/ledger/3.log
>  747151813 Mar  6 03:17 /var/bookkeeper/ledger/4.log
>  410381287 Mar  6 07:43 /var/bookkeeper/ledger/5.log
> 2147483363 Feb 27 19:59 /var/bookkeeper/ledger/7.log
> 2147483565 Feb 29 09:40 /var/bookkeeper/ledger/9.log
> 1691783168 Mar  1 03:22 /var/bookkeeper/ledger/a.log
>  125556720 Mar  1 08:30 /var/bookkeeper/ledger/b.log
>          0 Mar  1 08:33 /var/bookkeeper/ledger/c.log
> {code}
> the 0-5 entry log files are overwritten.
> looked into the code, found that when bookie server failed to read lastLogId, it would set the lastLogId to -1. then start writing entry log files from 0. and also there is not checking about the existen of the entry log file.
> it would better to scan the directories to found the biggest log id and start from it. and check whether the file exists or not when creating a new entry log file.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (BOOKKEEPER-182) Entry log file is overwritten when fail to read lastLogId.

Posted by "Sijie Guo (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/BOOKKEEPER-182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sijie Guo updated BOOKKEEPER-182:
---------------------------------

    Attachment: BK-182.diff_v2

new patch attached.
                
> Entry log file is overwritten when fail to read lastLogId.
> ----------------------------------------------------------
>
>                 Key: BOOKKEEPER-182
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-182
>             Project: Bookkeeper
>          Issue Type: Bug
>            Reporter: Sijie Guo
>            Assignee: Sijie Guo
>             Fix For: 4.1.0
>
>         Attachments: BK-182.diff, BK-182.diff_v2
>
>
> we found data corruption happened on entry log files.
> 2012-03-06 07:26:14,947 - ERROR [NIOServerFactory-3181:BookieServer@413] - Error reading 229@114724
> java.io.IOException: problem found in 0@229 at position + 89030194 entry belongs to 6373236044838956613 not 114724
>         at org.apache.bookkeeper.bookie.EntryLogger.readEntry(EntryLogger.java:347)
>         at org.apache.bookkeeper.bookie.LedgerDescriptor.readEntry(LedgerDescriptor.java:180)
>         at org.apache.bookkeeper.bookie.Bookie.readEntry(Bookie.java:1081)
>         at org.apache.bookkeeper.proto.BookieServer.processPacket(BookieServer.java:386)
>         at org.apache.bookkeeper.proto.NIOServerFactory$Cnxn.readRequest(NIOServerFactory.java:315)
>         at org.apache.bookkeeper.proto.NIOServerFactory$Cnxn.doIO(NIOServerFactory.java:213)
>         at org.apache.bookkeeper.proto.NIOServerFactory.run(NIOServerFactory.java:124
> then we did some investigation on failed ledger:
> first looked into ledger 114724's index file.
> {code}
> entry 75        :       (log:11, pos: 100526580)
> entry 76        :       (log:11, pos: 101849530)
> entry 77        :       (log:11, pos: 103176596)
> entry 78        :       (log:11, pos: 104403977)
> entry 79        :       (log:11, pos: 105756017)
> entry 80        :       (log:11, pos: 106740803)
> entry 81        :       (log:0, pos: 73365)
> entry 82        :       (log:0, pos: 1366625)
> entry 83        :       (log:0, pos: 2719276)
> entry 84        :       (log:0, pos: 4065142)
> {code}
> from entry 80, the data is written in 0 entry log which is less than 11. (means data is written to an older entry log file)
> then we looked into ledger directory as below
> {code}
> 2147483550 Mar  5 11:30 /var/bookkeeper/ledger/0.log
>   94122988 Mar  5 11:33 /var/bookkeeper/ledger/1.log
> 1984247565 Mar  5 11:34 /var/bookkeeper/ledger/2.log
>     288376 Mar  5 11:34 /var/bookkeeper/ledger/3.log
>  747151813 Mar  6 03:17 /var/bookkeeper/ledger/4.log
>  410381287 Mar  6 07:43 /var/bookkeeper/ledger/5.log
> 2147483363 Feb 27 19:59 /var/bookkeeper/ledger/7.log
> 2147483565 Feb 29 09:40 /var/bookkeeper/ledger/9.log
> 1691783168 Mar  1 03:22 /var/bookkeeper/ledger/a.log
>  125556720 Mar  1 08:30 /var/bookkeeper/ledger/b.log
>          0 Mar  1 08:33 /var/bookkeeper/ledger/c.log
> {code}
> the 0-5 entry log files are overwritten.
> looked into the code, found that when bookie server failed to read lastLogId, it would set the lastLogId to -1. then start writing entry log files from 0. and also there is not checking about the existen of the entry log file.
> it would better to scan the directories to found the biggest log id and start from it. and check whether the file exists or not when creating a new entry log file.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira