You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@bookkeeper.apache.org by Venkateswara Rao Jujjuri <ju...@gmail.com> on 2019/10/12 17:18:30 UTC

Journal Corruption

We have observed Journal Corruption while replaying the journal log.
I could be missing something but I don't see we have clear indication of
where
journal file ends.

In the Journal.java:scanJournal() we have multiple breaks but mostly
we assume either the journal file ends in the right place or the length is
zero.

scanJournal() {
...
while(true) {

// entry start offset
long offset = recLog.fc.position();
// start reading entry
lenBuff.clear();
fullRead(recLog, lenBuff);
if (lenBuff.remaining() != 0) {
    break;
}
lenBuff.flip();
int len = lenBuff.getInt();
if (len == 0) {
    break;
}
}
====

So there is an implicit assumption here that the journal file ends

or the length is zero.

But... JournalChannel.java:
void preAllocIfNeeded(long size) throws IOException {
    if (bc.position() + size > nextPrealloc) {
        nextPrealloc += preAllocSize;
        zeros.clear();
        fc.write(zeros, nextPrealloc - journalAlignSize);
    }
}

We extend the file withe prealloc size at the journalAlignSize.

Based on our tunables (in our case preAllocSize:4MB journalAlignSize:512)

We could extend the file and the fc.write and we are writing only the
last journalAlignSize of zeros. i.e 4MB-512 can be garbage in the file.
When the journal is replayed there is a potential for garbage and journal
corruption indication.

Being said that we haven't see this so far, so the bug may not be

this obvious.. anyway just trying to see what I am missing.

Here is the stack:

20191010181408.832000 [INFO ] [main] [JournalChannel.java:154] -
Opening journal /sfs/sfsdata/journal/current/16cdf99de69.txn
Exception in thread "main" java.lang.IllegalArgumentException
    at java.nio.Buffer.limit(Buffer.java:275)
    at org.apache.bookkeeper.bookie.Journal.scanJournal(Journal.java:827)
    at org.apache.bookkeeper.bookie.Bookie.replay(Bookie.java:1047)
    at org.apache.bookkeeper.bookie.Bookie.readJournal(Bookie.java:1009)
    at org.apache.bookkeeper.bookie.Bookie.start(Bookie.java:1070)
    at org.apache.bookkeeper.proto.BookieServer.start(BookieServer.java:141)
    at org.apache.bookkeeper.server.service.BookieService.doStart(BookieService.java:58)
    at org.apache.bookkeeper.common.component.AbstractLifecycleComponent.start(AbstractLifecycleComponent.java:78)
    at org.apache.bookkeeper.common.component.LifecycleComponentStack.lambda$start$2(LifecycleComponentStack.java:113)
    at com.google.common.collect.ImmutableList.forEach(ImmutableList.java:408)
    at org.apache.bookkeeper.common.component.LifecycleComponentStack.start(LifecycleComponentStack.java:113)
    at org.apache.bookkeeper.common.component.ComponentStarter.startComponent(ComponentStarter.java:80)
    at org.apache.bookkeeper.server.Main.doMain(Main.java:228)
    at org.apache.bookkeeper.server.Main.main(Main.java:202)
    at org.apache.bookkeeper.proto.BookieServer.main(BookieServer.java:296)



-- 
Jvrao
---
First they ignore you, then they laugh at you, then they fight you, then
you win. - Mahatma Gandhi

Re: Journal Corruption

Posted by Venkateswara Rao Jujjuri <ju...@gmail.com>.
Correct. We can't just ignore it. We needs to be a proper EndOfEntryLog.
How did it work until now? I wrote a simple test to reproduce this. This is
super surprising to me, who we haven't hit it so far?

Filling the file with negative numbers reproduces this problem very easily
by running BookieJournalTest.java: testV5Journal()
Just add the following 3 lines and run testV5Journal()

diff --git
a/bookkeeper-server/src/main/java/org/apache/bookkeeper/bookie/JournalChannel.java
b/bookkeeper-server/src/main/java/org/apache/bookkeeper/bookie/JournalChannel.java
index ab1ac7d0b..6fc68a0b9 100644
---
a/bookkeeper-server/src/main/java/org/apache/bookkeeper/bookie/JournalChannel.java
+++
b/bookkeeper-server/src/main/java/org/apache/bookkeeper/bookie/JournalChannel.java
@@ -162,6 +162,9 @@ class JournalChannel implements Closeable {
                 throw new IOException("File " + fn
                         + " suddenly appeared, is another bookie process
running?");
             }
+            byte[] filldata = new byte[4 * 1024 * 1024];
+            Arrays.fill(filldata, (byte)-1);
+            Files.write(Paths.get(fn.getPath()), filldata);

             randomAccessFile = new RandomAccessFile(fn, "rw");
             fc = randomAccessFile.getChannel();

On Sat, Oct 12, 2019 at 12:21 PM Enrico Olivelli <eo...@gmail.com>
wrote:

> JV,
> As the machine/disk/process may crash/halt unexpectedly during writes we
> can't do many assumptions, the only way is to accept an entry in case of a
> valid read and consider that we reached the end of the journal in case of a
> partial read or garbage.
>
> In your case it is possible that the reader reached an unwritten part of
> the preallocated file (garbage) or it is possible that the file has been
> corrupted.
>
> We can't reliably detect which of the two cases happened, this is bad.
> Even turning off preallocation won't help.
>
> We can fix the issue just by considering that we reached the end of the
> journal but if the file was corrupted we may lose important bits, as
> 'fence' bits
>
> Enrico
>
> Il sab 12 ott 2019, 19:18 Venkateswara Rao Jujjuri <ju...@gmail.com> ha
> scritto:
>
> > We have observed Journal Corruption while replaying the journal log.
> > I could be missing something but I don't see we have clear indication of
> > where
> > journal file ends.
> >
> > In the Journal.java:scanJournal() we have multiple breaks but mostly
> > we assume either the journal file ends in the right place or the length
> is
> > zero.
> >
> > scanJournal() {
> > ...
> > while(true) {
> >
> > // entry start offset
> > long offset = recLog.fc.position();
> > // start reading entry
> > lenBuff.clear();
> > fullRead(recLog, lenBuff);
> > if (lenBuff.remaining() != 0) {
> >     break;
> > }
> > lenBuff.flip();
> > int len = lenBuff.getInt();
> > if (len == 0) {
> >     break;
> > }
> > }
> > ====
> >
> > So there is an implicit assumption here that the journal file ends
> >
> > or the length is zero.
> >
> > But... JournalChannel.java:
> > void preAllocIfNeeded(long size) throws IOException {
> >     if (bc.position() + size > nextPrealloc) {
> >         nextPrealloc += preAllocSize;
> >         zeros.clear();
> >         fc.write(zeros, nextPrealloc - journalAlignSize);
> >     }
> > }
> >
> > We extend the file withe prealloc size at the journalAlignSize.
> >
> > Based on our tunables (in our case preAllocSize:4MB journalAlignSize:512)
> >
> > We could extend the file and the fc.write and we are writing only the
> > last journalAlignSize of zeros. i.e 4MB-512 can be garbage in the file.
> > When the journal is replayed there is a potential for garbage and journal
> > corruption indication.
> >
> > Being said that we haven't see this so far, so the bug may not be
> >
> > this obvious.. anyway just trying to see what I am missing.
> >
> > Here is the stack:
> >
> > 20191010181408.832000 [INFO ] [main] [JournalChannel.java:154] -
> > Opening journal /sfs/sfsdata/journal/current/16cdf99de69.txn
> > Exception in thread "main" java.lang.IllegalArgumentException
> >     at java.nio.Buffer.limit(Buffer.java:275)
> >     at org.apache.bookkeeper.bookie.Journal.scanJournal(Journal.java:827)
> >     at org.apache.bookkeeper.bookie.Bookie.replay(Bookie.java:1047)
> >     at org.apache.bookkeeper.bookie.Bookie.readJournal(Bookie.java:1009)
> >     at org.apache.bookkeeper.bookie.Bookie.start(Bookie.java:1070)
> >     at
> > org.apache.bookkeeper.proto.BookieServer.start(BookieServer.java:141)
> >     at
> >
> org.apache.bookkeeper.server.service.BookieService.doStart(BookieService.java:58)
> >     at
> >
> org.apache.bookkeeper.common.component.AbstractLifecycleComponent.start(AbstractLifecycleComponent.java:78)
> >     at
> >
> org.apache.bookkeeper.common.component.LifecycleComponentStack.lambda$start$2(LifecycleComponentStack.java:113)
> >     at
> > com.google.common.collect.ImmutableList.forEach(ImmutableList.java:408)
> >     at
> >
> org.apache.bookkeeper.common.component.LifecycleComponentStack.start(LifecycleComponentStack.java:113)
> >     at
> >
> org.apache.bookkeeper.common.component.ComponentStarter.startComponent(ComponentStarter.java:80)
> >     at org.apache.bookkeeper.server.Main.doMain(Main.java:228)
> >     at org.apache.bookkeeper.server.Main.main(Main.java:202)
> >     at
> org.apache.bookkeeper.proto.BookieServer.main(BookieServer.java:296)
> >
> >
> >
> > --
> > Jvrao
> > ---
> > First they ignore you, then they laugh at you, then they fight you, then
> > you win. - Mahatma Gandhi
> >
>


-- 
Jvrao
---
First they ignore you, then they laugh at you, then they fight you, then
you win. - Mahatma Gandhi

Re: Journal Corruption

Posted by Enrico Olivelli <eo...@gmail.com>.
JV,
As the machine/disk/process may crash/halt unexpectedly during writes we
can't do many assumptions, the only way is to accept an entry in case of a
valid read and consider that we reached the end of the journal in case of a
partial read or garbage.

In your case it is possible that the reader reached an unwritten part of
the preallocated file (garbage) or it is possible that the file has been
corrupted.

We can't reliably detect which of the two cases happened, this is bad.
Even turning off preallocation won't help.

We can fix the issue just by considering that we reached the end of the
journal but if the file was corrupted we may lose important bits, as
'fence' bits

Enrico

Il sab 12 ott 2019, 19:18 Venkateswara Rao Jujjuri <ju...@gmail.com> ha
scritto:

> We have observed Journal Corruption while replaying the journal log.
> I could be missing something but I don't see we have clear indication of
> where
> journal file ends.
>
> In the Journal.java:scanJournal() we have multiple breaks but mostly
> we assume either the journal file ends in the right place or the length is
> zero.
>
> scanJournal() {
> ...
> while(true) {
>
> // entry start offset
> long offset = recLog.fc.position();
> // start reading entry
> lenBuff.clear();
> fullRead(recLog, lenBuff);
> if (lenBuff.remaining() != 0) {
>     break;
> }
> lenBuff.flip();
> int len = lenBuff.getInt();
> if (len == 0) {
>     break;
> }
> }
> ====
>
> So there is an implicit assumption here that the journal file ends
>
> or the length is zero.
>
> But... JournalChannel.java:
> void preAllocIfNeeded(long size) throws IOException {
>     if (bc.position() + size > nextPrealloc) {
>         nextPrealloc += preAllocSize;
>         zeros.clear();
>         fc.write(zeros, nextPrealloc - journalAlignSize);
>     }
> }
>
> We extend the file withe prealloc size at the journalAlignSize.
>
> Based on our tunables (in our case preAllocSize:4MB journalAlignSize:512)
>
> We could extend the file and the fc.write and we are writing only the
> last journalAlignSize of zeros. i.e 4MB-512 can be garbage in the file.
> When the journal is replayed there is a potential for garbage and journal
> corruption indication.
>
> Being said that we haven't see this so far, so the bug may not be
>
> this obvious.. anyway just trying to see what I am missing.
>
> Here is the stack:
>
> 20191010181408.832000 [INFO ] [main] [JournalChannel.java:154] -
> Opening journal /sfs/sfsdata/journal/current/16cdf99de69.txn
> Exception in thread "main" java.lang.IllegalArgumentException
>     at java.nio.Buffer.limit(Buffer.java:275)
>     at org.apache.bookkeeper.bookie.Journal.scanJournal(Journal.java:827)
>     at org.apache.bookkeeper.bookie.Bookie.replay(Bookie.java:1047)
>     at org.apache.bookkeeper.bookie.Bookie.readJournal(Bookie.java:1009)
>     at org.apache.bookkeeper.bookie.Bookie.start(Bookie.java:1070)
>     at
> org.apache.bookkeeper.proto.BookieServer.start(BookieServer.java:141)
>     at
> org.apache.bookkeeper.server.service.BookieService.doStart(BookieService.java:58)
>     at
> org.apache.bookkeeper.common.component.AbstractLifecycleComponent.start(AbstractLifecycleComponent.java:78)
>     at
> org.apache.bookkeeper.common.component.LifecycleComponentStack.lambda$start$2(LifecycleComponentStack.java:113)
>     at
> com.google.common.collect.ImmutableList.forEach(ImmutableList.java:408)
>     at
> org.apache.bookkeeper.common.component.LifecycleComponentStack.start(LifecycleComponentStack.java:113)
>     at
> org.apache.bookkeeper.common.component.ComponentStarter.startComponent(ComponentStarter.java:80)
>     at org.apache.bookkeeper.server.Main.doMain(Main.java:228)
>     at org.apache.bookkeeper.server.Main.main(Main.java:202)
>     at org.apache.bookkeeper.proto.BookieServer.main(BookieServer.java:296)
>
>
>
> --
> Jvrao
> ---
> First they ignore you, then they laugh at you, then they fight you, then
> you win. - Mahatma Gandhi
>