You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@bookkeeper.apache.org by GitBox <gi...@apache.org> on 2018/03/13 22:38:33 UTC

[GitHub] sijie opened a new issue #1255: Bookie should not advance the journal marker before creating the index file

sijie opened a new issue #1255: Bookie should not advance the journal marker before creating the index file
URL: https://github.com/apache/bookkeeper/issues/1255
 
 
   **Bug**
   
   *Problem*
   
   Currently Bookie journal 'new ledger' entry if a ledger doesn't exist at ledger storage. This 'new ledger' entry is journaled before adding the entry to ledger storage. so this would cause a problem on checkpointing.
   
   - journal 'new ledger' entry
   
   ```
       /**
        * Retrieve the ledger descriptor for the ledger which entry should be added to.
        * The LedgerDescriptor returned from this method should be eventually freed with
        * #putHandle().
        *
        * @throws BookieException if masterKey does not match the master key of the ledger
        */
       private LedgerDescriptor getLedgerForEntry(ByteBuf entry, final byte[] masterKey)
               throws IOException, BookieException {
           final long ledgerId = entry.getLong(entry.readerIndex());
   
           LedgerDescriptor l = handles.getHandle(ledgerId, masterKey);
           if (masterKeyCache.get(ledgerId) == null) {
               // Force the load into masterKey cache
               byte[] oldValue = masterKeyCache.putIfAbsent(ledgerId, masterKey);
               if (oldValue == null) {
                   // new handle, we should add the key to journal ensure we can rebuild
                   ByteBuffer bb = ByteBuffer.allocate(8 + 8 + 4 + masterKey.length);
                   bb.putLong(ledgerId);
                   bb.putLong(METAENTRY_ID_LEDGER_KEY);
                   bb.putInt(masterKey.length);
                   bb.put(masterKey);
                   bb.flip();
   
                   getJournal(ledgerId).logAddEntry(bb, false /* ackBeforeSync */, new NopWriteCallback(), null);
               }
           }
   
           return l;
       }
   ```
   
   - add entry to ledger storage
   ```
       /**
        * Add an entry to a ledger as specified by handle.
        */
       private void addEntryInternal(LedgerDescriptor handle, ByteBuf entry,
                                     boolean ackBeforeSync, WriteCallback cb, Object ctx)
               throws IOException, BookieException {
           long ledgerId = handle.getLedgerId();
           long entryId = handle.addEntry(entry);
   
           writeBytes.add(entry.readableBytes());
   
           if (LOG.isTraceEnabled()) {
               LOG.trace("Adding {}@{}", entryId, ledgerId);
           }
           getJournal(ledgerId).logAddEntry(entry, ackBeforeSync, cb, ctx);
       }
   
   ```
   
   The problematic sequence can be described as below:
   
   - thread t1 is adding the first entry of ledger L1
   - thread t2 is adding entries of ledger L2
   - t1 is adding a journal entry of 'new ledger L1'
   - t2 is adding entries after t1 adds the journal entry and before t1 adding the entry to ledger storage
   - after t2 added entries, checkpoint happens in the ledger storage. it would roll the journal mark, which will claim the 'new ledger L1' entry as persistent.
   - if the bookie restarts, it would fail with no such ledger exception.
   
   The problem can be produced using a unit test: https://github.com/sijie/bookkeeper/commit/5053a717cd578aeb88236d373553d7494501b801
   
   *Solution*
   
   The fix is simple - just make sure the 'new ledger' journal entry is added after an entry is added to ledger storage. so it make sure when checkpoint happen it will flush and create the ledger before moving the journal mark.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services