You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@bookkeeper.apache.org by "hangc0276 (via GitHub)" <gi...@apache.org> on 2023/04/13 11:53:22 UTC

[GitHub] [bookkeeper] hangc0276 opened a new pull request, #3917: Fix ledger replicated failed blocks bookie decommission process

hangc0276 opened a new pull request, #3917:
URL: https://github.com/apache/bookkeeper/pull/3917

   ### Motivation
   When I decommission one bookie (bk3), one ledger replicate failed and blocked decommission process.
   
   This is the auto-recovery log:
   ```
   2023-03-29T06:29:22,642+0000 [ReplicationWorker] ERROR org.apache.bookkeeper.client.LedgerHandle - ReadEntries exception on ledgerId:904368 firstEntry:14 lastEntry:14 lastAddConfirmed:13
   2023-03-29T06:29:22,642+0000 [ReplicationWorker] ERROR org.apache.bookkeeper.replication.ReplicationWorker - Received error: -1 while trying to read entry: 14 of ledger: 904368 in ReplicationWorker
   2023-03-29T06:29:22,642+0000 [ReplicationWorker] ERROR org.apache.bookkeeper.replication.ReplicationWorker - Failed to read faulty entries, so giving up replicating ledgerFragment Fragment(LedgerID: 904368, FirstEntryID: 0[0], LastKnownEntryID: 14[14], Host: [betausc1-bk-10.betausc1-bk-headless.o-vaxkx.svc.cluster.local:3181], Closed: true)
   2023-03-29T06:29:22,644+0000 [ReplicationWorker] ERROR org.apache.bookkeeper.replication.ReplicationWorker - ReplicationWorker failed to replicate Ledger : 904368 for 6 number of times, so deferring the ledger lock release by 300000 msecs
   ```
   The ledger's metadata:
   ```
   ledgerID: 904368
   2023-03-29T06:47:56,511+0000 [main] INFO  org.apache.bookkeeper.tools.cli.commands.
   client.LedgerMetaDataCommand - LedgerMetadata{formatVersion=3, ensembleSize=3, writeQuorumSize=3, 
   ackQuorumSize=2, state=OPEN, digestType=CRC32C, password=base64:, 
   ensembles={0=[bk1:3181, bk2:3181, bk3:3181], 15=[bk1:3181, bk2:3181, bk4:3181]},...}
   ```
   
   The ledger (904368) has two ensembles, `ensembles={0=[bk1:3181, bk2:3181, bk3:3181], 15=[bk1:3181, bk2:3181, bk4:3181]}`. However, the replication worker got the ledger's LAC is 13, but it got the replication fragment entry range is [0, 14]. When reading entry 14, it failed.
   
   In checking the ledger to find the missing replicas fragment, we use the next ensemble's `key - 1` as the current fragment's lastKnownEntryId, and it doesn't consider the ledger's lastAddConfirm.
   https://github.com/apache/bookkeeper/blob/912896deb2e748389e15e74c37539b2ff36302c7/bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerChecker.java#L384-L396
   
   **One question is why the ensembles created a new ensemble starting with entryId = 15, but the ledger's lastAddConfirm is 13.**
   
   This question is related to two parts, one is how the new ensemble was created and the other is how the lastAddConfirm was generated.
   
   #### How the new ensemble was created
   The ensemble change is controlled on the bookie client side. 
   
   When one entry is ready to send to the bookie server, the bookie client will check whether need to do the ensemble change.
   https://github.com/apache/bookkeeper/blob/912896deb2e748389e15e74c37539b2ff36302c7/bookkeeper-server/src/main/java/org/apache/bookkeeper/client/PendingAddOp.java#L254
   
   For the above case, when writing entry 15, one bookie is lost, it will trigger the ensemble change and generate the new ensemble: 15=[bk1:3181, bk2:3181, bk4:3181]. However, entry 15 write failed, such as timeout or bookie rejected the write.
   
   For now, entry 14 is written succeed.
   
   #### How the lastAddConfirm was generated
   Due to the ledger being in the `OPEN` state, the ledger handle will send a readLAC request to get the ledger's lastAddConfirm. 
   
   For the above case, if bk1 holds the max entry 14, bk2 holds the max entry 13 and bk3 is lost, the LedgerHandle get lastAddConfirm will be 13, not 14.
   
   ### Changes
   When generating missing replica fragments' entry range, take the ledger's lastAddConfirm into consideration.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@bookkeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [bookkeeper] eolivelli commented on a diff in pull request #3917: Fix ledger replicated failed blocks bookie decommission process

Posted by "eolivelli (via GitHub)" <gi...@apache.org>.
eolivelli commented on code in PR #3917:
URL: https://github.com/apache/bookkeeper/pull/3917#discussion_r1165418967


##########
bookkeeper-server/src/test/java/org/apache/bookkeeper/client/BookieWriteLedgerTest.java:
##########
@@ -1454,6 +1467,67 @@ public void testLedgerCreateByteBufRefCnt() throws Exception {
         bkc.deleteLedger(lh.ledgerId);
     }
 
+    @Test
+    public void testReadLacNotSameWithMetadata() throws Exception {
+       lh = bkc.createLedger(3, 3, 2, digestType, ledgerPassword);
+        for (int i = 0; i < 10; ++i) {
+            ByteBuffer entry = ByteBuffer.allocate(4);
+            entry.putInt(rng.nextInt(maxInt));
+            entry.position(0);
+            lh.addEntry(entry.array());
+        }
+
+        List<BookieId> ensemble = lh.getLedgerMetadata().getAllEnsembles().entrySet().iterator().next().getValue();
+        assertEquals(1, lh.getLedgerMetadata().getAllEnsembles().size());
+        killBookie(ensemble.get(1));
+
+
+        try {
+            lh.ensembleChangeLoop(ensemble, Collections.singletonMap(1, ensemble.get(1)));
+        } catch (Exception e) {

Review Comment:
   Of we expect an exception we should add a fail() statement inside the try block.
   Maybe we could also perform some assertion over this Exception (optionally)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@bookkeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [bookkeeper] hangc0276 commented on a diff in pull request #3917: Fix ledger replicated failed blocks bookie decommission process

Posted by "hangc0276 (via GitHub)" <gi...@apache.org>.
hangc0276 commented on code in PR #3917:
URL: https://github.com/apache/bookkeeper/pull/3917#discussion_r1165427023


##########
bookkeeper-server/src/test/java/org/apache/bookkeeper/client/BookieWriteLedgerTest.java:
##########
@@ -1454,6 +1467,67 @@ public void testLedgerCreateByteBufRefCnt() throws Exception {
         bkc.deleteLedger(lh.ledgerId);
     }
 
+    @Test
+    public void testReadLacNotSameWithMetadata() throws Exception {
+       lh = bkc.createLedger(3, 3, 2, digestType, ledgerPassword);
+        for (int i = 0; i < 10; ++i) {
+            ByteBuffer entry = ByteBuffer.allocate(4);
+            entry.putInt(rng.nextInt(maxInt));
+            entry.position(0);
+            lh.addEntry(entry.array());
+        }
+
+        List<BookieId> ensemble = lh.getLedgerMetadata().getAllEnsembles().entrySet().iterator().next().getValue();
+        assertEquals(1, lh.getLedgerMetadata().getAllEnsembles().size());
+        killBookie(ensemble.get(1));
+
+
+        try {
+            lh.ensembleChangeLoop(ensemble, Collections.singletonMap(1, ensemble.get(1)));
+        } catch (Exception e) {

Review Comment:
   done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@bookkeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [bookkeeper] hangc0276 commented on a diff in pull request #3917: Fix ledger replicated failed blocks bookie decommission process

Posted by "hangc0276 (via GitHub)" <gi...@apache.org>.
hangc0276 commented on code in PR #3917:
URL: https://github.com/apache/bookkeeper/pull/3917#discussion_r1168080849


##########
bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerFragment.java:
##########
@@ -53,8 +53,13 @@ public LedgerFragment(LedgerHandle lh,
         this.schedule = lh.getDistributionSchedule();
         SortedMap<Long, ? extends List<BookieId>> ensembles = lh
                 .getLedgerMetadata().getAllEnsembles();
+        // Check if the ledger fragment is closed has two conditions
+        // 1. The ledger is closed
+        // 2. This fragment is not the last fragment and this ledger's lastAddConfirm >= ensembles.lastKey() - 1.
+        //    This case happens when the ledger's last ensemble is empty
         this.isLedgerClosed = lh.getLedgerMetadata().isClosed()
-                || !ensemble.equals(ensembles.get(ensembles.lastKey()));
+                || (!ensemble.equals(ensembles.get(ensembles.lastKey()))
+            && lh.getLastAddConfirmed() >= ensembles.lastKey() - 1);

Review Comment:
   Maybe not. Take the following case for example:
   0: [bk1, bk2, bk3]
   10: [bk1, bk2, bk4]
   LAC = 100
   Ledger is OPEN
   
   For the first ensemble: isLedgerClosed = (false || (true && 100 >= 9)) => true
   For the last ensemble: isLedgerClosed = (false || (false && 100 >= 9)) => false
   The result is expected.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@bookkeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [bookkeeper] horizonzy commented on a diff in pull request #3917: Fix ledger replicated failed blocks bookie decommission process

Posted by "horizonzy (via GitHub)" <gi...@apache.org>.
horizonzy commented on code in PR #3917:
URL: https://github.com/apache/bookkeeper/pull/3917#discussion_r1168134657


##########
bookkeeper-server/src/test/java/org/apache/bookkeeper/replication/TestReplicationWorker.java:
##########
@@ -1158,6 +1158,33 @@ public void testRWShutDownInTheCaseOfZKOperationFailures() throws Exception {
         }
     }
 
+    @Test
+    public void testReplicateEmptyOpenStateLedger() throws Exception {
+        LedgerHandle lh = bkc.createLedger(3, 3, 2, BookKeeper.DigestType.CRC32, TESTPASSWD);
+        assertFalse(lh.getLedgerMetadata().isClosed());
+
+        List<BookieId> firstEnsemble = lh.getLedgerMetadata().getAllEnsembles().firstEntry().getValue();

Review Comment:
   `firstEnsemble` is unused



##########
bookkeeper-server/src/main/java/org/apache/bookkeeper/replication/ReplicationWorker.java:
##########
@@ -561,6 +563,14 @@ private boolean rereplicate(long ledgerIdToReplicate) throws InterruptedExceptio
      *
      * <p>Missing bookies in closed ledgers are fine, as we know the last confirmed add, so
      * we can tell which entries are supposed to exist and rereplicate them if necessary.
+     *
+     * <p>Another corner case is that there are multiple ensembles in the ledger and the last
+     * segment/ensemble is open, but nothing has been written to some quorums in the ensemble.
+     * For the v2 protocol, this ledger's lastAddConfirm entry is the last segment/ensemble's `key - 2`,
+     * not `key - 2`, the explanation please refer to: https://github.com/apache/bookkeeper/pull/3917.
+     * If we treat the penultimate segment/ensemble as closed state, we will can't replicate
+     * the last entry in the segment. So in this case, we should also check if the penultimate
+     * segment/ensemble has missing bookies.
      */
     private boolean isLastSegmentOpenAndMissingBookies(LedgerHandle lh) throws BKException {
         LedgerMetadata md = admin.getLedgerMetadata(lh);

Review Comment:
   `md` is unused.



##########
bookkeeper-server/src/main/java/org/apache/bookkeeper/replication/ReplicationWorker.java:
##########
@@ -570,6 +580,10 @@ private boolean isLastSegmentOpenAndMissingBookies(LedgerHandle lh) throws BKExc
 
         SortedMap<Long, ? extends List<BookieId>> ensembles = admin.getLedgerMetadata(lh).getAllEnsembles();
         List<BookieId> finalEnsemble = ensembles.get(ensembles.lastKey());
+        if (ensembles.size() > 1 && lh.getLastAddConfirmed() < ensembles.lastKey() - 1) {
+            finalEnsemble = new ArrayList<>(finalEnsemble);

Review Comment:
   `finalEnsemble = new ArrayList<>(finalEnsemble);` may be meaningless. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@bookkeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [bookkeeper] hangc0276 commented on a diff in pull request #3917: Fix ledger replicated failed blocks bookie decommission process

Posted by "hangc0276 (via GitHub)" <gi...@apache.org>.
hangc0276 commented on code in PR #3917:
URL: https://github.com/apache/bookkeeper/pull/3917#discussion_r1168136018


##########
bookkeeper-server/src/main/java/org/apache/bookkeeper/replication/ReplicationWorker.java:
##########
@@ -570,6 +580,10 @@ private boolean isLastSegmentOpenAndMissingBookies(LedgerHandle lh) throws BKExc
 
         SortedMap<Long, ? extends List<BookieId>> ensembles = admin.getLedgerMetadata(lh).getAllEnsembles();
         List<BookieId> finalEnsemble = ensembles.get(ensembles.lastKey());
+        if (ensembles.size() > 1 && lh.getLastAddConfirmed() < ensembles.lastKey() - 1) {
+            finalEnsemble = new ArrayList<>(finalEnsemble);

Review Comment:
   The `finalEnsemble` is immutable in the ledger metadata.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@bookkeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [bookkeeper] horizonzy commented on a diff in pull request #3917: Fix ledger replicated failed blocks bookie decommission process

Posted by "horizonzy (via GitHub)" <gi...@apache.org>.
horizonzy commented on code in PR #3917:
URL: https://github.com/apache/bookkeeper/pull/3917#discussion_r1168137286


##########
bookkeeper-server/src/main/java/org/apache/bookkeeper/replication/ReplicationWorker.java:
##########
@@ -570,6 +580,10 @@ private boolean isLastSegmentOpenAndMissingBookies(LedgerHandle lh) throws BKExc
 
         SortedMap<Long, ? extends List<BookieId>> ensembles = admin.getLedgerMetadata(lh).getAllEnsembles();
         List<BookieId> finalEnsemble = ensembles.get(ensembles.lastKey());
+        if (ensembles.size() > 1 && lh.getLastAddConfirmed() < ensembles.lastKey() - 1) {
+            finalEnsemble = new ArrayList<>(finalEnsemble);

Review Comment:
   Oh, get it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@bookkeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [bookkeeper] horizonzy commented on pull request #3917: Fix ledger replicated failed blocks bookie decommission process

Posted by "horizonzy (via GitHub)" <gi...@apache.org>.
horizonzy commented on PR #3917:
URL: https://github.com/apache/bookkeeper/pull/3917#issuecomment-1507285235

   > How the lastAddConfirm was generated
   > Due to the ledger being in the OPEN state, the ledger handle will send a readLAC request to get the ledger's lastAddConfirm.
   > For the above case, if bk1 holds the max entry 14, bk2 holds the max entry 13 and bk3 holds the max entry 14 but it is lost, the LedgerHandle get lastAddConfirm will be 13, not 14.
   
   I have a question about this. 
   
   ```
   public void checkLedger(final LedgerHandle lh,
                               final GenericCallback<Set<LedgerFragment>> cb,
                               long percentageOfLedgerFragmentToBeVerified)
   ```
   
   There are two places to invoke LedgerChecker#checkLedger.
   
   1. https://github.com/apache/bookkeeper/blob/35e9da9b55b5d44459d3421e8704be47afc6f914/bookkeeper-server/src/main/java/org/apache/bookkeeper/replication/AuditorCheckAllLedgersTask.java#L196-L212
   
   2.https://github.com/apache/bookkeeper/blob/35e9da9b55b5d44459d3421e8704be47afc6f914/bookkeeper-server/src/main/java/org/apache/bookkeeper/replication/ReplicationWorker.java#L449-L451
   
   Both places use the openLedgerNoRecovery to open the  LedgerHandle.
   
   It will use lh.asyncReadLastConfirmed() to get the LAC.
   https://github.com/apache/bookkeeper/blob/35e9da9b55b5d44459d3421e8704be47afc6f914/bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerOpenOp.java#L222-L245
   
   And we use the V2 protocol, it invokes `asyncReadPiggybackLastConfirmed`.
   https://github.com/apache/bookkeeper/blob/35e9da9b55b5d44459d3421e8704be47afc6f914/bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerHandle.java#L1388-L1395
   
   In asyncReadPiggybackLastConfirmed, it invokes ReadLastConfirmedOp#initiate() to get the LAC. 
   
   https://github.com/apache/bookkeeper/blob/35e9da9b55b5d44459d3421e8704be47afc6f914/bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerHandle.java#L1422-L1428
   
   
   In the ReadLastConfirmedOp, it will read all bookies in the currentEnsemble. The currentEnsemble is `15=[bk1:3181, bk2:3181, bk4:3181]`.  Not the {0=[bk1:3181, bk2:3181, bk3:3181]. 
   So it will send readLac RPC to bk1, bk2, and bk4. Then get response from bk1, bk2, and bk4.
   When ReadLastConfirmedOp#readEntryComplete, it will compare the response from different bookies, then pick the max lac response to override maxRecoveredData. Then invoke the callback using the maxRecoveredData.
   
   So if the bk1 lac is 14, the ledgerHandle lac should be 14, not 13.
   
   
   https://github.com/apache/bookkeeper/blob/35e9da9b55b5d44459d3421e8704be47afc6f914/bookkeeper-server/src/main/java/org/apache/bookkeeper/client/ReadLastConfirmedOp.java#L197-L154
   
   line_l08-line_112, it will pick the max lac to override maxRecoveredData.
   
   line_137-line_148, use the maxRecoveredData to invoke callback.
   
   
   
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@bookkeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [bookkeeper] hangc0276 commented on pull request #3917: Fix ledger replicated failed blocks bookie decommission process

Posted by "hangc0276 (via GitHub)" <gi...@apache.org>.
hangc0276 commented on PR #3917:
URL: https://github.com/apache/bookkeeper/pull/3917#issuecomment-1509508971

   > I wonder of we risk to lose one entry in case of AQ < WQ.
   > 
   > I have to think more
   
   @eolivelli Sorry, I misunderstood the lastAddConfirm part. It doesn't have the risk of losing one entry. I updated the description, please help take a look, thanks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@bookkeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [bookkeeper] horizonzy commented on a diff in pull request #3917: Fix ledger replicated failed blocks bookie decommission process

Posted by "horizonzy (via GitHub)" <gi...@apache.org>.
horizonzy commented on code in PR #3917:
URL: https://github.com/apache/bookkeeper/pull/3917#discussion_r1168127349


##########
bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerFragment.java:
##########
@@ -53,8 +53,13 @@ public LedgerFragment(LedgerHandle lh,
         this.schedule = lh.getDistributionSchedule();
         SortedMap<Long, ? extends List<BookieId>> ensembles = lh
                 .getLedgerMetadata().getAllEnsembles();
+        // Check if the ledger fragment is closed has two conditions
+        // 1. The ledger is closed
+        // 2. This fragment is not the last fragment and this ledger's lastAddConfirm >= ensembles.lastKey() - 1.
+        //    This case happens when the ledger's last ensemble is empty
         this.isLedgerClosed = lh.getLedgerMetadata().isClosed()
-                || !ensemble.equals(ensembles.get(ensembles.lastKey()));
+                || (!ensemble.equals(ensembles.get(ensembles.lastKey()))
+            && lh.getLastAddConfirmed() >= ensembles.lastKey() - 1);

Review Comment:
   Yes, you are right.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@bookkeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [bookkeeper] horizonzy commented on a diff in pull request #3917: Fix ledger replicated failed blocks bookie decommission process

Posted by "horizonzy (via GitHub)" <gi...@apache.org>.
horizonzy commented on code in PR #3917:
URL: https://github.com/apache/bookkeeper/pull/3917#discussion_r1167971853


##########
bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerFragment.java:
##########
@@ -53,8 +53,13 @@ public LedgerFragment(LedgerHandle lh,
         this.schedule = lh.getDistributionSchedule();
         SortedMap<Long, ? extends List<BookieId>> ensembles = lh
                 .getLedgerMetadata().getAllEnsembles();
+        // Check if the ledger fragment is closed has two conditions
+        // 1. The ledger is closed
+        // 2. This fragment is not the last fragment and this ledger's lastAddConfirm >= ensembles.lastKey() - 1.
+        //    This case happens when the ledger's last ensemble is empty
         this.isLedgerClosed = lh.getLedgerMetadata().isClosed()
-                || !ensemble.equals(ensembles.get(ensembles.lastKey()));
+                || (!ensemble.equals(ensembles.get(ensembles.lastKey()))
+            && lh.getLastAddConfirmed() >= ensembles.lastKey() - 1);

Review Comment:
   Here should be `lh.getLastAddConfirmed() >= ensembles.lastKey() - 2`.
   The last entry in the bookie may be entry 9, the getLastAddConfirmed is 9-1 = 8. The last ensemble key maybe 10. 8 >= 10 - 2.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@bookkeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [bookkeeper] hangc0276 commented on pull request #3917: Fix ledger replicated failed blocks bookie decommission process

Posted by "hangc0276 (via GitHub)" <gi...@apache.org>.
hangc0276 commented on PR #3917:
URL: https://github.com/apache/bookkeeper/pull/3917#issuecomment-1518918757

   @merlimat @eolivelli @dlg99 Please help take a look, thanks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@bookkeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [bookkeeper] zymap merged pull request #3917: Fix ledger replicated failed blocks bookie decommission process

Posted by "zymap (via GitHub)" <gi...@apache.org>.
zymap merged PR #3917:
URL: https://github.com/apache/bookkeeper/pull/3917


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@bookkeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [bookkeeper] hangc0276 commented on pull request #3917: Fix ledger replicated failed blocks bookie decommission process

Posted by "hangc0276 (via GitHub)" <gi...@apache.org>.
hangc0276 commented on PR #3917:
URL: https://github.com/apache/bookkeeper/pull/3917#issuecomment-1509508411

   > > How the lastAddConfirm was generated
   > > Due to the ledger being in the OPEN state, the ledger handle will send a readLAC request to get the ledger's lastAddConfirm.
   > > For the above case, if bk1 holds the max entry 14, bk2 holds the max entry 13 and bk3 holds the max entry 14 but it is lost, the LedgerHandle get lastAddConfirm will be 13, not 14.
   > 
   > I have a question about this.
   > 
   > ```
   > public void checkLedger(final LedgerHandle lh,
   >                             final GenericCallback<Set<LedgerFragment>> cb,
   >                             long percentageOfLedgerFragmentToBeVerified)
   > ```
   > 
   > There are two places to invoke LedgerChecker#checkLedger.
   > 
   > 1. https://github.com/apache/bookkeeper/blob/35e9da9b55b5d44459d3421e8704be47afc6f914/bookkeeper-server/src/main/java/org/apache/bookkeeper/replication/AuditorCheckAllLedgersTask.java#L196-L212
   > 
   > 2.
   > 
   > https://github.com/apache/bookkeeper/blob/35e9da9b55b5d44459d3421e8704be47afc6f914/bookkeeper-server/src/main/java/org/apache/bookkeeper/replication/ReplicationWorker.java#L449-L451
   > 
   > Both places use the openLedgerNoRecovery to open the LedgerHandle.
   > 
   > It will use lh.asyncReadLastConfirmed() to get the LAC.
   > 
   > https://github.com/apache/bookkeeper/blob/35e9da9b55b5d44459d3421e8704be47afc6f914/bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerOpenOp.java#L222-L245
   > 
   > And we use the V2 protocol, it invokes `asyncReadPiggybackLastConfirmed`.
   > 
   > https://github.com/apache/bookkeeper/blob/35e9da9b55b5d44459d3421e8704be47afc6f914/bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerHandle.java#L1388-L1395
   > 
   > In asyncReadPiggybackLastConfirmed, it invokes ReadLastConfirmedOp#initiate() to get the LAC.
   > 
   > https://github.com/apache/bookkeeper/blob/35e9da9b55b5d44459d3421e8704be47afc6f914/bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerHandle.java#L1422-L1428
   > 
   > In the ReadLastConfirmedOp, it will read all bookies in the currentEnsemble. The currentEnsemble is `15=[bk1:3181, bk2:3181, bk4:3181]`. Not the {0=[bk1:3181, bk2:3181, bk3:3181]. So it will send readLac RPC to bk1, bk2, and bk4. Then get response from bk1, bk2, and bk4. When ReadLastConfirmedOp#readEntryComplete, it will compare the response from different bookies, then pick the max lac response to override maxRecoveredData. Then invoke the callback using the maxRecoveredData.
   > 
   > So if the bk1 lac is 14, the ledgerHandle lac should be 14, not 13.
   > 
   > https://github.com/apache/bookkeeper/blob/35e9da9b55b5d44459d3421e8704be47afc6f914/bookkeeper-server/src/main/java/org/apache/bookkeeper/client/ReadLastConfirmedOp.java#L97-L154
   > 
   > line_l08-line_112, it will pick the max lac to override maxRecoveredData.
   > 
   > line_137-line_148, use the maxRecoveredData to invoke callback.
   
   @horizonzy Yes, you are right. I updated the description, please help take a look, thanks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@bookkeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [bookkeeper] horizonzy commented on a diff in pull request #3917: Fix ledger replicated failed blocks bookie decommission process

Posted by "horizonzy (via GitHub)" <gi...@apache.org>.
horizonzy commented on code in PR #3917:
URL: https://github.com/apache/bookkeeper/pull/3917#discussion_r1167971853


##########
bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerFragment.java:
##########
@@ -53,8 +53,13 @@ public LedgerFragment(LedgerHandle lh,
         this.schedule = lh.getDistributionSchedule();
         SortedMap<Long, ? extends List<BookieId>> ensembles = lh
                 .getLedgerMetadata().getAllEnsembles();
+        // Check if the ledger fragment is closed has two conditions
+        // 1. The ledger is closed
+        // 2. This fragment is not the last fragment and this ledger's lastAddConfirm >= ensembles.lastKey() - 1.
+        //    This case happens when the ledger's last ensemble is empty
         this.isLedgerClosed = lh.getLedgerMetadata().isClosed()
-                || !ensemble.equals(ensembles.get(ensembles.lastKey()));
+                || (!ensemble.equals(ensembles.get(ensembles.lastKey()))
+            && lh.getLastAddConfirmed() >= ensembles.lastKey() - 1);

Review Comment:
   Here should be `lh.getLastAddConfirmed() == ensembles.lastKey() - 2. 
   The lac maybe 100. The last ensemble key maybe 10.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@bookkeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org