You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@bookkeeper.apache.org by yo...@apache.org on 2021/10/25 01:41:30 UTC
[bookkeeper] branch master updated: Add error handling to
readLedgerMetadata in over-replicated ledger GC (#2844)
This is an automated email from the ASF dual-hosted git repository.
yong pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/bookkeeper.git
The following commit(s) were added to refs/heads/master by this push:
new bd5c50b Add error handling to readLedgerMetadata in over-replicated ledger GC (#2844)
bd5c50b is described below
commit bd5c50bf331c28e6a9db2b8d2b186b86342dbd6b
Author: shustsud <51...@users.noreply.github.com>
AuthorDate: Mon Oct 25 10:40:14 2021 +0900
Add error handling to readLedgerMetadata in over-replicated ledger GC (#2844)
### Motivation
For each ledger whose metadata is not in ZK, following stack trace will be output:
```
15:30:17.925 [GarbageCollectorThread-11-1] ERROR o.a.b.b.ScanAndCompareGarbageCollector - Exception when iterating through the ledgers to check for over-replication
java.util.concurrent.ExecutionException: org.apache.bookkeeper.client.BKException$BKNoSuchLedgerExistsException: No such ledger exists
at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908)
at org.apache.bookkeeper.bookie.ScanAndCompareGarbageCollector.removeOverReplicatedledgers(ScanAndCompareGarbageCollector.java:199)
at org.apache.bookkeeper.bookie.ScanAndCompareGarbageCollector.gc(ScanAndCompareGarbageCollector.java:120)
at org.apache.bookkeeper.bookie.GarbageCollectorThread.doGcLedgers(GarbageCollectorThread.java:372)
at org.apache.bookkeeper.bookie.GarbageCollectorThread.runWithFlags(GarbageCollectorThread.java:323)
at org.apache.bookkeeper.bookie.GarbageCollectorThread.safeRun(GarbageCollectorThread.java:301)
at org.apache.bookkeeper.common.util.SafeRunnable.run(SafeRunnable.java:36)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.bookkeeper.client.BKException$BKNoSuchLedgerExistsException: No such ledger exists
at org.apache.bookkeeper.meta.AbstractZkLedgerManager$3.processResult(AbstractZkLedgerManager.java:397)
at org.apache.bookkeeper.zookeeper.ZooKeeperClient$19$1.processResult(ZooKeeperClient.java:994)
at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:575)
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:508)
```
It is noisy, makes the size of log files large and finally causes OOM during log rotation.
So we should suppress the stacktrace.
(This problem is due to [#2813](https://github.com/apache/bookkeeper/pull/2813).)
### Changes
Add error handling to readLedgerMetadata in over-replicated ledger GC in order to suppress the stacktrace.
---
.../bookkeeper/bookie/ScanAndCompareGarbageCollector.java | 14 ++++++++++++--
1 file changed, 12 insertions(+), 2 deletions(-)
diff --git a/bookkeeper-server/src/main/java/org/apache/bookkeeper/bookie/ScanAndCompareGarbageCollector.java b/bookkeeper-server/src/main/java/org/apache/bookkeeper/bookie/ScanAndCompareGarbageCollector.java
index 4c778a1..faac259 100644
--- a/bookkeeper-server/src/main/java/org/apache/bookkeeper/bookie/ScanAndCompareGarbageCollector.java
+++ b/bookkeeper-server/src/main/java/org/apache/bookkeeper/bookie/ScanAndCompareGarbageCollector.java
@@ -234,9 +234,19 @@ public class ScanAndCompareGarbageCollector implements GarbageCollector {
// check ledger ensembles before creating lock nodes.
// this is to reduce the number of lock node creations and deletions in ZK.
// the ensemble check is done again after the lock node is created.
- // also, check if the ledger is being replicated already by the replication worker
Versioned<LedgerMetadata> preCheckMetadata = ledgerManager.readLedgerMetadata(ledgerId).get();
- if (!isNotBookieIncludedInLedgerEnsembles(preCheckMetadata) || lum.isLedgerBeingReplicated(ledgerId)) {
+ if (!isNotBookieIncludedInLedgerEnsembles(preCheckMetadata)) {
+ latch.countDown();
+ continue;
+ }
+ } catch (Throwable t) {
+ latch.countDown();
+ continue;
+ }
+
+ try {
+ // check if the ledger is being replicated already by the replication worker
+ if (lum.isLedgerBeingReplicated(ledgerId)) {
latch.countDown();
continue;
}