You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@bookkeeper.apache.org by GitBox <gi...@apache.org> on 2020/12/03 07:37:07 UTC

[GitHub] [bookkeeper] michaeljmarshall opened a new issue #2506: Autorecovery Process Tries to Connect to Bookie that Cleanly Exited Cluster

michaeljmarshall opened a new issue #2506:
URL: https://github.com/apache/bookkeeper/issues/2506


   **BUG REPORT**
   
   ***Describe the bug***
   
   After a bookie gracefully shuts down and leaves the cluster, autorecovery processes try (very frequently) to connect to the shutdown bookie leading to a ton of unnecessary calls, unnecessary logging, and increased CPU utilization.
   
   ***To Reproduce***
   
   I am running bookkeeper as part of a pulsar cluster, so my process to stand up the bookkeeper nodes, zookeeper nodes, and autorecovery nodes is likely a little different. If necessary, I can try to reproduce this problem in a solo bookkeeper cluster. Please let me know what is needed.
   
   Steps to reproduce the behavior:
   1. Create pulsar cluster using the pulsar operator. (Each component runs in a container from docker image `apachepulsar/pulsar:2.6.2`, which ships with bookkeeper jars from version 4.10.0. The bookies are started with `bin/apply-config-from-env.py conf/bookkeeper.conf && bin/apply-config-from-env.py conf/pulsar_env.sh && exec bin/pulsar bookie` and the autorecovery containers are started with `bin/apply-config-from-env.py conf/bookkeeper.conf && exec bin/bookkeeper autorecovery`.)
   2. Start producing data to the bookies to ensure that there is something to be underreplicated.
   3. Gracefully shutdown one of the bookie nodes. (I validated that the bookie is no longer listed under `/ledgers/available` in zookeeper.)
   4. Watch the logs in the autorecovery containers, and see that the process first logs that the bookie is removed, then attempts to connect to it many times.
   5. Restart the autorecovery process, and it will continue to attempt to connect to the stopped bookie. (This implies its not just in some cache.)
   
   ***Expected behavior***
   
   I would expect that the autorecovery process would not try to connect to a bookie that it _knows_ has left the cluster.
   
   My theory is that the autorecovery process is trying to connect to nodes based on the ensemble or perhaps the write quorum or the ack quorum. I read through the autorecovery code a bit last night, but I haven't been able to find the exact place where the code is holding onto a reference to the stopped bookie.
   
   ***Logs***
   
   Here are logs from the bookie that is stopped. I grabbed the logs at the point when the bookie receives the SIGTERM. This shows that the bookie is stopping correctly, as far as I can tell.
   
   ```
   07:04:11.393 [component-shutdown-thread] INFO  org.apache.bookkeeper.common.component.ComponentStarter - Closing component bookie-server in shutdown hook.
   07:04:11.395 [component-shutdown-thread] INFO  org.apache.bookkeeper.proto.BookieServer - Shutting down BookieServer
   07:04:11.396 [component-shutdown-thread] INFO  org.apache.bookkeeper.proto.BookieNettyServer - Shutting down BookieNettyServer
   07:04:11.404 [bookie-io-1-4] INFO  org.apache.bookkeeper.proto.BookieRequestHandler - Channels disconnected: [id: 0x09d7bf13, L:/10.100.157.164:3181 ! R:/10.100.186.63:60358]
   07:04:11.407 [bookie-io-1-3] INFO  org.apache.bookkeeper.proto.BookieRequestHandler - Channels disconnected: [id: 0x132e871c, L:/10.100.157.164:3181 ! R:/10.100.186.63:60368]
   07:04:11.408 [bookie-io-1-4] INFO  org.apache.bookkeeper.proto.BookieRequestHandler - Channels disconnected: [id: 0x70d55455, L:/10.100.157.164:3181 ! R:/10.100.157.90:39336]
   07:04:11.409 [bookie-io-1-3] INFO  org.apache.bookkeeper.proto.BookieRequestHandler - Channels disconnected: [id: 0xea12a92f, L:/10.100.157.164:3181 ! R:/10.100.182.35:52362]
   07:04:11.409 [bookie-io-1-4] INFO  org.apache.bookkeeper.proto.BookieRequestHandler - Channels disconnected: [id: 0x3538803d, L:/10.100.157.164:3181 ! R:/10.100.186.63:60330]
   07:04:11.409 [bookie-io-1-3] INFO  org.apache.bookkeeper.proto.BookieRequestHandler - Channels disconnected: [id: 0x7af4a041, L:/10.100.157.164:3181 ! R:/10.100.186.63:60318]
   07:04:11.410 [bookie-io-1-2] INFO  org.apache.bookkeeper.proto.BookieRequestHandler - Channels disconnected: [id: 0x801ff1b1, L:/10.100.157.164:3181 ! R:/10.100.182.35:52400]
   07:04:11.409 [bookie-io-1-4] INFO  org.apache.bookkeeper.proto.BookieRequestHandler - Channels disconnected: [id: 0x09983fec, L:/10.100.157.164:3181 ! R:/10.100.182.35:52368]
   07:04:11.410 [bookie-io-1-2] INFO  org.apache.bookkeeper.proto.BookieRequestHandler - Channels disconnected: [id: 0x1fe9039f, L:/10.100.157.164:3181 ! R:/10.100.186.63:60354]
   07:04:11.410 [bookie-io-1-4] INFO  org.apache.bookkeeper.proto.BookieRequestHandler - Channels disconnected: [id: 0x93f5ffda, L:/10.100.157.164:3181 ! R:/10.100.157.90:39296]
   07:04:11.410 [bookie-io-1-3] INFO  org.apache.bookkeeper.proto.BookieRequestHandler - Channels disconnected: [id: 0x7f6aff63, L:/10.100.157.164:3181 ! R:/10.100.157.90:39326]
   07:04:11.411 [bookie-io-1-4] INFO  org.apache.bookkeeper.proto.BookieRequestHandler - Channels disconnected: [id: 0x593fc65f, L:/10.100.157.164:3181 ! R:/10.100.182.35:52422]
   07:04:11.411 [bookie-io-1-3] INFO  org.apache.bookkeeper.proto.BookieRequestHandler - Channels disconnected: [id: 0xd62ac189, L:/10.100.157.164:3181 ! R:/10.100.157.90:39286]
   07:04:11.411 [bookie-io-1-2] INFO  org.apache.bookkeeper.proto.BookieRequestHandler - Channels disconnected: [id: 0xbd771f1f, L:/10.100.157.164:3181 ! R:/10.100.182.35:52346]
   07:04:11.412 [bookie-io-1-4] INFO  org.apache.bookkeeper.proto.BookieRequestHandler - Channels disconnected: [id: 0xb0a41ff1, L:/10.100.157.164:3181 ! R:/10.100.182.35:52388]
   07:04:11.412 [bookie-io-1-3] INFO  org.apache.bookkeeper.proto.BookieRequestHandler - Channels disconnected: [id: 0x93f6dea1, L:/10.100.157.164:3181 ! R:/10.100.182.35:52394]
   07:04:11.412 [bookie-io-1-4] INFO  org.apache.bookkeeper.proto.BookieRequestHandler - Channels disconnected: [id: 0x6ab2ea71, L:/10.100.157.164:3181 ! R:/10.100.157.90:39330]
   07:04:11.412 [bookie-io-1-2] INFO  org.apache.bookkeeper.proto.BookieRequestHandler - Channels disconnected: [id: 0xcf8a7a0c, L:/10.100.157.164:3181 ! R:/10.100.157.90:39304]
   07:04:11.412 [bookie-io-1-4] INFO  org.apache.bookkeeper.proto.BookieRequestHandler - Channels disconnected: [id: 0x3eec8af8, L:/10.100.157.164:3181 ! R:/10.100.186.63:60316]
   07:04:11.412 [bookie-io-1-3] INFO  org.apache.bookkeeper.proto.BookieRequestHandler - Channels disconnected: [id: 0xb7e7607f, L:/10.100.157.164:3181 ! R:/10.100.182.35:52402]
   07:04:11.412 [bookie-io-1-4] INFO  org.apache.bookkeeper.proto.BookieRequestHandler - Channels disconnected: [id: 0x0e99368e, L:/10.100.157.164:3181 ! R:/10.100.157.90:39316]
   07:04:11.412 [bookie-io-1-3] INFO  org.apache.bookkeeper.proto.BookieRequestHandler - Channels disconnected: [id: 0x6c4260d6, L:/10.100.157.164:3181 ! R:/10.100.157.90:39310]
   07:04:11.412 [bookie-io-1-2] INFO  org.apache.bookkeeper.proto.BookieRequestHandler - Channels disconnected: [id: 0x3290c225, L:/10.100.157.164:3181 ! R:/10.100.182.35:52382]
   07:04:11.412 [bookie-io-1-4] INFO  org.apache.bookkeeper.proto.BookieRequestHandler - Channels disconnected: [id: 0x0fd9fe17, L:/10.100.157.164:3181 ! R:/10.100.182.35:52354]
   07:04:11.412 [bookie-io-1-3] INFO  org.apache.bookkeeper.proto.BookieRequestHandler - Channels disconnected: [id: 0x29b9a9ad, L:/10.100.157.164:3181 ! R:/10.100.186.63:60338]
   07:04:11.412 [bookie-io-1-4] INFO  org.apache.bookkeeper.proto.BookieRequestHandler - Channels disconnected: [id: 0xc82f01d0, L:/10.100.157.164:3181 ! R:/10.100.157.90:39328]
   07:04:11.412 [bookie-io-1-3] INFO  org.apache.bookkeeper.proto.BookieRequestHandler - Channels disconnected: [id: 0x1033f5d3, L:/10.100.157.164:3181 ! R:/10.100.186.63:60356]
   07:04:11.412 [bookie-io-1-2] INFO  org.apache.bookkeeper.proto.BookieRequestHandler - Channels disconnected: [id: 0x4be78a41, L:/10.100.157.164:3181 ! R:/10.100.157.90:39334]
   07:04:11.412 [bookie-io-1-3] INFO  org.apache.bookkeeper.proto.BookieRequestHandler - Channels disconnected: [id: 0x1a648110, L:/10.100.157.164:3181 ! R:/10.100.186.63:60364]
   07:04:11.412 [bookie-io-1-2] INFO  org.apache.bookkeeper.proto.BookieRequestHandler - Channels disconnected: [id: 0x55a74676, L:/10.100.157.164:3181 ! R:/10.100.186.63:60336]
   07:04:11.412 [bookie-io-1-3] INFO  org.apache.bookkeeper.proto.BookieRequestHandler - Channels disconnected: [id: 0x6775d4e4, L:/10.100.157.164:3181 ! R:/10.100.182.35:52348]
   07:04:11.413 [bookie-io-1-1] INFO  org.apache.bookkeeper.proto.BookieRequestHandler - Channels disconnected: [id: 0xa4b14ea8, L:/10.100.157.164:3181 ! R:/10.100.157.90:39322]
   07:04:11.413 [bookie-io-1-2] INFO  org.apache.bookkeeper.proto.BookieRequestHandler - Channels disconnected: [id: 0x1f49ed3e, L:/10.100.157.164:3181 ! R:/10.100.157.90:39288]
   07:04:11.413 [bookie-io-1-1] INFO  org.apache.bookkeeper.proto.BookieRequestHandler - Channels disconnected: [id: 0x3d10cd4c, L:/10.100.157.164:3181 ! R:/10.100.157.90:39332]
   07:04:11.414 [bookie-io-1-2] INFO  org.apache.bookkeeper.proto.BookieRequestHandler - Channels disconnected: [id: 0x8cf54abe, L:/10.100.157.164:3181 ! R:/10.100.186.63:60308]
   07:04:11.414 [bookie-io-1-1] INFO  org.apache.bookkeeper.proto.BookieRequestHandler - Channels disconnected: [id: 0x52ef4540, L:/10.100.157.164:3181 ! R:/10.100.182.35:52356]
   07:04:11.415 [bookie-io-1-2] INFO  org.apache.bookkeeper.proto.BookieRequestHandler - Channels disconnected: [id: 0xf3cb9a8a, L:/10.100.157.164:3181 ! R:/10.100.186.63:60362]
   07:04:11.415 [bookie-io-1-1] INFO  org.apache.bookkeeper.proto.BookieRequestHandler - Channels disconnected: [id: 0xbb142ac2, L:/10.100.157.164:3181 ! R:/10.100.186.63:60360]
   07:04:11.415 [bookie-io-1-2] INFO  org.apache.bookkeeper.proto.BookieRequestHandler - Channels disconnected: [id: 0xbf544206, L:/10.100.157.164:3181 ! R:/10.100.157.90:39324]
   07:04:11.415 [bookie-io-1-1] INFO  org.apache.bookkeeper.proto.BookieRequestHandler - Channels disconnected: [id: 0xbe0d8aa9, L:/10.100.157.164:3181 ! R:/10.100.157.90:39338]
   07:04:11.415 [bookie-io-1-2] INFO  org.apache.bookkeeper.proto.BookieRequestHandler - Channels disconnected: [id: 0xf03a7c72, L:/10.100.157.164:3181 ! R:/10.100.182.35:52366]
   07:04:11.415 [bookie-io-1-1] INFO  org.apache.bookkeeper.proto.BookieRequestHandler - Channels disconnected: [id: 0xb8e915aa, L:/10.100.157.164:3181 ! R:/10.100.186.63:60310]
   07:04:11.416 [bookie-io-1-1] INFO  org.apache.bookkeeper.proto.BookieRequestHandler - Channels disconnected: [id: 0xbeb75e87, L:/10.100.157.164:3181 ! R:/10.100.186.63:60352]
   07:04:11.416 [bookie-io-1-1] INFO  org.apache.bookkeeper.proto.BookieRequestHandler - Channels disconnected: [id: 0xb4eb45b4, L:/10.100.157.164:3181 ! R:/10.100.182.35:52390]
   07:04:11.416 [bookie-io-1-1] INFO  org.apache.bookkeeper.proto.BookieRequestHandler - Channels disconnected: [id: 0x7e4b9636, L:/10.100.157.164:3181 ! R:/10.100.186.63:60328]
   07:04:11.416 [bookie-io-1-1] INFO  org.apache.bookkeeper.proto.BookieRequestHandler - Channels disconnected: [id: 0xaebfcb28, L:/10.100.157.164:3181 ! R:/10.100.182.35:52428]
   07:04:11.416 [bookie-io-1-1] INFO  org.apache.bookkeeper.proto.BookieRequestHandler - Channels disconnected: [id: 0x180315cd, L:/10.100.157.164:3181 ! R:/10.100.182.35:52386]
   07:04:11.416 [bookie-io-1-1] INFO  org.apache.bookkeeper.proto.BookieRequestHandler - Channels disconnected: [id: 0x83ca43a0, L:/10.100.157.164:3181 ! R:/10.100.157.90:39298]
   07:04:11.424 [component-shutdown-thread] INFO  org.apache.bookkeeper.bookie.Bookie - Shutting down Bookie-3181 with exitCode 0
   07:04:11.425 [component-shutdown-thread] INFO  org.apache.bookkeeper.bookie.Bookie - Turning bookie to read only during shut down
   07:04:11.425 [component-shutdown-thread] INFO  org.apache.bookkeeper.bookie.SyncThread - Shutting down SyncThread
   07:04:12.711 [SyncThread-7-1] INFO  org.apache.bookkeeper.bookie.EntryLogManagerBase - Creating a new entry log file : createNewLog = false, reachEntryLogLimit = true
   07:04:12.711 [SyncThread-7-1] INFO  org.apache.bookkeeper.bookie.EntryLogManagerBase - Flushing entry logger 102 back to filesystem, pending for syncing entry loggers : [BufferedChannel{logId=102, logFile=data/bookkeeper/ledgers/current/66.log, ledgerIdAssigned=-1}].
   07:04:12.712 [pool-5-thread-1] INFO  org.apache.bookkeeper.bookie.EntryLoggerAllocator - Created new entry log file data/bookkeeper/ledgers/current/68.log for logId 104.
   07:04:14.042 [SyncThread-7-1] INFO  org.apache.bookkeeper.bookie.EntryLogManagerForSingleEntryLog - Synced entry logger 102 to disk.
   07:04:14.053 [SyncThread-7-1] INFO  org.apache.bookkeeper.bookie.Journal - garbage collected journal 176275d94bd.txn
   07:04:14.053 [SyncThread-7-1] INFO  org.apache.bookkeeper.bookie.SyncThread - Flush ledger storage at checkpoint CheckpointList{checkpoints=[LogMark: logFileId - 1606978213059 , logFileOffset - 318435328]}.
   07:04:14.054 [component-shutdown-thread] INFO  org.apache.bookkeeper.bookie.Journal - Shutting down Journal
   07:04:14.054 [ForceWriteThread] INFO  org.apache.bookkeeper.bookie.Journal - ForceWrite thread interrupted
   07:04:14.054 [BookieJournal-3181] INFO  org.apache.bookkeeper.bookie.Journal - Journal exits when shutting down
   07:04:14.054 [BookieJournal-3181] INFO  org.apache.bookkeeper.bookie.Journal - Journal exited loop!
   07:04:14.056 [component-shutdown-thread] INFO  org.apache.bookkeeper.bookie.Journal - Finished Shutting down Journal thread
   07:04:14.056 [Bookie-3181] INFO  org.apache.bookkeeper.bookie.Bookie - Journal thread(s) quit.
   07:04:14.063 [component-shutdown-thread] INFO  org.apache.bookkeeper.bookie.GarbageCollectorThread - Shutting down GarbageCollectorThread
   07:04:14.063 [component-shutdown-thread] INFO  org.apache.bookkeeper.bookie.EntryLogger - Stopping EntryLogger
   07:04:14.064 [component-shutdown-thread] INFO  org.apache.bookkeeper.bookie.EntryLoggerAllocator - Stopped entry logger preallocator.
   07:04:14.244 [component-shutdown-thread] INFO  org.apache.bookkeeper.bookie.LedgerDirsMonitor - Shutting down LedgerDirsMonitor
   07:04:14.350 [component-shutdown-thread] INFO  org.apache.zookeeper.ZooKeeper - Session: 0x3009ce550af0009 closed
   07:04:14.350 [main-EventThread] INFO  org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x3009ce550af0009
   07:04:14.360 [component-shutdown-thread] INFO  org.eclipse.jetty.server.AbstractConnector - Stopped ServerConnector@62e8f862{HTTP/1.1, (http/1.1)}{0.0.0.0:8000}
   07:04:14.363 [component-shutdown-thread] INFO  org.eclipse.jetty.server.handler.ContextHandler - Stopped o.e.j.s.ServletContextHandler@4943defe{/,null,STOPPED}
   07:04:14.365 [component-shutdown-thread] INFO  org.apache.bookkeeper.common.component.ComponentStarter - Closed component bookie-server in shutdown hook successfully. Exiting.
   ```
   
   Next are the logs from the autorecovery process. I start these a bit before the bookie left the cluster. Note that the autorecory process clearly logs that bookie `ole-roemer-bookie-statefulset-1.ole-roemer-bookie.pulsar-testing.svc.cluster.local:3181` is a failed bookie. Then it proceeds to attemp to open connections to that bookie. Based on the frequency of the logging (many times a second), it appears to me that the autorecovery process is trying to connect to the exited bookie many times. Anecdotally, I can see a pretty big spike in the CPU for the process.
   
   ```
   07:03:43.115 [AuditorBookie-10.100.155.22:3181] INFO  org.apache.bookkeeper.replication.Auditor - Starting auditBookies
   07:03:43.116 [AuditorElector-10.100.155.22:3181] INFO  org.apache.bookkeeper.replication.Auditor - checkAllLedgers scheduling info.  checkAllLedgersLastExecutedCTime: 1606978101474 durationSinceLastExecutionInSecs: 921 initialDelay: 603879 interval: 604800
   07:03:43.117 [AuditorElector-10.100.155.22:3181] INFO  org.apache.bookkeeper.replication.Auditor - Periodic placementPolicy check disabled
   07:03:43.117 [AuditorElector-10.100.155.22:3181] INFO  org.apache.bookkeeper.replication.Auditor - Periodic replicas check disabled
   07:03:43.892 [AuditorBookie-10.100.155.22:3181] INFO  org.apache.bookkeeper.replication.Auditor - Completed auditBookies
   07:04:14.250 [BookKeeperClientScheduler-OrderedScheduler-0-0] INFO  org.apache.bookkeeper.net.NetworkTopologyImpl - Removing a node: /default-rack/ole-roemer-bookie-statefulset-1.ole-roemer-bookie.pulsar-testing.svc.cluster.local:3181
   07:04:14.254 [AuditorBookie-10.100.155.22:3181] INFO  org.apache.bookkeeper.replication.Auditor - Delaying bookie audit by 120 secs for [ole-roemer-bookie-statefulset-1.ole-roemer-bookie.pulsar-testing.svc.cluster.local:3181]
   07:06:14.255 [AuditorBookie-10.100.155.22:3181] INFO  org.apache.bookkeeper.replication.Auditor - Starting auditBookies
   07:06:14.952 [AuditorBookie-10.100.155.22:3181] INFO  org.apache.bookkeeper.replication.Auditor - Following are the failed bookies: [ole-roemer-bookie-statefulset-1.ole-roemer-bookie.pulsar-testing.svc.cluster.local:3181], and searching its ledgers for re-replication
   07:06:14.955 [AuditorBookie-10.100.155.22:3181] INFO  org.apache.bookkeeper.replication.Auditor - Following ledgers: [0, 1024, 1, 1025, 2, 1026, 1027, 4, 1029, 6, 7, 1031, 1032, 1033, 10, 1034, 12, 16, 1040, 17, 1041, 18, 19, 21, 1045, 22, 23, 1047, 1048, 25, 1049, 1052, 30, 1054, 1055, 32, 1056, 33, 1058, 35, 1059, 36, 1060, 37, 1061, 38, 39, 40, 1064, 42, 1066, 43, 1067, 44, 45, 1069, 1070, 1071, 48, 1072, 51, 1075, 52, 1076, 53, 54, 1078, 55, 1080, 57, 1081, 59, 60, 61, 1085, 1086, 63, 64, 1088, 65, 1089, 66, 1091, 1092, 69, 1094, 71, 1095, 72, 1096, 1097, 74, 1098, 75, 1099, 1100, 1101, 1102, 79, 1103, 80, 81, 1105, 82, 83, 1107, 1108, 1111, 88, 89, 1113, 90, 92, 1117, 94, 95, 1119, 96, 1120, 98, 99, 100, 101, 1125, 102, 1126, 103, 1127, 105, 106, 1130, 1131, 108, 1134, 111, 112, 1136, 1138, 1139, 116, 1141, 119, 1143, 1144, 121, 1145, 123, 1147, 125, 1149, 126, 127, 1152, 129, 131, 1155, 132, 1156, 133, 1158, 135, 1159, 1160, 137, 1161, 1162, 139, 140, 1164, 142, 1167, 144, 1
 45, 146, 147, 148, 1173, 150, 1174, 1176, 153, 1177, 1178, 1179, 1180, 160, 1185, 162, 1186, 163, 1187, 165, 1189, 1191, 168, 1192, 169, 170, 172, 173, 174, 175, 1199, 1200, 177, 1201, 1202, 1203, 181, 1205, 182, 1206, 1208, 185, 1210, 187, 188, 1212, 189, 191, 1215, 193, 194, 196, 197, 200, 202, 205, 206, 209, 211, 212, 213, 214, 217, 218, 225, 226, 229, 230, 231, 233, 234, 235, 238, 243, 244, 245, 246, 249, 1274, 251, 1275, 1276, 1277, 254, 1279, 256, 1280, 257, 1281, 1282, 259, 1283, 260, 262, 263, 264, 265, 266, 267, 271, 272, 274, 275, 276, 277, 279, 280, 281, 282, 283, 284, 285, 287, 288, 292, 293, 294, 295, 296, 298, 300, 301, 305, 308, 309, 310, 316, 317, 321, 322, 323, 324, 328, 329, 331, 337, 338, 342, 343, 345, 346, 348, 349, 350, 351, 354, 355, 357, 358, 360, 361, 362, 367, 369, 373, 382, 389, 395, 397, 398, 401, 403, 404, 405, 406, 407, 408, 410, 414, 417, 418, 420, 429, 430, 433, 434, 439, 440, 441, 443, 444, 446, 447, 449, 450, 451, 452, 453, 454, 456, 457, 458, 459, 
 460, 462, 465, 466, 469, 471, 472, 476, 477, 479, 480, 482, 484, 486, 487, 488, 489, 491, 493, 496, 498, 500, 501, 502, 503, 505, 507, 510, 511, 514, 515, 516, 517, 518, 519, 524, 528, 529, 530, 531, 537, 541, 544, 547, 550, 551, 553, 554, 555, 556, 557, 558, 561, 564, 571, 573, 574, 575, 576, 578, 579, 581, 583, 588, 589, 594, 598, 599, 600, 603, 604, 605, 606, 608, 610, 611, 614, 617, 618, 620, 621, 623, 625, 626, 631, 634, 635, 637, 640, 641, 643, 646, 648, 650, 651, 654, 655, 656, 659, 660, 664, 665, 669, 671, 672, 673, 675, 685, 690, 692, 693, 694, 698, 699, 700, 703, 704, 707, 708, 709, 710, 712, 716, 721, 723, 724, 725, 727, 730, 732, 733, 736, 739, 741, 742, 744, 749, 750, 757, 758, 762, 763, 765, 767, 769, 770, 772, 774, 775, 776, 779, 781, 782, 783, 785, 786, 789, 791, 792, 794, 796, 798, 801, 805, 806, 810, 812, 816, 817, 818, 819, 823, 824, 825, 827, 832, 839, 842, 847, 849, 850, 854, 855, 857, 863, 865, 866, 871, 874, 875, 876, 877, 878, 879, 881, 882, 885, 886, 887, 89
 1, 893, 897, 899, 901, 902, 903, 904, 908, 910, 911, 916, 918, 919, 920, 921, 923, 925, 927, 928, 931, 932, 935, 936, 938, 939, 942, 943, 945, 947, 948, 949, 951, 953, 954, 958, 960, 961, 962, 963, 966, 967, 968, 969, 970, 972, 977, 978, 980, 983, 990, 991, 993, 996, 997, 998, 1000, 1002, 1003, 1004, 1005, 1009, 1010, 1011, 1012, 1014, 1015, 1016, 1018, 1019, 1021, 1023] of bookie: [ole-roemer-bookie-statefulset-1.ole-roemer-bookie.pulsar-testing.svc.cluster.local:3181] are identified as underreplicated
   07:06:15.137 [bookkeeper-io-3-1] WARN  org.apache.bookkeeper.proto.PerChannelBookieClient - Could not connect to bookie: [id: 0x0546ea8b]/ole-roemer-bookie-statefulset-1.ole-roemer-bookie.pulsar-testing.svc.cluster.local:3181, current state CONNECTING : ole-roemer-bookie-statefulset-1.ole-roemer-bookie.pulsar-testing.svc.cluster.local
   07:06:15.142 [bookkeeper-io-3-2] INFO  org.apache.bookkeeper.proto.PerChannelBookieClient - Successfully connected to bookie: [id: 0x6cc73bd4, L:/10.100.155.22:48498 - R:ole-roemer-bookie-statefulset-0.ole-roemer-bookie.pulsar-testing.svc.cluster.local/10.100.152.203:3181]
   07:06:15.148 [bookkeeper-io-3-2] INFO  org.apache.bookkeeper.proto.PerChannelBookieClient - Successfully connected to bookie: ole-roemer-bookie-statefulset-0.ole-roemer-bookie.pulsar-testing.svc.cluster.local:3181
   07:06:15.151 [bookkeeper-io-3-1] INFO  org.apache.bookkeeper.proto.PerChannelBookieClient - Successfully connected to bookie: [id: 0x09e0a380, L:/10.100.155.22:58330 - R:ole-roemer-bookie-statefulset-2.ole-roemer-bookie.pulsar-testing.svc.cluster.local/10.100.153.94:3181]
   07:06:15.151 [bookkeeper-io-3-1] INFO  org.apache.bookkeeper.proto.PerChannelBookieClient - Successfully connected to bookie: ole-roemer-bookie-statefulset-2.ole-roemer-bookie.pulsar-testing.svc.cluster.local:3181
   07:06:15.175 [bookkeeper-io-3-2] INFO  org.apache.bookkeeper.proto.PerChannelBookieClient - connection [id: 0x6cc73bd4, L:/10.100.155.22:48498 - R:ole-roemer-bookie-statefulset-0.ole-roemer-bookie.pulsar-testing.svc.cluster.local/10.100.152.203:3181] authenticated as BookKeeperPrincipal{ANONYMOUS}
   07:06:15.175 [bookkeeper-io-3-1] INFO  org.apache.bookkeeper.proto.PerChannelBookieClient - connection [id: 0x09e0a380, L:/10.100.155.22:58330 - R:ole-roemer-bookie-statefulset-2.ole-roemer-bookie.pulsar-testing.svc.cluster.local/10.100.153.94:3181] authenticated as BookKeeperPrincipal{ANONYMOUS}
   07:06:15.301 [BookKeeperClientWorker-OrderedExecutor-0-0] INFO  com.scurrilous.circe.checksum.Crc32cIntChecksum - SSE4.2 CRC32C provider initialized
   07:06:15.389 [bookkeeper-io-3-2] WARN  org.apache.bookkeeper.proto.PerChannelBookieClient - Could not connect to bookie: [id: 0x4fa0e74e]/ole-roemer-bookie-statefulset-1.ole-roemer-bookie.pulsar-testing.svc.cluster.local:3181, current state CONNECTING : ole-roemer-bookie-statefulset-1.ole-roemer-bookie.pulsar-testing.svc.cluster.local
   07:06:15.433 [bookkeeper-io-3-1] WARN  org.apache.bookkeeper.proto.PerChannelBookieClient - Could not connect to bookie: [id: 0xf9524733]/ole-roemer-bookie-statefulset-1.ole-roemer-bookie.pulsar-testing.svc.cluster.local:3181, current state CONNECTING : ole-roemer-bookie-statefulset-1.ole-roemer-bookie.pulsar-testing.svc.cluster.local
   07:06:15.435 [bookkeeper-io-3-2] INFO  org.apache.bookkeeper.proto.PerChannelBookieClient - Successfully connected to bookie: [id: 0x7fc214d5, L:/10.100.155.22:53236 - R:ole-roemer-bookie-statefulset-3.ole-roemer-bookie.pulsar-testing.svc.cluster.local/10.100.155.35:3181]
   07:06:15.436 [bookkeeper-io-3-2] INFO  org.apache.bookkeeper.proto.PerChannelBookieClient - Successfully connected to bookie: ole-roemer-bookie-statefulset-3.ole-roemer-bookie.pulsar-testing.svc.cluster.local:3181
   07:06:15.436 [bookkeeper-io-3-2] INFO  org.apache.bookkeeper.proto.PerChannelBookieClient - connection [id: 0x7fc214d5, L:/10.100.155.22:53236 - R:ole-roemer-bookie-statefulset-3.ole-roemer-bookie.pulsar-testing.svc.cluster.local/10.100.155.35:3181] authenticated as BookKeeperPrincipal{ANONYMOUS}
   07:06:15.454 [bookkeeper-io-3-1] WARN  org.apache.bookkeeper.proto.PerChannelBookieClient - Could not connect to bookie: [id: 0x7d93c45a]/ole-roemer-bookie-statefulset-1.ole-roemer-bookie.pulsar-testing.svc.cluster.local:3181, current state CONNECTING : ole-roemer-bookie-statefulset-1.ole-roemer-bookie.pulsar-testing.svc.cluster.local
   07:06:15.459 [PendingReplicationTimer] WARN  org.apache.bookkeeper.replication.ReplicationWorker - Missing bookie(s) from last segment. Opening Ledger0 for Recovery.
   07:06:15.479 [bookkeeper-io-3-2] WARN  org.apache.bookkeeper.proto.PerChannelBookieClient - Could not connect to bookie: [id: 0xef226200]/ole-roemer-bookie-statefulset-1.ole-roemer-bookie.pulsar-testing.svc.cluster.local:3181, current state CONNECTING : ole-roemer-bookie-statefulset-1.ole-roemer-bookie.pulsar-testing.svc.cluster.local
   07:06:15.513 [bookkeeper-io-3-1] WARN  org.apache.bookkeeper.proto.PerChannelBookieClient - Could not connect to bookie: [id: 0xe898d0d0]/ole-roemer-bookie-statefulset-1.ole-roemer-bookie.pulsar-testing.svc.cluster.local:3181, current state CONNECTING : ole-roemer-bookie-statefulset-1.ole-roemer-bookie.pulsar-testing.svc.cluster.local
   07:06:16.081 [bookkeeper-io-3-2] WARN  org.apache.bookkeeper.proto.PerChannelBookieClient - Could not connect to bookie: [id: 0x491ad4b9]/ole-roemer-bookie-statefulset-1.ole-roemer-bookie.pulsar-testing.svc.cluster.local:3181, current state CONNECTING : ole-roemer-bookie-statefulset-1.ole-roemer-bookie.pulsar-testing.svc.cluster.local
   07:06:16.086 [BookKeeperClientWorker-OrderedExecutor-0-0] WARN  org.apache.bookkeeper.client.PendingAddOp - Failed to write entry (0, 1249): Bookie handle is not available
   07:06:16.088 [bookkeeper-io-3-1] INFO  org.apache.bookkeeper.proto.PerChannelBookieClient - Successfully connected to bookie: [id: 0xe1ea305e, L:/10.100.155.22:53998 - R:ole-roemer-bookie-statefulset-4.ole-roemer-bookie.pulsar-testing.svc.cluster.local/10.100.153.101:3181]
   07:06:16.088 [bookkeeper-io-3-1] INFO  org.apache.bookkeeper.proto.PerChannelBookieClient - Successfully connected to bookie: ole-roemer-bookie-statefulset-4.ole-roemer-bookie.pulsar-testing.svc.cluster.local:3181
   07:06:16.088 [bookkeeper-io-3-1] INFO  org.apache.bookkeeper.proto.PerChannelBookieClient - connection [id: 0xe1ea305e, L:/10.100.155.22:53998 - R:ole-roemer-bookie-statefulset-4.ole-roemer-bookie.pulsar-testing.svc.cluster.local/10.100.153.101:3181] authenticated as BookKeeperPrincipal{ANONYMOUS}
   07:06:16.089 [bookkeeper-io-3-2] WARN  org.apache.bookkeeper.proto.PerChannelBookieClient - Could not connect to bookie: [id: 0xc7799dca]/ole-roemer-bookie-statefulset-1.ole-roemer-bookie.pulsar-testing.svc.cluster.local:3181, current state CONNECTING : ole-roemer-bookie-statefulset-1.ole-roemer-bookie.pulsar-testing.svc.cluster.local
   07:06:16.094 [BookKeeperClientWorker-OrderedExecutor-0-0] WARN  org.apache.bookkeeper.client.BookieWatcherImpl - replaceBookie for bookie: ole-roemer-bookie-statefulset-1.ole-roemer-bookie.pulsar-testing.svc.cluster.local:3181 in ensemble: [ole-roemer-bookie-statefulset-1.ole-roemer-bookie.pulsar-testing.svc.cluster.local:3181, ole-roemer-bookie-statefulset-0.ole-roemer-bookie.pulsar-testing.svc.cluster.local:3181, ole-roemer-bookie-statefulset-2.ole-roemer-bookie.pulsar-testing.svc.cluster.local:3181] is not adhering to placement policy and chose ole-roemer-bookie-statefulset-4.ole-roemer-bookie.pulsar-testing.svc.cluster.local:3181. excludedBookies [ole-roemer-bookie-statefulset-1.ole-roemer-bookie.pulsar-testing.svc.cluster.local:3181] and quarantinedBookies []
   07:06:16.095 [BookKeeperClientWorker-OrderedExecutor-0-0] INFO  org.apache.bookkeeper.client.PendingReadOp - Error: Bookie handle is not available while reading L0 E1251 from bookie: ole-roemer-bookie-statefulset-1.ole-roemer-bookie.pulsar-testing.svc.cluster.local:3181
   07:06:16.128 [BookKeeperClientWorker-OrderedExecutor-0-0] INFO  org.apache.bookkeeper.client.ReadOnlyLedgerHandle - Closing recovered ledger 0 at entry 1250
   07:06:16.146 [bookkeeper-io-3-1] WARN  org.apache.bookkeeper.proto.PerChannelBookieClient - Could not connect to bookie: [id: 0x1e410fe0]/ole-roemer-bookie-statefulset-1.ole-roemer-bookie.pulsar-testing.svc.cluster.local:3181, current state CONNECTING : ole-roemer-bookie-statefulset-1.ole-roemer-bookie.pulsar-testing.svc.cluster.local
   07:06:16.146 [PendingReplicationTimer] WARN  org.apache.bookkeeper.replication.ReplicationWorker - Missing bookie(s) from last segment. Opening Ledger7 for Recovery.
   07:06:16.155 [bookkeeper-io-3-2] WARN  org.apache.bookkeeper.proto.PerChannelBookieClient - Could not connect to bookie: [id: 0x8c7cde07]/ole-roemer-bookie-statefulset-1.ole-roemer-bookie.pulsar-testing.svc.cluster.local:3181, current state CONNECTING : ole-roemer-bookie-statefulset-1.ole-roemer-bookie.pulsar-testing.svc.cluster.local
   07:06:16.206 [bookkeeper-io-3-1] WARN  org.apache.bookkeeper.proto.PerChannelBookieClient - Could not connect to bookie: [id: 0xd2a87f69]/ole-roemer-bookie-statefulset-1.ole-roemer-bookie.pulsar-testing.svc.cluster.local:3181, current state CONNECTING : ole-roemer-bookie-statefulset-1.ole-roemer-bookie.pulsar-testing.svc.cluster.local
   07:06:16.238 [bookkeeper-io-3-2] WARN  org.apache.bookkeeper.proto.PerChannelBookieClient - Could not connect to bookie: [id: 0xe2748b57]/ole-roemer-bookie-statefulset-1.ole-roemer-bookie.pulsar-testing.svc.cluster.local:3181, current state CONNECTING : ole-roemer-bookie-statefulset-1.ole-roemer-bookie.pulsar-testing.svc.cluster.local
   07:06:16.238 [BookKeeperClientWorker-OrderedExecutor-0-0] INFO  org.apache.bookkeeper.client.PendingReadOp - Error: Bookie handle is not available while reading L7 E1251 from bookie: ole-roemer-bookie-statefulset-1.ole-roemer-bookie.pulsar-testing.svc.cluster.local:3181
   07:06:16.247 [bookkeeper-io-3-1] WARN  org.apache.bookkeeper.proto.PerChannelBookieClient - Could not connect to bookie: [id: 0x608adf4d]/ole-roemer-bookie-statefulset-1.ole-roemer-bookie.pulsar-testing.svc.cluster.local:3181, current state CONNECTING : ole-roemer-bookie-statefulset-1.ole-roemer-bookie.pulsar-testing.svc.cluster.local
   07:06:16.248 [BookKeeperClientWorker-OrderedExecutor-0-0] WARN  org.apache.bookkeeper.client.PendingAddOp - Failed to write entry (7, 1251): Bookie handle is not available
   07:06:16.248 [BookKeeperClientWorker-OrderedExecutor-0-0] WARN  org.apache.bookkeeper.client.BookieWatcherImpl - replaceBookie for bookie: ole-roemer-bookie-statefulset-1.ole-roemer-bookie.pulsar-testing.svc.cluster.local:3181 in ensemble: [ole-roemer-bookie-statefulset-1.ole-roemer-bookie.pulsar-testing.svc.cluster.local:3181, ole-roemer-bookie-statefulset-0.ole-roemer-bookie.pulsar-testing.svc.cluster.local:3181, ole-roemer-bookie-statefulset-3.ole-roemer-bookie.pulsar-testing.svc.cluster.local:3181] is not adhering to placement policy and chose ole-roemer-bookie-statefulset-4.ole-roemer-bookie.pulsar-testing.svc.cluster.local:3181. excludedBookies [ole-roemer-bookie-statefulset-1.ole-roemer-bookie.pulsar-testing.svc.cluster.local:3181] and quarantinedBookies []
   07:06:16.263 [bookkeeper-io-3-2] WARN  org.apache.bookkeeper.proto.PerChannelBookieClient - Could not connect to bookie: [id: 0xdbdc2fa3]/ole-roemer-bookie-statefulset-1.ole-roemer-bookie.pulsar-testing.svc.cluster.local:3181, current state CONNECTING : ole-roemer-bookie-statefulset-1.ole-roemer-bookie.pulsar-testing.svc.cluster.local
   07:06:16.264 [BookKeeperClientWorker-OrderedExecutor-0-0] INFO  org.apache.bookkeeper.client.PendingReadOp - Error: Bookie handle is not available while reading L7 E1254 from bookie: ole-roemer-bookie-statefulset-1.ole-roemer-bookie.pulsar-testing.svc.cluster.local:3181
   07:06:16.275 [bookkeeper-io-3-1] WARN  org.apache.bookkeeper.proto.PerChannelBookieClient - Could not connect to bookie: [id: 0xd171dca9]/ole-roemer-bookie-statefulset-1.ole-roemer-bookie.pulsar-testing.svc.cluster.local:3181, current state CONNECTING : ole-roemer-bookie-statefulset-1.ole-roemer-bookie.pulsar-testing.svc.cluster.local
   07:06:16.276 [BookKeeperClientWorker-OrderedExecutor-0-0] INFO  org.apache.bookkeeper.client.PendingReadOp - Error: Bookie handle is not available while reading L7 E1257 from bookie: ole-roemer-bookie-statefulset-1.ole-roemer-bookie.pulsar-testing.svc.cluster.local:3181
   07:06:16.283 [bookkeeper-io-3-2] WARN  org.apache.bookkeeper.proto.PerChannelBookieClient - Could not connect to bookie: [id: 0x1ca3ecb9]/ole-roemer-bookie-statefulset-1.ole-roemer-bookie.pulsar-testing.svc.cluster.local:3181, current state CONNECTING : ole-roemer-bookie-statefulset-1.ole-roemer-bookie.pulsar-testing.svc.cluster.local
   07:06:16.289 [bookkeeper-io-3-1] WARN  org.apache.bookkeeper.proto.PerChannelBookieClient - Could not connect to bookie: [id: 0xfb0666d2]/ole-roemer-bookie-statefulset-1.ole-roemer-bookie.pulsar-testing.svc.cluster.local:3181, current state CONNECTING : ole-roemer-bookie-statefulset-1.ole-roemer-bookie.pulsar-testing.svc.cluster.local
   07:06:16.289 [BookKeeperClientWorker-OrderedExecutor-0-0] INFO  org.apache.bookkeeper.client.PendingReadOp - Error: Bookie handle is not available while reading L7 E1260 from bookie: ole-roemer-bookie-statefulset-1.ole-roemer-bookie.pulsar-testing.svc.cluster.local:3181
   07:06:16.300 [bookkeeper-io-3-2] WARN  org.apache.bookkeeper.proto.PerChannelBookieClient - Could not connect to bookie: [id: 0x6a4c50f2]/ole-roemer-bookie-statefulset-1.ole-roemer-bookie.pulsar-testing.svc.cluster.local:3181, current state CONNECTING : ole-roemer-bookie-statefulset-1.ole-roemer-bookie.pulsar-testing.svc.cluster.local
   07:06:16.300 [BookKeeperClientWorker-OrderedExecutor-0-0] INFO  org.apache.bookkeeper.client.PendingReadOp - Error: Bookie handle is not available while reading L7 E1263 from bookie: ole-roemer-bookie-statefulset-1.ole-roemer-bookie.pulsar-testing.svc.cluster.local:3181
   07:06:16.310 [bookkeeper-io-3-1] WARN  org.apache.bookkeeper.proto.PerChannelBookieClient - Could not connect to bookie: [id: 0x79264105]/ole-roemer-bookie-statefulset-1.ole-roemer-bookie.pulsar-testing.svc.cluster.local:3181, current state CONNECTING : ole-roemer-bookie-statefulset-1.ole-roemer-bookie.pulsar-testing.svc.cluster.local
   07:06:16.311 [BookKeeperClientWorker-OrderedExecutor-0-0] INFO  org.apache.bookkeeper.client.PendingReadOp - Error: Bookie handle is not available while reading L7 E1266 from bookie: ole-roemer-bookie-statefulset-1.ole-roemer-bookie.pulsar-testing.svc.cluster.local:3181
   07:06:16.320 [BookKeeperClientWorker-OrderedExecutor-0-0] INFO  org.apache.bookkeeper.client.ReadOnlyLedgerHandle - Closing recovered ledger 7 at entry 1265
   07:06:16.331 [bookkeeper-io-3-2] WARN  org.apache.bookkeeper.proto.PerChannelBookieClient - Could not connect to bookie: [id: 0xe4fbd53a]/ole-roemer-bookie-statefulset-1.ole-roemer-bookie.pulsar-testing.svc.cluster.local:3181, current state CONNECTING : ole-roemer-bookie-statefulset-1.ole-roemer-bookie.pulsar-testing.svc.cluster.local
   07:06:16.339 [PendingReplicationTimer] WARN  org.apache.bookkeeper.replication.ReplicationWorker - Missing bookie(s) from last segment. Opening Ledger40 for Recovery.
   07:06:16.348 [bookkeeper-io-3-1] WARN  org.apache.bookkeeper.proto.PerChannelBookieClient - Could not connect to bookie: [id: 0x6d718fd9]/ole-roemer-bookie-statefulset-1.ole-roemer-bookie.pulsar-testing.svc.cluster.local:3181, current state CONNECTING : ole-roemer-bookie-statefulset-1.ole-roemer-bookie.pulsar-testing.svc.cluster.local
   07:06:16.355 [bookkeeper-io-3-2] WARN  org.apache.bookkeeper.proto.PerChannelBookieClient - Could not connect to bookie: [id: 0x94849170]/ole-roemer-bookie-statefulset-1.ole-roemer-bookie.pulsar-testing.svc.cluster.local:3181, current state CONNECTING : ole-roemer-bookie-statefulset-1.ole-roemer-bookie.pulsar-testing.svc.cluster.local
   07:06:16.360 [bookkeeper-io-3-1] WARN  org.apache.bookkeeper.proto.PerChannelBookieClient - Could not connect to bookie: [id: 0x2e280b3c]/ole-roemer-bookie-statefulset-1.ole-roemer-bookie.pulsar-testing.svc.cluster.local:3181, current state CONNECTING : ole-roemer-bookie-statefulset-1.ole-roemer-bookie.pulsar-testing.svc.cluster.local
   07:06:16.402 [bookkeeper-io-3-2] WARN  org.apache.bookkeeper.proto.PerChannelBookieClient - Could not connect to bookie: [id: 0x54b9fad3]/ole-roemer-bookie-statefulset-1.ole-roemer-bookie.pulsar-testing.svc.cluster.local:3181, current state CONNECTING : ole-roemer-bookie-statefulset-1.ole-roemer-bookie.pulsar-testing.svc.cluster.local
   07:06:16.411 [bookkeeper-io-3-1] WARN  org.apache.bookkeeper.proto.PerChannelBookieClient - Could not connect to bookie: [id: 0xbe23d8cd]/ole-roemer-bookie-statefulset-1.ole-roemer-bookie.pulsar-testing.svc.cluster.local:3181, current state CONNECTING : ole-roemer-bookie-statefulset-1.ole-roemer-bookie.pulsar-testing.svc.cluster.local
   07:06:16.470 [bookkeeper-io-3-2] WARN  org.apache.bookkeeper.proto.PerChannelBookieClient - Could not connect to bookie: [id: 0x407f04f9]/ole-roemer-bookie-statefulset-1.ole-roemer-bookie.pulsar-testing.svc.cluster.local:3181, current state CONNECTING : ole-roemer-bookie-statefulset-1.ole-roemer-bookie.pulsar-testing.svc.cluster.local
   07:06:16.478 [bookkeeper-io-3-1] WARN  org.apache.bookkeeper.proto.PerChannelBookieClient - Could not connect to bookie: [id: 0xe64ecb81]/ole-roemer-bookie-statefulset-1.ole-roemer-bookie.pulsar-testing.svc.cluster.local:3181, current state CONNECTING : ole-roemer-bookie-statefulset-1.ole-roemer-bookie.pulsar-testing.svc.cluster.local
   07:06:16.509 [bookkeeper-io-3-2] WARN  org.apache.bookkeeper.proto.PerChannelBookieClient - Could not connect to bookie: [id: 0x916c40ef]/ole-roemer-bookie-statefulset-1.ole-roemer-bookie.pulsar-testing.svc.cluster.local:3181, current state CONNECTING : ole-roemer-bookie-statefulset-1.ole-roemer-bookie.pulsar-testing.svc.cluster.local
   07:06:16.991 [ReplicationWorker] INFO  org.apache.bookkeeper.client.LedgerFragmentReplicator - Replicating fragment Fragment(LedgerID: 1031, FirstEntryID: 0[0], LastKnownEntryID: 78[78], Host: [ole-roemer-bookie-statefulset-1.ole-roemer-bookie.pulsar-testing.svc.cluster.local:3181], Closed: true) in 1 sub fragments.
   07:06:16.995 [bookkeeper-io-3-1] WARN  org.apache.bookkeeper.proto.PerChannelBookieClient - Could not connect to bookie: [id: 0xf883ca8c]/ole-roemer-bookie-statefulset-1.ole-roemer-bookie.pulsar-testing.svc.cluster.local:3181, current state CONNECTING : ole-roemer-bookie-statefulset-1.ole-roemer-bookie.pulsar-testing.svc.cluster.local
   07:06:16.998 [bookkeeper-io-3-2] WARN  org.apache.bookkeeper.proto.PerChannelBookieClient - Could not connect to bookie: [id: 0x571a528b]/ole-roemer-bookie-statefulset-1.ole-roemer-bookie.pulsar-testing.svc.cluster.local:3181, current state CONNECTING : ole-roemer-bookie-statefulset-1.ole-roemer-bookie.pulsar-testing.svc.cluster.local
   07:06:17.000 [bookkeeper-io-3-1] WARN  org.apache.bookkeeper.proto.PerChannelBookieClient - Could not connect to bookie: [id: 0x5297eb4c]/ole-roemer-bookie-statefulset-1.ole-roemer-bookie.pulsar-testing.svc.cluster.local:3181, current state CONNECTING : ole-roemer-bookie-statefulset-1.ole-roemer-bookie.pulsar-testing.svc.cluster.local
   07:06:17.001 [BookKeeperClientWorker-OrderedExecutor-0-0] INFO  org.apache.bookkeeper.client.PendingReadOp - Error: Bookie handle is not available while reading L40 E1256 from bookie: ole-roemer-bookie-statefulset-1.ole-roemer-bookie.pulsar-testing.svc.cluster.local:3181
   07:06:17.002 [BookKeeperClientWorker-OrderedExecutor-0-0] INFO  org.apache.bookkeeper.client.PendingReadOp - Error: Bookie handle is not available while reading L1031 E1 from bookie: ole-roemer-bookie-statefulset-1.ole-roemer-bookie.pulsar-testing.svc.cluster.local:3181
   ```
   
   Note that the logs continue in the same fashion for quite some time. I'd guess that they stop once the underreplicated ledgers are rereplicated, but I haven't specifically tested to see.
   
   ***Additional context***
   
   Please let me know if any additional context is needed. Thanks!
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [bookkeeper] michaeljmarshall commented on issue #2506: Autorecovery Process Tries to Connect to Bookie that Cleanly Exited Cluster

Posted by GitBox <gi...@apache.org>.
michaeljmarshall commented on issue #2506:
URL: https://github.com/apache/bookkeeper/issues/2506#issuecomment-738590120


   > IIRC there is no preliminary check on the availability of the bookie before readning.
   
   That would certainly explain he behavior I'm seeing. Further, I reproduced the problematic behavior today, and while looking at the pulsar brokers (also users of the BK client), I noticed that they were logging the same errors as well (and very frequently, too).
   
   > btw it would be better that the BK client uses the availability information to direct the first read trial to an available bookie, and we can improve it. Therefore in this specific case we know that the bookie is dead, and it is really useless to try to connect.
   
   This is what I am hoping we'll be able to do. I am new to the Bookkeeper project, but from my perspective, it seems essential that clients are optimized to handle a loss of up to (Qa - 1) bookies without much of a problem. It seems obvious that there might be initial connectivity issues in the case where a bookie leaves ungracefully, but even then, it should be deemed unreachable at some (configurable) time and then the clients should stop trying to connect to it.
   
   Additionally, I ran my above steps to reproduce the issue with pulsar 2.7.0, which shipped with Bookkeeper 4.12.0 jars, and I was able to reproduce the behavior. Although, in the 4.12.0 version, the error logging is more verbose, as it includes a stack trace for each logged error.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [bookkeeper] eolivelli closed issue #2506: Autorecovery Process Tries to Connect to Bookie that Cleanly Exited Cluster

Posted by GitBox <gi...@apache.org>.
eolivelli closed issue #2506:
URL: https://github.com/apache/bookkeeper/issues/2506


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [bookkeeper] eolivelli commented on issue #2506: Autorecovery Process Tries to Connect to Bookie that Cleanly Exited Cluster

Posted by GitBox <gi...@apache.org>.
eolivelli commented on issue #2506:
URL: https://github.com/apache/bookkeeper/issues/2506#issuecomment-737729921


   The BookKeeper client is using the 'available bookie' information only in the write path, when it uses the PlacementPolicy to choose which bookie to write to.
   IIRC  there is no preliminary check on the availability of the bookie before readning.
   
   btw it would be better that the BK client uses the availability information to direct the first read trial to an available bookie, and we can improve it. Therefore in this specific case we know that the bookie is dead, and it is really useless to try to connect.
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [bookkeeper] eolivelli commented on issue #2506: Autorecovery Process Tries to Connect to Bookie that Cleanly Exited Cluster

Posted by GitBox <gi...@apache.org>.
eolivelli commented on issue #2506:
URL: https://github.com/apache/bookkeeper/issues/2506#issuecomment-758460890


   @michaeljmarshall your proposal makes sense to me.
   I wonder if we should add a configuration entry in order to allow users to fallback to the legacy behaviour
   
   cc @reddycharan @karanmehta93 @ivankelly @sijie 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [bookkeeper] eolivelli commented on issue #2506: Autorecovery Process Tries to Connect to Bookie that Cleanly Exited Cluster

Posted by GitBox <gi...@apache.org>.
eolivelli commented on issue #2506:
URL: https://github.com/apache/bookkeeper/issues/2506#issuecomment-743436443


   Please try to set up a draft PR.
   If the patch is too bug then we will start a BP.
   
   I am not sure we need a configuration flag.
   
   I guess that it is enough to ask to the BookieWatcher if a bookie is available before trying to use it, so it is already something very straightforward and expected 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [bookkeeper] michaeljmarshall commented on issue #2506: Autorecovery Process Tries to Connect to Bookie that Cleanly Exited Cluster

Posted by GitBox <gi...@apache.org>.
michaeljmarshall commented on issue #2506:
URL: https://github.com/apache/bookkeeper/issues/2506#issuecomment-743406920


   @eolivelli what do you think it'll take to update the bookkeeper client to prevent this behavior? Would this be large enough to be a proposal, or is it simply a feature? I'm interested in helping contribute a solution, but I'm new to the project, so I'll need some direction. Thanks!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [bookkeeper] michaeljmarshall commented on issue #2506: Autorecovery Process Tries to Connect to Bookie that Cleanly Exited Cluster

Posted by GitBox <gi...@apache.org>.
michaeljmarshall commented on issue #2506:
URL: https://github.com/apache/bookkeeper/issues/2506#issuecomment-743467889


   Thanks for the direction. I'll look into the `BookieWatcher` and try to get a PR up soon.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [bookkeeper] michaeljmarshall commented on issue #2506: Autorecovery Process Tries to Connect to Bookie that Cleanly Exited Cluster

Posted by GitBox <gi...@apache.org>.
michaeljmarshall commented on issue #2506:
URL: https://github.com/apache/bookkeeper/issues/2506#issuecomment-744265121


   @eolivelli - after digging into the project a bit, I can see that there could be an improvement to the bookie client. However, I think the autorecovery process itself could also be improved by improving the logic in the following method:
   
   https://github.com/apache/bookkeeper/blob/a1af31e915ec74a5066b8f1dadb15fd8cfaf8d43/bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerChecker.java#L142-L168
   
   The `verifyLedgerFragment` method is called by the `checkLedger` method, which is then called by the `Auditor` and the `ReplicationWorker`. Both of these classes have access to the BK Admin client, and would be able to get and then pass in all of the currently available bookies, which could then be checked against the ledger's bookies to determine "known bad bookies" without attempting to read entries from a bunch of bookies. By calculating (some of) the bad bookies before trying to read entries, the method could skip bookies that are already known to be down. This would prevent thrash while trying to read entries to empirically build a collection of bad bookies.
   
   Does this seem like a reasonable path to go down at this point? I think there might be multiple PRs that could result from this issue.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [bookkeeper] eolivelli commented on issue #2506: Autorecovery Process Tries to Connect to Bookie that Cleanly Exited Cluster

Posted by GitBox <gi...@apache.org>.
eolivelli commented on issue #2506:
URL: https://github.com/apache/bookkeeper/issues/2506#issuecomment-775741472


   Adding more state to ZK is a problem, and it will add much complexity (without thinking about backward compatibility issues)
   
   The BookieWatcher has its own cache (see BookieWatcherImpl), you can use it, this way you will save calls to ZK


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [bookkeeper] michaeljmarshall commented on issue #2506: Autorecovery Process Tries to Connect to Bookie that Cleanly Exited Cluster

Posted by GitBox <gi...@apache.org>.
michaeljmarshall commented on issue #2506:
URL: https://github.com/apache/bookkeeper/issues/2506#issuecomment-775743109


   @eolivelli - thanks for the direction! That's helpful context.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [bookkeeper] michaeljmarshall commented on issue #2506: Autorecovery Process Tries to Connect to Bookie that Cleanly Exited Cluster

Posted by GitBox <gi...@apache.org>.
michaeljmarshall commented on issue #2506:
URL: https://github.com/apache/bookkeeper/issues/2506#issuecomment-775732839


   @eolivelli - I took a look at this a bit, and I have some questions about design. 
   
   In order to make this code easiest to test, I think it makes sense to include a list of `badBookies` instead of `availableBookies`. In looking at the available methods in the bookkeeper admin, there is `getAvailableBookies` (which seems more like `getWritableBookies`), `getReadOnlyBookies`, and `getAllBookies`. I believe `getAllBookies` would include bookies that are unavailable but not yet decommissioned, but I haven't validated that yet. If it does include them, then the only way I currently see to `getUnavailableBookies` is to call all 3 methods and find the bookies that are in the "all" list but not in the other two.
   
   I think it might be reasonable to call all three methods to get the unavailable bookies, given that this code is not run that often, but I wanted to check before I looked into it more. Do you think it's okay to require three zk calls to get this state? The alternative is to add more state to zk, which I assume is a more cumbersome task.
   
   I can put up a tentative PR to show you what I mean, if that'd be helpful.
   
   Thanks for any insight you can give!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [bookkeeper] michaeljmarshall commented on issue #2506: Autorecovery Process Tries to Connect to Bookie that Cleanly Exited Cluster

Posted by GitBox <gi...@apache.org>.
michaeljmarshall commented on issue #2506:
URL: https://github.com/apache/bookkeeper/issues/2506#issuecomment-754370446


   @eolivelli - what do you think about my comments on the above method? Thanks!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org