You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@bookkeeper.apache.org by GitBox <gi...@apache.org> on 2020/12/03 07:37:07 UTC
[GitHub] [bookkeeper] michaeljmarshall opened a new issue #2506: Autorecovery Process Tries to Connect to Bookie that Cleanly Exited Cluster
michaeljmarshall opened a new issue #2506:
URL: https://github.com/apache/bookkeeper/issues/2506
**BUG REPORT**
***Describe the bug***
After a bookie gracefully shuts down and leaves the cluster, autorecovery processes try (very frequently) to connect to the shutdown bookie leading to a ton of unnecessary calls, unnecessary logging, and increased CPU utilization.
***To Reproduce***
I am running bookkeeper as part of a pulsar cluster, so my process to stand up the bookkeeper nodes, zookeeper nodes, and autorecovery nodes is likely a little different. If necessary, I can try to reproduce this problem in a solo bookkeeper cluster. Please let me know what is needed.
Steps to reproduce the behavior:
1. Create pulsar cluster using the pulsar operator. (Each component runs in a container from docker image `apachepulsar/pulsar:2.6.2`, which ships with bookkeeper jars from version 4.10.0. The bookies are started with `bin/apply-config-from-env.py conf/bookkeeper.conf && bin/apply-config-from-env.py conf/pulsar_env.sh && exec bin/pulsar bookie` and the autorecovery containers are started with `bin/apply-config-from-env.py conf/bookkeeper.conf && exec bin/bookkeeper autorecovery`.)
2. Start producing data to the bookies to ensure that there is something to be underreplicated.
3. Gracefully shutdown one of the bookie nodes. (I validated that the bookie is no longer listed under `/ledgers/available` in zookeeper.)
4. Watch the logs in the autorecovery containers, and see that the process first logs that the bookie is removed, then attempts to connect to it many times.
5. Restart the autorecovery process, and it will continue to attempt to connect to the stopped bookie. (This implies its not just in some cache.)
***Expected behavior***
I would expect that the autorecovery process would not try to connect to a bookie that it _knows_ has left the cluster.
My theory is that the autorecovery process is trying to connect to nodes based on the ensemble or perhaps the write quorum or the ack quorum. I read through the autorecovery code a bit last night, but I haven't been able to find the exact place where the code is holding onto a reference to the stopped bookie.
***Logs***
Here are logs from the bookie that is stopped. I grabbed the logs at the point when the bookie receives the SIGTERM. This shows that the bookie is stopping correctly, as far as I can tell.
```
07:04:11.393 [component-shutdown-thread] INFO org.apache.bookkeeper.common.component.ComponentStarter - Closing component bookie-server in shutdown hook.
07:04:11.395 [component-shutdown-thread] INFO org.apache.bookkeeper.proto.BookieServer - Shutting down BookieServer
07:04:11.396 [component-shutdown-thread] INFO org.apache.bookkeeper.proto.BookieNettyServer - Shutting down BookieNettyServer
07:04:11.404 [bookie-io-1-4] INFO org.apache.bookkeeper.proto.BookieRequestHandler - Channels disconnected: [id: 0x09d7bf13, L:/10.100.157.164:3181 ! R:/10.100.186.63:60358]
07:04:11.407 [bookie-io-1-3] INFO org.apache.bookkeeper.proto.BookieRequestHandler - Channels disconnected: [id: 0x132e871c, L:/10.100.157.164:3181 ! R:/10.100.186.63:60368]
07:04:11.408 [bookie-io-1-4] INFO org.apache.bookkeeper.proto.BookieRequestHandler - Channels disconnected: [id: 0x70d55455, L:/10.100.157.164:3181 ! R:/10.100.157.90:39336]
07:04:11.409 [bookie-io-1-3] INFO org.apache.bookkeeper.proto.BookieRequestHandler - Channels disconnected: [id: 0xea12a92f, L:/10.100.157.164:3181 ! R:/10.100.182.35:52362]
07:04:11.409 [bookie-io-1-4] INFO org.apache.bookkeeper.proto.BookieRequestHandler - Channels disconnected: [id: 0x3538803d, L:/10.100.157.164:3181 ! R:/10.100.186.63:60330]
07:04:11.409 [bookie-io-1-3] INFO org.apache.bookkeeper.proto.BookieRequestHandler - Channels disconnected: [id: 0x7af4a041, L:/10.100.157.164:3181 ! R:/10.100.186.63:60318]
07:04:11.410 [bookie-io-1-2] INFO org.apache.bookkeeper.proto.BookieRequestHandler - Channels disconnected: [id: 0x801ff1b1, L:/10.100.157.164:3181 ! R:/10.100.182.35:52400]
07:04:11.409 [bookie-io-1-4] INFO org.apache.bookkeeper.proto.BookieRequestHandler - Channels disconnected: [id: 0x09983fec, L:/10.100.157.164:3181 ! R:/10.100.182.35:52368]
07:04:11.410 [bookie-io-1-2] INFO org.apache.bookkeeper.proto.BookieRequestHandler - Channels disconnected: [id: 0x1fe9039f, L:/10.100.157.164:3181 ! R:/10.100.186.63:60354]
07:04:11.410 [bookie-io-1-4] INFO org.apache.bookkeeper.proto.BookieRequestHandler - Channels disconnected: [id: 0x93f5ffda, L:/10.100.157.164:3181 ! R:/10.100.157.90:39296]
07:04:11.410 [bookie-io-1-3] INFO org.apache.bookkeeper.proto.BookieRequestHandler - Channels disconnected: [id: 0x7f6aff63, L:/10.100.157.164:3181 ! R:/10.100.157.90:39326]
07:04:11.411 [bookie-io-1-4] INFO org.apache.bookkeeper.proto.BookieRequestHandler - Channels disconnected: [id: 0x593fc65f, L:/10.100.157.164:3181 ! R:/10.100.182.35:52422]
07:04:11.411 [bookie-io-1-3] INFO org.apache.bookkeeper.proto.BookieRequestHandler - Channels disconnected: [id: 0xd62ac189, L:/10.100.157.164:3181 ! R:/10.100.157.90:39286]
07:04:11.411 [bookie-io-1-2] INFO org.apache.bookkeeper.proto.BookieRequestHandler - Channels disconnected: [id: 0xbd771f1f, L:/10.100.157.164:3181 ! R:/10.100.182.35:52346]
07:04:11.412 [bookie-io-1-4] INFO org.apache.bookkeeper.proto.BookieRequestHandler - Channels disconnected: [id: 0xb0a41ff1, L:/10.100.157.164:3181 ! R:/10.100.182.35:52388]
07:04:11.412 [bookie-io-1-3] INFO org.apache.bookkeeper.proto.BookieRequestHandler - Channels disconnected: [id: 0x93f6dea1, L:/10.100.157.164:3181 ! R:/10.100.182.35:52394]
07:04:11.412 [bookie-io-1-4] INFO org.apache.bookkeeper.proto.BookieRequestHandler - Channels disconnected: [id: 0x6ab2ea71, L:/10.100.157.164:3181 ! R:/10.100.157.90:39330]
07:04:11.412 [bookie-io-1-2] INFO org.apache.bookkeeper.proto.BookieRequestHandler - Channels disconnected: [id: 0xcf8a7a0c, L:/10.100.157.164:3181 ! R:/10.100.157.90:39304]
07:04:11.412 [bookie-io-1-4] INFO org.apache.bookkeeper.proto.BookieRequestHandler - Channels disconnected: [id: 0x3eec8af8, L:/10.100.157.164:3181 ! R:/10.100.186.63:60316]
07:04:11.412 [bookie-io-1-3] INFO org.apache.bookkeeper.proto.BookieRequestHandler - Channels disconnected: [id: 0xb7e7607f, L:/10.100.157.164:3181 ! R:/10.100.182.35:52402]
07:04:11.412 [bookie-io-1-4] INFO org.apache.bookkeeper.proto.BookieRequestHandler - Channels disconnected: [id: 0x0e99368e, L:/10.100.157.164:3181 ! R:/10.100.157.90:39316]
07:04:11.412 [bookie-io-1-3] INFO org.apache.bookkeeper.proto.BookieRequestHandler - Channels disconnected: [id: 0x6c4260d6, L:/10.100.157.164:3181 ! R:/10.100.157.90:39310]
07:04:11.412 [bookie-io-1-2] INFO org.apache.bookkeeper.proto.BookieRequestHandler - Channels disconnected: [id: 0x3290c225, L:/10.100.157.164:3181 ! R:/10.100.182.35:52382]
07:04:11.412 [bookie-io-1-4] INFO org.apache.bookkeeper.proto.BookieRequestHandler - Channels disconnected: [id: 0x0fd9fe17, L:/10.100.157.164:3181 ! R:/10.100.182.35:52354]
07:04:11.412 [bookie-io-1-3] INFO org.apache.bookkeeper.proto.BookieRequestHandler - Channels disconnected: [id: 0x29b9a9ad, L:/10.100.157.164:3181 ! R:/10.100.186.63:60338]
07:04:11.412 [bookie-io-1-4] INFO org.apache.bookkeeper.proto.BookieRequestHandler - Channels disconnected: [id: 0xc82f01d0, L:/10.100.157.164:3181 ! R:/10.100.157.90:39328]
07:04:11.412 [bookie-io-1-3] INFO org.apache.bookkeeper.proto.BookieRequestHandler - Channels disconnected: [id: 0x1033f5d3, L:/10.100.157.164:3181 ! R:/10.100.186.63:60356]
07:04:11.412 [bookie-io-1-2] INFO org.apache.bookkeeper.proto.BookieRequestHandler - Channels disconnected: [id: 0x4be78a41, L:/10.100.157.164:3181 ! R:/10.100.157.90:39334]
07:04:11.412 [bookie-io-1-3] INFO org.apache.bookkeeper.proto.BookieRequestHandler - Channels disconnected: [id: 0x1a648110, L:/10.100.157.164:3181 ! R:/10.100.186.63:60364]
07:04:11.412 [bookie-io-1-2] INFO org.apache.bookkeeper.proto.BookieRequestHandler - Channels disconnected: [id: 0x55a74676, L:/10.100.157.164:3181 ! R:/10.100.186.63:60336]
07:04:11.412 [bookie-io-1-3] INFO org.apache.bookkeeper.proto.BookieRequestHandler - Channels disconnected: [id: 0x6775d4e4, L:/10.100.157.164:3181 ! R:/10.100.182.35:52348]
07:04:11.413 [bookie-io-1-1] INFO org.apache.bookkeeper.proto.BookieRequestHandler - Channels disconnected: [id: 0xa4b14ea8, L:/10.100.157.164:3181 ! R:/10.100.157.90:39322]
07:04:11.413 [bookie-io-1-2] INFO org.apache.bookkeeper.proto.BookieRequestHandler - Channels disconnected: [id: 0x1f49ed3e, L:/10.100.157.164:3181 ! R:/10.100.157.90:39288]
07:04:11.413 [bookie-io-1-1] INFO org.apache.bookkeeper.proto.BookieRequestHandler - Channels disconnected: [id: 0x3d10cd4c, L:/10.100.157.164:3181 ! R:/10.100.157.90:39332]
07:04:11.414 [bookie-io-1-2] INFO org.apache.bookkeeper.proto.BookieRequestHandler - Channels disconnected: [id: 0x8cf54abe, L:/10.100.157.164:3181 ! R:/10.100.186.63:60308]
07:04:11.414 [bookie-io-1-1] INFO org.apache.bookkeeper.proto.BookieRequestHandler - Channels disconnected: [id: 0x52ef4540, L:/10.100.157.164:3181 ! R:/10.100.182.35:52356]
07:04:11.415 [bookie-io-1-2] INFO org.apache.bookkeeper.proto.BookieRequestHandler - Channels disconnected: [id: 0xf3cb9a8a, L:/10.100.157.164:3181 ! R:/10.100.186.63:60362]
07:04:11.415 [bookie-io-1-1] INFO org.apache.bookkeeper.proto.BookieRequestHandler - Channels disconnected: [id: 0xbb142ac2, L:/10.100.157.164:3181 ! R:/10.100.186.63:60360]
07:04:11.415 [bookie-io-1-2] INFO org.apache.bookkeeper.proto.BookieRequestHandler - Channels disconnected: [id: 0xbf544206, L:/10.100.157.164:3181 ! R:/10.100.157.90:39324]
07:04:11.415 [bookie-io-1-1] INFO org.apache.bookkeeper.proto.BookieRequestHandler - Channels disconnected: [id: 0xbe0d8aa9, L:/10.100.157.164:3181 ! R:/10.100.157.90:39338]
07:04:11.415 [bookie-io-1-2] INFO org.apache.bookkeeper.proto.BookieRequestHandler - Channels disconnected: [id: 0xf03a7c72, L:/10.100.157.164:3181 ! R:/10.100.182.35:52366]
07:04:11.415 [bookie-io-1-1] INFO org.apache.bookkeeper.proto.BookieRequestHandler - Channels disconnected: [id: 0xb8e915aa, L:/10.100.157.164:3181 ! R:/10.100.186.63:60310]
07:04:11.416 [bookie-io-1-1] INFO org.apache.bookkeeper.proto.BookieRequestHandler - Channels disconnected: [id: 0xbeb75e87, L:/10.100.157.164:3181 ! R:/10.100.186.63:60352]
07:04:11.416 [bookie-io-1-1] INFO org.apache.bookkeeper.proto.BookieRequestHandler - Channels disconnected: [id: 0xb4eb45b4, L:/10.100.157.164:3181 ! R:/10.100.182.35:52390]
07:04:11.416 [bookie-io-1-1] INFO org.apache.bookkeeper.proto.BookieRequestHandler - Channels disconnected: [id: 0x7e4b9636, L:/10.100.157.164:3181 ! R:/10.100.186.63:60328]
07:04:11.416 [bookie-io-1-1] INFO org.apache.bookkeeper.proto.BookieRequestHandler - Channels disconnected: [id: 0xaebfcb28, L:/10.100.157.164:3181 ! R:/10.100.182.35:52428]
07:04:11.416 [bookie-io-1-1] INFO org.apache.bookkeeper.proto.BookieRequestHandler - Channels disconnected: [id: 0x180315cd, L:/10.100.157.164:3181 ! R:/10.100.182.35:52386]
07:04:11.416 [bookie-io-1-1] INFO org.apache.bookkeeper.proto.BookieRequestHandler - Channels disconnected: [id: 0x83ca43a0, L:/10.100.157.164:3181 ! R:/10.100.157.90:39298]
07:04:11.424 [component-shutdown-thread] INFO org.apache.bookkeeper.bookie.Bookie - Shutting down Bookie-3181 with exitCode 0
07:04:11.425 [component-shutdown-thread] INFO org.apache.bookkeeper.bookie.Bookie - Turning bookie to read only during shut down
07:04:11.425 [component-shutdown-thread] INFO org.apache.bookkeeper.bookie.SyncThread - Shutting down SyncThread
07:04:12.711 [SyncThread-7-1] INFO org.apache.bookkeeper.bookie.EntryLogManagerBase - Creating a new entry log file : createNewLog = false, reachEntryLogLimit = true
07:04:12.711 [SyncThread-7-1] INFO org.apache.bookkeeper.bookie.EntryLogManagerBase - Flushing entry logger 102 back to filesystem, pending for syncing entry loggers : [BufferedChannel{logId=102, logFile=data/bookkeeper/ledgers/current/66.log, ledgerIdAssigned=-1}].
07:04:12.712 [pool-5-thread-1] INFO org.apache.bookkeeper.bookie.EntryLoggerAllocator - Created new entry log file data/bookkeeper/ledgers/current/68.log for logId 104.
07:04:14.042 [SyncThread-7-1] INFO org.apache.bookkeeper.bookie.EntryLogManagerForSingleEntryLog - Synced entry logger 102 to disk.
07:04:14.053 [SyncThread-7-1] INFO org.apache.bookkeeper.bookie.Journal - garbage collected journal 176275d94bd.txn
07:04:14.053 [SyncThread-7-1] INFO org.apache.bookkeeper.bookie.SyncThread - Flush ledger storage at checkpoint CheckpointList{checkpoints=[LogMark: logFileId - 1606978213059 , logFileOffset - 318435328]}.
07:04:14.054 [component-shutdown-thread] INFO org.apache.bookkeeper.bookie.Journal - Shutting down Journal
07:04:14.054 [ForceWriteThread] INFO org.apache.bookkeeper.bookie.Journal - ForceWrite thread interrupted
07:04:14.054 [BookieJournal-3181] INFO org.apache.bookkeeper.bookie.Journal - Journal exits when shutting down
07:04:14.054 [BookieJournal-3181] INFO org.apache.bookkeeper.bookie.Journal - Journal exited loop!
07:04:14.056 [component-shutdown-thread] INFO org.apache.bookkeeper.bookie.Journal - Finished Shutting down Journal thread
07:04:14.056 [Bookie-3181] INFO org.apache.bookkeeper.bookie.Bookie - Journal thread(s) quit.
07:04:14.063 [component-shutdown-thread] INFO org.apache.bookkeeper.bookie.GarbageCollectorThread - Shutting down GarbageCollectorThread
07:04:14.063 [component-shutdown-thread] INFO org.apache.bookkeeper.bookie.EntryLogger - Stopping EntryLogger
07:04:14.064 [component-shutdown-thread] INFO org.apache.bookkeeper.bookie.EntryLoggerAllocator - Stopped entry logger preallocator.
07:04:14.244 [component-shutdown-thread] INFO org.apache.bookkeeper.bookie.LedgerDirsMonitor - Shutting down LedgerDirsMonitor
07:04:14.350 [component-shutdown-thread] INFO org.apache.zookeeper.ZooKeeper - Session: 0x3009ce550af0009 closed
07:04:14.350 [main-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x3009ce550af0009
07:04:14.360 [component-shutdown-thread] INFO org.eclipse.jetty.server.AbstractConnector - Stopped ServerConnector@62e8f862{HTTP/1.1, (http/1.1)}{0.0.0.0:8000}
07:04:14.363 [component-shutdown-thread] INFO org.eclipse.jetty.server.handler.ContextHandler - Stopped o.e.j.s.ServletContextHandler@4943defe{/,null,STOPPED}
07:04:14.365 [component-shutdown-thread] INFO org.apache.bookkeeper.common.component.ComponentStarter - Closed component bookie-server in shutdown hook successfully. Exiting.
```
Next are the logs from the autorecovery process. I start these a bit before the bookie left the cluster. Note that the autorecory process clearly logs that bookie `ole-roemer-bookie-statefulset-1.ole-roemer-bookie.pulsar-testing.svc.cluster.local:3181` is a failed bookie. Then it proceeds to attemp to open connections to that bookie. Based on the frequency of the logging (many times a second), it appears to me that the autorecovery process is trying to connect to the exited bookie many times. Anecdotally, I can see a pretty big spike in the CPU for the process.
```
07:03:43.115 [AuditorBookie-10.100.155.22:3181] INFO org.apache.bookkeeper.replication.Auditor - Starting auditBookies
07:03:43.116 [AuditorElector-10.100.155.22:3181] INFO org.apache.bookkeeper.replication.Auditor - checkAllLedgers scheduling info. checkAllLedgersLastExecutedCTime: 1606978101474 durationSinceLastExecutionInSecs: 921 initialDelay: 603879 interval: 604800
07:03:43.117 [AuditorElector-10.100.155.22:3181] INFO org.apache.bookkeeper.replication.Auditor - Periodic placementPolicy check disabled
07:03:43.117 [AuditorElector-10.100.155.22:3181] INFO org.apache.bookkeeper.replication.Auditor - Periodic replicas check disabled
07:03:43.892 [AuditorBookie-10.100.155.22:3181] INFO org.apache.bookkeeper.replication.Auditor - Completed auditBookies
07:04:14.250 [BookKeeperClientScheduler-OrderedScheduler-0-0] INFO org.apache.bookkeeper.net.NetworkTopologyImpl - Removing a node: /default-rack/ole-roemer-bookie-statefulset-1.ole-roemer-bookie.pulsar-testing.svc.cluster.local:3181
07:04:14.254 [AuditorBookie-10.100.155.22:3181] INFO org.apache.bookkeeper.replication.Auditor - Delaying bookie audit by 120 secs for [ole-roemer-bookie-statefulset-1.ole-roemer-bookie.pulsar-testing.svc.cluster.local:3181]
07:06:14.255 [AuditorBookie-10.100.155.22:3181] INFO org.apache.bookkeeper.replication.Auditor - Starting auditBookies
07:06:14.952 [AuditorBookie-10.100.155.22:3181] INFO org.apache.bookkeeper.replication.Auditor - Following are the failed bookies: [ole-roemer-bookie-statefulset-1.ole-roemer-bookie.pulsar-testing.svc.cluster.local:3181], and searching its ledgers for re-replication
07:06:14.955 [AuditorBookie-10.100.155.22:3181] INFO org.apache.bookkeeper.replication.Auditor - Following ledgers: [0, 1024, 1, 1025, 2, 1026, 1027, 4, 1029, 6, 7, 1031, 1032, 1033, 10, 1034, 12, 16, 1040, 17, 1041, 18, 19, 21, 1045, 22, 23, 1047, 1048, 25, 1049, 1052, 30, 1054, 1055, 32, 1056, 33, 1058, 35, 1059, 36, 1060, 37, 1061, 38, 39, 40, 1064, 42, 1066, 43, 1067, 44, 45, 1069, 1070, 1071, 48, 1072, 51, 1075, 52, 1076, 53, 54, 1078, 55, 1080, 57, 1081, 59, 60, 61, 1085, 1086, 63, 64, 1088, 65, 1089, 66, 1091, 1092, 69, 1094, 71, 1095, 72, 1096, 1097, 74, 1098, 75, 1099, 1100, 1101, 1102, 79, 1103, 80, 81, 1105, 82, 83, 1107, 1108, 1111, 88, 89, 1113, 90, 92, 1117, 94, 95, 1119, 96, 1120, 98, 99, 100, 101, 1125, 102, 1126, 103, 1127, 105, 106, 1130, 1131, 108, 1134, 111, 112, 1136, 1138, 1139, 116, 1141, 119, 1143, 1144, 121, 1145, 123, 1147, 125, 1149, 126, 127, 1152, 129, 131, 1155, 132, 1156, 133, 1158, 135, 1159, 1160, 137, 1161, 1162, 139, 140, 1164, 142, 1167, 144, 1
45, 146, 147, 148, 1173, 150, 1174, 1176, 153, 1177, 1178, 1179, 1180, 160, 1185, 162, 1186, 163, 1187, 165, 1189, 1191, 168, 1192, 169, 170, 172, 173, 174, 175, 1199, 1200, 177, 1201, 1202, 1203, 181, 1205, 182, 1206, 1208, 185, 1210, 187, 188, 1212, 189, 191, 1215, 193, 194, 196, 197, 200, 202, 205, 206, 209, 211, 212, 213, 214, 217, 218, 225, 226, 229, 230, 231, 233, 234, 235, 238, 243, 244, 245, 246, 249, 1274, 251, 1275, 1276, 1277, 254, 1279, 256, 1280, 257, 1281, 1282, 259, 1283, 260, 262, 263, 264, 265, 266, 267, 271, 272, 274, 275, 276, 277, 279, 280, 281, 282, 283, 284, 285, 287, 288, 292, 293, 294, 295, 296, 298, 300, 301, 305, 308, 309, 310, 316, 317, 321, 322, 323, 324, 328, 329, 331, 337, 338, 342, 343, 345, 346, 348, 349, 350, 351, 354, 355, 357, 358, 360, 361, 362, 367, 369, 373, 382, 389, 395, 397, 398, 401, 403, 404, 405, 406, 407, 408, 410, 414, 417, 418, 420, 429, 430, 433, 434, 439, 440, 441, 443, 444, 446, 447, 449, 450, 451, 452, 453, 454, 456, 457, 458, 459,
460, 462, 465, 466, 469, 471, 472, 476, 477, 479, 480, 482, 484, 486, 487, 488, 489, 491, 493, 496, 498, 500, 501, 502, 503, 505, 507, 510, 511, 514, 515, 516, 517, 518, 519, 524, 528, 529, 530, 531, 537, 541, 544, 547, 550, 551, 553, 554, 555, 556, 557, 558, 561, 564, 571, 573, 574, 575, 576, 578, 579, 581, 583, 588, 589, 594, 598, 599, 600, 603, 604, 605, 606, 608, 610, 611, 614, 617, 618, 620, 621, 623, 625, 626, 631, 634, 635, 637, 640, 641, 643, 646, 648, 650, 651, 654, 655, 656, 659, 660, 664, 665, 669, 671, 672, 673, 675, 685, 690, 692, 693, 694, 698, 699, 700, 703, 704, 707, 708, 709, 710, 712, 716, 721, 723, 724, 725, 727, 730, 732, 733, 736, 739, 741, 742, 744, 749, 750, 757, 758, 762, 763, 765, 767, 769, 770, 772, 774, 775, 776, 779, 781, 782, 783, 785, 786, 789, 791, 792, 794, 796, 798, 801, 805, 806, 810, 812, 816, 817, 818, 819, 823, 824, 825, 827, 832, 839, 842, 847, 849, 850, 854, 855, 857, 863, 865, 866, 871, 874, 875, 876, 877, 878, 879, 881, 882, 885, 886, 887, 89
1, 893, 897, 899, 901, 902, 903, 904, 908, 910, 911, 916, 918, 919, 920, 921, 923, 925, 927, 928, 931, 932, 935, 936, 938, 939, 942, 943, 945, 947, 948, 949, 951, 953, 954, 958, 960, 961, 962, 963, 966, 967, 968, 969, 970, 972, 977, 978, 980, 983, 990, 991, 993, 996, 997, 998, 1000, 1002, 1003, 1004, 1005, 1009, 1010, 1011, 1012, 1014, 1015, 1016, 1018, 1019, 1021, 1023] of bookie: [ole-roemer-bookie-statefulset-1.ole-roemer-bookie.pulsar-testing.svc.cluster.local:3181] are identified as underreplicated
07:06:15.137 [bookkeeper-io-3-1] WARN org.apache.bookkeeper.proto.PerChannelBookieClient - Could not connect to bookie: [id: 0x0546ea8b]/ole-roemer-bookie-statefulset-1.ole-roemer-bookie.pulsar-testing.svc.cluster.local:3181, current state CONNECTING : ole-roemer-bookie-statefulset-1.ole-roemer-bookie.pulsar-testing.svc.cluster.local
07:06:15.142 [bookkeeper-io-3-2] INFO org.apache.bookkeeper.proto.PerChannelBookieClient - Successfully connected to bookie: [id: 0x6cc73bd4, L:/10.100.155.22:48498 - R:ole-roemer-bookie-statefulset-0.ole-roemer-bookie.pulsar-testing.svc.cluster.local/10.100.152.203:3181]
07:06:15.148 [bookkeeper-io-3-2] INFO org.apache.bookkeeper.proto.PerChannelBookieClient - Successfully connected to bookie: ole-roemer-bookie-statefulset-0.ole-roemer-bookie.pulsar-testing.svc.cluster.local:3181
07:06:15.151 [bookkeeper-io-3-1] INFO org.apache.bookkeeper.proto.PerChannelBookieClient - Successfully connected to bookie: [id: 0x09e0a380, L:/10.100.155.22:58330 - R:ole-roemer-bookie-statefulset-2.ole-roemer-bookie.pulsar-testing.svc.cluster.local/10.100.153.94:3181]
07:06:15.151 [bookkeeper-io-3-1] INFO org.apache.bookkeeper.proto.PerChannelBookieClient - Successfully connected to bookie: ole-roemer-bookie-statefulset-2.ole-roemer-bookie.pulsar-testing.svc.cluster.local:3181
07:06:15.175 [bookkeeper-io-3-2] INFO org.apache.bookkeeper.proto.PerChannelBookieClient - connection [id: 0x6cc73bd4, L:/10.100.155.22:48498 - R:ole-roemer-bookie-statefulset-0.ole-roemer-bookie.pulsar-testing.svc.cluster.local/10.100.152.203:3181] authenticated as BookKeeperPrincipal{ANONYMOUS}
07:06:15.175 [bookkeeper-io-3-1] INFO org.apache.bookkeeper.proto.PerChannelBookieClient - connection [id: 0x09e0a380, L:/10.100.155.22:58330 - R:ole-roemer-bookie-statefulset-2.ole-roemer-bookie.pulsar-testing.svc.cluster.local/10.100.153.94:3181] authenticated as BookKeeperPrincipal{ANONYMOUS}
07:06:15.301 [BookKeeperClientWorker-OrderedExecutor-0-0] INFO com.scurrilous.circe.checksum.Crc32cIntChecksum - SSE4.2 CRC32C provider initialized
07:06:15.389 [bookkeeper-io-3-2] WARN org.apache.bookkeeper.proto.PerChannelBookieClient - Could not connect to bookie: [id: 0x4fa0e74e]/ole-roemer-bookie-statefulset-1.ole-roemer-bookie.pulsar-testing.svc.cluster.local:3181, current state CONNECTING : ole-roemer-bookie-statefulset-1.ole-roemer-bookie.pulsar-testing.svc.cluster.local
07:06:15.433 [bookkeeper-io-3-1] WARN org.apache.bookkeeper.proto.PerChannelBookieClient - Could not connect to bookie: [id: 0xf9524733]/ole-roemer-bookie-statefulset-1.ole-roemer-bookie.pulsar-testing.svc.cluster.local:3181, current state CONNECTING : ole-roemer-bookie-statefulset-1.ole-roemer-bookie.pulsar-testing.svc.cluster.local
07:06:15.435 [bookkeeper-io-3-2] INFO org.apache.bookkeeper.proto.PerChannelBookieClient - Successfully connected to bookie: [id: 0x7fc214d5, L:/10.100.155.22:53236 - R:ole-roemer-bookie-statefulset-3.ole-roemer-bookie.pulsar-testing.svc.cluster.local/10.100.155.35:3181]
07:06:15.436 [bookkeeper-io-3-2] INFO org.apache.bookkeeper.proto.PerChannelBookieClient - Successfully connected to bookie: ole-roemer-bookie-statefulset-3.ole-roemer-bookie.pulsar-testing.svc.cluster.local:3181
07:06:15.436 [bookkeeper-io-3-2] INFO org.apache.bookkeeper.proto.PerChannelBookieClient - connection [id: 0x7fc214d5, L:/10.100.155.22:53236 - R:ole-roemer-bookie-statefulset-3.ole-roemer-bookie.pulsar-testing.svc.cluster.local/10.100.155.35:3181] authenticated as BookKeeperPrincipal{ANONYMOUS}
07:06:15.454 [bookkeeper-io-3-1] WARN org.apache.bookkeeper.proto.PerChannelBookieClient - Could not connect to bookie: [id: 0x7d93c45a]/ole-roemer-bookie-statefulset-1.ole-roemer-bookie.pulsar-testing.svc.cluster.local:3181, current state CONNECTING : ole-roemer-bookie-statefulset-1.ole-roemer-bookie.pulsar-testing.svc.cluster.local
07:06:15.459 [PendingReplicationTimer] WARN org.apache.bookkeeper.replication.ReplicationWorker - Missing bookie(s) from last segment. Opening Ledger0 for Recovery.
07:06:15.479 [bookkeeper-io-3-2] WARN org.apache.bookkeeper.proto.PerChannelBookieClient - Could not connect to bookie: [id: 0xef226200]/ole-roemer-bookie-statefulset-1.ole-roemer-bookie.pulsar-testing.svc.cluster.local:3181, current state CONNECTING : ole-roemer-bookie-statefulset-1.ole-roemer-bookie.pulsar-testing.svc.cluster.local
07:06:15.513 [bookkeeper-io-3-1] WARN org.apache.bookkeeper.proto.PerChannelBookieClient - Could not connect to bookie: [id: 0xe898d0d0]/ole-roemer-bookie-statefulset-1.ole-roemer-bookie.pulsar-testing.svc.cluster.local:3181, current state CONNECTING : ole-roemer-bookie-statefulset-1.ole-roemer-bookie.pulsar-testing.svc.cluster.local
07:06:16.081 [bookkeeper-io-3-2] WARN org.apache.bookkeeper.proto.PerChannelBookieClient - Could not connect to bookie: [id: 0x491ad4b9]/ole-roemer-bookie-statefulset-1.ole-roemer-bookie.pulsar-testing.svc.cluster.local:3181, current state CONNECTING : ole-roemer-bookie-statefulset-1.ole-roemer-bookie.pulsar-testing.svc.cluster.local
07:06:16.086 [BookKeeperClientWorker-OrderedExecutor-0-0] WARN org.apache.bookkeeper.client.PendingAddOp - Failed to write entry (0, 1249): Bookie handle is not available
07:06:16.088 [bookkeeper-io-3-1] INFO org.apache.bookkeeper.proto.PerChannelBookieClient - Successfully connected to bookie: [id: 0xe1ea305e, L:/10.100.155.22:53998 - R:ole-roemer-bookie-statefulset-4.ole-roemer-bookie.pulsar-testing.svc.cluster.local/10.100.153.101:3181]
07:06:16.088 [bookkeeper-io-3-1] INFO org.apache.bookkeeper.proto.PerChannelBookieClient - Successfully connected to bookie: ole-roemer-bookie-statefulset-4.ole-roemer-bookie.pulsar-testing.svc.cluster.local:3181
07:06:16.088 [bookkeeper-io-3-1] INFO org.apache.bookkeeper.proto.PerChannelBookieClient - connection [id: 0xe1ea305e, L:/10.100.155.22:53998 - R:ole-roemer-bookie-statefulset-4.ole-roemer-bookie.pulsar-testing.svc.cluster.local/10.100.153.101:3181] authenticated as BookKeeperPrincipal{ANONYMOUS}
07:06:16.089 [bookkeeper-io-3-2] WARN org.apache.bookkeeper.proto.PerChannelBookieClient - Could not connect to bookie: [id: 0xc7799dca]/ole-roemer-bookie-statefulset-1.ole-roemer-bookie.pulsar-testing.svc.cluster.local:3181, current state CONNECTING : ole-roemer-bookie-statefulset-1.ole-roemer-bookie.pulsar-testing.svc.cluster.local
07:06:16.094 [BookKeeperClientWorker-OrderedExecutor-0-0] WARN org.apache.bookkeeper.client.BookieWatcherImpl - replaceBookie for bookie: ole-roemer-bookie-statefulset-1.ole-roemer-bookie.pulsar-testing.svc.cluster.local:3181 in ensemble: [ole-roemer-bookie-statefulset-1.ole-roemer-bookie.pulsar-testing.svc.cluster.local:3181, ole-roemer-bookie-statefulset-0.ole-roemer-bookie.pulsar-testing.svc.cluster.local:3181, ole-roemer-bookie-statefulset-2.ole-roemer-bookie.pulsar-testing.svc.cluster.local:3181] is not adhering to placement policy and chose ole-roemer-bookie-statefulset-4.ole-roemer-bookie.pulsar-testing.svc.cluster.local:3181. excludedBookies [ole-roemer-bookie-statefulset-1.ole-roemer-bookie.pulsar-testing.svc.cluster.local:3181] and quarantinedBookies []
07:06:16.095 [BookKeeperClientWorker-OrderedExecutor-0-0] INFO org.apache.bookkeeper.client.PendingReadOp - Error: Bookie handle is not available while reading L0 E1251 from bookie: ole-roemer-bookie-statefulset-1.ole-roemer-bookie.pulsar-testing.svc.cluster.local:3181
07:06:16.128 [BookKeeperClientWorker-OrderedExecutor-0-0] INFO org.apache.bookkeeper.client.ReadOnlyLedgerHandle - Closing recovered ledger 0 at entry 1250
07:06:16.146 [bookkeeper-io-3-1] WARN org.apache.bookkeeper.proto.PerChannelBookieClient - Could not connect to bookie: [id: 0x1e410fe0]/ole-roemer-bookie-statefulset-1.ole-roemer-bookie.pulsar-testing.svc.cluster.local:3181, current state CONNECTING : ole-roemer-bookie-statefulset-1.ole-roemer-bookie.pulsar-testing.svc.cluster.local
07:06:16.146 [PendingReplicationTimer] WARN org.apache.bookkeeper.replication.ReplicationWorker - Missing bookie(s) from last segment. Opening Ledger7 for Recovery.
07:06:16.155 [bookkeeper-io-3-2] WARN org.apache.bookkeeper.proto.PerChannelBookieClient - Could not connect to bookie: [id: 0x8c7cde07]/ole-roemer-bookie-statefulset-1.ole-roemer-bookie.pulsar-testing.svc.cluster.local:3181, current state CONNECTING : ole-roemer-bookie-statefulset-1.ole-roemer-bookie.pulsar-testing.svc.cluster.local
07:06:16.206 [bookkeeper-io-3-1] WARN org.apache.bookkeeper.proto.PerChannelBookieClient - Could not connect to bookie: [id: 0xd2a87f69]/ole-roemer-bookie-statefulset-1.ole-roemer-bookie.pulsar-testing.svc.cluster.local:3181, current state CONNECTING : ole-roemer-bookie-statefulset-1.ole-roemer-bookie.pulsar-testing.svc.cluster.local
07:06:16.238 [bookkeeper-io-3-2] WARN org.apache.bookkeeper.proto.PerChannelBookieClient - Could not connect to bookie: [id: 0xe2748b57]/ole-roemer-bookie-statefulset-1.ole-roemer-bookie.pulsar-testing.svc.cluster.local:3181, current state CONNECTING : ole-roemer-bookie-statefulset-1.ole-roemer-bookie.pulsar-testing.svc.cluster.local
07:06:16.238 [BookKeeperClientWorker-OrderedExecutor-0-0] INFO org.apache.bookkeeper.client.PendingReadOp - Error: Bookie handle is not available while reading L7 E1251 from bookie: ole-roemer-bookie-statefulset-1.ole-roemer-bookie.pulsar-testing.svc.cluster.local:3181
07:06:16.247 [bookkeeper-io-3-1] WARN org.apache.bookkeeper.proto.PerChannelBookieClient - Could not connect to bookie: [id: 0x608adf4d]/ole-roemer-bookie-statefulset-1.ole-roemer-bookie.pulsar-testing.svc.cluster.local:3181, current state CONNECTING : ole-roemer-bookie-statefulset-1.ole-roemer-bookie.pulsar-testing.svc.cluster.local
07:06:16.248 [BookKeeperClientWorker-OrderedExecutor-0-0] WARN org.apache.bookkeeper.client.PendingAddOp - Failed to write entry (7, 1251): Bookie handle is not available
07:06:16.248 [BookKeeperClientWorker-OrderedExecutor-0-0] WARN org.apache.bookkeeper.client.BookieWatcherImpl - replaceBookie for bookie: ole-roemer-bookie-statefulset-1.ole-roemer-bookie.pulsar-testing.svc.cluster.local:3181 in ensemble: [ole-roemer-bookie-statefulset-1.ole-roemer-bookie.pulsar-testing.svc.cluster.local:3181, ole-roemer-bookie-statefulset-0.ole-roemer-bookie.pulsar-testing.svc.cluster.local:3181, ole-roemer-bookie-statefulset-3.ole-roemer-bookie.pulsar-testing.svc.cluster.local:3181] is not adhering to placement policy and chose ole-roemer-bookie-statefulset-4.ole-roemer-bookie.pulsar-testing.svc.cluster.local:3181. excludedBookies [ole-roemer-bookie-statefulset-1.ole-roemer-bookie.pulsar-testing.svc.cluster.local:3181] and quarantinedBookies []
07:06:16.263 [bookkeeper-io-3-2] WARN org.apache.bookkeeper.proto.PerChannelBookieClient - Could not connect to bookie: [id: 0xdbdc2fa3]/ole-roemer-bookie-statefulset-1.ole-roemer-bookie.pulsar-testing.svc.cluster.local:3181, current state CONNECTING : ole-roemer-bookie-statefulset-1.ole-roemer-bookie.pulsar-testing.svc.cluster.local
07:06:16.264 [BookKeeperClientWorker-OrderedExecutor-0-0] INFO org.apache.bookkeeper.client.PendingReadOp - Error: Bookie handle is not available while reading L7 E1254 from bookie: ole-roemer-bookie-statefulset-1.ole-roemer-bookie.pulsar-testing.svc.cluster.local:3181
07:06:16.275 [bookkeeper-io-3-1] WARN org.apache.bookkeeper.proto.PerChannelBookieClient - Could not connect to bookie: [id: 0xd171dca9]/ole-roemer-bookie-statefulset-1.ole-roemer-bookie.pulsar-testing.svc.cluster.local:3181, current state CONNECTING : ole-roemer-bookie-statefulset-1.ole-roemer-bookie.pulsar-testing.svc.cluster.local
07:06:16.276 [BookKeeperClientWorker-OrderedExecutor-0-0] INFO org.apache.bookkeeper.client.PendingReadOp - Error: Bookie handle is not available while reading L7 E1257 from bookie: ole-roemer-bookie-statefulset-1.ole-roemer-bookie.pulsar-testing.svc.cluster.local:3181
07:06:16.283 [bookkeeper-io-3-2] WARN org.apache.bookkeeper.proto.PerChannelBookieClient - Could not connect to bookie: [id: 0x1ca3ecb9]/ole-roemer-bookie-statefulset-1.ole-roemer-bookie.pulsar-testing.svc.cluster.local:3181, current state CONNECTING : ole-roemer-bookie-statefulset-1.ole-roemer-bookie.pulsar-testing.svc.cluster.local
07:06:16.289 [bookkeeper-io-3-1] WARN org.apache.bookkeeper.proto.PerChannelBookieClient - Could not connect to bookie: [id: 0xfb0666d2]/ole-roemer-bookie-statefulset-1.ole-roemer-bookie.pulsar-testing.svc.cluster.local:3181, current state CONNECTING : ole-roemer-bookie-statefulset-1.ole-roemer-bookie.pulsar-testing.svc.cluster.local
07:06:16.289 [BookKeeperClientWorker-OrderedExecutor-0-0] INFO org.apache.bookkeeper.client.PendingReadOp - Error: Bookie handle is not available while reading L7 E1260 from bookie: ole-roemer-bookie-statefulset-1.ole-roemer-bookie.pulsar-testing.svc.cluster.local:3181
07:06:16.300 [bookkeeper-io-3-2] WARN org.apache.bookkeeper.proto.PerChannelBookieClient - Could not connect to bookie: [id: 0x6a4c50f2]/ole-roemer-bookie-statefulset-1.ole-roemer-bookie.pulsar-testing.svc.cluster.local:3181, current state CONNECTING : ole-roemer-bookie-statefulset-1.ole-roemer-bookie.pulsar-testing.svc.cluster.local
07:06:16.300 [BookKeeperClientWorker-OrderedExecutor-0-0] INFO org.apache.bookkeeper.client.PendingReadOp - Error: Bookie handle is not available while reading L7 E1263 from bookie: ole-roemer-bookie-statefulset-1.ole-roemer-bookie.pulsar-testing.svc.cluster.local:3181
07:06:16.310 [bookkeeper-io-3-1] WARN org.apache.bookkeeper.proto.PerChannelBookieClient - Could not connect to bookie: [id: 0x79264105]/ole-roemer-bookie-statefulset-1.ole-roemer-bookie.pulsar-testing.svc.cluster.local:3181, current state CONNECTING : ole-roemer-bookie-statefulset-1.ole-roemer-bookie.pulsar-testing.svc.cluster.local
07:06:16.311 [BookKeeperClientWorker-OrderedExecutor-0-0] INFO org.apache.bookkeeper.client.PendingReadOp - Error: Bookie handle is not available while reading L7 E1266 from bookie: ole-roemer-bookie-statefulset-1.ole-roemer-bookie.pulsar-testing.svc.cluster.local:3181
07:06:16.320 [BookKeeperClientWorker-OrderedExecutor-0-0] INFO org.apache.bookkeeper.client.ReadOnlyLedgerHandle - Closing recovered ledger 7 at entry 1265
07:06:16.331 [bookkeeper-io-3-2] WARN org.apache.bookkeeper.proto.PerChannelBookieClient - Could not connect to bookie: [id: 0xe4fbd53a]/ole-roemer-bookie-statefulset-1.ole-roemer-bookie.pulsar-testing.svc.cluster.local:3181, current state CONNECTING : ole-roemer-bookie-statefulset-1.ole-roemer-bookie.pulsar-testing.svc.cluster.local
07:06:16.339 [PendingReplicationTimer] WARN org.apache.bookkeeper.replication.ReplicationWorker - Missing bookie(s) from last segment. Opening Ledger40 for Recovery.
07:06:16.348 [bookkeeper-io-3-1] WARN org.apache.bookkeeper.proto.PerChannelBookieClient - Could not connect to bookie: [id: 0x6d718fd9]/ole-roemer-bookie-statefulset-1.ole-roemer-bookie.pulsar-testing.svc.cluster.local:3181, current state CONNECTING : ole-roemer-bookie-statefulset-1.ole-roemer-bookie.pulsar-testing.svc.cluster.local
07:06:16.355 [bookkeeper-io-3-2] WARN org.apache.bookkeeper.proto.PerChannelBookieClient - Could not connect to bookie: [id: 0x94849170]/ole-roemer-bookie-statefulset-1.ole-roemer-bookie.pulsar-testing.svc.cluster.local:3181, current state CONNECTING : ole-roemer-bookie-statefulset-1.ole-roemer-bookie.pulsar-testing.svc.cluster.local
07:06:16.360 [bookkeeper-io-3-1] WARN org.apache.bookkeeper.proto.PerChannelBookieClient - Could not connect to bookie: [id: 0x2e280b3c]/ole-roemer-bookie-statefulset-1.ole-roemer-bookie.pulsar-testing.svc.cluster.local:3181, current state CONNECTING : ole-roemer-bookie-statefulset-1.ole-roemer-bookie.pulsar-testing.svc.cluster.local
07:06:16.402 [bookkeeper-io-3-2] WARN org.apache.bookkeeper.proto.PerChannelBookieClient - Could not connect to bookie: [id: 0x54b9fad3]/ole-roemer-bookie-statefulset-1.ole-roemer-bookie.pulsar-testing.svc.cluster.local:3181, current state CONNECTING : ole-roemer-bookie-statefulset-1.ole-roemer-bookie.pulsar-testing.svc.cluster.local
07:06:16.411 [bookkeeper-io-3-1] WARN org.apache.bookkeeper.proto.PerChannelBookieClient - Could not connect to bookie: [id: 0xbe23d8cd]/ole-roemer-bookie-statefulset-1.ole-roemer-bookie.pulsar-testing.svc.cluster.local:3181, current state CONNECTING : ole-roemer-bookie-statefulset-1.ole-roemer-bookie.pulsar-testing.svc.cluster.local
07:06:16.470 [bookkeeper-io-3-2] WARN org.apache.bookkeeper.proto.PerChannelBookieClient - Could not connect to bookie: [id: 0x407f04f9]/ole-roemer-bookie-statefulset-1.ole-roemer-bookie.pulsar-testing.svc.cluster.local:3181, current state CONNECTING : ole-roemer-bookie-statefulset-1.ole-roemer-bookie.pulsar-testing.svc.cluster.local
07:06:16.478 [bookkeeper-io-3-1] WARN org.apache.bookkeeper.proto.PerChannelBookieClient - Could not connect to bookie: [id: 0xe64ecb81]/ole-roemer-bookie-statefulset-1.ole-roemer-bookie.pulsar-testing.svc.cluster.local:3181, current state CONNECTING : ole-roemer-bookie-statefulset-1.ole-roemer-bookie.pulsar-testing.svc.cluster.local
07:06:16.509 [bookkeeper-io-3-2] WARN org.apache.bookkeeper.proto.PerChannelBookieClient - Could not connect to bookie: [id: 0x916c40ef]/ole-roemer-bookie-statefulset-1.ole-roemer-bookie.pulsar-testing.svc.cluster.local:3181, current state CONNECTING : ole-roemer-bookie-statefulset-1.ole-roemer-bookie.pulsar-testing.svc.cluster.local
07:06:16.991 [ReplicationWorker] INFO org.apache.bookkeeper.client.LedgerFragmentReplicator - Replicating fragment Fragment(LedgerID: 1031, FirstEntryID: 0[0], LastKnownEntryID: 78[78], Host: [ole-roemer-bookie-statefulset-1.ole-roemer-bookie.pulsar-testing.svc.cluster.local:3181], Closed: true) in 1 sub fragments.
07:06:16.995 [bookkeeper-io-3-1] WARN org.apache.bookkeeper.proto.PerChannelBookieClient - Could not connect to bookie: [id: 0xf883ca8c]/ole-roemer-bookie-statefulset-1.ole-roemer-bookie.pulsar-testing.svc.cluster.local:3181, current state CONNECTING : ole-roemer-bookie-statefulset-1.ole-roemer-bookie.pulsar-testing.svc.cluster.local
07:06:16.998 [bookkeeper-io-3-2] WARN org.apache.bookkeeper.proto.PerChannelBookieClient - Could not connect to bookie: [id: 0x571a528b]/ole-roemer-bookie-statefulset-1.ole-roemer-bookie.pulsar-testing.svc.cluster.local:3181, current state CONNECTING : ole-roemer-bookie-statefulset-1.ole-roemer-bookie.pulsar-testing.svc.cluster.local
07:06:17.000 [bookkeeper-io-3-1] WARN org.apache.bookkeeper.proto.PerChannelBookieClient - Could not connect to bookie: [id: 0x5297eb4c]/ole-roemer-bookie-statefulset-1.ole-roemer-bookie.pulsar-testing.svc.cluster.local:3181, current state CONNECTING : ole-roemer-bookie-statefulset-1.ole-roemer-bookie.pulsar-testing.svc.cluster.local
07:06:17.001 [BookKeeperClientWorker-OrderedExecutor-0-0] INFO org.apache.bookkeeper.client.PendingReadOp - Error: Bookie handle is not available while reading L40 E1256 from bookie: ole-roemer-bookie-statefulset-1.ole-roemer-bookie.pulsar-testing.svc.cluster.local:3181
07:06:17.002 [BookKeeperClientWorker-OrderedExecutor-0-0] INFO org.apache.bookkeeper.client.PendingReadOp - Error: Bookie handle is not available while reading L1031 E1 from bookie: ole-roemer-bookie-statefulset-1.ole-roemer-bookie.pulsar-testing.svc.cluster.local:3181
```
Note that the logs continue in the same fashion for quite some time. I'd guess that they stop once the underreplicated ledgers are rereplicated, but I haven't specifically tested to see.
***Additional context***
Please let me know if any additional context is needed. Thanks!
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [bookkeeper] michaeljmarshall commented on issue #2506: Autorecovery Process Tries to Connect to Bookie that Cleanly Exited Cluster
Posted by GitBox <gi...@apache.org>.
michaeljmarshall commented on issue #2506:
URL: https://github.com/apache/bookkeeper/issues/2506#issuecomment-738590120
> IIRC there is no preliminary check on the availability of the bookie before readning.
That would certainly explain he behavior I'm seeing. Further, I reproduced the problematic behavior today, and while looking at the pulsar brokers (also users of the BK client), I noticed that they were logging the same errors as well (and very frequently, too).
> btw it would be better that the BK client uses the availability information to direct the first read trial to an available bookie, and we can improve it. Therefore in this specific case we know that the bookie is dead, and it is really useless to try to connect.
This is what I am hoping we'll be able to do. I am new to the Bookkeeper project, but from my perspective, it seems essential that clients are optimized to handle a loss of up to (Qa - 1) bookies without much of a problem. It seems obvious that there might be initial connectivity issues in the case where a bookie leaves ungracefully, but even then, it should be deemed unreachable at some (configurable) time and then the clients should stop trying to connect to it.
Additionally, I ran my above steps to reproduce the issue with pulsar 2.7.0, which shipped with Bookkeeper 4.12.0 jars, and I was able to reproduce the behavior. Although, in the 4.12.0 version, the error logging is more verbose, as it includes a stack trace for each logged error.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [bookkeeper] eolivelli closed issue #2506: Autorecovery Process Tries to Connect to Bookie that Cleanly Exited Cluster
Posted by GitBox <gi...@apache.org>.
eolivelli closed issue #2506:
URL: https://github.com/apache/bookkeeper/issues/2506
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [bookkeeper] eolivelli commented on issue #2506: Autorecovery Process Tries to Connect to Bookie that Cleanly Exited Cluster
Posted by GitBox <gi...@apache.org>.
eolivelli commented on issue #2506:
URL: https://github.com/apache/bookkeeper/issues/2506#issuecomment-737729921
The BookKeeper client is using the 'available bookie' information only in the write path, when it uses the PlacementPolicy to choose which bookie to write to.
IIRC there is no preliminary check on the availability of the bookie before readning.
btw it would be better that the BK client uses the availability information to direct the first read trial to an available bookie, and we can improve it. Therefore in this specific case we know that the bookie is dead, and it is really useless to try to connect.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [bookkeeper] eolivelli commented on issue #2506: Autorecovery Process Tries to Connect to Bookie that Cleanly Exited Cluster
Posted by GitBox <gi...@apache.org>.
eolivelli commented on issue #2506:
URL: https://github.com/apache/bookkeeper/issues/2506#issuecomment-758460890
@michaeljmarshall your proposal makes sense to me.
I wonder if we should add a configuration entry in order to allow users to fallback to the legacy behaviour
cc @reddycharan @karanmehta93 @ivankelly @sijie
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [bookkeeper] eolivelli commented on issue #2506: Autorecovery Process Tries to Connect to Bookie that Cleanly Exited Cluster
Posted by GitBox <gi...@apache.org>.
eolivelli commented on issue #2506:
URL: https://github.com/apache/bookkeeper/issues/2506#issuecomment-743436443
Please try to set up a draft PR.
If the patch is too bug then we will start a BP.
I am not sure we need a configuration flag.
I guess that it is enough to ask to the BookieWatcher if a bookie is available before trying to use it, so it is already something very straightforward and expected
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [bookkeeper] michaeljmarshall commented on issue #2506: Autorecovery Process Tries to Connect to Bookie that Cleanly Exited Cluster
Posted by GitBox <gi...@apache.org>.
michaeljmarshall commented on issue #2506:
URL: https://github.com/apache/bookkeeper/issues/2506#issuecomment-743406920
@eolivelli what do you think it'll take to update the bookkeeper client to prevent this behavior? Would this be large enough to be a proposal, or is it simply a feature? I'm interested in helping contribute a solution, but I'm new to the project, so I'll need some direction. Thanks!
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [bookkeeper] michaeljmarshall commented on issue #2506: Autorecovery Process Tries to Connect to Bookie that Cleanly Exited Cluster
Posted by GitBox <gi...@apache.org>.
michaeljmarshall commented on issue #2506:
URL: https://github.com/apache/bookkeeper/issues/2506#issuecomment-743467889
Thanks for the direction. I'll look into the `BookieWatcher` and try to get a PR up soon.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [bookkeeper] michaeljmarshall commented on issue #2506: Autorecovery Process Tries to Connect to Bookie that Cleanly Exited Cluster
Posted by GitBox <gi...@apache.org>.
michaeljmarshall commented on issue #2506:
URL: https://github.com/apache/bookkeeper/issues/2506#issuecomment-744265121
@eolivelli - after digging into the project a bit, I can see that there could be an improvement to the bookie client. However, I think the autorecovery process itself could also be improved by improving the logic in the following method:
https://github.com/apache/bookkeeper/blob/a1af31e915ec74a5066b8f1dadb15fd8cfaf8d43/bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerChecker.java#L142-L168
The `verifyLedgerFragment` method is called by the `checkLedger` method, which is then called by the `Auditor` and the `ReplicationWorker`. Both of these classes have access to the BK Admin client, and would be able to get and then pass in all of the currently available bookies, which could then be checked against the ledger's bookies to determine "known bad bookies" without attempting to read entries from a bunch of bookies. By calculating (some of) the bad bookies before trying to read entries, the method could skip bookies that are already known to be down. This would prevent thrash while trying to read entries to empirically build a collection of bad bookies.
Does this seem like a reasonable path to go down at this point? I think there might be multiple PRs that could result from this issue.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [bookkeeper] eolivelli commented on issue #2506: Autorecovery Process Tries to Connect to Bookie that Cleanly Exited Cluster
Posted by GitBox <gi...@apache.org>.
eolivelli commented on issue #2506:
URL: https://github.com/apache/bookkeeper/issues/2506#issuecomment-775741472
Adding more state to ZK is a problem, and it will add much complexity (without thinking about backward compatibility issues)
The BookieWatcher has its own cache (see BookieWatcherImpl), you can use it, this way you will save calls to ZK
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [bookkeeper] michaeljmarshall commented on issue #2506: Autorecovery Process Tries to Connect to Bookie that Cleanly Exited Cluster
Posted by GitBox <gi...@apache.org>.
michaeljmarshall commented on issue #2506:
URL: https://github.com/apache/bookkeeper/issues/2506#issuecomment-775743109
@eolivelli - thanks for the direction! That's helpful context.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [bookkeeper] michaeljmarshall commented on issue #2506: Autorecovery Process Tries to Connect to Bookie that Cleanly Exited Cluster
Posted by GitBox <gi...@apache.org>.
michaeljmarshall commented on issue #2506:
URL: https://github.com/apache/bookkeeper/issues/2506#issuecomment-775732839
@eolivelli - I took a look at this a bit, and I have some questions about design.
In order to make this code easiest to test, I think it makes sense to include a list of `badBookies` instead of `availableBookies`. In looking at the available methods in the bookkeeper admin, there is `getAvailableBookies` (which seems more like `getWritableBookies`), `getReadOnlyBookies`, and `getAllBookies`. I believe `getAllBookies` would include bookies that are unavailable but not yet decommissioned, but I haven't validated that yet. If it does include them, then the only way I currently see to `getUnavailableBookies` is to call all 3 methods and find the bookies that are in the "all" list but not in the other two.
I think it might be reasonable to call all three methods to get the unavailable bookies, given that this code is not run that often, but I wanted to check before I looked into it more. Do you think it's okay to require three zk calls to get this state? The alternative is to add more state to zk, which I assume is a more cumbersome task.
I can put up a tentative PR to show you what I mean, if that'd be helpful.
Thanks for any insight you can give!
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [bookkeeper] michaeljmarshall commented on issue #2506: Autorecovery Process Tries to Connect to Bookie that Cleanly Exited Cluster
Posted by GitBox <gi...@apache.org>.
michaeljmarshall commented on issue #2506:
URL: https://github.com/apache/bookkeeper/issues/2506#issuecomment-754370446
@eolivelli - what do you think about my comments on the above method? Thanks!
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org