You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pulsar.apache.org by GitBox <gi...@apache.org> on 2022/05/06 03:07:45 UTC
[GitHub] [pulsar] kwenZh opened a new issue, #15464: auto recovery error log : Corrupted frame received from bookie, under replicated can not covery
kwenZh opened a new issue, #15464:
URL: https://github.com/apache/pulsar/issues/15464
**Describe the bug**
there are a pulsar cluster in k8s
We decided to migrate bookie to a new bookie cluster,
1, we set auto recovery is enabled
2. deploy a new bookie statefulset normally
3. Reduced number of instances one by one .
4. i can see recovery is working
old bookie cluster has 5 pod (0-4), first i close bookie-4
but in latest , there are some ledgerId are not migrate successful,
bin/bookkeeper shell listunderreplicated
```
02:27:18.605 [main-EventThread] INFO org.apache.bookkeeper.zookeeper.ZooKeeperWatcherBase - ZooKeeper client is connected now.
02:27:18.720 [main] INFO org.apache.bookkeeper.tools.cli.commands.autorecovery.ListUnderReplicatedCommand - 90129
02:27:18.721 [main] INFO org.apache.bookkeeper.tools.cli.commands.autorecovery.ListUnderReplicatedCommand - Ctime : 1651748446769
02:27:18.722 [main] INFO org.apache.bookkeeper.tools.cli.commands.autorecovery.ListUnderReplicatedCommand - 93326
02:27:18.723 [main] INFO org.apache.bookkeeper.tools.cli.commands.autorecovery.ListUnderReplicatedCommand - Ctime : 1651748452235
02:27:18.724 [main] INFO org.apache.bookkeeper.tools.cli.commands.autorecovery.ListUnderReplicatedCommand - 90190
02:27:18.724 [main] INFO org.apache.bookkeeper.tools.cli.commands.autorecovery.ListUnderReplicatedCommand - Ctime : 1651748446993
02:27:18.726 [main] INFO org.apache.bookkeeper.tools.cli.commands.autorecovery.ListUnderReplicatedCommand - 93508
02:27:18.726 [main] INFO org.apache.bookkeeper.tools.cli.commands.autorecovery.ListUnderReplicatedCommand - Ctime : 1651748452461
02:27:18.728 [main] INFO org.apache.bookkeeper.tools.cli.commands.autorecovery.ListUnderReplicatedCommand - 90955
02:27:18.728 [main] INFO org.apache.bookkeeper.tools.cli.commands.autorecovery.ListUnderReplicatedCommand - Ctime : 1651748448866
02:27:18.730 [main] INFO org.apache.bookkeeper.tools.cli.commands.autorecovery.ListUnderReplicatedCommand - 95981
02:27:18.730 [main] INFO org.apache.bookkeeper.tools.cli.commands.autorecovery.ListUnderReplicatedCommand - Ctime : 1651748457162
02:27:18.732 [main] INFO org.apache.bookkeeper.tools.cli.commands.autorecovery.ListUnderReplicatedCommand - 92099
02:27:18.732 [main] INFO org.apache.bookkeeper.tools.cli.commands.autorecovery.ListUnderReplicatedCommand - Ctime : 1651748450587
02:27:18.734 [main] INFO org.apache.bookkeeper.tools.cli.commands.autorecovery.ListUnderReplicatedCommand - 92461
02:27:18.734 [main] INFO org.apache.bookkeeper.tools.cli.commands.autorecovery.ListUnderReplicatedCommand - Ctime : 1651748451050
02:27:18.735 [main] INFO org.apache.bookkeeper.tools.cli.commands.autorecovery.ListUnderReplicatedCommand - 98373
02:27:18.736 [main] INFO org.apache.bookkeeper.tools.cli.commands.autorecovery.ListUnderReplicatedCommand - Ctime : 1651748426002
02:27:18.737 [main] INFO org.apache.bookkeeper.tools.cli.commands.autorecovery.ListUnderReplicatedCommand - 98613
02:27:18.737 [main] INFO org.apache.bookkeeper.tools.cli.commands.autorecovery.ListUnderReplicatedCommand - Ctime : 1651748426602
02:27:18.739 [main-SendThread(pulsar-cluster-zookeeper:2181)] WARN org.apache.zookeeper.ClientCnxn - An exception was thrown while closing send thread for session 0x2000018322a00a4.
org.apache.zookeeper.ClientCnxn$EndOfStreamException: Unable to read additional data from server sessionid 0x2000018322a00a4, likely server has closed socket
at org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:77) ~[org.apache.zookeeper-zookeeper-3.6.3.jar:3.6.3]
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350) ~[org.apache.zookeeper-zookeeper-3.6.3.jar:3.6.3]
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1290) [org.apache.zookeeper-zookeeper-3.6.3.jar:3.6.3]
02:27:18.854 [main-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x2000018322a00a4
02:27:18.854 [main] INFO org.apache.zookeeper.ZooKeeper - Session: 0x2000018322a00a4 closed
```
auto recovery log:
```
02:37:31.367 [main-EventThread] INFO org.apache.bookkeeper.meta.ZkLedgerUnderreplicationManager - Latch countdown due to ZK event: WatchedEvent state:SyncConnected type:NodeChildrenChanged path:/ledgers/underreplication/locks
02:37:31.369 [main-EventThread] INFO org.apache.bookkeeper.meta.ZkLedgerUnderreplicationManager - Latch countdown due to ZK event: WatchedEvent state:SyncConnected type:NodeChildrenChanged path:/ledgers/underreplication/locks
02:38:47.113 [main-EventThread] INFO org.apache.bookkeeper.meta.ZkLedgerUnderreplicationManager - Latch countdown due to ZK event: WatchedEvent state:SyncConnected type:NodeChildrenChanged path:/ledgers/underreplication/locks
02:38:47.115 [main-EventThread] INFO org.apache.bookkeeper.meta.ZkLedgerUnderreplicationManager - Latch countdown due to ZK event: WatchedEvent state:SyncConnected type:NodeChildrenChanged path:/ledgers/underreplication/locks
02:39:10.731 [main-EventThread] INFO org.apache.bookkeeper.meta.ZkLedgerUnderreplicationManager - Latch countdown due to ZK event: WatchedEvent state:SyncConnected type:NodeChildrenChanged path:/ledgers/underreplication/locks
02:39:10.734 [main-EventThread] INFO org.apache.bookkeeper.meta.ZkLedgerUnderreplicationManager - Latch countdown due to ZK event: WatchedEvent state:SyncConnected type:NodeChildrenChanged path:/ledgers/underreplication/locks
02:39:10.736 [bookkeeper-io-3-4] INFO org.apache.bookkeeper.proto.PerChannelBookieClient - Successfully connected to bookie: pulsar-cluster-bookie-3.pulsar-cluster-bookie.pulsar.svc.cluster.local:3181 [id: 0xb94a56fe, L:/10.244.81.13:45666 - R:pulsar-cluster-bookie-3.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.74.151:3181]
02:39:10.736 [bookkeeper-io-3-4] INFO org.apache.bookkeeper.proto.PerChannelBookieClient - connection [id: 0xb94a56fe, L:/10.244.81.13:45666 - R:pulsar-cluster-bookie-3.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.74.151:3181] authenticated as BookKeeperPrincipal{ANONYMOUS}
02:39:10.737 [bookkeeper-io-3-1] INFO org.apache.bookkeeper.proto.PerChannelBookieClient - Successfully connected to bookie: pulsar-cluster-bookie-2.pulsar-cluster-bookie.pulsar.svc.cluster.local:3181 [id: 0x0030dc6a, L:/10.244.81.13:44790 - R:pulsar-cluster-bookie-2.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.178.149:3181]
02:39:10.737 [bookkeeper-io-3-1] INFO org.apache.bookkeeper.proto.PerChannelBookieClient - connection [id: 0x0030dc6a, L:/10.244.81.13:44790 - R:pulsar-cluster-bookie-2.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.178.149:3181] authenticated as BookKeeperPrincipal{ANONYMOUS}
02:39:10.747 [bookkeeper-io-3-4] ERROR org.apache.bookkeeper.proto.PerChannelBookieClient - Corrupted frame received from bookie: pulsar-cluster-bookie-3.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.74.151:3181
02:39:10.747 [bookkeeper-io-3-1] ERROR org.apache.bookkeeper.proto.PerChannelBookieClient - Corrupted frame received from bookie: pulsar-cluster-bookie-2.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.178.149:3181
02:39:10.747 [bookkeeper-io-3-4] INFO org.apache.bookkeeper.proto.PerChannelBookieClient - Disconnected from bookie channel [id: 0xb94a56fe, L:/10.244.81.13:45666 ! R:pulsar-cluster-bookie-3.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.74.151:3181]
02:39:10.747 [bookkeeper-io-3-1] INFO org.apache.bookkeeper.proto.PerChannelBookieClient - Disconnected from bookie channel [id: 0x0030dc6a, L:/10.244.81.13:44790 ! R:pulsar-cluster-bookie-2.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.178.149:3181]
02:39:10.748 [bookkeeper-io-3-2] INFO org.apache.bookkeeper.proto.PerChannelBookieClient - Successfully connected to bookie: pulsar-cluster-bookie-3.pulsar-cluster-bookie.pulsar.svc.cluster.local:3181 [id: 0xd8237aad, L:/10.244.81.13:45670 - R:pulsar-cluster-bookie-3.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.74.151:3181]
02:39:10.748 [bookkeeper-io-3-3] INFO org.apache.bookkeeper.proto.PerChannelBookieClient - Successfully connected to bookie: pulsar-cluster-bookie-2.pulsar-cluster-bookie.pulsar.svc.cluster.local:3181 [id: 0x71fcf2a7, L:/10.244.81.13:44794 - R:pulsar-cluster-bookie-2.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.178.149:3181]
02:39:10.748 [bookkeeper-io-3-2] INFO org.apache.bookkeeper.proto.PerChannelBookieClient - connection [id: 0xd8237aad, L:/10.244.81.13:45670 - R:pulsar-cluster-bookie-3.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.74.151:3181] authenticated as BookKeeperPrincipal{ANONYMOUS}
02:39:10.748 [bookkeeper-io-3-3] INFO org.apache.bookkeeper.proto.PerChannelBookieClient - connection [id: 0x71fcf2a7, L:/10.244.81.13:44794 - R:pulsar-cluster-bookie-2.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.178.149:3181] authenticated as BookKeeperPrincipal{ANONYMOUS}
02:39:10.763 [bookkeeper-io-3-2] ERROR org.apache.bookkeeper.proto.PerChannelBookieClient - Corrupted frame received from bookie: pulsar-cluster-bookie-3.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.74.151:3181
02:39:10.763 [bookkeeper-io-3-2] INFO org.apache.bookkeeper.proto.PerChannelBookieClient - Disconnected from bookie channel [id: 0xd8237aad, L:/10.244.81.13:45670 ! R:pulsar-cluster-bookie-3.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.74.151:3181]
02:39:10.764 [bookkeeper-io-3-3] ERROR org.apache.bookkeeper.proto.PerChannelBookieClient - Corrupted frame received from bookie: pulsar-cluster-bookie-2.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.178.149:3181
02:39:10.764 [bookkeeper-io-3-3] INFO org.apache.bookkeeper.proto.PerChannelBookieClient - Disconnected from bookie channel [id: 0x71fcf2a7, L:/10.244.81.13:44794 ! R:pulsar-cluster-bookie-2.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.178.149:3181]
02:39:10.765 [bookkeeper-io-3-4] INFO org.apache.bookkeeper.proto.PerChannelBookieClient - Successfully connected to bookie: pulsar-cluster-bookie-3.pulsar-cluster-bookie.pulsar.svc.cluster.local:3181 [id: 0x7021b4ff, L:/10.244.81.13:45674 - R:pulsar-cluster-bookie-3.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.74.151:3181]
02:39:10.765 [bookkeeper-io-3-4] INFO org.apache.bookkeeper.proto.PerChannelBookieClient - connection [id: 0x7021b4ff, L:/10.244.81.13:45674 - R:pulsar-cluster-bookie-3.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.74.151:3181] authenticated as BookKeeperPrincipal{ANONYMOUS}
02:39:10.773 [bookkeeper-io-3-4] ERROR org.apache.bookkeeper.proto.PerChannelBookieClient - Corrupted frame received from bookie: pulsar-cluster-bookie-3.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.74.151:3181
02:39:10.773 [bookkeeper-io-3-4] INFO org.apache.bookkeeper.proto.PerChannelBookieClient - Disconnected from bookie channel [id: 0x7021b4ff, L:/10.244.81.13:45674 ! R:pulsar-cluster-bookie-3.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.74.151:3181]
02:39:10.773 [BookKeeperClientWorker-OrderedExecutor-0-0] INFO org.apache.bookkeeper.client.PendingReadOp - Error: Bookie handle is not available while reading L90190 E0 from bookie: pulsar-cluster-bookie-3.pulsar-cluster-bookie.pulsar.svc.cluster.local:3181
02:39:10.774 [bookkeeper-io-3-1] INFO org.apache.bookkeeper.proto.PerChannelBookieClient - Successfully connected to bookie: pulsar-cluster-bookie-2.pulsar-cluster-bookie.pulsar.svc.cluster.local:3181 [id: 0x0c5a059e, L:/10.244.81.13:44798 - R:pulsar-cluster-bookie-2.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.178.149:3181]
02:39:10.774 [bookkeeper-io-3-1] INFO org.apache.bookkeeper.proto.PerChannelBookieClient - connection [id: 0x0c5a059e, L:/10.244.81.13:44798 - R:pulsar-cluster-bookie-2.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.178.149:3181] authenticated as BookKeeperPrincipal{ANONYMOUS}
02:39:10.783 [bookkeeper-io-3-1] ERROR org.apache.bookkeeper.proto.PerChannelBookieClient - Corrupted frame received from bookie: pulsar-cluster-bookie-2.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.178.149:3181
02:39:10.783 [bookkeeper-io-3-1] INFO org.apache.bookkeeper.proto.PerChannelBookieClient - Disconnected from bookie channel [id: 0x0c5a059e, L:/10.244.81.13:44798 ! R:pulsar-cluster-bookie-2.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.178.149:3181]
02:39:10.784 [BookKeeperClientWorker-OrderedExecutor-0-0] INFO org.apache.bookkeeper.client.PendingReadOp - Error: Bookie handle is not available while reading L90190 E0 from bookie: pulsar-cluster-bookie-2.pulsar-cluster-bookie.pulsar.svc.cluster.local:3181
02:39:10.784 [BookKeeperClientWorker-OrderedExecutor-0-0] ERROR org.apache.bookkeeper.client.PendingReadOp - Read of ledger entry failed: L90190 E0-E0, Sent to [pulsar-cluster-bookie-3.pulsar-cluster-bookie.pulsar.svc.cluster.local:3181, pulsar-cluster-bookie-2.pulsar-cluster-bookie.pulsar.svc.cluster.local:3181], Heard from [] : bitset = {}, Error = 'Bookie handle is not available'. First unread entry is (-1, rc = null)
02:39:10.784 [BookKeeperClientWorker-OrderedExecutor-0-0] ERROR org.apache.bookkeeper.replication.ReplicationWorker - Received error: -8 while trying to read entry: 0 of ledger: 90190 in ReplicationWorker
02:39:10.784 [ReplicationWorker] ERROR org.apache.bookkeeper.replication.ReplicationWorker - Failed to read faulty entries, so giving up replicating ledgerFragment Fragment(LedgerID: 90190, FirstEntryID: 0[0], LastKnownEntryID: 0[0], Host: [pulsar-cluster-bookie-3.pulsar-cluster-bookie.pulsar.svc.cluster.local:3181, pulsar-cluster-bookie-2.pulsar-cluster-bookie.pulsar.svc.cluster.local:3181], Closed: true)
02:39:10.784 [bookkeeper-io-3-2] INFO org.apache.bookkeeper.proto.PerChannelBookieClient - Successfully connected to bookie: pulsar-cluster-bookie-3.pulsar-cluster-bookie.pulsar.svc.cluster.local:3181 [id: 0xacb8bca9, L:/10.244.81.13:45680 - R:pulsar-cluster-bookie-3.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.74.151:3181]
02:39:10.784 [bookkeeper-io-3-2] INFO org.apache.bookkeeper.proto.PerChannelBookieClient - connection [id: 0xacb8bca9, L:/10.244.81.13:45680 - R:pulsar-cluster-bookie-3.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.74.151:3181] authenticated as BookKeeperPrincipal{ANONYMOUS}
02:39:10.784 [bookkeeper-io-3-3] INFO org.apache.bookkeeper.proto.PerChannelBookieClient - Successfully connected to bookie: pulsar-cluster-bookie-2.pulsar-cluster-bookie.pulsar.svc.cluster.local:3181 [id: 0x42c42991, L:/10.244.81.13:44804 - R:pulsar-cluster-bookie-2.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.178.149:3181]
02:39:10.784 [bookkeeper-io-3-3] INFO org.apache.bookkeeper.proto.PerChannelBookieClient - connection [id: 0x42c42991, L:/10.244.81.13:44804 - R:pulsar-cluster-bookie-2.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.178.149:3181] authenticated as BookKeeperPrincipal{ANONYMOUS}
02:39:10.793 [bookkeeper-io-3-2] ERROR org.apache.bookkeeper.proto.PerChannelBookieClient - Corrupted frame received from bookie: pulsar-cluster-bookie-3.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.74.151:3181
02:39:10.793 [bookkeeper-io-3-2] INFO org.apache.bookkeeper.proto.PerChannelBookieClient - Disconnected from bookie channel [id: 0xacb8bca9, L:/10.244.81.13:45680 ! R:pulsar-cluster-bookie-3.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.74.151:3181]
02:39:10.799 [bookkeeper-io-3-3] ERROR org.apache.bookkeeper.proto.PerChannelBookieClient - Corrupted frame received from bookie: pulsar-cluster-bookie-2.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.178.149:3181
02:39:10.799 [bookkeeper-io-3-3] INFO org.apache.bookkeeper.proto.PerChannelBookieClient - Disconnected from bookie channel [id: 0x42c42991, L:/10.244.81.13:44804 ! R:pulsar-cluster-bookie-2.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.178.149:3181]
02:39:10.800 [bookkeeper-io-3-4] INFO org.apache.bookkeeper.proto.PerChannelBookieClient - Successfully connected to bookie: pulsar-cluster-bookie-3.pulsar-cluster-bookie.pulsar.svc.cluster.local:3181 [id: 0xda6ef22c, L:/10.244.81.13:45684 - R:pulsar-cluster-bookie-3.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.74.151:3181]
02:39:10.800 [bookkeeper-io-3-4] INFO org.apache.bookkeeper.proto.PerChannelBookieClient - connection [id: 0xda6ef22c, L:/10.244.81.13:45684 - R:pulsar-cluster-bookie-3.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.74.151:3181] authenticated as BookKeeperPrincipal{ANONYMOUS}
02:39:10.800 [bookkeeper-io-3-1] INFO org.apache.bookkeeper.proto.PerChannelBookieClient - Successfully connected to bookie: pulsar-cluster-bookie-2.pulsar-cluster-bookie.pulsar.svc.cluster.local:3181 [id: 0x7eb38ed7, L:/10.244.81.13:44808 - R:pulsar-cluster-bookie-2.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.178.149:3181]
02:39:10.800 [bookkeeper-io-3-1] INFO org.apache.bookkeeper.proto.PerChannelBookieClient - connection [id: 0x7eb38ed7, L:/10.244.81.13:44808 - R:pulsar-cluster-bookie-2.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.178.149:3181] authenticated as BookKeeperPrincipal{ANONYMOUS}
02:39:10.808 [bookkeeper-io-3-1] ERROR org.apache.bookkeeper.proto.PerChannelBookieClient - Corrupted frame received from bookie: pulsar-cluster-bookie-2.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.178.149:3181
02:39:10.808 [bookkeeper-io-3-1] INFO org.apache.bookkeeper.proto.PerChannelBookieClient - Disconnected from bookie channel [id: 0x7eb38ed7, L:/10.244.81.13:44808 ! R:pulsar-cluster-bookie-2.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.178.149:3181]
02:39:10.850 [bookkeeper-io-3-4] ERROR org.apache.bookkeeper.proto.PerChannelBookieClient - Corrupted frame received from bookie: pulsar-cluster-bookie-3.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.74.151:3181
02:39:10.850 [bookkeeper-io-3-4] INFO org.apache.bookkeeper.proto.PerChannelBookieClient - Disconnected from bookie channel [id: 0xda6ef22c, L:/10.244.81.13:45684 ! R:pulsar-cluster-bookie-3.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.74.151:3181]
02:39:10.850 [ReplicationWorker] ERROR org.apache.bookkeeper.replication.ReplicationWorker - ReplicationWorker failed to replicate Ledger : 90190 for 32 number of times, so deferring the ledger lock release by 300000 msecs
02:39:14.886 [main-EventThread] INFO org.apache.bookkeeper.meta.ZkLedgerUnderreplicationManager - Latch countdown due to ZK event: WatchedEvent state:SyncConnected type:NodeChildrenChanged path:/ledgers/underreplication/locks
02:39:14.888 [main-EventThread] INFO org.apache.bookkeeper.meta.ZkLedgerUnderreplicationManager - Latch countdown due to ZK event: WatchedEvent state:SyncConnected type:NodeChildrenChanged path:/ledgers/underreplication/locks
02:39:30.519 [main-EventThread] INFO org.apache.bookkeeper.meta.ZkLedgerUnderreplicationManager - Latch countdown due to ZK event: WatchedEvent state:SyncConnected type:NodeChildrenChanged path:/ledgers/underreplication/locks
02:39:30.522 [main-EventThread] INFO org.apache.bookkeeper.meta.ZkLedgerUnderreplicationManager - Latch countdown due to ZK event: WatchedEvent state:SyncConnected type:NodeChildrenChanged path:/ledgers/underreplication/locks
02:39:30.524 [bookkeeper-io-3-2] INFO org.apache.bookkeeper.proto.PerChannelBookieClient - Successfully connected to bookie: pulsar-cluster-bookie-3.pulsar-cluster-bookie.pulsar.svc.cluster.local:3181 [id: 0xa9665a84, L:/10.244.81.13:45976 - R:pulsar-cluster-bookie-3.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.74.151:3181]
02:39:30.524 [bookkeeper-io-3-2] INFO org.apache.bookkeeper.proto.PerChannelBookieClient - connection [id: 0xa9665a84, L:/10.244.81.13:45976 - R:pulsar-cluster-bookie-3.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.74.151:3181] authenticated as BookKeeperPrincipal{ANONYMOUS}
02:39:30.524 [bookkeeper-io-3-3] INFO org.apache.bookkeeper.proto.PerChannelBookieClient - Successfully connected to bookie: pulsar-cluster-bookie-2.pulsar-cluster-bookie.pulsar.svc.cluster.local:3181 [id: 0x0628fbc9, L:/10.244.81.13:45100 - R:pulsar-cluster-bookie-2.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.178.149:3181]
02:39:30.524 [bookkeeper-io-3-3] INFO org.apache.bookkeeper.proto.PerChannelBookieClient - connection [id: 0x0628fbc9, L:/10.244.81.13:45100 - R:pulsar-cluster-bookie-2.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.178.149:3181] authenticated as BookKeeperPrincipal{ANONYMOUS}
02:39:30.527 [ReplicationWorker] INFO org.apache.bookkeeper.client.DefaultBookieAddressResolver - Cannot resolve pulsar-cluster-bookie-4.pulsar-cluster-bookie.pulsar.svc.cluster.local:3181, bookie is unknown org.apache.bookkeeper.client.BKException$BKBookieHandleNotAvailableException: Bookie handle is not available
02:39:30.527 [ReplicationWorker] ERROR org.apache.bookkeeper.proto.PerChannelBookieClient - Cannot connect to pulsar-cluster-bookie-4.pulsar-cluster-bookie.pulsar.svc.cluster.local:3181 as endpoint resolution failed (probably bookie is down) err org.apache.bookkeeper.proto.BookieAddressResolver$BookieIdNotResolvedException: Cannot resolve bookieId pulsar-cluster-bookie-4.pulsar-cluster-bookie.pulsar.svc.cluster.local:3181, bookie does not exist or it is not running
02:39:30.527 [ReplicationWorker] ERROR org.apache.bookkeeper.proto.PerChannelBookieClient - Could not connect to bookie: null/pulsar-cluster-bookie-4.pulsar-cluster-bookie.pulsar.svc.cluster.local:3181, current state CONNECTING :
org.apache.bookkeeper.proto.BookieAddressResolver$BookieIdNotResolvedException: Cannot resolve bookieId pulsar-cluster-bookie-4.pulsar-cluster-bookie.pulsar.svc.cluster.local:3181, bookie does not exist or it is not running
at org.apache.bookkeeper.client.DefaultBookieAddressResolver.resolve(DefaultBookieAddressResolver.java:63) ~[org.apache.bookkeeper-bookkeeper-server-4.14.2.jar:4.14.2]
at org.apache.bookkeeper.proto.PerChannelBookieClient.connect(PerChannelBookieClient.java:531) [org.apache.bookkeeper-bookkeeper-server-4.14.2.jar:4.14.2]
at org.apache.bookkeeper.proto.PerChannelBookieClient.connectIfNeededAndDoOp(PerChannelBookieClient.java:657) [org.apache.bookkeeper-bookkeeper-server-4.14.2.jar:4.14.2]
at org.apache.bookkeeper.proto.DefaultPerChannelBookieClientPool.obtain(DefaultPerChannelBookieClientPool.java:121) [org.apache.bookkeeper-bookkeeper-server-4.14.2.jar:4.14.2]
at org.apache.bookkeeper.proto.DefaultPerChannelBookieClientPool.obtain(DefaultPerChannelBookieClientPool.java:116) [org.apache.bookkeeper-bookkeeper-server-4.14.2.jar:4.14.2]
at org.apache.bookkeeper.proto.BookieClientImpl.readEntry(BookieClientImpl.java:509) [org.apache.bookkeeper-bookkeeper-server-4.14.2.jar:4.14.2]
at org.apache.bookkeeper.proto.BookieClientImpl.readEntry(BookieClientImpl.java:495) [org.apache.bookkeeper-bookkeeper-server-4.14.2.jar:4.14.2]
at org.apache.bookkeeper.proto.BookieClientImpl.readEntry(BookieClientImpl.java:489) [org.apache.bookkeeper-bookkeeper-server-4.14.2.jar:4.14.2]
at org.apache.bookkeeper.client.PendingReadOp.sendReadTo(PendingReadOp.java:576) [org.apache.bookkeeper-bookkeeper-server-4.14.2.jar:4.14.2]
at org.apache.bookkeeper.client.PendingReadOp$SequenceReadRequest.sendNextRead(PendingReadOp.java:405) [org.apache.bookkeeper-bookkeeper-server-4.14.2.jar:4.14.2]
at org.apache.bookkeeper.client.PendingReadOp$SequenceReadRequest.read(PendingReadOp.java:386) [org.apache.bookkeeper-bookkeeper-server-4.14.2.jar:4.14.2]
at org.apache.bookkeeper.client.PendingReadOp.initiate(PendingReadOp.java:530) [org.apache.bookkeeper-bookkeeper-server-4.14.2.jar:4.14.2]
at org.apache.bookkeeper.client.PendingReadOp.safeRun(PendingReadOp.java:540) [org.apache.bookkeeper-bookkeeper-server-4.14.2.jar:4.14.2]
at org.apache.bookkeeper.common.util.SafeRunnable.run(SafeRunnable.java:36) [org.apache.bookkeeper-bookkeeper-common-4.14.2.jar:4.14.2]
at org.apache.bookkeeper.client.LedgerHandle.readEntriesInternalAsync(LedgerHandle.java:896) [org.apache.bookkeeper-bookkeeper-server-4.14.2.jar:4.14.2]
at org.apache.bookkeeper.client.LedgerHandle.asyncReadEntriesInternal(LedgerHandle.java:800) [org.apache.bookkeeper-bookkeeper-server-4.14.2.jar:4.14.2]
at org.apache.bookkeeper.client.LedgerHandle.asyncReadEntries(LedgerHandle.java:694) [org.apache.bookkeeper-bookkeeper-server-4.14.2.jar:4.14.2]
at org.apache.bookkeeper.replication.ReplicationWorker.tryReadingFaultyEntries(ReplicationWorker.java:326) [org.apache.bookkeeper-bookkeeper-server-4.14.2.jar:4.14.2]
at org.apache.bookkeeper.replication.ReplicationWorker.rereplicate(ReplicationWorker.java:378) [org.apache.bookkeeper-bookkeeper-server-4.14.2.jar:4.14.2]
at org.apache.bookkeeper.replication.ReplicationWorker.rereplicate(ReplicationWorker.java:277) [org.apache.bookkeeper-bookkeeper-server-4.14.2.jar:4.14.2]
at org.apache.bookkeeper.replication.ReplicationWorker.run(ReplicationWorker.java:238) [org.apache.bookkeeper-bookkeeper-server-4.14.2.jar:4.14.2]
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [io.netty-netty-common-4.1.66.Final.jar:4.1.66.Final]
at java.lang.Thread.run(Thread.java:829) [?:?]
Caused by: org.apache.bookkeeper.client.BKException$BKBookieHandleNotAvailableException: Bookie handle is not available
at org.apache.bookkeeper.discover.ZKRegistrationClient.getBookieServiceInfo(ZKRegistrationClient.java:248) ~[org.apache.bookkeeper-bookkeeper-server-4.14.2.jar:4.14.2]
at org.apache.bookkeeper.client.DefaultBookieAddressResolver.resolve(DefaultBookieAddressResolver.java:43) ~[org.apache.bookkeeper-bookkeeper-server-4.14.2.jar:4.14.2]
... 22 more
02:39:30.527 [BookKeeperClientWorker-OrderedExecutor-1-0] INFO org.apache.bookkeeper.client.PendingReadOp - Error: Bookie handle is not available while reading L92461 E2060 from bookie: pulsar-cluster-bookie-4.pulsar-cluster-bookie.pulsar.svc.cluster.local:3181
02:39:30.528 [ReplicationWorker] INFO org.apache.bookkeeper.client.DefaultBookieAddressResolver - Cannot resolve pulsar-cluster-bookie-4.pulsar-cluster-bookie.pulsar.svc.cluster.local:3181, bookie is unknown org.apache.bookkeeper.client.BKException$BKBookieHandleNotAvailableException: Bookie handle is not available
```
i notice the main error log:
ERROR org.apache.bookkeeper.proto.PerChannelBookieClient - Corrupted frame received from bookie:
ERROR org.apache.bookkeeper.client.PendingReadOp - Read of ledger entry failed: L95981 E210-E210, Sent to [pulsar-cluster-bookie-3.pulsar-cluster-bookie.pulsar.svc.cluster.local:3181, pulsar-cluster-bookie-2.pulsar-cluster-bookie.pulsar.svc.cluster.local:3181], Heard from [] : bitset = {}, Error = 'Bookie handle is not available'. First unread entry is (-1, rc = null)
the ledger id metadata
```
2:49:10.484 [main] INFO org.apache.bookkeeper.tools.cli.commands.client.LedgerMetaDataCommand - ledgerID: 95981
02:49:10.490 [main] INFO org.apache.bookkeeper.tools.cli.commands.client.LedgerMetaDataCommand - LedgerMetadata{formatVersion=3, ensembleSize=3, writeQuorumSize=2, ackQuorumSize=2, state=CLOSED, length=67946172, lastEntryId=2037, digestType=CRC32C, password=base64:, ensembles={0=[pulsar-cluster-bookie-3.pulsar-cluster-bookie.pulsar.svc.cluster.local:3181, pulsar-cluster-bookie-2.pulsar-cluster-bookie.pulsar.svc.cluster.local:3181, pulsar-cluster-bookie-4.pulsar-cluster-bookie.pulsar.svc.cluster.local:3181]}, customMetadata={component=base64:bWFuYWdlZC1sZWRnZXI=, pulsar/managed-ledger=base64:Y29sbGVjdC9kZWZhdWx0L3BlcnNpc3RlbnQvaW9hX2xvZy1wYXJ0aXRpb24tMg==, application=base64:cHVsc2Fy}}
```
**To Reproduce**
Steps to reproduce the behavior:
1. Go to '...'
2. Click on '....'
3. Scroll down to '....'
6. See error
**Expected behavior**
recovery log is normally
**Screenshots**
![image](https://user-images.githubusercontent.com/38367518/167060160-e59abcb3-61d7-42b6-a318-c32ad111b22a.png)
![image](https://user-images.githubusercontent.com/38367518/167060175-e8510807-eab2-4fb6-91d3-4ec0c95059fe.png)
**Desktop (please complete the following information):**
- OS: centos7
- pulsar: 2.8.1
**Additional context**
Add any other context about the problem here.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [pulsar] github-actions[bot] commented on issue #15464: auto recovery error log : Corrupted frame received from bookie, under replicated can not covery
Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on issue #15464:
URL: https://github.com/apache/pulsar/issues/15464#issuecomment-1166399225
The issue had no activity for 30 days, mark with Stale label.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [pulsar] ByrsH commented on issue #15464: auto recovery error log : Corrupted frame received from bookie, under replicated can not covery
Posted by GitBox <gi...@apache.org>.
ByrsH commented on issue #15464:
URL: https://github.com/apache/pulsar/issues/15464#issuecomment-1138420979
I had a similar problem,broker log :
```
[BookKeeperClientWorker-OrderedExecutor-1-0] ERROR org.apache.bookkeeper.client.PendingReadOp - Read of ledger entry failed: L173395 E0-E0, Sent to [pulsar-iot-bookie-1.pulsar-iot-bookie.pulsar-prod.svc.cluster.local:3181, pulsar-iot-bookie-0.pulsar-iot-bookie.pulsar-prod.svc.cluster.local:3181], Heard from [] : bitset = {}, Error = 'No such ledger exists on Bookies'. First unread entry is (-1, rc = null)
```
**Describe** :
there are a pulsar cluster in k8s,pulsar version 2.8.2。The purpose is replace new disk.
**operating steps**:
1、run bin/bookkeeper shell bookieformat -deleteCookie -f
2、restart pod
3、wait bookie auto sync data to new disk
business service run normal, but when restart business service, Throws the following exception:
```
Caused by: org.springframework.beans.BeanInstantiationException: Failed to instantiate [so.dian.clover.pulsar.PulsarProducer]: Factory method 'createPulsarProducer' threw exception; nested exception is org.apache.pulsar.client.api.PulsarClientException: java.util.concurrent.ExecutionException: org.apache.pulsar.client.api.PulsarClientException: org.apache.pulsar.broker.service.schema.exceptions.SchemaException: No such ledger exists on Bookies - ledger=173395 - operation=Failed to read entry - entry=0 caused by org.apache.pulsar.broker.service.schema.exceptions.SchemaException: No such ledger exists on Bookies - ledger=173395 - operation=Failed to read entry - entry=0
at org.springframework.beans.factory.support.SimpleInstantiationStrategy.instantiate(SimpleInstantiationStrategy.java:185)
at org.springframework.beans.factory.support.ConstructorResolver.instantiate(ConstructorResolver.java:651)
... 41 common frames omitted
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org