You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pulsar.apache.org by GitBox <gi...@apache.org> on 2022/05/06 03:07:45 UTC

[GitHub] [pulsar] kwenZh opened a new issue, #15464: auto recovery error log : Corrupted frame received from bookie, under replicated can not covery

kwenZh opened a new issue, #15464:
URL: https://github.com/apache/pulsar/issues/15464

   **Describe the bug**
   there are a  pulsar cluster in k8s
   
   We decided to migrate bookie to a new  bookie cluster,  
   1, we set auto recovery is enabled
   2. deploy a new bookie statefulset normally
   3. Reduced number of instances one by one .
   4. i can see recovery is working
   
   
    old bookie cluster has 5 pod (0-4), first i close  bookie-4
   but in latest , there are some ledgerId are not  migrate successful, 
   
   bin/bookkeeper shell listunderreplicated
   ```
   
   02:27:18.605 [main-EventThread] INFO  org.apache.bookkeeper.zookeeper.ZooKeeperWatcherBase - ZooKeeper client is connected now.
   02:27:18.720 [main] INFO  org.apache.bookkeeper.tools.cli.commands.autorecovery.ListUnderReplicatedCommand - 90129
   02:27:18.721 [main] INFO  org.apache.bookkeeper.tools.cli.commands.autorecovery.ListUnderReplicatedCommand -    Ctime : 1651748446769
   02:27:18.722 [main] INFO  org.apache.bookkeeper.tools.cli.commands.autorecovery.ListUnderReplicatedCommand - 93326
   02:27:18.723 [main] INFO  org.apache.bookkeeper.tools.cli.commands.autorecovery.ListUnderReplicatedCommand -    Ctime : 1651748452235
   02:27:18.724 [main] INFO  org.apache.bookkeeper.tools.cli.commands.autorecovery.ListUnderReplicatedCommand - 90190
   02:27:18.724 [main] INFO  org.apache.bookkeeper.tools.cli.commands.autorecovery.ListUnderReplicatedCommand -    Ctime : 1651748446993
   02:27:18.726 [main] INFO  org.apache.bookkeeper.tools.cli.commands.autorecovery.ListUnderReplicatedCommand - 93508
   02:27:18.726 [main] INFO  org.apache.bookkeeper.tools.cli.commands.autorecovery.ListUnderReplicatedCommand -    Ctime : 1651748452461
   02:27:18.728 [main] INFO  org.apache.bookkeeper.tools.cli.commands.autorecovery.ListUnderReplicatedCommand - 90955
   02:27:18.728 [main] INFO  org.apache.bookkeeper.tools.cli.commands.autorecovery.ListUnderReplicatedCommand -    Ctime : 1651748448866
   02:27:18.730 [main] INFO  org.apache.bookkeeper.tools.cli.commands.autorecovery.ListUnderReplicatedCommand - 95981
   02:27:18.730 [main] INFO  org.apache.bookkeeper.tools.cli.commands.autorecovery.ListUnderReplicatedCommand -    Ctime : 1651748457162
   02:27:18.732 [main] INFO  org.apache.bookkeeper.tools.cli.commands.autorecovery.ListUnderReplicatedCommand - 92099
   02:27:18.732 [main] INFO  org.apache.bookkeeper.tools.cli.commands.autorecovery.ListUnderReplicatedCommand -    Ctime : 1651748450587
   02:27:18.734 [main] INFO  org.apache.bookkeeper.tools.cli.commands.autorecovery.ListUnderReplicatedCommand - 92461
   02:27:18.734 [main] INFO  org.apache.bookkeeper.tools.cli.commands.autorecovery.ListUnderReplicatedCommand -    Ctime : 1651748451050
   02:27:18.735 [main] INFO  org.apache.bookkeeper.tools.cli.commands.autorecovery.ListUnderReplicatedCommand - 98373
   02:27:18.736 [main] INFO  org.apache.bookkeeper.tools.cli.commands.autorecovery.ListUnderReplicatedCommand -    Ctime : 1651748426002
   02:27:18.737 [main] INFO  org.apache.bookkeeper.tools.cli.commands.autorecovery.ListUnderReplicatedCommand - 98613
   02:27:18.737 [main] INFO  org.apache.bookkeeper.tools.cli.commands.autorecovery.ListUnderReplicatedCommand -    Ctime : 1651748426602
   02:27:18.739 [main-SendThread(pulsar-cluster-zookeeper:2181)] WARN  org.apache.zookeeper.ClientCnxn - An exception was thrown while closing send thread for session 0x2000018322a00a4.
   org.apache.zookeeper.ClientCnxn$EndOfStreamException: Unable to read additional data from server sessionid 0x2000018322a00a4, likely server has closed socket
           at org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:77) ~[org.apache.zookeeper-zookeeper-3.6.3.jar:3.6.3]
           at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350) ~[org.apache.zookeeper-zookeeper-3.6.3.jar:3.6.3]
           at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1290) [org.apache.zookeeper-zookeeper-3.6.3.jar:3.6.3]
   02:27:18.854 [main-EventThread] INFO  org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x2000018322a00a4
   02:27:18.854 [main] INFO  org.apache.zookeeper.ZooKeeper - Session: 0x2000018322a00a4 closed
   
   ```
   
   auto recovery log:
   ```
   02:37:31.367 [main-EventThread] INFO  org.apache.bookkeeper.meta.ZkLedgerUnderreplicationManager - Latch countdown due to ZK event: WatchedEvent state:SyncConnected type:NodeChildrenChanged path:/ledgers/underreplication/locks
   02:37:31.369 [main-EventThread] INFO  org.apache.bookkeeper.meta.ZkLedgerUnderreplicationManager - Latch countdown due to ZK event: WatchedEvent state:SyncConnected type:NodeChildrenChanged path:/ledgers/underreplication/locks
   02:38:47.113 [main-EventThread] INFO  org.apache.bookkeeper.meta.ZkLedgerUnderreplicationManager - Latch countdown due to ZK event: WatchedEvent state:SyncConnected type:NodeChildrenChanged path:/ledgers/underreplication/locks
   02:38:47.115 [main-EventThread] INFO  org.apache.bookkeeper.meta.ZkLedgerUnderreplicationManager - Latch countdown due to ZK event: WatchedEvent state:SyncConnected type:NodeChildrenChanged path:/ledgers/underreplication/locks
   02:39:10.731 [main-EventThread] INFO  org.apache.bookkeeper.meta.ZkLedgerUnderreplicationManager - Latch countdown due to ZK event: WatchedEvent state:SyncConnected type:NodeChildrenChanged path:/ledgers/underreplication/locks
   02:39:10.734 [main-EventThread] INFO  org.apache.bookkeeper.meta.ZkLedgerUnderreplicationManager - Latch countdown due to ZK event: WatchedEvent state:SyncConnected type:NodeChildrenChanged path:/ledgers/underreplication/locks
   02:39:10.736 [bookkeeper-io-3-4] INFO  org.apache.bookkeeper.proto.PerChannelBookieClient - Successfully connected to bookie: pulsar-cluster-bookie-3.pulsar-cluster-bookie.pulsar.svc.cluster.local:3181 [id: 0xb94a56fe, L:/10.244.81.13:45666 - R:pulsar-cluster-bookie-3.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.74.151:3181]
   02:39:10.736 [bookkeeper-io-3-4] INFO  org.apache.bookkeeper.proto.PerChannelBookieClient - connection [id: 0xb94a56fe, L:/10.244.81.13:45666 - R:pulsar-cluster-bookie-3.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.74.151:3181] authenticated as BookKeeperPrincipal{ANONYMOUS}
   02:39:10.737 [bookkeeper-io-3-1] INFO  org.apache.bookkeeper.proto.PerChannelBookieClient - Successfully connected to bookie: pulsar-cluster-bookie-2.pulsar-cluster-bookie.pulsar.svc.cluster.local:3181 [id: 0x0030dc6a, L:/10.244.81.13:44790 - R:pulsar-cluster-bookie-2.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.178.149:3181]
   02:39:10.737 [bookkeeper-io-3-1] INFO  org.apache.bookkeeper.proto.PerChannelBookieClient - connection [id: 0x0030dc6a, L:/10.244.81.13:44790 - R:pulsar-cluster-bookie-2.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.178.149:3181] authenticated as BookKeeperPrincipal{ANONYMOUS}
   02:39:10.747 [bookkeeper-io-3-4] ERROR org.apache.bookkeeper.proto.PerChannelBookieClient - Corrupted frame received from bookie: pulsar-cluster-bookie-3.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.74.151:3181
   02:39:10.747 [bookkeeper-io-3-1] ERROR org.apache.bookkeeper.proto.PerChannelBookieClient - Corrupted frame received from bookie: pulsar-cluster-bookie-2.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.178.149:3181
   02:39:10.747 [bookkeeper-io-3-4] INFO  org.apache.bookkeeper.proto.PerChannelBookieClient - Disconnected from bookie channel [id: 0xb94a56fe, L:/10.244.81.13:45666 ! R:pulsar-cluster-bookie-3.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.74.151:3181]
   02:39:10.747 [bookkeeper-io-3-1] INFO  org.apache.bookkeeper.proto.PerChannelBookieClient - Disconnected from bookie channel [id: 0x0030dc6a, L:/10.244.81.13:44790 ! R:pulsar-cluster-bookie-2.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.178.149:3181]
   02:39:10.748 [bookkeeper-io-3-2] INFO  org.apache.bookkeeper.proto.PerChannelBookieClient - Successfully connected to bookie: pulsar-cluster-bookie-3.pulsar-cluster-bookie.pulsar.svc.cluster.local:3181 [id: 0xd8237aad, L:/10.244.81.13:45670 - R:pulsar-cluster-bookie-3.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.74.151:3181]
   02:39:10.748 [bookkeeper-io-3-3] INFO  org.apache.bookkeeper.proto.PerChannelBookieClient - Successfully connected to bookie: pulsar-cluster-bookie-2.pulsar-cluster-bookie.pulsar.svc.cluster.local:3181 [id: 0x71fcf2a7, L:/10.244.81.13:44794 - R:pulsar-cluster-bookie-2.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.178.149:3181]
   02:39:10.748 [bookkeeper-io-3-2] INFO  org.apache.bookkeeper.proto.PerChannelBookieClient - connection [id: 0xd8237aad, L:/10.244.81.13:45670 - R:pulsar-cluster-bookie-3.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.74.151:3181] authenticated as BookKeeperPrincipal{ANONYMOUS}
   02:39:10.748 [bookkeeper-io-3-3] INFO  org.apache.bookkeeper.proto.PerChannelBookieClient - connection [id: 0x71fcf2a7, L:/10.244.81.13:44794 - R:pulsar-cluster-bookie-2.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.178.149:3181] authenticated as BookKeeperPrincipal{ANONYMOUS}
   02:39:10.763 [bookkeeper-io-3-2] ERROR org.apache.bookkeeper.proto.PerChannelBookieClient - Corrupted frame received from bookie: pulsar-cluster-bookie-3.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.74.151:3181
   02:39:10.763 [bookkeeper-io-3-2] INFO  org.apache.bookkeeper.proto.PerChannelBookieClient - Disconnected from bookie channel [id: 0xd8237aad, L:/10.244.81.13:45670 ! R:pulsar-cluster-bookie-3.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.74.151:3181]
   02:39:10.764 [bookkeeper-io-3-3] ERROR org.apache.bookkeeper.proto.PerChannelBookieClient - Corrupted frame received from bookie: pulsar-cluster-bookie-2.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.178.149:3181
   02:39:10.764 [bookkeeper-io-3-3] INFO  org.apache.bookkeeper.proto.PerChannelBookieClient - Disconnected from bookie channel [id: 0x71fcf2a7, L:/10.244.81.13:44794 ! R:pulsar-cluster-bookie-2.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.178.149:3181]
   02:39:10.765 [bookkeeper-io-3-4] INFO  org.apache.bookkeeper.proto.PerChannelBookieClient - Successfully connected to bookie: pulsar-cluster-bookie-3.pulsar-cluster-bookie.pulsar.svc.cluster.local:3181 [id: 0x7021b4ff, L:/10.244.81.13:45674 - R:pulsar-cluster-bookie-3.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.74.151:3181]
   02:39:10.765 [bookkeeper-io-3-4] INFO  org.apache.bookkeeper.proto.PerChannelBookieClient - connection [id: 0x7021b4ff, L:/10.244.81.13:45674 - R:pulsar-cluster-bookie-3.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.74.151:3181] authenticated as BookKeeperPrincipal{ANONYMOUS}
   02:39:10.773 [bookkeeper-io-3-4] ERROR org.apache.bookkeeper.proto.PerChannelBookieClient - Corrupted frame received from bookie: pulsar-cluster-bookie-3.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.74.151:3181
   02:39:10.773 [bookkeeper-io-3-4] INFO  org.apache.bookkeeper.proto.PerChannelBookieClient - Disconnected from bookie channel [id: 0x7021b4ff, L:/10.244.81.13:45674 ! R:pulsar-cluster-bookie-3.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.74.151:3181]
   02:39:10.773 [BookKeeperClientWorker-OrderedExecutor-0-0] INFO  org.apache.bookkeeper.client.PendingReadOp - Error: Bookie handle is not available while reading L90190 E0 from bookie: pulsar-cluster-bookie-3.pulsar-cluster-bookie.pulsar.svc.cluster.local:3181
   02:39:10.774 [bookkeeper-io-3-1] INFO  org.apache.bookkeeper.proto.PerChannelBookieClient - Successfully connected to bookie: pulsar-cluster-bookie-2.pulsar-cluster-bookie.pulsar.svc.cluster.local:3181 [id: 0x0c5a059e, L:/10.244.81.13:44798 - R:pulsar-cluster-bookie-2.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.178.149:3181]
   02:39:10.774 [bookkeeper-io-3-1] INFO  org.apache.bookkeeper.proto.PerChannelBookieClient - connection [id: 0x0c5a059e, L:/10.244.81.13:44798 - R:pulsar-cluster-bookie-2.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.178.149:3181] authenticated as BookKeeperPrincipal{ANONYMOUS}
   02:39:10.783 [bookkeeper-io-3-1] ERROR org.apache.bookkeeper.proto.PerChannelBookieClient - Corrupted frame received from bookie: pulsar-cluster-bookie-2.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.178.149:3181
   02:39:10.783 [bookkeeper-io-3-1] INFO  org.apache.bookkeeper.proto.PerChannelBookieClient - Disconnected from bookie channel [id: 0x0c5a059e, L:/10.244.81.13:44798 ! R:pulsar-cluster-bookie-2.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.178.149:3181]
   02:39:10.784 [BookKeeperClientWorker-OrderedExecutor-0-0] INFO  org.apache.bookkeeper.client.PendingReadOp - Error: Bookie handle is not available while reading L90190 E0 from bookie: pulsar-cluster-bookie-2.pulsar-cluster-bookie.pulsar.svc.cluster.local:3181
   02:39:10.784 [BookKeeperClientWorker-OrderedExecutor-0-0] ERROR org.apache.bookkeeper.client.PendingReadOp - Read of ledger entry failed: L90190 E0-E0, Sent to [pulsar-cluster-bookie-3.pulsar-cluster-bookie.pulsar.svc.cluster.local:3181, pulsar-cluster-bookie-2.pulsar-cluster-bookie.pulsar.svc.cluster.local:3181], Heard from [] : bitset = {}, Error = 'Bookie handle is not available'. First unread entry is (-1, rc = null)
   02:39:10.784 [BookKeeperClientWorker-OrderedExecutor-0-0] ERROR org.apache.bookkeeper.replication.ReplicationWorker - Received error: -8 while trying to read entry: 0 of ledger: 90190 in ReplicationWorker
   02:39:10.784 [ReplicationWorker] ERROR org.apache.bookkeeper.replication.ReplicationWorker - Failed to read faulty entries, so giving up replicating ledgerFragment Fragment(LedgerID: 90190, FirstEntryID: 0[0], LastKnownEntryID: 0[0], Host: [pulsar-cluster-bookie-3.pulsar-cluster-bookie.pulsar.svc.cluster.local:3181, pulsar-cluster-bookie-2.pulsar-cluster-bookie.pulsar.svc.cluster.local:3181], Closed: true)
   02:39:10.784 [bookkeeper-io-3-2] INFO  org.apache.bookkeeper.proto.PerChannelBookieClient - Successfully connected to bookie: pulsar-cluster-bookie-3.pulsar-cluster-bookie.pulsar.svc.cluster.local:3181 [id: 0xacb8bca9, L:/10.244.81.13:45680 - R:pulsar-cluster-bookie-3.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.74.151:3181]
   02:39:10.784 [bookkeeper-io-3-2] INFO  org.apache.bookkeeper.proto.PerChannelBookieClient - connection [id: 0xacb8bca9, L:/10.244.81.13:45680 - R:pulsar-cluster-bookie-3.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.74.151:3181] authenticated as BookKeeperPrincipal{ANONYMOUS}
   02:39:10.784 [bookkeeper-io-3-3] INFO  org.apache.bookkeeper.proto.PerChannelBookieClient - Successfully connected to bookie: pulsar-cluster-bookie-2.pulsar-cluster-bookie.pulsar.svc.cluster.local:3181 [id: 0x42c42991, L:/10.244.81.13:44804 - R:pulsar-cluster-bookie-2.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.178.149:3181]
   02:39:10.784 [bookkeeper-io-3-3] INFO  org.apache.bookkeeper.proto.PerChannelBookieClient - connection [id: 0x42c42991, L:/10.244.81.13:44804 - R:pulsar-cluster-bookie-2.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.178.149:3181] authenticated as BookKeeperPrincipal{ANONYMOUS}
   02:39:10.793 [bookkeeper-io-3-2] ERROR org.apache.bookkeeper.proto.PerChannelBookieClient - Corrupted frame received from bookie: pulsar-cluster-bookie-3.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.74.151:3181
   02:39:10.793 [bookkeeper-io-3-2] INFO  org.apache.bookkeeper.proto.PerChannelBookieClient - Disconnected from bookie channel [id: 0xacb8bca9, L:/10.244.81.13:45680 ! R:pulsar-cluster-bookie-3.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.74.151:3181]
   02:39:10.799 [bookkeeper-io-3-3] ERROR org.apache.bookkeeper.proto.PerChannelBookieClient - Corrupted frame received from bookie: pulsar-cluster-bookie-2.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.178.149:3181
   02:39:10.799 [bookkeeper-io-3-3] INFO  org.apache.bookkeeper.proto.PerChannelBookieClient - Disconnected from bookie channel [id: 0x42c42991, L:/10.244.81.13:44804 ! R:pulsar-cluster-bookie-2.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.178.149:3181]
   02:39:10.800 [bookkeeper-io-3-4] INFO  org.apache.bookkeeper.proto.PerChannelBookieClient - Successfully connected to bookie: pulsar-cluster-bookie-3.pulsar-cluster-bookie.pulsar.svc.cluster.local:3181 [id: 0xda6ef22c, L:/10.244.81.13:45684 - R:pulsar-cluster-bookie-3.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.74.151:3181]
   02:39:10.800 [bookkeeper-io-3-4] INFO  org.apache.bookkeeper.proto.PerChannelBookieClient - connection [id: 0xda6ef22c, L:/10.244.81.13:45684 - R:pulsar-cluster-bookie-3.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.74.151:3181] authenticated as BookKeeperPrincipal{ANONYMOUS}
   02:39:10.800 [bookkeeper-io-3-1] INFO  org.apache.bookkeeper.proto.PerChannelBookieClient - Successfully connected to bookie: pulsar-cluster-bookie-2.pulsar-cluster-bookie.pulsar.svc.cluster.local:3181 [id: 0x7eb38ed7, L:/10.244.81.13:44808 - R:pulsar-cluster-bookie-2.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.178.149:3181]
   02:39:10.800 [bookkeeper-io-3-1] INFO  org.apache.bookkeeper.proto.PerChannelBookieClient - connection [id: 0x7eb38ed7, L:/10.244.81.13:44808 - R:pulsar-cluster-bookie-2.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.178.149:3181] authenticated as BookKeeperPrincipal{ANONYMOUS}
   02:39:10.808 [bookkeeper-io-3-1] ERROR org.apache.bookkeeper.proto.PerChannelBookieClient - Corrupted frame received from bookie: pulsar-cluster-bookie-2.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.178.149:3181
   02:39:10.808 [bookkeeper-io-3-1] INFO  org.apache.bookkeeper.proto.PerChannelBookieClient - Disconnected from bookie channel [id: 0x7eb38ed7, L:/10.244.81.13:44808 ! R:pulsar-cluster-bookie-2.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.178.149:3181]
   02:39:10.850 [bookkeeper-io-3-4] ERROR org.apache.bookkeeper.proto.PerChannelBookieClient - Corrupted frame received from bookie: pulsar-cluster-bookie-3.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.74.151:3181
   02:39:10.850 [bookkeeper-io-3-4] INFO  org.apache.bookkeeper.proto.PerChannelBookieClient - Disconnected from bookie channel [id: 0xda6ef22c, L:/10.244.81.13:45684 ! R:pulsar-cluster-bookie-3.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.74.151:3181]
   02:39:10.850 [ReplicationWorker] ERROR org.apache.bookkeeper.replication.ReplicationWorker - ReplicationWorker failed to replicate Ledger : 90190 for 32 number of times, so deferring the ledger lock release by 300000 msecs
   02:39:14.886 [main-EventThread] INFO  org.apache.bookkeeper.meta.ZkLedgerUnderreplicationManager - Latch countdown due to ZK event: WatchedEvent state:SyncConnected type:NodeChildrenChanged path:/ledgers/underreplication/locks
   02:39:14.888 [main-EventThread] INFO  org.apache.bookkeeper.meta.ZkLedgerUnderreplicationManager - Latch countdown due to ZK event: WatchedEvent state:SyncConnected type:NodeChildrenChanged path:/ledgers/underreplication/locks
   02:39:30.519 [main-EventThread] INFO  org.apache.bookkeeper.meta.ZkLedgerUnderreplicationManager - Latch countdown due to ZK event: WatchedEvent state:SyncConnected type:NodeChildrenChanged path:/ledgers/underreplication/locks
   02:39:30.522 [main-EventThread] INFO  org.apache.bookkeeper.meta.ZkLedgerUnderreplicationManager - Latch countdown due to ZK event: WatchedEvent state:SyncConnected type:NodeChildrenChanged path:/ledgers/underreplication/locks
   02:39:30.524 [bookkeeper-io-3-2] INFO  org.apache.bookkeeper.proto.PerChannelBookieClient - Successfully connected to bookie: pulsar-cluster-bookie-3.pulsar-cluster-bookie.pulsar.svc.cluster.local:3181 [id: 0xa9665a84, L:/10.244.81.13:45976 - R:pulsar-cluster-bookie-3.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.74.151:3181]
   02:39:30.524 [bookkeeper-io-3-2] INFO  org.apache.bookkeeper.proto.PerChannelBookieClient - connection [id: 0xa9665a84, L:/10.244.81.13:45976 - R:pulsar-cluster-bookie-3.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.74.151:3181] authenticated as BookKeeperPrincipal{ANONYMOUS}
   02:39:30.524 [bookkeeper-io-3-3] INFO  org.apache.bookkeeper.proto.PerChannelBookieClient - Successfully connected to bookie: pulsar-cluster-bookie-2.pulsar-cluster-bookie.pulsar.svc.cluster.local:3181 [id: 0x0628fbc9, L:/10.244.81.13:45100 - R:pulsar-cluster-bookie-2.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.178.149:3181]
   02:39:30.524 [bookkeeper-io-3-3] INFO  org.apache.bookkeeper.proto.PerChannelBookieClient - connection [id: 0x0628fbc9, L:/10.244.81.13:45100 - R:pulsar-cluster-bookie-2.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.178.149:3181] authenticated as BookKeeperPrincipal{ANONYMOUS}
   02:39:30.527 [ReplicationWorker] INFO  org.apache.bookkeeper.client.DefaultBookieAddressResolver - Cannot resolve pulsar-cluster-bookie-4.pulsar-cluster-bookie.pulsar.svc.cluster.local:3181, bookie is unknown org.apache.bookkeeper.client.BKException$BKBookieHandleNotAvailableException: Bookie handle is not available
   02:39:30.527 [ReplicationWorker] ERROR org.apache.bookkeeper.proto.PerChannelBookieClient - Cannot connect to pulsar-cluster-bookie-4.pulsar-cluster-bookie.pulsar.svc.cluster.local:3181 as endpoint resolution failed (probably bookie is down) err org.apache.bookkeeper.proto.BookieAddressResolver$BookieIdNotResolvedException: Cannot resolve bookieId pulsar-cluster-bookie-4.pulsar-cluster-bookie.pulsar.svc.cluster.local:3181, bookie does not exist or it is not running
   02:39:30.527 [ReplicationWorker] ERROR org.apache.bookkeeper.proto.PerChannelBookieClient - Could not connect to bookie: null/pulsar-cluster-bookie-4.pulsar-cluster-bookie.pulsar.svc.cluster.local:3181, current state CONNECTING :
   org.apache.bookkeeper.proto.BookieAddressResolver$BookieIdNotResolvedException: Cannot resolve bookieId pulsar-cluster-bookie-4.pulsar-cluster-bookie.pulsar.svc.cluster.local:3181, bookie does not exist or it is not running
           at org.apache.bookkeeper.client.DefaultBookieAddressResolver.resolve(DefaultBookieAddressResolver.java:63) ~[org.apache.bookkeeper-bookkeeper-server-4.14.2.jar:4.14.2]
           at org.apache.bookkeeper.proto.PerChannelBookieClient.connect(PerChannelBookieClient.java:531) [org.apache.bookkeeper-bookkeeper-server-4.14.2.jar:4.14.2]
           at org.apache.bookkeeper.proto.PerChannelBookieClient.connectIfNeededAndDoOp(PerChannelBookieClient.java:657) [org.apache.bookkeeper-bookkeeper-server-4.14.2.jar:4.14.2]
           at org.apache.bookkeeper.proto.DefaultPerChannelBookieClientPool.obtain(DefaultPerChannelBookieClientPool.java:121) [org.apache.bookkeeper-bookkeeper-server-4.14.2.jar:4.14.2]
           at org.apache.bookkeeper.proto.DefaultPerChannelBookieClientPool.obtain(DefaultPerChannelBookieClientPool.java:116) [org.apache.bookkeeper-bookkeeper-server-4.14.2.jar:4.14.2]
           at org.apache.bookkeeper.proto.BookieClientImpl.readEntry(BookieClientImpl.java:509) [org.apache.bookkeeper-bookkeeper-server-4.14.2.jar:4.14.2]
           at org.apache.bookkeeper.proto.BookieClientImpl.readEntry(BookieClientImpl.java:495) [org.apache.bookkeeper-bookkeeper-server-4.14.2.jar:4.14.2]
           at org.apache.bookkeeper.proto.BookieClientImpl.readEntry(BookieClientImpl.java:489) [org.apache.bookkeeper-bookkeeper-server-4.14.2.jar:4.14.2]
           at org.apache.bookkeeper.client.PendingReadOp.sendReadTo(PendingReadOp.java:576) [org.apache.bookkeeper-bookkeeper-server-4.14.2.jar:4.14.2]
           at org.apache.bookkeeper.client.PendingReadOp$SequenceReadRequest.sendNextRead(PendingReadOp.java:405) [org.apache.bookkeeper-bookkeeper-server-4.14.2.jar:4.14.2]
           at org.apache.bookkeeper.client.PendingReadOp$SequenceReadRequest.read(PendingReadOp.java:386) [org.apache.bookkeeper-bookkeeper-server-4.14.2.jar:4.14.2]
           at org.apache.bookkeeper.client.PendingReadOp.initiate(PendingReadOp.java:530) [org.apache.bookkeeper-bookkeeper-server-4.14.2.jar:4.14.2]
           at org.apache.bookkeeper.client.PendingReadOp.safeRun(PendingReadOp.java:540) [org.apache.bookkeeper-bookkeeper-server-4.14.2.jar:4.14.2]
           at org.apache.bookkeeper.common.util.SafeRunnable.run(SafeRunnable.java:36) [org.apache.bookkeeper-bookkeeper-common-4.14.2.jar:4.14.2]
           at org.apache.bookkeeper.client.LedgerHandle.readEntriesInternalAsync(LedgerHandle.java:896) [org.apache.bookkeeper-bookkeeper-server-4.14.2.jar:4.14.2]
           at org.apache.bookkeeper.client.LedgerHandle.asyncReadEntriesInternal(LedgerHandle.java:800) [org.apache.bookkeeper-bookkeeper-server-4.14.2.jar:4.14.2]
           at org.apache.bookkeeper.client.LedgerHandle.asyncReadEntries(LedgerHandle.java:694) [org.apache.bookkeeper-bookkeeper-server-4.14.2.jar:4.14.2]
           at org.apache.bookkeeper.replication.ReplicationWorker.tryReadingFaultyEntries(ReplicationWorker.java:326) [org.apache.bookkeeper-bookkeeper-server-4.14.2.jar:4.14.2]
           at org.apache.bookkeeper.replication.ReplicationWorker.rereplicate(ReplicationWorker.java:378) [org.apache.bookkeeper-bookkeeper-server-4.14.2.jar:4.14.2]
           at org.apache.bookkeeper.replication.ReplicationWorker.rereplicate(ReplicationWorker.java:277) [org.apache.bookkeeper-bookkeeper-server-4.14.2.jar:4.14.2]
           at org.apache.bookkeeper.replication.ReplicationWorker.run(ReplicationWorker.java:238) [org.apache.bookkeeper-bookkeeper-server-4.14.2.jar:4.14.2]
           at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [io.netty-netty-common-4.1.66.Final.jar:4.1.66.Final]
           at java.lang.Thread.run(Thread.java:829) [?:?]
   Caused by: org.apache.bookkeeper.client.BKException$BKBookieHandleNotAvailableException: Bookie handle is not available
           at org.apache.bookkeeper.discover.ZKRegistrationClient.getBookieServiceInfo(ZKRegistrationClient.java:248) ~[org.apache.bookkeeper-bookkeeper-server-4.14.2.jar:4.14.2]
           at org.apache.bookkeeper.client.DefaultBookieAddressResolver.resolve(DefaultBookieAddressResolver.java:43) ~[org.apache.bookkeeper-bookkeeper-server-4.14.2.jar:4.14.2]
           ... 22 more
   02:39:30.527 [BookKeeperClientWorker-OrderedExecutor-1-0] INFO  org.apache.bookkeeper.client.PendingReadOp - Error: Bookie handle is not available while reading L92461 E2060 from bookie: pulsar-cluster-bookie-4.pulsar-cluster-bookie.pulsar.svc.cluster.local:3181
   02:39:30.528 [ReplicationWorker] INFO  org.apache.bookkeeper.client.DefaultBookieAddressResolver - Cannot resolve pulsar-cluster-bookie-4.pulsar-cluster-bookie.pulsar.svc.cluster.local:3181, bookie is unknown org.apache.bookkeeper.client.BKException$BKBookieHandleNotAvailableException: Bookie handle is not available
   
   ```
   
   i notice the main error log:  
    ERROR org.apache.bookkeeper.proto.PerChannelBookieClient - Corrupted frame received from bookie: 
    ERROR org.apache.bookkeeper.client.PendingReadOp - Read of ledger entry failed: L95981 E210-E210, Sent to [pulsar-cluster-bookie-3.pulsar-cluster-bookie.pulsar.svc.cluster.local:3181, pulsar-cluster-bookie-2.pulsar-cluster-bookie.pulsar.svc.cluster.local:3181], Heard from [] : bitset = {}, Error = 'Bookie handle is not available'. First unread entry is (-1, rc = null)
   
   
   the ledger id  metadata
   ```
   2:49:10.484 [main] INFO  org.apache.bookkeeper.tools.cli.commands.client.LedgerMetaDataCommand - ledgerID: 95981
   02:49:10.490 [main] INFO  org.apache.bookkeeper.tools.cli.commands.client.LedgerMetaDataCommand - LedgerMetadata{formatVersion=3, ensembleSize=3, writeQuorumSize=2, ackQuorumSize=2, state=CLOSED, length=67946172, lastEntryId=2037, digestType=CRC32C, password=base64:, ensembles={0=[pulsar-cluster-bookie-3.pulsar-cluster-bookie.pulsar.svc.cluster.local:3181, pulsar-cluster-bookie-2.pulsar-cluster-bookie.pulsar.svc.cluster.local:3181, pulsar-cluster-bookie-4.pulsar-cluster-bookie.pulsar.svc.cluster.local:3181]}, customMetadata={component=base64:bWFuYWdlZC1sZWRnZXI=, pulsar/managed-ledger=base64:Y29sbGVjdC9kZWZhdWx0L3BlcnNpc3RlbnQvaW9hX2xvZy1wYXJ0aXRpb24tMg==, application=base64:cHVsc2Fy}}
   
   ```
   
   
   
   **To Reproduce**
   Steps to reproduce the behavior:
   1. Go to '...'
   2. Click on '....'
   3. Scroll down to '....'
   6. See error
   
   **Expected behavior**
   recovery log is normally
   
   **Screenshots**
   
   ![image](https://user-images.githubusercontent.com/38367518/167060160-e59abcb3-61d7-42b6-a318-c32ad111b22a.png)
   
   ![image](https://user-images.githubusercontent.com/38367518/167060175-e8510807-eab2-4fb6-91d3-4ec0c95059fe.png)
   
   
   
   **Desktop (please complete the following information):**
    - OS: centos7
    - pulsar: 2.8.1
   
   **Additional context**
   Add any other context about the problem here.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] github-actions[bot] commented on issue #15464: auto recovery error log : Corrupted frame received from bookie, under replicated can not covery

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on issue #15464:
URL: https://github.com/apache/pulsar/issues/15464#issuecomment-1166399225

   The issue had no activity for 30 days, mark with Stale label.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] ByrsH commented on issue #15464: auto recovery error log : Corrupted frame received from bookie, under replicated can not covery

Posted by GitBox <gi...@apache.org>.
ByrsH commented on issue #15464:
URL: https://github.com/apache/pulsar/issues/15464#issuecomment-1138420979

   I had a similar problem,broker log :
   ```
   [BookKeeperClientWorker-OrderedExecutor-1-0] ERROR org.apache.bookkeeper.client.PendingReadOp - Read of ledger entry failed: L173395 E0-E0, Sent to [pulsar-iot-bookie-1.pulsar-iot-bookie.pulsar-prod.svc.cluster.local:3181, pulsar-iot-bookie-0.pulsar-iot-bookie.pulsar-prod.svc.cluster.local:3181], Heard from [] : bitset = {}, Error = 'No such ledger exists on Bookies'. First unread entry is (-1, rc = null)
   ```
   
   **Describe** :
   there are a pulsar cluster in k8s,pulsar version 2.8.2。The purpose is replace new disk.
   
   **operating steps**:
   1、run bin/bookkeeper shell bookieformat -deleteCookie -f
   2、restart pod 
   3、wait bookie auto sync data to new disk
   
   business service run normal, but when restart business service,  Throws the following exception:
   ```
   Caused by: org.springframework.beans.BeanInstantiationException: Failed to instantiate [so.dian.clover.pulsar.PulsarProducer]: Factory method 'createPulsarProducer' threw exception; nested exception is org.apache.pulsar.client.api.PulsarClientException: java.util.concurrent.ExecutionException: org.apache.pulsar.client.api.PulsarClientException: org.apache.pulsar.broker.service.schema.exceptions.SchemaException: No such ledger exists on Bookies -  ledger=173395 - operation=Failed to read entry - entry=0 caused by org.apache.pulsar.broker.service.schema.exceptions.SchemaException: No such ledger exists on Bookies -  ledger=173395 - operation=Failed to read entry - entry=0
           at org.springframework.beans.factory.support.SimpleInstantiationStrategy.instantiate(SimpleInstantiationStrategy.java:185)
           at org.springframework.beans.factory.support.ConstructorResolver.instantiate(ConstructorResolver.java:651)
           ... 41 common frames omitted
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org