You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pulsar.apache.org by GitBox <gi...@apache.org> on 2022/03/07 04:54:53 UTC

[GitHub] [pulsar] wolfstudy opened a new issue #14573: Inconsistent Bookie ledgers and under replicate metadata in Bookie

wolfstudy opened a new issue #14573:
URL: https://github.com/apache/pulsar/issues/14573


   **Describe the bug**
   
   
   Check Bookie node log info, we can see the following error message:
   
   ```
   20:15:20.948 [BookKeeperClientWorker-OrderedExecutor-24-0] ERROR org.apache.bookkeeper.replication.ReplicationWorker - Received error: -7 while trying to read entry: 19 of ledger: 6522968 in ReplicationWorker
   ```
   
   At this point we use the CLI command provided by bookie to check the status of the current ledger
   
   ```
   bin/bookkeeper shell ledger -m 1213
   ```
   
   Output as follows:
   
   <img width="1377" alt="image" src="https://user-images.githubusercontent.com/20965307/156969787-842f3aa5-e8c4-464d-83c3-0a290b6c9a0f.png">
   
   We can see that there is no data information for this ledger in the bookie cluster
   
   
   And check under replicate:
   
   ```
   bin/bookkeeper shell listunderreplicated
   ```
   
   Output as follows:
   
   <img width="1531" alt="wecom-temp-214b6834b2b7930266f9eba438bec694" src="https://user-images.githubusercontent.com/20965307/156969795-275776c6-709c-4fca-a961-d38ef5c77377.png">
   
   
   We can see that the information of this ledger no longer exists in the Bookie cluster, but the information of this ledger can still be found in the underreplicated list, so ReplicationWorker will always try to copy the information of this ledger to other Bookie nodes, But in fact this ledger no longer exists, so the replication behavior of ReplicationWorker will continue to fail.
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] wolfstudy commented on issue #14573: Inconsistent Bookie ledgers and under replicate metadata in Bookie

Posted by GitBox <gi...@apache.org>.
wolfstudy commented on issue #14573:
URL: https://github.com/apache/pulsar/issues/14573#issuecomment-1060194372


   A possible guess here is that the current ledger deletion action will not check whether the ledger data still exists in the metadata of under replicate, so after the ledger is deleted in the bookie, it may cause the metadata of under replicate of bookie to still be lost the information about this ledger, causing data inconsistency between them


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] wolfstudy edited a comment on issue #14573: Inconsistent Bookie ledgers and under replicate metadata in Bookie

Posted by GitBox <gi...@apache.org>.
wolfstudy edited a comment on issue #14573:
URL: https://github.com/apache/pulsar/issues/14573#issuecomment-1060194372


   A possible guess here is that the current ledger deletion action will not check whether the ledger data still exists in the metadata of under replicate, so after the ledger is deleted in the bookie, it may cause the metadata of under replicate of bookie to still be lost the information about this ledger, causing data inconsistency between them.
   
   There are two ideas for this problem:
   
   1. When we delete the ledger from the Bookie cluster, we need to pay attention to the underreplicated list to ensure that there is no current ledger information to be deleted in the list, so as to ensure the consistency between them.
   
   2. Assuming that there is indeed inconsistency between them, when ReplicationWorker fails to replicate several times, do we consider abandoning the replication operation of this error ledger


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org