You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@bookkeeper.apache.org by GitBox <gi...@apache.org> on 2018/08/14 21:43:53 UTC

[GitHub] reddycharan opened a new issue #1602: Cluster Metadata Checker

reddycharan opened a new issue #1602: Cluster Metadata Checker
URL: https://github.com/apache/bookkeeper/issues/1602
 
 
   
   
   **FEATURE REQUEST**
   
   1. Please describe the feature you are requesting.
   
   This is the master ticket for tracking BP-34 :
   
   Intention of this new checker is to validate following things 
   
   - ledger placement policy : Ensemble of each segment in Ledger should adhere to LedgerPlacementPolicy
   - durability contract : Every entry has WQ number of replicas and entries are replicated according to RoundRobinDistributionSchedule
   - progress in handling under replication : No ledger is marked underreplicated for more than acceptable time
   - availability of bookies of the ensemble of ledgers : If Auditor fails to get response from a Bookie, then that Bookie shouldn’t be registered to metadata server and Auditor should be aware of it unavailability or if it is a transient error in getting response from Bookie then subsequent calls to that Bookie should succeed. 
   
   Roles and Responsibilities of the cluster metadata checker
   
   - Police the durability contract and report violations. Its job is to make sure that the metadata server(zk) and the storage servers (bookies) are in sync. Simply put, check if bookies agree with the metadata server metadata and if not, raise an alert.
   - Scrutiny’s job is not to fix if it finds any inconsistency. Instead make a noise about it. If the scrutiny fails, it means that we have a potential hole(bug) in our service to meet the durability contract. Scrutiny exposes that hole with enough information the help identify the issue and fix it.
   - The Metadata Scrutiny needs to be light weighted esp., on Bookie and must run regularly giving the confidence that the cluster is in good state.
   
   2. Indicate the importance of this issue to you (blocker, must-have, should-have, nice-to-have). Are you currently using any workarounds to address this issue?
   
   It is kind of should-have, it gives confidence on the data in the cluster
   
   3. Provide any additional detail on your proposed use case for this feature.
   
   For more information check BP-34
   
   4. If there are some sub-tasks using -[] for each subtask and create a corresponding issue to map to the sub task:
   
   [sub-task1]: develop underlying API in LedgerCache to get the list of entries of a ledger this bookie contains
   [sub-task2]: create BookKeeper protocol request / response for GetListOfEntriesOfALedgerRequest and the related classes for request /response handling
   [sub-task2]: add Cluster Metadata Checker to Auditor and the needed metrics / logs 
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services