You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@bookkeeper.apache.org by GitBox <gi...@apache.org> on 2022/06/29 13:54:46 UTC

[GitHub] [bookkeeper] horizonzy opened a new issue, #3377: BP-54: Repaired the ledger fragment which ensemble not adhere placement policy.

horizonzy opened a new issue, #3377:
URL: https://github.com/apache/bookkeeper/issues/3377

   ### Motivation
   
   There is a user case about data availability.
   
   1. They have two racks, they have a rack aware policy that ensures it writes across two racks.
   2. They had some data on a topic with long retention
   3. They ran a disaster recovery(DR) test, during this test, they shut down one rack.
   4. During the period of the DR test, auto-recovery ran. Because the DR test only has one rack active, and because the default of auto-recovery is to do rack aware with the best effort, it recovered up to an expected number of replicas.
   5. They stopped the DR test and all was well, but now that ledger was only on one rack
   6. They ran another DR test, this time basically moving data to the another zone, but now data is missing because it is all only on one rack
   
   We should supply a feature to support this case.
   
   #### Auditor placement policy check logic
   
   At now, we already support config `auditorPeriodicPlacementPolicyCheckInterval` to check the ledger's segment ensemble is adhering the placement policy. If the value of `auditorPeriodicPlacementPolicyCheckInterval` > 0, `Auditor` will check it by scheduled task. Default value is 0, means that not check placement policy.
   
   This feature is supporteed by [[BP-34](https://bookkeeper.apache.org/bps/BP-34-cluster-metadata-checker/)](https://bookkeeper.apache.org/bps/BP-34-cluster-metadata-checker/)
   
   #### Drawbacks
   
   In BP-34 Implementation, it detect which ledger fragment's ensemble is not adhering placement policy, only record it to `LoggerState`, not to repaired the data to adhere placement policy. 
   
   
   
   ### Proposal
   
   Based on the above issues, we introduce a new config `repairedPlacementPolicyNotAdheringBookieEnabled` to handle this case.
   
   In `Auditor`, if user config `auditorPeriodicPlacementPolicyCheckInterval` > 0, the scheduled task will check ledger fragment's ensemble is adhering placement policy. If not adhere and config `repairedPlacementPolicyNotAdheringBookieEnabled` is true, the `Auditor` will mark the ledger underreplicated.
   
   In `ReplicationWorker`,  it will get the undererplicated ledger, then will check the ledger data integrity then try to move data to alive bookie at now. If config `repairedPlacementPolicyNotAdheringBookieEnabled` is true, it will check the ledger fragment ensemble is adhering placement policy. The ledger fragment maybe loss data and not adhere placement policy at the same time,
   
   we will ignore repaired adhering placement policy problem in this time, just replicate the data to active bookie and update ensemble info, cause the data integrity is more important. If the ensemble is still not adhering placement policy, the `Auditor` will mark this ledger again, then `ReplicationWorker` will repaired adhering placement policy problem.
   
   If the ledger fragment only not adhere placement policy, `ReplicationWorker` will select other rack bookie to take place of old bookie which in the same rack with other bookies. If there is no more rack bookie, it won't repaired, record no more bookie to `LoggerState`.
   
   ### Changes
   
   1. Support a new config `repairedPlacementPolicyNotAdheringBookieEnabled` to control is repaired ensemble not adhere placement policy problem.
   2. In `Auditor` placement policy check process, mark ledger if the ledger ensemble not adhere placement policy when`repairedPlacementPolicyNotAdheringBookieEnabled` is true.
   3. In `ReplicationWorker` rereplicate, repaired the ledger fragment to adhere placement policy.
   4. Add this feature in the docs.
   
   ### Compatibility, Deprecation, and Migration Plan
   
   The `repairedPlacementPolicyNotAdheringBookieEnabled` default is false, if user upgrade the new release, it won't change any behavior compared to before.
   
   
   ### Test Plan
   
   We will add tests for the following module.
   
   1. Auditor, test the ledger is marked underreplicated when the ledger fragment policy is not adhering placement policy.
   2. ReplicationWorker, test the not adhering placement policy fragment is repaired to adhere placement policy.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@bookkeeper.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [bookkeeper] horizonzy commented on issue #3377: BP-54: Repaired the ledger fragment which ensemble not adhere placement policy.

Posted by GitBox <gi...@apache.org>.
horizonzy commented on issue #3377:
URL: https://github.com/apache/bookkeeper/issues/3377#issuecomment-1199262415

   @eolivelli ping


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@bookkeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [bookkeeper] hangc0276 commented on issue #3377: BP-54: Repaired the ledger fragment which ensemble not adhere placement policy.

Posted by GitBox <gi...@apache.org>.
hangc0276 commented on issue #3377:
URL: https://github.com/apache/bookkeeper/issues/3377#issuecomment-1203419019

   @eolivelli Would you please help review this PR again? thanks. We hope this feature can be included in 4.16.0


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@bookkeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] BP-54: Repaired the ledger fragment which ensemble not adhere placement policy. [bookkeeper]

Posted by "shoothzj (via GitHub)" <gi...@apache.org>.
shoothzj commented on issue #3377:
URL: https://github.com/apache/bookkeeper/issues/3377#issuecomment-2097143862

   closed by #3359 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@bookkeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [bookkeeper] horizonzy commented on issue #3377: BP-54: Repaired the ledger fragment which ensemble not adhere placement policy.

Posted by GitBox <gi...@apache.org>.
horizonzy commented on issue #3377:
URL: https://github.com/apache/bookkeeper/issues/3377#issuecomment-1170014244

   The proposal PR:
   https://github.com/apache/bookkeeper/pull/3359


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@bookkeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [bookkeeper] horizonzy commented on issue #3377: BP-54: Repaired the ledger fragment which ensemble not adhere placement policy.

Posted by GitBox <gi...@apache.org>.
horizonzy commented on issue #3377:
URL: https://github.com/apache/bookkeeper/issues/3377#issuecomment-1201162900

   @eolivelli  updated the BP, could you take a look again, thanks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@bookkeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] BP-54: Repaired the ledger fragment which ensemble not adhere placement policy. [bookkeeper]

Posted by "shoothzj (via GitHub)" <gi...@apache.org>.
shoothzj closed issue #3377: BP-54: Repaired the ledger fragment which ensemble not adhere placement policy.
URL: https://github.com/apache/bookkeeper/issues/3377


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@bookkeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org