You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@bookkeeper.apache.org by GitBox <gi...@apache.org> on 2021/06/08 16:55:46 UTC

[GitHub] [bookkeeper] dlg99 opened a new pull request #2730: BP-45: a pluggable way to modify payload sent to the ledger

dlg99 opened a new pull request #2730:
URL: https://github.com/apache/bookkeeper/pull/2730


   Descriptions of the changes in this PR:
   
   The proposed change targets addition of a pluggable way to modify payload sent to the ledger. Specific use cases include GDPR compliance, encryption, and compression. While these can be implemented at the application level, having this as a BK client level extension unlocks such functionality to any application built on top of Bookkeeper.
   
   ### Motivation
   
   see site/bps/BP-45-LedgerPayloadInterceptor.md for details
   
   ### Changes
   
   Added interceptor that can modify ledger's custom metadata, payload before add, and payload after read.
   
   Master Issue: #<master-issue-number>
   
   > ---
   > In order to uphold a high standard for quality for code contributions, Apache BookKeeper runs various precommit
   > checks for pull requests. A pull request can only be merged when it passes precommit checks.
   >
   > ---
   > Be sure to do all of the following to help us incorporate your contribution
   > quickly and easily:
   >
   > If this PR is a BookKeeper Proposal (BP):
   >
   > - [ ] Make sure the PR title is formatted like:
   >     `<BP-#>: Description of bookkeeper proposal`
   >     `e.g. BP-1: 64 bits ledger is support`
   > - [ ] Attach the master issue link in the description of this PR.
   > - [ ] Attach the google doc link if the BP is written in Google Doc.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [bookkeeper] dlg99 commented on pull request #2730: BP-45: a pluggable way to modify payload sent to the ledger

Posted by GitBox <gi...@apache.org>.
dlg99 commented on pull request #2730:
URL: https://github.com/apache/bookkeeper/pull/2730#issuecomment-857140165


   @sijie The full BP that goes through the value add, I'll be happy to expand on specific points if I described them too briefly. I am also open to suggestions how to achieve specifically described usecases better in a different manner. Specifically how to achieve GDPR case for "right to be forgotten" in pulsar when short and long lived data is mixed at the same bookie cluster.
   
   The plan is to have this functionality disabled by default. When enabled, any payload modifications should happen before digest computation.
   
   I understand the sentiment regarding immutability, but whatever does modify the payload should happen
   1. reversibly (typically, I cannot predict all usecases)
   2. as planned by the author and administrator of the BL cluster. 
   
   HDDs supposed to save whatever the user saves to them yet there are usecases for [self-encrypting drives](https://en.wikipedia.org/wiki/Hardware-based_full_disk_encryption#Hard_disk_drive_FDE). This case is similar IMO.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [bookkeeper] dlg99 closed pull request #2730: BP-45: a pluggable way to modify payload sent to the ledger

Posted by GitBox <gi...@apache.org>.
dlg99 closed pull request #2730:
URL: https://github.com/apache/bookkeeper/pull/2730


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@bookkeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [bookkeeper] fpj commented on pull request #2730: BP-45: a pluggable way to modify payload sent to the ledger

Posted by GitBox <gi...@apache.org>.
fpj commented on pull request #2730:
URL: https://github.com/apache/bookkeeper/pull/2730#issuecomment-859393745


   I read the proposal, and it seems reasonable at a high level to enable pre and post processing for ledgers and entries. From the proposal, I don't understand whether implemented interceptors are loaded at runtime or they have to be merged to the code base. If it is the latter, then the scope of utilization is going to be more limited.
   
   There is also an implicit assumption that the client reading the data knows what interceptor to use. This might be the case for a number of applications, but if an application has some form of generic ledger reader, then it would need to find out what interceptors to use. If the interceptor code is merged in to the code base, then we don't have this problem, but the mechanism is less flexible. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [bookkeeper] dlg99 commented on pull request #2730: BP-45: a pluggable way to modify payload sent to the ledger

Posted by GitBox <gi...@apache.org>.
dlg99 commented on pull request #2730:
URL: https://github.com/apache/bookkeeper/pull/2730#issuecomment-858105250


   @jvrao @ankit-j do you have any feedback? Would you find this useful for your usecases? see `BP-45-LedgerPayloadInterceptor.md` for details.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [bookkeeper] eolivelli commented on pull request #2730: BP-45: a pluggable way to modify payload sent to the ledger

Posted by GitBox <gi...@apache.org>.
eolivelli commented on pull request #2730:
URL: https://github.com/apache/bookkeeper/pull/2730#issuecomment-857830934


   I believe that the main value of this BP is to enable the administrators of the BookKeeper cluster to deal with this transformations.
   Especially I am talking about encryption.
   
   If we implement this, for instance, in Pulsar, it will be a problem for all of the external tools (like the bookkeeper shell, BKVM...) to deal with the contents of the ledger.
   
   If the plugin is deployed together with the BK client (like some Encryption plugin that we can provide out of the box), then every tool will be able to interact directly with BK.
   
   Therefore, if we implement this is BK, many other applications that would like to add encryption (like BlobIt https://github.com/diennea/blobit) will be able to leverage this work by simply applying the right plugin. (cc @nicoloboschi @diegosalvi )


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [bookkeeper] lhotari commented on a change in pull request #2730: BP-45: a pluggable way to modify payload sent to the ledger

Posted by GitBox <gi...@apache.org>.
lhotari commented on a change in pull request #2730:
URL: https://github.com/apache/bookkeeper/pull/2730#discussion_r648968043



##########
File path: site/bps/BP-45-LedgerPayloadInterceptor.md
##########
@@ -0,0 +1,183 @@
+---
+title: "BP-45: Add interceptor interface allowing modifications of the payload"
+issue: https://github.com/apache/bookkeeper/issues/2731
+state: Under Discussion
+release: "4.15.0"
+---
+
+### Motivation
+
+The proposed change targets addition of a pluggable way to modify payload sent to the ledger. 
+Specific use cases include GDPR compliance, encryption, and compression. 
+While these can be implemented at the application level, having this as a BK client level 
+extension unlocks such functionality to any application built on top of Bookkeeper.
+
+Let’s dig deeper into specific use cases: 
+
+#### GDPR
+
+Ledger's data on the bookie servers is stored in the immutable EntryLog files. The files, 
+in the most common case, mix the data from multiple ledgers. There is no guarantee of immediate 
+erasure from the disk upon the ledger deletion, the entry logs are "compacted" with some delay. 
+The amount of the delay is not guaranteed and, in an extreme case, can be infinite if the data 
+from a deleted small short-lived ledger gets mixed in the entry log with data from the long-lived 
+ledgers. 
+
+Such behavior of compaction is a trade-off between performance and disk space utilization. 
+
+The data is also stored in the journal file and, in an extreme case 
+(low TTL for deletion and low traffic volume), can be recoverable from the journal after the 
+ledger deletion.
+
+Modern global businesses are affected by privacy laws [1] and obligations, the most notable 
+of which is "Art. 17 GDPR: Right to be forgotten" [2]. Privacy policies require setting deadlines 
+after which the data cannot be used.
+
+"Forgotten" encryption keys are an acceptable alternative to the actual deletion of the data. 

Review comment:
       Isn't this commonly called [crypto-shredding](https://en.wikipedia.org/wiki/Crypto-shredding)?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org