You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@bookkeeper.apache.org by GitBox <gi...@apache.org> on 2018/08/20 12:10:45 UTC

[GitHub] ivankelly commented on a change in pull request #1611: BP-35: 128 bits support

ivankelly commented on a change in pull request #1611: BP-35: 128 bits support
URL: https://github.com/apache/bookkeeper/pull/1611#discussion_r211235841
 
 

 ##########
 File path: site/bps/BP-35-128-bits-support.md
 ##########
 @@ -0,0 +1,399 @@
+---
+title: "BP-34: 128 bits support"
+issue: https://github.com/apache/bookkeeper/603
+state: "Under Discussion"
+design doc: https://docs.google.com/document/d/1cu54dNSV2ZrdWCi40LcyX8NxXGRCW0609T_ewmK9BWM
+release: "4.9.0"
+---
+
+### Motivation
+
+BookKeeper coordinates with a metadata store to generate a cluster wide `ledgerId`.
+Currently this is a signed `64 bit` number (effectively 63 bits). This method works
+great because we have a centralized metadata store for coordinating the id generation.
+However this method may not scale as the cluster size and number of ledgers grow.
+
+[Universally unique identifier - Wikipedia](https://en.wikipedia.org/wiki/Globally_unique_identifier)
+is a preferred way to generate decentralized globally unique IDs and it takes `128 bits`.
+This method can scale well as it doesn't need a centralized coordination. 
+
+This BP proposes the changes for increasing ledger id from `63 bits` to `128 bits`.
+
+### 128 bits
+
+Since there is no native support for `128 bits` in both Java and
+[Protobuf](https://github.com/google/protobuf/issues/2180), we have to break `128 bits`
+into 2 `64 bits` numbers for representing the `128 bits` id:
+
+- ledger-id-msb: the most significant 64 bits, bit 64 - 127
+- ledger-id-lsb: the least significant 64 bits, bit 0 - 63
+
+For backward compatibility, the `ledger-id-lsb` is the current `64 bits` ledger-id.
+The `ledger-id-msb` will be added as a new field in both API and protocol. 
+
+I am proposing calling `ledger-id-msb` as `ledger-scope-id`. So the current 64bits `ledgerId` and
+the newly introduced 64bits `ledgerScopeId` together will be forming the new `128 bits` ledger id.
+
+The default `ledgerScopeId` is `0`. That means any ledgers created prior to this change are allocated
+under scope `0`. Hence it maintains backward compatibility during upgrade. 
+
+The combination of `ledgerScopeId` and `ledgerId` forms the `128 bits` ledger id. We can introduce a
+hex representation of this `128 bits` ledger id - `ledgerQualifiedName` . This `ledgerQualifiedName` can
+be useful for CLI tooling, REST api and troubleshooting purpose. The API internally can convert
+`ledgerQualifiedName` to `ledgerScopeId` and `ledgerId`.
+
+### Public Interfaces
+
+#### API Change
+
+The API will be introducing `ledgerScopeId` across the interfaces. This field will be optional and default to `0`. 
+
+##### Handle
+
+Introduce a new method `getScopeId` for representing the scope id (the most significant  `128 bits` ledger id).
+
+```java
+public interface Handle extends AutoCloseable {
+
+  ...
+
+  /**
+   * Return the ledger scope id. The most significant 64 bits of 128 bits.
+   */
+  long getScopeId();
+
+  /**
+   * Return the ledger id. The least significant 64 bits of 128 bits.
+   */ 
+  long getId();
+
+  ...
+
+}
+```
+
+##### Create LedgerAdv
+
+Introduce a new method `withLedgerScopeId` in `CreateAdvBuilder` for providing `scopeId`
+(the most significant 64 bits for 128 bits ledger id) on creating a ledger.
+
+```java
+public interface CreateAdvBuilder extends OpBuilder<WriteHandle> {
+  ...
+
+  /**
+   * Set the scope id for the newly created ledger.
+   * If no explicit scopeId is passed, the new ledger
+   * will be created under scope `0`.
+   */
+  CreateAdvBuilder withLedgerScopeId(long scopeId);	
+
+  ...
+}
+```
+
+##### Open Ledger
+
+Introduce a new method `withLedgerScopeId` in `OpenBuilder` for providing `scopeId`
+(the most significant 64 bits for 128 bits ledger id) on opening a ledger.
+
+```java
+public interface OpenBuilder extends OpBuilder<ReadHandle> {
+  ...
+  /**
+   * Set the scope id of the ledger to open.
+   */
+  OpenBuilder withLedgerScopeId(long scopeId);
+  ...
+}
+```
+
+##### Delete Ledger
+
+Introduce a new method `withLedgerScopeId` in `DeleteBuilder` for providing `scopeId`
+(the most significant 64 bits for 128 bits ledger id) on deleting a ledger.
+
+```java
+public interface DeleteBuilder extends OpBuilder<Void> {
+  ...
+  /**
+   * Set the scope id of the ledger to delete.
+   */
+  DeleteBuilder withLedgerScopeId(long scopeId);
+  ...
+}
+```
+
+#### CLI
+
+All BookKeeper CLI tools will be updated with additional option `—ledger-scope-id`.
+Optionally we can add option `—ledger-qualified-name` (the hex representation of 128 bits).
+Internally all the CLI tools will convert ledger qualified name to `ledgerId` and `ledgerScopeId`.
+
+#### REST
+
+1. All ledger related endpoints will be adding a new parameter `ledger_scope_id`. 
+2. `ListLedgerService`  only supports listing ledgers under a given ledger scope id.
+   If `ledger_scope_id` is missing, it will be listing ledgers under scope `0`.
+
+#### Wire Protocol
+
+> There will be no plan for supporting 128 bits in v2 protocol, due to the limitation in v2 protocol.
+> So any operations in v2 protocol with scope id not equal to 0  will be failed immediately with
+> `ILLEGAL_OP` exceptions.
+
+All the request and response messages will be adding an optional field `optional int64 ledgerScopeId`.
+
+#### Entry Format
+
+Currently all the entries written to bookies are encoded in a certain format, including `metadata`,
+`digest code` and `payload`. The entry format is not *versioned*.
+
+In order to support adding another field `ledgerScopeId` in the `metadata` section, we are introducing
+`version` in the entry format.
+
+##### Entry Format V1
+
+```json
+Entry Format V1
+===============
+--- header ---
+Bytes (0 - 7)                   : Ledger ID
+Bytes (8 - 15)                  : Entry ID
+Bytes (16 - 23)                 : LastAddConfirmed
+Bytes (24 - 31)                 : Length
+--- digest ---
+Bytes (32 - (32 + x - 1))       : Digest Code (e.g. CRC32)
+--- payload ---
+Bytes ((32 + x) - )             : Payload
+```
+
+> `x` is the length of digest code.
+
+>  Prior to introducing `ledgerScopeId`, ledgerId is assumed to be a positive value.
+
+##### Entry Format V2
+
+```json
+Entry Format V2
+===============
+--- header ---
+Bytes (0 - 7)                   : Metadata Flags
+Bytes (8 - 15)                  : Ledger Scope ID
+Bytes (16 - 23)                 : Ledger ID
+Bytes (24 - 31)                 : Entry ID
+Bytes (32 - 39)                 : LastAddConfirmed
+Bytes (40 - 47)                 : Length
+--- digest ---
+Bytes (37 - (37 + x - 1))       : Digest Code (e.g. CRC32)
+--- payload ---
+Bytes ((37 + x) - )             : Payload
+``` 
+
+> `x` is the length of digest code.
+
+###### Metadata Flags
+
+```json
+Metadata: 1 Bytes (Long)
+------------------------
+0x 0 0
+   |__| 
+     |
+ version
+
+----
+Bit 0 - 3: digest type (e.g. CRC32, CRC32C and such)
+Bit 4 - 7: version, the most significant bit of this byte will be always set to 1.
+it will be used for differentiating entry format v1 and v2.
+
+```
+
+We are setting the most significant bit to be `1`. So the first byte in entry v2 will
+be a negative value, which can be used for differentiating entry format v1 and v2.
+The version will be encoded into the first byte. The version will be used for describing
+the entry format.
+
+##### Decoding Entry
+
+The pseudo code for decoding an entry will be described as followings:
+
+```java
+
+ByteBuf entry = ...;
+
+int metadataFlags = entry.getByte();
+
+if (metadataFlags <= 128) { // the entry is encoded in v1 format
+	// decoding the entry in v1 format
+	...
+} else {
+	// decoding the entry in v2 format
+}
+
+```
+
+#### Bookie Storage
+
+##### Journal
+
+A new method should be added in journal `WriteCallback` to handle `ledgerScopeId`.
+
+```java
+public interface WriteCallback {
 
 Review comment:
   Change the callback type for writing the Journal completely. There's nothing to be gained by using WriteCallback, and it means changing anything in the callback in the journal write path has to change a bunch of unrelated places.
   
   That said, all the client side internal callbacks will also have to change to include scope.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services