You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pulsar.apache.org by "poorbarcode (via GitHub)" <gi...@apache.org> on 2023/08/21 18:46:36 UTC

[GitHub] [pulsar] poorbarcode commented on a diff in pull request #20923: [improve] [pip] PIP-290 Provide a way to implement WSS E2E encryption and not need to expose the private key to the WebSocket Proxy

poorbarcode commented on code in PR #20923:
URL: https://github.com/apache/pulsar/pull/20923#discussion_r1300518754


##########
pip/pip-290.md:
##########
@@ -0,0 +1,207 @@
+# Background knowledge
+
+### 1. Web Socket Proxy Server
+[Web Socket Proxy Server](https://pulsar.apache.org/docs/3.0.x/client-libraries-websocket/#run-the-websocket-service) provides a simple way to interact with Pulsar under `WSS` protocol.
+- When a [wss-producer](https://pulsar.apache.org/docs/3.0.x/client-libraries-websocket/#nodejs-producer) was registered, Web Socket Proxy Server will create a one-to-one producer to actually send messages to the Broker.
+- When a [wss-consumer](https://pulsar.apache.org/docs/3.0.x/client-libraries-websocket/#nodejs-consumer) was registered, Web Socket Proxy Server will create a one-to-one consumer to actually receive messages from the Broker and send them to WSS Consumer.
+
+### 2. When a user wants to encrypt the message payload, there are two solutions:
+- **Solution 1**: encrypt message payload before WSS Producer sends messages, and decrypt after WSS Consumer receives messages. If the user wants to use different encryption keys for different messages, they can set a [property](https://github.com/apache/pulsar/blob/master/pulsar-websocket/src/main/java/org/apache/pulsar/websocket/data/ProducerMessage.java#L38) into messages to indicate the message was encrypted by which key. But this solution has a shortcoming: if the user also has consumers with Java clients, then these consumers cannot auto-decrypt the messages(Normally, java clients can [decrypt messages automatically](https://pulsar.apache.org/docs/3.0.x/security-encryption/#how-it-works-in-pulsar)). And the benefit of this solution is that the user does not need to expose the private key to Web Socket Proxy Server.
+- **Solution 2**: In the release `2.11`, there is a [feature](https://github.com/apache/pulsar/pull/16234) that provides a way to set encrypt keys for the internal producers and consumers of Web Socket Proxy Server, but needs the user to upload both public key and private key into the Web Socket Proxy Server(in other words: user should expose the keys to Web Socket Proxy Server), there is a un-recommended workaround for this shortcoming<sup>[1]</sup>. The benefit is that the WSS producer and WSS consumer should not care about encryption and decryption.
+
+### 3. The message payload process during message sending
+- The Producer will composite several message payloads into a batched message payload if the producer is enabled batch;
+- The Producer will compress the batched message payload to a compressed payload if enabled compression;
+- After the previous two steps, the Producer encrypts the compressed payload to an encrypted payload.
+
+
+### 4. Encrypt context
+
+The Construction of the Encrypt Context:
+```json
+{
+  "batchSize": 2, // How many single messages are in the batch. If null, it means it is not a batched message.
+  "compressionType": "NONE", // the compression type.
+  "uncompressedMessageSize": 0, // the size of the uncompressed payload.
+  "keys": {
+    "client-rsa.pem": {  // key name.
+      "keyValue": "asdvfdw==", // key value.
+      "metadata": {} // extra props of the key.
+    }
+  },
+  "param": "Tfu1PxVm6S9D3+Hk" // the IV of current encryption for this message. 
+}
+```
+All the fields of Encrypt Context are used to parse the encrypted message payload. 
+- `keys` and `param` are used to decrypt the encrypted message payload. 
+- `compressionType` and `uncompressedMessageSize` are used to uncompress the compressed message payload.
+- `batchSize` is used to extract the batched message payload.
+
+There is another attribute named `encryptionAlgo` used to identify what encrypt algo is using, it is an optional attribute, so there is no such property in Encrypt Context.
+
+When the internal consumer of the Web Socket Proxy Server receives a message, if the message metadata indicates that the message is encrypted, the consumer will add Encrypt Context into the response for the WSS consumer. 
+
+### 5. Quick explanation of the used components in the section Design:
+- `CryptoKeyReader`: an interface that requires users to implement to read public key and private key.
+- `MessageCrypto`: a tool interface to encrypt and decrypt the message payload and add and extract encryption information for message metadata.
+
+# Motivation
+
+Therefore, there is no way to enable encryption under the WSS protocol and meet the following conditions:
+- WSS producer and WSS consumer did encrypt and decrypt themselves and did not share private keys to Web Socket Proxy Server.
+- Other clients(such as Java and CPP) can automatically decrypt the messages which WSS producer sent.
+
+# Goals
+Provide a way to make Web Socket Proxy Server just passes encrypt information to the client, the WSS producer and WSS consumer did encrypt and decrypt themselves.
+
+Since the order of producer operation for message payloads is `compression --> encryption,` users need to handle Compression themselves if needed.
+
+Since the order of consumer operation for message payload is `deencryption --> un-compression --> extract the batched messages`, users need to handle Un-compression amd Extract Batch Messages themselves if needed.
+
+Note: I want to cherry-pick this feature into `branch-2.11`.
+
+
+## Out of Scope
+This proposal does not intend to support the three features:
+- Support publishing "Null value messages" for WSS producers.
+- Support publishing "Chunked messages" for WSS producers.
+- Support publishing "Batched messages" for WSS producers.
+
+
+# High-Level Design
+**For WSS producers**: Web Socket Proxy Server marks the Producer as Client-Side Encryption Producer if a producer registered with a non-empty `encryptionKeyValues`, and discards server-side batch messages, server-side compression, and server-side encryption.
+
+**For WSS consumers**: Users can set the parameter `cryptoFailureAction` to `CONSUME` to directly receive the undecrypted message payload (it was supported before). 
+
+# Detailed Design
+**For the producers marked as Client-Side Encryption Producer**: 
+
+- forcefully set the component `CryptoKeyReader` to `DummyCryptoKeyReaderImpl`.
+  - `DummyCryptoKeyReaderImpl`: doesn't provide any public key or private key, and just returns `null`.
+- forcefully set the component `MessageCrypto` to `WSSDummyMessageCryptoImpl` to skip the message Server-Side encryption.
+  - `WSSDummyMessageCryptoImpl`: only set the encryption info into the message metadata and discard payload encryption.
+- forcefully set `enableBatching` to `false` to skip Server-Side batch messages building, and print a warning log if users set `enableBatching`, `batchingMaxMessages`, `maxPendingMessages`, `batchingMaxPublishDelay`.
+- forcefully set the `CompressionType` to `None` to skip the Server-Side compression, and print a warning log if users set `compressionType`.
+- forcefully set the param `enableChunking` to `false`(the default value is `false`) to prevent unexpected problems if the default setting is changed in the future.
+
+**For the client-side encryption consumers**: 
+
+- To avoid too many warning logs: after setting the config `cryptoFailureAction` of the consumer is `CONSUME`, just print an `INFO` level log when receiving an encrypted message if the consumer could not decrypt it(the original log level is `WARN`).
+
+
+### Public API
+
+#### [Endpoint: producer connect](https://pulsar.apache.org/docs/3.1.x/client-libraries-websocket/#producer-endpoint)
+Add query params below: 
+| param name | description|
+| --- | --- |
+| `encryptionKeyValues` | Base64 encoded and URL encoded secret key |
+| `encryptionKeyMetadata` | Base64 encoded and URL encoded and JSON formatted key-value metadata list of encryption key |

Review Comment:
   > Why not add the key metadata to the encryptionKeyValues JSON structure? So that it will align with the returned data structure to consumers.
   
   I added a new mode for the parameter `encryptionKeys`: If a producer registered with a JSON parameter `encryptionKeys`, and the `encryptionKeys[{key_name}].keyValue` is not empty, Web Socket Proxy Server will mark this Producer as Client-Side Encryption Producer, then discard server-side batch messages, server-side compression, and server-side encryption. 
   
   > And could you please also provide an example of what is the original data looks like? without base64 and URL encoding.
   
   Done.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org