You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pulsar.apache.org by GitBox <gi...@apache.org> on 2022/09/14 02:43:00 UTC

[GitHub] [pulsar] shibd commented on pull request #17125: [improve][client c++] Support KeyValue Schema.

shibd commented on PR #17125:
URL: https://github.com/apache/pulsar/pull/17125#issuecomment-1246161102

   > I just thought again for the design of `KeyValue`. Currently, for a given `Message` object,
   > 
   > * the key is stored in the `MessageMetadata` object (generated by ProtoBuf)
   > * the value is stored in `buffer`, whose type is `SharedBuffer`
   > 
   > If the encoding type is `SEPARATED`, we can avoid the copy operation by just return the key and value to users.
   > 
   > Otherwise, i.e. the encoding type is `INLINE`, the data copy cannot be avoided, because both key and value are stored in `buffer`, we must split it into two byte arrays (`std::string` objects). But in this case, we can still expose the raw pointer of the value part.
   > 
   > For example, if the buffer is
   > 
   > ```
   > 0x00 0x00 0x00 0x01 0x61 0x00 0x00 0x00 0x01 0x62
   > ```
   > 
   > The key is "a" (ASCII is 61), the value is "b" (ASCII is 62). Assuming the pointer to the first byte is `p`, we can return `p + 9` as the pointer to the value and 1 as the length of the value.
   > 
   > Regarding the key, usually we assume the key is not too large, so the data copy of the key is accepted.
   > 
   > In addition, we don't need to expose the key value encoding type in `KeyValue` because we already store it in the schema.
   > 
   > ```c++
   > enum class KeyValueEncodingType
   > {
   >     SEPARATED,
   > 
   >     INLINE
   > };
   > 
   > class KeyValueImpl;
   > 
   > class PULSAR_PUBLIC KeyValue {
   >    public:
   >     // Use move constructor to avoid data copy
   >     KeyValue(std::string&& key, std::string&& value);
   >     KeyValue(const Message& message);
   > 
   >     const std::string& getKey() const;
   > 
   >     const void* getData() const;
   > 
   >     size_t getLength() const;
   > 
   >     std::string getDataAsString() const;
   > 
   >    private:
   >     std::shared_ptr<KeyValueImpl> impl_;
   > };
   > ```
   > 
   > The `KeyValueImpl` class should store the following two fields.
   > 
   > ```c++
   >     std::string key_;
   >     SharedBuffer valueBuffer_;
   > ```
   > 
   > We should encode the `key_` and `valueBuffer_` to the actual buffer in `ProducerImpl::sendAsync` because we can get the encoding type from the schema.
   
   @BewareMyPower Thanks for your review. I took your suggestion to reduce the copying of data by modifying the API.
   
   > In addition, we don't need to expose the key value encoding type in KeyValue because we already store it in the schema.
   
   I keep encoding type in KeyValue. Because if will remove it, will affect the API.
   
   For example, when a user produces a message.
   
   ```
       // originally
       Message msg = MessageBuilder().setContent(keyValue).build();   // originally
   
       // after modification
       Message msg = producer.newMessageBuilder().setContent(keyValue).build();  
       // or
       Message msg = MessageBuilder(schema).setContent(keyValue).build();  
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org