You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pulsar.apache.org by GitBox <gi...@apache.org> on 2022/10/24 12:56:51 UTC

[GitHub] [pulsar-client-cpp] tongsucn opened a new pull request, #66: Add new interfaces to improve data consuming performance. (#65)

tongsucn opened a new pull request, #66:
URL: https://github.com/apache/pulsar-client-cpp/pull/66

   Fixes #65
   
   ### Motivation
   
   Improve data consuming performance by adding new interfaces on `class Message` and `class SharedBuffer`.
   
   ### Modifications
   
   Adding two functions:
   
   `lib/SharedBuffer.h`
   
   ```cpp
   bool SharedBuffer::pop(std::string &target)
   ```
   
   `lib/Message.cc`
   
   ```cpp
   void Message::moveDataIntoString(std::string &data)
   ```
   
   ### Verifying this change
   
   Below TBD
   
   - [ ] Make sure that the change passes the CI checks.
   
   *(Please pick either of the following options)*
   
   This change is a trivial rework / code cleanup without any test coverage.
   
   *(or)*
   
   This change is already covered by existing tests, such as *(please describe tests)*.
   
   *(or)*
   
   This change added tests and can be verified as follows:
   
   *(example:)*
     - *Added integration tests for end-to-end deployment with large payloads (10MB)*
     - *Extended integration test for recovery after broker failure*
   
   ### Documentation
   
   <!-- DO NOT REMOVE THIS SECTION. CHECK THE PROPER BOX ONLY. -->
   
   - [ ] `doc-required` 
   (Your PR needs to update docs and you will update later)
   
   - [ ] `doc-not-needed` 
   (Please explain why)
   
   - [ ] `doc` 
   (Your PR contains doc changes)
   
   - [ ] `doc-complete`
   (Docs have been already added)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar-client-cpp] BewareMyPower commented on a diff in pull request #66: Add new interfaces to improve data consuming performance. (#65)

Posted by GitBox <gi...@apache.org>.
BewareMyPower commented on code in PR #66:
URL: https://github.com/apache/pulsar-client-cpp/pull/66#discussion_r1004270683


##########
lib/SharedBuffer.h:
##########
@@ -203,6 +203,13 @@ class SharedBuffer {
         writeIdx_ = 0;
     }
 
+    bool pop(std::string& target) {
+        if (data_ == nullptr) return false;

Review Comment:
   `data_` could never be a `nullptr`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar-client-cpp] BewareMyPower commented on pull request #66: Add new interfaces to improve data consuming performance. (#65)

Posted by GitBox <gi...@apache.org>.
BewareMyPower commented on PR #66:
URL: https://github.com/apache/pulsar-client-cpp/pull/66#issuecomment-1301691454

   @tongsucn Good point. I think it's determined by how do you use that.
   
   If you only want a view of the Message's value, you can create an instance `std::string_view` (or a similar class). But you should be careful that the lifetime is the same as `Message`, i.e. you must keep the `Message` objects in memory to make `std::string_view` valid.
   
   If you want some extra in place operations, I think you can expose the non-const `data()` override.
   
   ```c++
   const void* getData() const;
   void* getData();  // new interface
   ```
   
   If your process function in application just accepts a `std::string`, I'm afraid you have to change it to `std::string_view`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar-client-cpp] tongsucn commented on a diff in pull request #66: Add new interfaces to improve data consuming performance. (#65)

Posted by GitBox <gi...@apache.org>.
tongsucn commented on code in PR #66:
URL: https://github.com/apache/pulsar-client-cpp/pull/66#discussion_r1007586517


##########
lib/SharedBuffer.h:
##########
@@ -203,6 +203,13 @@ class SharedBuffer {
         writeIdx_ = 0;
     }
 
+    bool pop(std::string& target) {
+        if (data_ == nullptr) return false;

Review Comment:
   Done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar-client-cpp] shibd commented on a diff in pull request #66: Add new interfaces to improve data consuming performance. (#65)

Posted by GitBox <gi...@apache.org>.
shibd commented on code in PR #66:
URL: https://github.com/apache/pulsar-client-cpp/pull/66#discussion_r1009432159


##########
tests/MessageTest.cc:
##########
@@ -53,6 +53,15 @@ TEST(MessageTest, testAllocatedContents) {
     Message msg = msgBuilder.build();
     ASSERT_FALSE(strncmp("content", (char*)msg.getData(), msg.getLength()));
     ASSERT_EQ(content, (char*)msg.getData());
+
+    bool throwed = false;
+    try {
+        msg.releaseData();
+    } catch (std::runtime_error& ex) {

Review Comment:
   You can use `ASSERT_THROW` instead.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar-client-cpp] BewareMyPower commented on a diff in pull request #66: Add new interfaces to improve data consuming performance. (#65)

Posted by GitBox <gi...@apache.org>.
BewareMyPower commented on code in PR #66:
URL: https://github.com/apache/pulsar-client-cpp/pull/66#discussion_r1009494922


##########
tests/MessageTest.cc:
##########
@@ -53,6 +53,15 @@ TEST(MessageTest, testAllocatedContents) {
     Message msg = msgBuilder.build();
     ASSERT_FALSE(strncmp("content", (char*)msg.getData(), msg.getLength()));
     ASSERT_EQ(content, (char*)msg.getData());
+
+    bool throwed = false;
+    try {
+        msg.releaseData();
+    } catch (std::runtime_error& ex) {

Review Comment:
   +1



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar-client-cpp] BewareMyPower commented on a diff in pull request #66: Add new interfaces to improve data consuming performance. (#65)

Posted by GitBox <gi...@apache.org>.
BewareMyPower commented on code in PR #66:
URL: https://github.com/apache/pulsar-client-cpp/pull/66#discussion_r1004321666


##########
lib/SharedBuffer.h:
##########
@@ -203,6 +203,13 @@ class SharedBuffer {
         writeIdx_ = 0;
     }
 
+    bool pop(std::string& target) {
+        if (data_ == nullptr) return false;

Review Comment:
   The returned value is always ignored. I think we should throw an exception to disable `moveDataIntoString` on an invalid `Message` object, i.e. built from a `MessageBuilder` whose content was set by `setAllocatedContent` .



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar-client-cpp] BewareMyPower commented on pull request #66: Add new interfaces to improve data consuming performance. (#65)

Posted by GitBox <gi...@apache.org>.
BewareMyPower commented on PR #66:
URL: https://github.com/apache/pulsar-client-cpp/pull/66#issuecomment-1301694253

   I just found the `batchReceive` method added in #21. Maybe if the producer enables batching, the consumer must combine `batchReceive` and the `releaseData` APIs. In this case, I prefer not throwing an exception when `data_` is `nullptr`. Instead, fallback to copying the data rather than moving.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar-client-cpp] tongsucn commented on a diff in pull request #66: Add new interfaces to improve data consuming performance. (#65)

Posted by GitBox <gi...@apache.org>.
tongsucn commented on code in PR #66:
URL: https://github.com/apache/pulsar-client-cpp/pull/66#discussion_r1009320948


##########
include/pulsar/Message.h:
##########
@@ -100,6 +100,20 @@ class PULSAR_PUBLIC Message {
      */
     KeyValue getKeyValueData() const;
 
+    /**
+     * Release and return the message's payload.
+     *
+     * @return the released payload.
+     */
+    std::string releaseData();
+
+    /**
+     * Move payload into the given string.
+     *
+     * @param data the string that receiving the payload's ownership.
+     */
+    void moveDataIntoString(std::string& data);

Review Comment:
   Done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar-client-cpp] tongsucn commented on a diff in pull request #66: Add new interfaces to improve data consuming performance. (#65)

Posted by GitBox <gi...@apache.org>.
tongsucn commented on code in PR #66:
URL: https://github.com/apache/pulsar-client-cpp/pull/66#discussion_r1007568657


##########
lib/SharedBuffer.h:
##########
@@ -203,6 +203,13 @@ class SharedBuffer {
         writeIdx_ = 0;
     }
 
+    bool pop(std::string& target) {
+        if (data_ == nullptr) return false;

Review Comment:
   Good point.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar-client-cpp] shibd commented on a diff in pull request #66: Add new interfaces to improve data consuming performance. (#65)

Posted by GitBox <gi...@apache.org>.
shibd commented on code in PR #66:
URL: https://github.com/apache/pulsar-client-cpp/pull/66#discussion_r1009084692


##########
include/pulsar/Message.h:
##########
@@ -100,6 +100,20 @@ class PULSAR_PUBLIC Message {
      */
     KeyValue getKeyValueData() const;
 
+    /**
+     * Release and return the message's payload.
+     *
+     * @return the released payload.
+     */
+    std::string releaseData();
+
+    /**
+     * Move payload into the given string.
+     *
+     * @param data the string that receiving the payload's ownership.
+     */
+    void moveDataIntoString(std::string& data);

Review Comment:
   useless interface.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar-client-cpp] tongsucn commented on pull request #66: Add new interfaces to improve data consuming performance. (#65)

Posted by GitBox <gi...@apache.org>.
tongsucn commented on PR #66:
URL: https://github.com/apache/pulsar-client-cpp/pull/66#issuecomment-1294809195

   > Please add some unit tests.
   
   Done


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar-client-cpp] shibd commented on pull request #66: Add new interfaces to improve data consuming performance. (#65)

Posted by GitBox <gi...@apache.org>.
shibd commented on PR #66:
URL: https://github.com/apache/pulsar-client-cpp/pull/66#issuecomment-1294715295

   @tongsucn Please rebase the main branch.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar-client-cpp] tongsucn commented on pull request #66: Add new interfaces to improve data consuming performance. (#65)

Posted by GitBox <gi...@apache.org>.
tongsucn commented on PR #66:
URL: https://github.com/apache/pulsar-client-cpp/pull/66#issuecomment-1298127898

   @BewareMyPower @shibd @RobertIndie I found a problem on the new interface:
   
   When batching is enabled on producer side, `Consumer::receive` actually creates a shallow copy of the whole message batch and sets reading offset (i.e. `SharedBuffer::readIdx_`) via `SharedBuffer::slice` in `Commands::deSerializeSingleMessageInBatch`.
   
   It means that the messages from a batch cannot be **_moved_** into another `std::string` without exposing the reading offset. Therefore I think the current implementation of `Message::releaseData` or `Message::releaseData` itself is not a good design. If you have any better ideas, please tell me, or just close the issue and PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar-client-cpp] RobertIndie commented on a diff in pull request #66: Add new interfaces to improve data consuming performance. (#65)

Posted by GitBox <gi...@apache.org>.
RobertIndie commented on code in PR #66:
URL: https://github.com/apache/pulsar-client-cpp/pull/66#discussion_r1010110403


##########
tests/MessageTest.cc:
##########
@@ -53,6 +53,8 @@ TEST(MessageTest, testAllocatedContents) {
     Message msg = msgBuilder.build();
     ASSERT_FALSE(strncmp("content", (char*)msg.getData(), msg.getLength()));
     ASSERT_EQ(content, (char*)msg.getData());
+
+    ASSERT_THROW(msg.releaseData(), std::runtime_error);

Review Comment:
   Please add a test case for releasing the data successfully and verify the data is actually swapped.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar-client-cpp] tongsucn commented on a diff in pull request #66: Add new interfaces to improve data consuming performance. (#65)

Posted by GitBox <gi...@apache.org>.
tongsucn commented on code in PR #66:
URL: https://github.com/apache/pulsar-client-cpp/pull/66#discussion_r1007566758


##########
lib/SharedBuffer.h:
##########
@@ -203,6 +203,13 @@ class SharedBuffer {
         writeIdx_ = 0;
     }
 
+    bool pop(std::string& target) {
+        if (data_ == nullptr) return false;

Review Comment:
   Good point. Let me add a flag in `SharedBuffer` to denote if it's a read-only one.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar-client-cpp] tongsucn closed pull request #66: Add new interfaces to improve data consuming performance. (#65)

Posted by GitBox <gi...@apache.org>.
tongsucn closed pull request #66: Add new interfaces to improve data consuming performance. (#65)
URL: https://github.com/apache/pulsar-client-cpp/pull/66


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar-client-cpp] tongsucn commented on pull request #66: Add new interfaces to improve data consuming performance. (#65)

Posted by GitBox <gi...@apache.org>.
tongsucn commented on PR #66:
URL: https://github.com/apache/pulsar-client-cpp/pull/66#issuecomment-1319686471

   > I just found the `batchReceive` method added in #21. Maybe if the producer enables batching, the consumer must combine `batchReceive` and the `releaseData` APIs. In this case, I prefer not throwing an exception when `data_` is `nullptr`. Instead, fallback to copying the data rather than moving.
   
   Well, I think the result of `batchReceive` may contain data from multiple message batches, which make the situation more complicated. Falling back without telling caller may confuse the callers.
   
     Under current design, `pulsar::Message` is actually an *agent* of the message data rather than a *container*. An *agent* should NOT give the ownership of the data to others, because it actually **DOES NOT OWN** the data. Therefore, I think interfaces like `releaseData` should be avoided.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org