You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pulsar.apache.org by GitBox <gi...@apache.org> on 2022/04/28 13:31:16 UTC

[GitHub] [pulsar] AlvaroStream opened a new pull request, #15368: [improve][doc] Deduplication can have effects in performance and is not activated at client level

AlvaroStream opened a new pull request, #15368:
URL: https://github.com/apache/pulsar/pull/15368

   The deduplication can't be activated at the client level. 
   
   Clarification of where can be activated and explanation about the change in Send Timeout.
   
   Lesson learned about possible effects in performance over the broker when it is activated.
   
   ### Motivation
   
   We faced a performance issue related to the creation of topics in a broker and I missed this information in the documentation. 
   
   As I reviewed the documentation I saw there was not a correct explanation about where can the functionality be enable neither why the timeout has to be modified to 0.
   
   I hope this helps other use this functionality.
   
   ### Modifications
   
   - Added the possible performance effect
   - Improve explanation of the level where it can be activated
   - Explain the change to timeout in the client.
   
   ### Verifying this change
   
   - [] Make sure that the change passes the CI checks.
   
   This change is a trivial rework / code cleanup without any test coverage.
   
   ### Does this pull request potentially affect one of the following parts:
   
   *If `yes` was chosen, please highlight the changes*
   
     - Dependencies (does it add or upgrade a dependency): (no)
     - The public API: (no)
     - The schema: (no)
     - The default values of configurations: (no)
     - The wire protocol: (no)
     - The rest endpoints: (no)
     - The admin cli options: (no)
     - Anything that affects deployment: (no)
   
   ### Documentation
   
   Check the box below or label this PR directly.
   
   Need to update docs? 
   
   - [ ] `doc-required` 
   - [ ] `no-need-doc`   
   - [X] `doc` 
   - [ ] `doc-added`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] Anonymitaet merged pull request #15368: [improve][doc] Deduplication can have effects in performance and is not activated at client level

Posted by GitBox <gi...@apache.org>.
Anonymitaet merged PR #15368:
URL: https://github.com/apache/pulsar/pull/15368


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] Anonymitaet commented on pull request #15368: [improve][doc] Deduplication can have effects in performance and is not activated at client level

Posted by GitBox <gi...@apache.org>.
Anonymitaet commented on PR #15368:
URL: https://github.com/apache/pulsar/pull/15368#issuecomment-1112815634

   @Demogorgon314 can you help review this PR from the technical perspective? Thanks! 
   After that, we can review it from a technical writing perspective. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] AlvaroStream commented on a diff in pull request #15368: [improve][doc] Deduplication can have effects in performance and is not activated at client level

Posted by GitBox <gi...@apache.org>.
AlvaroStream commented on code in PR #15368:
URL: https://github.com/apache/pulsar/pull/15368#discussion_r861474012


##########
site2/docs/cookbooks-deduplication.md:
##########
@@ -60,7 +62,9 @@ $ bin/pulsar-admin namespaces set-deduplication \
 
 ## Pulsar clients
 
-If you enable message deduplication in Pulsar brokers, you need complete the following tasks for your client producers:
+If you enable message deduplication in Pulsar brokers, namespaces or topics, it is recommended to make the client retry infinitely the messages until it succeed, otherwise is possible to break the ordering guarantee as some request can time out and the application wont know if the request was successfully added to the topic or not. 

Review Comment:
   Thanks for the link. I will review it next. 
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] github-actions[bot] commented on pull request #15368: [improve][doc] Deduplication can have effects in performance and is not activated at client level

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #15368:
URL: https://github.com/apache/pulsar/pull/15368#issuecomment-1182689168

   The pr had no activity for 30 days, mark with Stale label.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] Anonymitaet commented on a diff in pull request #15368: [improve][doc] Deduplication can have effects in performance and is not activated at client level

Posted by GitBox <gi...@apache.org>.
Anonymitaet commented on code in PR #15368:
URL: https://github.com/apache/pulsar/pull/15368#discussion_r865867790


##########
site2/docs/cookbooks-deduplication.md:
##########
@@ -70,7 +74,7 @@ The instructions for Java, Python, and C++ clients are different.
 <!--DOCUSAURUS_CODE_TABS-->
 <!--Java clients-->
 
-To enable message deduplication on a [Java producer](client-libraries-java.md#producers), set the producer name using the `producerName` setter, and set the timeout to `0` using the `sendTimeout` setter. 
+Not to break the guarantee order on a [Java producer](client-libraries-java.md#producers) sending to a topic with message deduplication active, set the producer name using the `producerName` setter, and set the timeout to `0` using the `sendTimeout` setter. 

Review Comment:
   Any feedback on this? @AlvaroStream 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] Anonymitaet commented on pull request #15368: [improve][doc] Deduplication can have effects in performance and is not activated at client level

Posted by GitBox <gi...@apache.org>.
Anonymitaet commented on PR #15368:
URL: https://github.com/apache/pulsar/pull/15368#issuecomment-1232694801

   @AlvaroStream can you pls resolve conflicts?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] momo-jun commented on a diff in pull request #15368: [improve][doc] Deduplication can have effects in performance and is not activated at client level

Posted by GitBox <gi...@apache.org>.
momo-jun commented on code in PR #15368:
URL: https://github.com/apache/pulsar/pull/15368#discussion_r925208920


##########
site2/docs/cookbooks-deduplication.md:
##########
@@ -70,7 +74,7 @@ The instructions for Java, Python, and C++ clients are different.
 <!--DOCUSAURUS_CODE_TABS-->
 <!--Java clients-->
 
-To enable message deduplication on a [Java producer](client-libraries-java.md#producers), set the producer name using the `producerName` setter, and set the timeout to `0` using the `sendTimeout` setter. 
+Not to break the guarantee order on a [Java producer](client-libraries-java.md#producers) sending to a topic with message deduplication active, set the producer name using the `producerName` setter, and set the timeout to `0` using the `sendTimeout` setter. 

Review Comment:
   ```suggestion
   To ensure the guarantee order on a [Java producer](client-libraries-java.md#producers) sending to a topic with message deduplication enabled, set the producer name using the `producerName` setter, and set the timeout to `0` using the `sendTimeout` setter. 
   ```
   
   @AlvaroStream did you want to say something like this?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] AlvaroStream commented on pull request #15368: [improve][doc] Deduplication can have effects in performance and is not activated at client level

Posted by GitBox <gi...@apache.org>.
AlvaroStream commented on PR #15368:
URL: https://github.com/apache/pulsar/pull/15368#issuecomment-1143643586

   @liudezhi2098 or @Anonymitaet  can you resolve the conflicts?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] Anonymitaet commented on a diff in pull request #15368: [improve][doc] Deduplication can have effects in performance and is not activated at client level

Posted by GitBox <gi...@apache.org>.
Anonymitaet commented on code in PR #15368:
URL: https://github.com/apache/pulsar/pull/15368#discussion_r861427010


##########
site2/docs/cookbooks-deduplication.md:
##########
@@ -4,13 +4,15 @@ title: Message deduplication
 sidebar_label: Message deduplication 
 ---
 
-When **Message deduplication** is enabled, it ensures that each message produced on Pulsar topics is persisted to disk *only once*, even if the message is produced more than once. Message deduplication is handled automatically on the server side. 
+When **Message deduplication** is enabled, it ensures that each message produced on Pulsar topics is persisted to disk *only once*, even if the message is produced more than once. Message deduplication is handled automatically on the broker side. 
 
-To use message deduplication in Pulsar, you need to configure your Pulsar brokers and clients.
+Message deduplication could affect performance in the brokers during informational snapshot.

Review Comment:
   ```suggestion
   Message deduplication could affect the performance of the brokers during informational snapshots.
   ```



##########
site2/docs/cookbooks-deduplication.md:
##########
@@ -4,13 +4,15 @@ title: Message deduplication
 sidebar_label: Message deduplication 
 ---
 
-When **Message deduplication** is enabled, it ensures that each message produced on Pulsar topics is persisted to disk *only once*, even if the message is produced more than once. Message deduplication is handled automatically on the server side. 
+When **Message deduplication** is enabled, it ensures that each message produced on Pulsar topics is persisted to disk *only once*, even if the message is produced more than once. Message deduplication is handled automatically on the broker side. 
 
-To use message deduplication in Pulsar, you need to configure your Pulsar brokers and clients.
+Message deduplication could affect performance in the brokers during informational snapshot.
+
+To use message deduplication in Pulsar, you need to configure your Pulsar brokers, namespaces or topics and it is recommended to modify configuration in the clients, setting send timeout to infinity.

Review Comment:
   ```suggestion
   To use message deduplication in Pulsar, you need to configure your Pulsar brokers, namespaces, or topics. It is recommended to modify the configuration in the clients, for example, setting send timeout to infinity.
   ```
   



##########
site2/docs/cookbooks-deduplication.md:
##########
@@ -70,7 +74,7 @@ The instructions for Java, Python, and C++ clients are different.
 <!--DOCUSAURUS_CODE_TABS-->
 <!--Java clients-->
 
-To enable message deduplication on a [Java producer](client-libraries-java.md#producers), set the producer name using the `producerName` setter, and set the timeout to `0` using the `sendTimeout` setter. 
+Not to break the guarantee order on a [Java producer](client-libraries-java.md#producers) sending to a topic with message deduplication active, set the producer name using the `producerName` setter, and set the timeout to `0` using the `sendTimeout` setter. 

Review Comment:
   what is the meaning of "Not to break the guarantee order on a Java producer sending to a topic with message deduplication active"? 
   
   what are the logical relationships between the 3 parts of this whole sentence?
   
   would u mind re-writing this sentence? Thanks



##########
site2/docs/cookbooks-deduplication.md:
##########
@@ -4,13 +4,15 @@ title: Message deduplication
 sidebar_label: Message deduplication 
 ---
 
-When **Message deduplication** is enabled, it ensures that each message produced on Pulsar topics is persisted to disk *only once*, even if the message is produced more than once. Message deduplication is handled automatically on the server side. 
+When **Message deduplication** is enabled, it ensures that each message produced on Pulsar topics is persisted to disk *only once*, even if the message is produced more than once. Message deduplication is handled automatically on the broker side. 
 
-To use message deduplication in Pulsar, you need to configure your Pulsar brokers and clients.
+Message deduplication could affect performance in the brokers during informational snapshot.
+
+To use message deduplication in Pulsar, you need to configure your Pulsar brokers, namespaces or topics and it is recommended to modify configuration in the clients, setting send timeout to infinity.
 
 ## How it works
 
-You can enable or disable message deduplication at the namespace level or the topic level. By default, it is disabled on all namespaces or topics. You can enable it in the following ways:
+You can enable or disable message deduplication at broker, namespace or topic level. By default, it is disabled on all brokers, namespaces or topics. You can enable it in the following ways:

Review Comment:
   ```suggestion
   You can enable or disable message deduplication at broker, namespace, or topic level. By default, it is disabled on all brokers, namespaces, or topics. You can enable it in the following ways:
   ```



##########
site2/docs/cookbooks-deduplication.md:
##########
@@ -60,7 +62,9 @@ $ bin/pulsar-admin namespaces set-deduplication \
 
 ## Pulsar clients
 
-If you enable message deduplication in Pulsar brokers, you need complete the following tasks for your client producers:
+If you enable message deduplication in Pulsar brokers, namespaces or topics, it is recommended to make the client retry infinitely the messages until it succeed, otherwise is possible to break the ordering guarantee as some request can time out and the application wont know if the request was successfully added to the topic or not. 

Review Comment:
   ```suggestion
   If you enable message deduplication in Pulsar brokers, namespaces, or topics, it is recommended to make the client retry infinitely the messages until it succeeds, otherwise it is possible to break the ordering guarantee as some requests may time out and the application does not know whether the request is successfully added to the topic or not. 
   ```
   
   Write in the simple present tense as much as possible if you are covering facts that were, are, and forever shall be true.
   https://docs.google.com/document/d/1lc5j4RtuLIzlEYCBo97AC8-U_3Erzs_lxpkDuseU0n4/edit#bookmark=id.e8uqh1awkcnp



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org