You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pulsar.apache.org by ur...@apache.org on 2022/09/01 12:00:59 UTC

[pulsar-site] branch main updated: Docs sync done from apache/pulsar(#9529850)

This is an automated email from the ASF dual-hosted git repository.

urfree pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/pulsar-site.git


The following commit(s) were added to refs/heads/main by this push:
     new 985536614ca Docs sync done from apache/pulsar(#9529850)
985536614ca is described below

commit 985536614cab66c1a4c44604ef3609107bfb5067
Author: Pulsar Site Updater <de...@pulsar.apache.org>
AuthorDate: Thu Sep 1 12:00:53 2022 +0000

    Docs sync done from apache/pulsar(#9529850)
---
 site2/website-next/docs/cookbooks-deduplication.md | 25 +++++++++++++---------
 site2/website-next/docs/io-elasticsearch-sink.md   |  1 +
 2 files changed, 16 insertions(+), 10 deletions(-)

diff --git a/site2/website-next/docs/cookbooks-deduplication.md b/site2/website-next/docs/cookbooks-deduplication.md
index 702679641d7..de607c9ee14 100644
--- a/site2/website-next/docs/cookbooks-deduplication.md
+++ b/site2/website-next/docs/cookbooks-deduplication.md
@@ -4,6 +4,7 @@ title: Message deduplication
 sidebar_label: "Message deduplication "
 ---
 
+
 ````mdx-code-block
 import Tabs from '@theme/Tabs';
 import TabItem from '@theme/TabItem';
@@ -12,11 +13,13 @@ import TabItem from '@theme/TabItem';
 
 When **Message deduplication** is enabled, it ensures that each message produced on Pulsar topics is persisted to disk *only once*, even if the message is produced more than once. Message deduplication is handled automatically on the server side. 
 
-To use message deduplication in Pulsar, you need to configure your Pulsar brokers and clients.
+Message deduplication could affect the performance of the brokers during informational snapshots.
+
+To use message deduplication in Pulsar, you need to configure your Pulsar brokers, namespaces, or topics. It is recommended to modify the configuration in the clients, for example, setting send timeout to infinity.
 
 ## How it works
 
-You can enable or disable message deduplication at the namespace level or the topic level. By default, it is disabled on all namespaces or topics. You can enable it in the following ways:
+You can enable or disable message deduplication at broker, namespace, or topic level. By default, it is disabled on all brokers, namespaces, or topics. You can enable it in the following ways:
 
 * Enable deduplication for all namespaces/topics at the broker-level.
 * Enable deduplication for a specific namespace with the `pulsar-admin namespaces` interface.
@@ -40,7 +43,7 @@ By default, message deduplication is *disabled* on all Pulsar namespaces/topics.
 
 Even if you set the value for `brokerDeduplicationEnabled`, enabling or disabling via Pulsar admin CLI overrides the default settings at the broker-level.
 
-### Enable message deduplication
+### Enable message deduplication at namespace or topic level
 
 Though message deduplication is disabled by default at the broker level, you can enable message deduplication for a specific namespace or topic using the [`pulsar-admin namespaces set-deduplication`](/tools/pulsar-admin/) or the [`pulsar-admin topics set-deduplication`](/tools/pulsar-admin/) command. You can use the `--enable`/`-e` flag and specify the namespace/topic. 
 
@@ -54,7 +57,7 @@ $ bin/pulsar-admin namespaces set-deduplication \
 
 ```
 
-### Disable message deduplication
+### Disable message deduplication at namespace or topic level
 
 Even if you enable message deduplication at the broker level, you can disable message deduplication for a specific namespace or topic using the [`pulsar-admin namespace set-deduplication`](/tools/pulsar-admin/) or the [`pulsar-admin topics set-deduplication`](/tools/pulsar-admin/) command. Use the `--disable`/`-d` flag and specify the namespace/topic.
 
@@ -70,7 +73,9 @@ $ bin/pulsar-admin namespaces set-deduplication \
 
 ## Pulsar clients
 
-If you enable message deduplication in Pulsar brokers, you need complete the following tasks for your client producers:
+If you enable message deduplication in Pulsar brokers, namespaces, or topics, it is recommended to make the client retry infinitely the messages until it succeeds, otherwise it is possible to break the ordering guarantee as some requests may time out and the application does not know whether the request is successfully added to the topic or not. 
+
+So you need to complete the following tasks for your client producers:
 
 1. Specify a name for the producer.
 1. Set the message timeout to `0` (namely, no timeout).
@@ -83,7 +88,7 @@ The instructions for Java, Python, and C++ clients are different.
   values={[{"label":"Java clients","value":"Java clients"},{"label":"Python clients","value":"Python clients"},{"label":"C++ clients","value":"C++ clients"}]}>
 <TabItem value="Java clients">
 
-To enable message deduplication on a [Java producer](client-libraries-java#producer), set the producer name using the `producerName` setter, and set the timeout to `0` using the `sendTimeout` setter. 
+To ensure the guarantee order on a [Java producer](client-libraries-java.md#producers) sending to a topic with message deduplication enabled, set the producer name using the `producerName` setter, and set the timeout to `0` using the `sendTimeout` setter. 
 
 ```java
 
@@ -105,7 +110,7 @@ Producer producer = pulsarClient.newProducer()
 </TabItem>
 <TabItem value="Python clients">
 
-To enable message deduplication on a [Python producer](client-libraries-python#producer), set the producer name using `producer_name`, and set the timeout to `0` using `send_timeout_millis`. 
+Not to break the guarantee order on a [Python producer](client-libraries-python.md#producers) sending to a topic with message deduplication active, set the producer name using `producer_name`, and set the timeout to `0` using `send_timeout_millis`. 
 
 ```python
 
@@ -121,8 +126,7 @@ producer = client.create_producer(
 
 </TabItem>
 <TabItem value="C++ clients">
-
-To enable message deduplication on a [C++ producer](client-libraries-cpp/#create-a-producer), set the producer name using `producer_name`, and set the timeout to `0` using `send_timeout_millis`. 
+Not to break the guarantee order on a [C++ producer](client-libraries-cpp.md#producer) sending to a topic with message deduplication active, set the producer name using `producer_name`, and set the timeout to `0` using `send_timeout_millis`. 
 
 ```cpp
 
@@ -147,4 +151,5 @@ Result result = client.createProducer(topic, producerConfig, producer);
 </TabItem>
 
 </Tabs>
-````
\ No newline at end of file
+````
+
diff --git a/site2/website-next/docs/io-elasticsearch-sink.md b/site2/website-next/docs/io-elasticsearch-sink.md
index 88f7fabb9b5..04e195a776b 100644
--- a/site2/website-next/docs/io-elasticsearch-sink.md
+++ b/site2/website-next/docs/io-elasticsearch-sink.md
@@ -89,6 +89,7 @@ The configuration of the Elasticsearch sink connector has the following properti
 | `canonicalKeyFields` | Boolean | false | false | Whether to sort the key fields for JSON and Avro or not. If it is set to `true` and the record key schema is `JSON` or `AVRO`, the serialized object does not consider the order of properties. |
 | `stripNonPrintableCharacters` | Boolean| false | true| Whether to remove all non-printable characters from the document or not. If it is set to true, all non-printable characters are removed from the document. |
 | `idHashingAlgorithm` | enum(NONE,SHA256,SHA512)| false | NONE|Hashing algorithm to use for the document id. This is useful in order to be compliant with the ElasticSearch _id hard limit of 512 bytes. |
+| `copyKeyFields` | Boolean | false | false |If the message key schema is AVRO or JSON, the message key fields are copied into the ElasticSearch document. |
 
 ### Definition of ElasticSearchSslConfig structure: