You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@flink.apache.org by ti...@apache.org on 2023/01/15 02:55:17 UTC

[flink-connector-pulsar] branch main updated: [FLINK-30681][Connector/Pulsar] Add a warning hint on the delivery guarantee document. (#18)

This is an automated email from the ASF dual-hosted git repository.

tison pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/flink-connector-pulsar.git


The following commit(s) were added to refs/heads/main by this push:
     new 5ce2373  [FLINK-30681][Connector/Pulsar] Add a warning hint on the delivery guarantee document. (#18)
5ce2373 is described below

commit 5ce2373acbb176551f926743d9410f0f7ef4935d
Author: Yufan Sheng <yu...@streamnative.io>
AuthorDate: Sun Jan 15 10:55:12 2023 +0800

    [FLINK-30681][Connector/Pulsar] Add a warning hint on the delivery guarantee document. (#18)
---
 docs/content.zh/docs/connectors/datastream/pulsar.md |  6 ++++++
 docs/content/docs/connectors/datastream/pulsar.md    | 10 +++++++++-
 2 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/docs/content.zh/docs/connectors/datastream/pulsar.md b/docs/content.zh/docs/connectors/datastream/pulsar.md
index d7b7fcf..ad10d39 100644
--- a/docs/content.zh/docs/connectors/datastream/pulsar.md
+++ b/docs/content.zh/docs/connectors/datastream/pulsar.md
@@ -763,6 +763,12 @@ public interface TopicRouter<IN> extends Serializable {
 - `AT_LEAST_ONCE`:每条消息**至少有**一条对应消息发送至 Pulsar,发送至 Pulsar 的消息可能会因为 Flink 应用重启而出现重复。
 - `EXACTLY_ONCE`:每条消息**有且仅有**一条对应消息发送至 Pulsar。发送至 Pulsar 的消息不会有重复也不会丢失。Pulsar Sink 内部依赖 [Pulsar 事务](https://pulsar.apache.org/docs/2.10.x/transactions/)和两阶段提交协议来保证每条记录都能正确发往 Pulsar。
 
+{{< hint warning >}}
+如果想要使用 `EXACTLY_ONCE`,需要用户确保在 Flink 程序上启用 checkpoint,同时在 Pulsar 上启用事务。在此模式下,Pulsar sink 会将消息写入到某个未提交的事务下,并在成功执行完 checkpoint 后提交对应的事务。
+
+基于 Pulsar 的设计,任何在开启的事务之后写入的消息是无法被消费到的。只有这个事务提交了,对应的消息才能被消费到。
+{{< /hint >}}
+
 ### 消息延时发送
 
 [消息延时发送](https://pulsar.apache.org/docs/2.10.x/concepts-messaging/#delayed-message-delivery)特性可以让指定发送的每一条消息需要延时一段时间后才能被下游的消费者所消费。当延时消息发送特性启用时,Pulsar Sink 会**立刻**将消息发送至 Pulsar Broker。但该消息在指定的延迟时间到达前将会保持对下游消费者不可见。
diff --git a/docs/content/docs/connectors/datastream/pulsar.md b/docs/content/docs/connectors/datastream/pulsar.md
index e71b762..1d0f851 100644
--- a/docs/content/docs/connectors/datastream/pulsar.md
+++ b/docs/content/docs/connectors/datastream/pulsar.md
@@ -881,7 +881,15 @@ For details, see [partitioned topics](https://pulsar.apache.org/docs/2.10.x/cook
 - `AT_LEAST_ONCE`: No data loss happens, but data duplication can happen after a restart from checkpoint.
 - `EXACTLY_ONCE`: No data loss happens. Each record is sent to the Pulsar broker only once.
   Pulsar Sink uses [Pulsar transaction](https://pulsar.apache.org/docs/2.10.x/transactions/)
-  and two-phase commit (2PC) to ensure records are sent only once even after pipeline restarts.
+  and two-phase commit (2PC) to ensure records are sent only once even after the pipeline restarts.
+
+{{< hint warning >}}
+If you want to use `EXACTLY_ONCE`, make sure you have enabled the checkpoint on Flink and enabled the transaction on Pulsar.
+The Pulsar sink will write all the messages in a pending transaction and commit it after the successfully checkpointing.
+
+The messages written to Pulsar after a pending transaction won't be obtained based on the design of the Pulsar.
+You can acquire these messages only when the corresponding transaction is committed.
+{{< /hint >}}
 
 ### Delayed message delivery