You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@flink.apache.org by fp...@apache.org on 2022/02/24 16:08:54 UTC

[flink] branch release-1.14 updated: [FLINK-26159][doc] add description for MAX_FETCH_RECORD related question

This is an automated email from the ASF dual-hosted git repository.

fpaul pushed a commit to branch release-1.14
in repository https://gitbox.apache.org/repos/asf/flink.git


The following commit(s) were added to refs/heads/release-1.14 by this push:
     new a5dc36d  [FLINK-26159][doc] add description for MAX_FETCH_RECORD related question
a5dc36d is described below

commit a5dc36dd323ec064c8a3d2e2f64bca7f7e5fbd1e
Author: Yufei Zhang <af...@gmail.com>
AuthorDate: Tue Feb 15 21:49:44 2022 +0800

    [FLINK-26159][doc] add description for MAX_FETCH_RECORD related question
    
    Update docs/content/docs/connectors/datastream/pulsar.md
    
    Co-authored-by: MartijnVisser <ma...@2symbols.com>
    
    Update docs/content/docs/connectors/datastream/pulsar.md
    
    Co-authored-by: MartijnVisser <ma...@2symbols.com>
    
    add Chinese documentation
---
 docs/content.zh/docs/connectors/datastream/pubsub.md | 17 +++++++++++++++++
 docs/content/docs/connectors/datastream/pulsar.md    |  9 +++++++++
 2 files changed, 26 insertions(+)

diff --git a/docs/content.zh/docs/connectors/datastream/pubsub.md b/docs/content.zh/docs/connectors/datastream/pubsub.md
index f766374..837b892 100644
--- a/docs/content.zh/docs/connectors/datastream/pubsub.md
+++ b/docs/content.zh/docs/connectors/datastream/pubsub.md
@@ -150,4 +150,21 @@ env.addSource(pubsubSource)
 
 Sink function 会把准备发到 PubSub 的信息短暂地缓存以提高性能。每次 checkpoint 前,它会刷新缓冲区,并且只有当所有信息成功发送到 PubSub 之后,checkpoint 才会成功完成。
 
+## 常见问题
+
+由于Pulsar Connector大量依赖
+[PulsarClient](https://pulsar.apache.org/docs/en/client-libraries-java/) 和
+[PulsarAdmin](https://pulsar.apache.org/docs/en/admin-api-overview/) 来实现相关功能,有时Flink Job出现
+可能是由于Pulsar broker版本过低或者配置不合适导致的。调试配置或升级pulsar可能会解决部分问题。
+
+
+### 当数据量很小时,source读数据出现大约10s延迟
+
+当Pulsar Source从一个数据量很小的topic读取数据时,用户可能会观测到消息间出现10秒钟的间隔。这是由于Pulsar Source
+会将从broker读到的消息暂存至等待队列中,只有当队列中消息数量达到了 `PulsarSourceOptions.PULSAR_MAX_FETCH_RECORDS`
+才会向下游算子发送数据。如果数据量一直很小,那么就会在等待直到等待时长超过`PULSAR_MAX_FETCH_TIME`规定的值 (默认10秒钟)时
+也向下游算子发送数据。
+
+为了避免这种情况,用户需要改变两个配置项的值。
+
 {{< top >}}
diff --git a/docs/content/docs/connectors/datastream/pulsar.md b/docs/content/docs/connectors/datastream/pulsar.md
index 680fb1a..6e89174 100644
--- a/docs/content/docs/connectors/datastream/pulsar.md
+++ b/docs/content/docs/connectors/datastream/pulsar.md
@@ -417,4 +417,13 @@ If you have a problem with Pulsar when using Flink, keep in mind that Flink only
 and your problem might be independent of Flink and sometimes can be solved by upgrading Pulsar brokers,
 reconfiguring Pulsar brokers or reconfiguring Pulsar connector in Flink.
 
+### Messages can be delayed on low volume topics
+
+When the Pulsar source connector reads from a low volume topic, users might observe a 10 seconds delay between messages. Pulsar buffers messages from topics by default. Before emitting to downstream
+operators, the number of buffered records must be equal or larger than `PulsarSourceOptions.PULSAR_MAX_FETCH_RECORDS`. If the data volume is low, it could be that filling up the number of buffered records takes longer than `PULSAR_MAX_FETCH_TIME` (default to 10 seconds). If that's the case, it means that only after this time has passed the messages will be emitted. 
+
+To avoid this behaviour, you need to change either the buffered records or the waiting time. 
+
+
+
 {{< top >}}