You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@flink.apache.org by fp...@apache.org on 2022/02/01 08:57:25 UTC

[flink] branch release-1.14 updated: [FLINK-25818][Docs][Kafka] Add explanation how Kafka Source deals with idleness when parallelism is higher then the number of partitions

This is an automated email from the ASF dual-hosted git repository.

fpaul pushed a commit to branch release-1.14
in repository https://gitbox.apache.org/repos/asf/flink.git


The following commit(s) were added to refs/heads/release-1.14 by this push:
     new e074805  [FLINK-25818][Docs][Kafka] Add explanation how Kafka Source deals with idleness when parallelism is higher then the number of partitions
e074805 is described below

commit e074805bf64ea40407ee31d310043bbb5572d351
Author: martijnvisser <ma...@2symbols.com>
AuthorDate: Wed Jan 26 10:51:47 2022 +0100

    [FLINK-25818][Docs][Kafka] Add explanation how Kafka Source deals with idleness when parallelism is higher then the number of partitions
---
 docs/content.zh/docs/connectors/datastream/kafka.md | 16 ++++++++++++++++
 docs/content/docs/connectors/datastream/kafka.md    |  9 +++++++++
 2 files changed, 25 insertions(+)

diff --git a/docs/content.zh/docs/connectors/datastream/kafka.md b/docs/content.zh/docs/connectors/datastream/kafka.md
index 5a3b9a5..8f19a6f 100644
--- a/docs/content.zh/docs/connectors/datastream/kafka.md
+++ b/docs/content.zh/docs/connectors/datastream/kafka.md
@@ -357,6 +357,22 @@ Flink Kafka Producer 被称为 `FlinkKafkaProducer`。它允许将消息流写
 {{< tabs "f6c1b77e-6b17-4fd3-837a-c9257e6c7c00" >}}
 {{< tab "Java" >}}
 ```java
+env.fromSource(kafkaSource, new CustomWatermarkStrategy(), "Kafka Source With Custom Watermark Strategy")
+```
+[这篇文档]({{< ref "docs/dev/datastream/event-time/generating_watermarks.md" >}})描述了如何自定义水印策略(```WatermarkStrategy```)。
+
+### 空闲
+如果并行度高于分区数,Kafka Source 不会自动进入空闲状态。您将需要降低并行度或向水印策略添加空闲超时。如果在这段时间内没有记录在流的分区中流动,则该分区被视为“空闲”并且不会阻止下游操作符中水印的进度。
+[这篇文档]({{< ref "docs/dev/datastream/event-time/generating_watermarks.md" >}}#dealing-with-idle-sources) 描述了有关如何定义 ```WatermarkStrategy#withIdleness``` 的详细信息.
+
+### 消费位点提交
+Kafka source 在 checkpoint **完成**时提交当前的消费位点 ,以保证 Flink 的 checkpoint 状态和 Kafka broker 上的提交位点一致。如果未开启 
+checkpoint,Kafka source 依赖于 Kafka consumer 内部的位点定时自动提交逻辑,自动提交功能由 ```enable.auto.commit``` 和 
+```auto.commit.interval.ms``` 两个 Kafka consumer 配置项进行配置。
+
+注意:Kafka source **不依赖**于 broker 上提交的位点来恢复失败的作业。提交位点只是为了上报 Kafka consumer 和消费组的消费进度,以在 broker 端进行监控。
+
+### 监控
 DataStream<String> stream = ...;
 
 Properties properties = new Properties();
diff --git a/docs/content/docs/connectors/datastream/kafka.md b/docs/content/docs/connectors/datastream/kafka.md
index 30e028d..da7ac1c 100644
--- a/docs/content/docs/connectors/datastream/kafka.md
+++ b/docs/content/docs/connectors/datastream/kafka.md
@@ -197,6 +197,15 @@ env.fromSource(kafkaSource, new CustomWatermarkStrategy(), "Kafka Source With Cu
 [This documentation]({{< ref "docs/dev/datastream/event-time/generating_watermarks.md" >}}) describes
 details about how to define a ```WatermarkStrategy```.
 
+### Idleness
+The Kafka Source does not go automatically in an idle state if the parallelism is higher than the
+number of partitions. You will either need to lower the parallelism or add an idle timeout to the 
+watermark strategy. If no records flow in a partition of a stream for that amount of time, then that 
+partition is considered "idle" and will not hold back the progress of watermarks in downstream operators.
+
+[This documentation]({{< ref "docs/dev/datastream/event-time/generating_watermarks.md" >}}#dealing-with-idle-sources) 
+describes details about how to define a ```WatermarkStrategy#withIdleness```.
+
 ### Consumer Offset Committing
 Kafka source commits the current consuming offset when checkpoints are **completed**, for 
 ensuring the consistency between Flink's checkpoint state and committed offsets on Kafka brokers.