You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@flink.apache.org by pn...@apache.org on 2018/12/03 12:55:25 UTC

[flink] branch release-1.7 updated: [FLINK-10874][kafka-docs] Document likely cause of UnknownTopicOrPartitionException

This is an automated email from the ASF dual-hosted git repository.

pnowojski pushed a commit to branch release-1.7
in repository https://gitbox.apache.org/repos/asf/flink.git


The following commit(s) were added to refs/heads/release-1.7 by this push:
     new 7b23bf6  [FLINK-10874][kafka-docs] Document likely cause of UnknownTopicOrPartitionException
7b23bf6 is described below

commit 7b23bf69e80aae4d60b1f9f8b5175a25f144930b
Author: Piotr Nowojski <pi...@gmail.com>
AuthorDate: Wed Nov 14 13:57:44 2018 +0100

    [FLINK-10874][kafka-docs] Document likely cause of UnknownTopicOrPartitionException
---
 docs/dev/connectors/kafka.md | 62 ++++++++++++++++++++++++--------------------
 1 file changed, 34 insertions(+), 28 deletions(-)

diff --git a/docs/dev/connectors/kafka.md b/docs/dev/connectors/kafka.md
index 0630c6e..351a4dc 100644
--- a/docs/dev/connectors/kafka.md
+++ b/docs/dev/connectors/kafka.md
@@ -660,19 +660,6 @@ we recommend setting the number of retries to a higher value.
 **Note**: There is currently no transactional producer for Kafka, so Flink can not guarantee exactly-once delivery
 into a Kafka topic.
 
-<div class="alert alert-warning">
-  <strong>Attention:</strong> Depending on your Kafka configuration, even after Kafka acknowledges
-  writes you can still experience data loss. In particular keep in mind the following Kafka settings:
-  <ul>
-    <li><tt>acks</tt></li>
-    <li><tt>log.flush.interval.messages</tt></li>
-    <li><tt>log.flush.interval.ms</tt></li>
-    <li><tt>log.flush.*</tt></li>
-  </ul>
-  Default values for the above options can easily lead to data loss. Please refer to Kafka documentation
-  for more explanation.
-</div>
-
 #### Kafka 0.11 and newer
 
 With Flink's checkpointing enabled, the `FlinkKafkaProducer011` (`FlinkKafkaProducer` for Kafka >= 1.0.0 versions) can provide
@@ -690,21 +677,6 @@ chosen by passing appropriate `semantic` parameter to the `FlinkKafkaProducer011
  or `read_uncommitted` - the latter one is the default value) for any application consuming records
  from Kafka.
 
-<div class="alert alert-warning">
-  <strong>Attention:</strong> Depending on your Kafka configuration, even after Kafka acknowledges
-  writes you can still experience data losses. In particular keep in mind about following properties
-  in Kafka config:
-  <ul>
-    <li><tt>acks</tt></li>
-    <li><tt>log.flush.interval.messages</tt></li>
-    <li><tt>log.flush.interval.ms</tt></li>
-    <li><tt>log.flush.*</tt></li>
-  </ul>
-  Default values for the above options can easily lead to data loss. Please refer to the Kafka documentation
-  for more explanation.
-</div>
-
-
 ##### Caveats
 
 `Semantic.EXACTLY_ONCE` mode relies on the ability to commit transactions
@@ -831,4 +803,38 @@ A mismatch in service name between client and server configuration will cause th
 For more information on Flink configuration for Kerberos security, please see [here]({{ site.baseurl}}/ops/config.html).
 You can also find [here]({{ site.baseurl}}/ops/security-kerberos.html) further details on how Flink internally setups Kerberos-based security.
 
+## Troubleshooting
+
+<div class="alert alert-warning">
+If you have a problem with Kafka when using Flink, keep in mind that Flink only wraps
+<a href="https://kafka.apache.org/documentation/#consumerapi">KafkaConsumer</a> or
+<a href="https://kafka.apache.org/documentation/#producerapi">KafkaProducer</a>
+and your problem might be independent of Flink and sometimes can be solved by upgrading Kafka brokers,
+reconfiguring Kafka brokers or reconfiguring <tt>KafkaConsumer</tt> or <tt>KafkaProducer</tt> in Flink.
+Some examples of common problems are listed below.
+</div>
+
+### Data loss
+
+Depending on your Kafka configuration, even after Kafka acknowledges
+writes you can still experience data loss. In particular keep in mind about the following properties
+in Kafka config:
+
+- `acks`
+- `log.flush.interval.messages`
+- `log.flush.interval.ms`
+- `log.flush.*`
+
+Default values for the above options can easily lead to data loss.
+Please refer to the Kafka documentation for more explanation.
+
+### UnknownTopicOrPartitionException
+
+One possible cause of this error is when a new leader election is taking place,
+for example after or during restarting a Kafka broker.
+This is a retriable exception, so Flink job should be able to restart and resume normal operation.
+It also can be circumvented by changing `retries` property in the producer settings.
+However this might cause reordering of messages,
+which in turn if undesired can be circumvented by setting `max.in.flight.requests.per.connection` to 1.
+
 {% top %}