You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by "Christian Becker (Jira)" <ji...@apache.org> on 2020/07/02 16:52:00 UTC

[jira] [Created] (KAFKA-10228) producer: NETWORK_EXCEPTION is thrown instead of a request timeout

Christian Becker created KAFKA-10228:
----------------------------------------

             Summary: producer: NETWORK_EXCEPTION is thrown instead of a request timeout
                 Key: KAFKA-10228
                 URL: https://issues.apache.org/jira/browse/KAFKA-10228
             Project: Kafka
          Issue Type: Improvement
          Components: clients
    Affects Versions: 2.3.1
            Reporter: Christian Becker


We're currently seeing an issue with the java client (producer), when message producing runs into a timeout. Namely a NETWORK_EXCEPTION is thrown instead of a timeout exception.

*Situation and relevant code:*

Config
{code:java}
request.timeout.ms: 200
retries: 3
acks: all{code}
{code:java}
for (UnpublishedEvent event : unpublishedEvents) {
    ListenableFuture<SendResult<String, String>> future;
    future = kafkaTemplate.send(new ProducerRecord<>(event.getTopic(), event.getKafkaKey(), event.getPayload()));
    futures.add(future.completable());
}

CompletableFuture.allOf(futures.stream().toArray(CompletableFuture[]::new)).join();{code}
We're using the KafkaTemplate from SpringBoot here, but it shouldn't matter, as it's merely a wrapper. There we put in batches of messages to be sent.

200ms later, we can see the following in the logs:
{code:java}
[Producer clientId=producer-1] Received invalid metadata error in produce request on partition events-6 due to org.apache.kafka.common.errors.NetworkException: The server disconnected before a response was received.. Going to request metadata update now
[Producer clientId=producer-1] Got error produce response with correlation id 3094 on topic-partition events-6, retrying (2 attempts left). Error: NETWORK_EXCEPTION {code}
This was somewhat unexpected and sent us for a hunt across the infrastructure for possible connection issues, but we've found none.

Side note: In some cases the retries worked and the messages were successfully produced.

Only many hours of heavy debugging, we've noticed, that the error might be related to the low timeout setting. We've removed that setting now, as it was a remnant from the past and no longer valid for our use-case. However in order to avoid other people having that issue again and to simplify future debugging, some form of timeout exception should be thrown.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)