You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by "David Jacot (Jira)" <ji...@apache.org> on 2021/10/21 06:42:00 UTC
[jira] [Commented] (KAFKA-13388) Kafka Producer has no timeout for
nodes stuck in CHECKING_API_VERSIONS
[ https://issues.apache.org/jira/browse/KAFKA-13388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17432238#comment-17432238 ]
David Jacot commented on KAFKA-13388:
-------------------------------------
[~dhofftgt] Thanks for filing this issue. Which client version do you use? I just checked the code in trunk and it seems that the API_VERSIONS request should timeouts based on the `request.timeout.ms` (30s by default) like any other requests. To verify, you could turn on debug logs and you should see `Disconnecting from node {} due to request timeout.`.
> Kafka Producer has no timeout for nodes stuck in CHECKING_API_VERSIONS
> ----------------------------------------------------------------------
>
> Key: KAFKA-13388
> URL: https://issues.apache.org/jira/browse/KAFKA-13388
> Project: Kafka
> Issue Type: Bug
> Components: core
> Reporter: David Hoffman
> Priority: Major
>
> I have been seeing expired batch errors in my app.
> {code:java}
> org.apache.kafka.common.errors.TimeoutException: Expiring 51 record(s) for xxx-17:120002 ms has passed since batch creation
> {code}
> I would have assumed a request timout or connection timeout should have also been logged. I could not find any other associated errors.
> I added some instrumenting to my app and have traced this down to broker connections hanging in CHECKING_API_VERSIONS state. It appears there is no effective timeout for Kafka Producer broker connections in CHECKING_API_VERSIONS state.
> In the code see the after the NetworkClient connects to a broker node it makes a request to check api versions, when it receives the response it marks the node as ready. I am seeing that sometimes a reply is not received for the check api versions request the connection just hangs in CHECKING_API_VERSIONS state until it is disposed I assume after the idle connection timeout.
> I am guessing the connection setup timeout should be still in play for this, but it is not.
> There is a connectingNodes set that is consulted when checking timeouts and the node is removed
> when ClusterConnectionStates.checkingApiVersions(String id) is called to transition the node into CHECKING_API_VERSIONS
--
This message was sent by Atlassian Jira
(v8.3.4#803005)