You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by "David Jacot (Jira)" <ji...@apache.org> on 2021/12/10 09:59:00 UTC

[jira] [Commented] (KAFKA-13388) Kafka Producer nodes stuck in CHECKING_API_VERSIONS

    [ https://issues.apache.org/jira/browse/KAFKA-13388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17457016#comment-17457016 ] 

David Jacot commented on KAFKA-13388:
-------------------------------------

I had a look at the code as well. [~david.mao] is right. My understanding is that we actually transition to the CHECKING_API_VERSIONS state but we only send out the ApiVersionsRequest if the channel is ready. If it never does, we don't have any timeout apply to it because the request timeout only kicks in when the request is sent out.

> Kafka Producer nodes stuck in CHECKING_API_VERSIONS
> ---------------------------------------------------
>
>                 Key: KAFKA-13388
>                 URL: https://issues.apache.org/jira/browse/KAFKA-13388
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>            Reporter: David Hoffman
>            Priority: Minor
>         Attachments: Screen Shot 2021-10-25 at 10.28.48 AM.png, image-2021-10-21-13-42-06-528.png
>
>
> I have been seeing expired batch errors in my app.
> {code:java}
> org.apache.kafka.common.errors.TimeoutException: Expiring 51 record(s) for xxx-17:120002 ms has passed since batch creation
> {code}
>  I would have assumed a request timout or connection timeout should have also been logged. I could not find any other associated errors. 
> I added some instrumenting to my app and have traced this down to broker connections hanging in CHECKING_API_VERSIONS state. -It appears there is no effective timeout for Kafka Producer broker connections in CHECKING_API_VERSIONS state.-
> In the code see the after the NetworkClient connects to a broker node it makes a request to check api versions, when it receives the response it marks the node as ready. -I am seeing that sometimes a reply is not received for the check api versions request the connection just hangs in CHECKING_API_VERSIONS state until it is disposed I assume after the idle connection timeout.-
> Update: not actually sure what causes the connection to get stuck in CHECKING_API_VERSIONS.
> -I am guessing the connection setup timeout should be still in play for this, but it is not.- 
>  -There is a connectingNodes set that is consulted when checking timeouts and the node is removed- 
>  -when ClusterConnectionStates.checkingApiVersions(String id) is called to transition the node into CHECKING_API_VERSIONS-



--
This message was sent by Atlassian Jira
(v8.20.1#820001)