You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by "Guozhang Wang (Jira)" <ji...@apache.org> on 2020/05/01 17:36:00 UTC

[jira] [Commented] (KAFKA-8803) Stream will not start due to TimeoutException: Timeout expired after 60000milliseconds while awaiting InitProducerId

    [ https://issues.apache.org/jira/browse/KAFKA-8803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17097539#comment-17097539 ] 

Guozhang Wang commented on KAFKA-8803:
--------------------------------------

[~waykarp] It's a bit tricky to reproduce since we've found and fixed at least three different root causes of it, e.g. one of them is related to possible time-shifts on local wall-clocks, which makes it even harder to reproduce in a production environment. Also note that all of the found issues are not actually on the client side, but on the broker side.

So I'd suggest upgrading your server side to the newly released 2.5.0 and see if it helps avoiding this scenario.

> Stream will not start due to TimeoutException: Timeout expired after 60000milliseconds while awaiting InitProducerId
> --------------------------------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-8803
>                 URL: https://issues.apache.org/jira/browse/KAFKA-8803
>             Project: Kafka
>          Issue Type: Bug
>          Components: streams
>            Reporter: Raman Gupta
>            Assignee: Guozhang Wang
>            Priority: Major
>             Fix For: 2.5.0, 2.3.2, 2.4.2
>
>         Attachments: logs-20200311.txt.gz, logs-client-20200311.txt.gz, logs.txt.gz, screenshot-1.png
>
>
> One streams app is consistently failing at startup with the following exception:
> {code}
> 2019-08-14 17:02:29,568 ERROR --- [2ce1b-StreamThread-2] org.apa.kaf.str.pro.int.StreamTask                : task [0_36] Timeout exception caught when initializing transactions for task 0_36. This might happen if the broker is slow to respond, if the network connection to the broker was interrupted, or if similar circumstances arise. You can increase producer parameter `max.block.ms` to increase this timeout.
> org.apache.kafka.common.errors.TimeoutException: Timeout expired after 60000milliseconds while awaiting InitProducerId
> {code}
> These same brokers are used by many other streams without any issue, including some in the very same processes for the stream which consistently throws this exception.
> *UPDATE 08/16:*
> The very first instance of this error is August 13th 2019, 17:03:36.754 and it happened for 4 different streams. For 3 of these streams, the error only happened once, and then the stream recovered. For the 4th stream, the error has continued to happen, and continues to happen now.
> I looked up the broker logs for this time, and see that at August 13th 2019, 16:47:43, two of four brokers started reporting messages like this, for multiple partitions:
> [2019-08-13 20:47:43,658] INFO [ReplicaFetcher replicaId=3, leaderId=1, fetcherId=0] Retrying leaderEpoch request for partition xxx-1 as the leader reported an error: UNKNOWN_LEADER_EPOCH (kafka.server.ReplicaFetcherThread)
> The UNKNOWN_LEADER_EPOCH messages continued for some time, and then stopped, here is a view of the count of these messages over time:
>  !screenshot-1.png! 
> However, as noted, the stream task timeout error continues to happen.
> I use the static consumer group protocol with Kafka 2.3.0 clients and 2.3.0 broker. The broker has a patch for KAFKA-8773.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)