You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by "Eleanore Jin (Jira)" <ji...@apache.org> on 2020/08/23 01:28:00 UTC

[jira] [Commented] (KAFKA-8803) Stream will not start due to TimeoutException: Timeout expired after 60000milliseconds while awaiting InitProducerId

    [ https://issues.apache.org/jira/browse/KAFKA-8803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17182553#comment-17182553 ] 

Eleanore Jin commented on KAFKA-8803:
-------------------------------------

Hi [~guozhang], 

I am using [Apache Beam KafkaIO|https://beam.apache.org/documentation/io/built-in/] to read from kafka topics and publish to kafka topics.

Recently I changed to enable transaction:
{code:java}
private static <K, V> Producer<K, V> initializeExactlyOnceProducer(
      WriteRecords<K, V> spec, String producerName) {

    Map<String, Object> producerConfig = new HashMap<>(spec.getProducerConfig());
    producerConfig.putAll(
        ImmutableMap.of(
            ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG,
            spec.getKeySerializer(),
            ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG,
            spec.getValueSerializer(),
            ProducerSpEL.ENABLE_IDEMPOTENCE_CONFIG,
            true,
            ProducerSpEL.TRANSACTIONAL_ID_CONFIG,
            producerName));

    Producer<K, V> producer =
        spec.getProducerFactoryFn() != null
            ? spec.getProducerFactoryFn().apply(producerConfig)
            : new KafkaProducer<>(producerConfig);

    ProducerSpEL.initTransactions(producer);
    return producer;
  }
{code}

I keeps on getting *org.apache.kafka.common.errors.TimeoutException: Timeout expired after 10000milliseconds while awaiting InitProducerId*

My kafka client is 2.6.0, also tried 2.3.0, and kafka broker is 2.3.0.

Reading from the above comments, it seems the broker requires to be upgraded to 2.5.0+, is my understanding correct? 

> Stream will not start due to TimeoutException: Timeout expired after 60000milliseconds while awaiting InitProducerId
> --------------------------------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-8803
>                 URL: https://issues.apache.org/jira/browse/KAFKA-8803
>             Project: Kafka
>          Issue Type: Bug
>          Components: streams
>            Reporter: Raman Gupta
>            Assignee: Guozhang Wang
>            Priority: Major
>             Fix For: 2.5.0, 2.3.2, 2.4.2
>
>         Attachments: logs-20200311.txt.gz, logs-client-20200311.txt.gz, logs.txt.gz, screenshot-1.png
>
>
> One streams app is consistently failing at startup with the following exception:
> {code}
> 2019-08-14 17:02:29,568 ERROR --- [2ce1b-StreamThread-2] org.apa.kaf.str.pro.int.StreamTask                : task [0_36] Timeout exception caught when initializing transactions for task 0_36. This might happen if the broker is slow to respond, if the network connection to the broker was interrupted, or if similar circumstances arise. You can increase producer parameter `max.block.ms` to increase this timeout.
> org.apache.kafka.common.errors.TimeoutException: Timeout expired after 60000milliseconds while awaiting InitProducerId
> {code}
> These same brokers are used by many other streams without any issue, including some in the very same processes for the stream which consistently throws this exception.
> *UPDATE 08/16:*
> The very first instance of this error is August 13th 2019, 17:03:36.754 and it happened for 4 different streams. For 3 of these streams, the error only happened once, and then the stream recovered. For the 4th stream, the error has continued to happen, and continues to happen now.
> I looked up the broker logs for this time, and see that at August 13th 2019, 16:47:43, two of four brokers started reporting messages like this, for multiple partitions:
> [2019-08-13 20:47:43,658] INFO [ReplicaFetcher replicaId=3, leaderId=1, fetcherId=0] Retrying leaderEpoch request for partition xxx-1 as the leader reported an error: UNKNOWN_LEADER_EPOCH (kafka.server.ReplicaFetcherThread)
> The UNKNOWN_LEADER_EPOCH messages continued for some time, and then stopped, here is a view of the count of these messages over time:
>  !screenshot-1.png! 
> However, as noted, the stream task timeout error continues to happen.
> I use the static consumer group protocol with Kafka 2.3.0 clients and 2.3.0 broker. The broker has a patch for KAFKA-8773.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)