You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by "Guozhang Wang (Jira)" <ji...@apache.org> on 2019/10/31 23:13:00 UTC

[jira] [Resolved] (KAFKA-9073) Kafka Streams State stuck in rebalancing after one of the StreamThread encounters java.lang.IllegalStateException: No current assignment for partition

     [ https://issues.apache.org/jira/browse/KAFKA-9073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Guozhang Wang resolved KAFKA-9073.
----------------------------------
    Resolution: Fixed

> Kafka Streams State stuck in rebalancing after one of the StreamThread encounters java.lang.IllegalStateException: No current assignment for partition
> ------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-9073
>                 URL: https://issues.apache.org/jira/browse/KAFKA-9073
>             Project: Kafka
>          Issue Type: Bug
>          Components: streams
>    Affects Versions: 2.3.0
>            Reporter: amuthan Ganeshan
>            Priority: Major
>             Fix For: 2.4.0
>
>         Attachments: KAFKA-9073.log
>
>
> I have a Kafka stream application that stores the incoming messages into a state store, and later during the punctuation period, we store them into a big data persistent store after processing the messages.
> The application consumes from 120 partitions distributed across 40 instances. The application has been running fine without any problem for months, but all of a sudden some of the instances failed because of a stream thread exception saying  
> ```java.lang.IllegalStateException: No current assignment for partition <app_name>-<store_name>-changelog-98```
>  
> And other instances are stuck in the REBALANCING state, and never comes out of it. Here is the full stack trace, I just masked the application-specific app name and store name in the stack trace due to NDA.
>  
> ```
> 2019-10-21 13:27:13,481 ERROR [application.id-a2c06c51-bfc7-449a-a094-d8b770caee92-StreamThread-3] [org.apache.kafka.streams.processor.internals.StreamThread] [] stream-thread [application.id-a2c06c51-bfc7-449a-a094-d8b770caee92-StreamThread-3] Encountered the following error during processing:
> java.lang.IllegalStateException: No current assignment for partition application.id-store_name-changelog-98
>  at org.apache.kafka.clients.consumer.internals.SubscriptionState.assignedState(SubscriptionState.java:319)
>  at org.apache.kafka.clients.consumer.internals.SubscriptionState.requestFailed(SubscriptionState.java:618)
>  at org.apache.kafka.clients.consumer.internals.Fetcher$2.onFailure(Fetcher.java:709)
>  at org.apache.kafka.clients.consumer.internals.RequestFuture.fireFailure(RequestFuture.java:177)
>  at org.apache.kafka.clients.consumer.internals.RequestFuture.raise(RequestFuture.java:147)
>  at org.apache.kafka.clients.consumer.internals.RequestFutureAdapter.onFailure(RequestFutureAdapter.java:30)
>  at org.apache.kafka.clients.consumer.internals.RequestFuture$1.onFailure(RequestFuture.java:209)
>  at org.apache.kafka.clients.consumer.internals.RequestFuture.fireFailure(RequestFuture.java:177)
>  at org.apache.kafka.clients.consumer.internals.RequestFuture.raise(RequestFuture.java:147)
>  at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler.fireCompletion(ConsumerNetworkClient.java:574)
>  at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.firePendingCompletedRequests(ConsumerNetworkClient.java:388)
>  at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:294)
>  at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:233)
>  at org.apache.kafka.clients.consumer.KafkaConsumer.pollForFetches(KafkaConsumer.java:1281)
>  at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1225)
>  at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1201)
>  at org.apache.kafka.streams.processor.internals.StreamThread.maybeUpdateStandbyTasks(StreamThread.java:1126)
>  at org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:923)
>  at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:805)
>  at org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:774)
> ```
>  
> Now I checked the state sore disk usage; it is less than 40% of the total disk space available. Restarting the application solves the problem for a short amount of time, but the error popping up randomly on some other instances quickly. I tried to change the retry and retry.backoff.ms configuration but not helpful at all
> ```
> retries = 2147483647
> retry.backoff.ms
> ```
> After googling for some time I found there was a similar bug reported to the Kafka team in the past, and also notice my stack trace is exactly matching with the stack trace of the reported bug.
> Here is the link for the bug reported on a comparable basis a year ago.
> https://issues.apache.org/jira/browse/KAFKA-7181
>  
> Now I am wondering is there a workaround for this bug though configuration changes, or is there something wrong the way I set up the application, the following are the configuration I have for my stream application.
>  
> ```
> consumer.session.timeout.ms=30000
>  metric.reporters=org.apache.kafka.common.metrics.JmxReporter
>  replication.factor=3
>  metadata.max.age.ms=30000
>  max.partition.fetch.bytes=2000000
>  producer.retries=2147483647
>  bootstrap.servers= <bootstrap server list goes here>
>  metrics.recording.level=DEBUG
>  producer.retry.backoff.ms=60000
>  consumer.auto.offset.reset=latest
>  application.server=0.0.0.0:6063
>  num.standby.replicas=1
>  max.poll.records=2
>  group.initial.rebalance.delay.ms=30000
>  state.dir= <state dir path goes here>
>  heartbeat.interval.ms=10000
>  max.poll.interval.ms=300000
>  num.stream.threads=10
>  application.id= <application id goes here>
> ```
> Note: The original bug reported a year back got a conclusion that it is related to https://issues.apache.org/jira/browse/KAFKA-7657 and reported solved in version 2.2.0, but I am using the latest 2.3.0 version.
> I appreciate your help concerning this bug.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)