You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by "Maruthi Vemuri (mavemuri)" <ma...@cisco.com.INVALID> on 2021/10/05 01:41:55 UTC

Replica fetcher not fetching post rolling reboots

Hello,

We are seeing an issue on rolling restarts where replicas of a few partitions are lagging and never catchup. The log files for these partitions look the same size on all the brokers- including the ones where the replicas are lagging. The failedpartitionscount metric is still at 0 but the replicas are stuck in that state until we manually either reassign partitions or reelect leader. Some of the partitions in question don’t even receive any data during the rolling reboots. These partitions have min.insync.replicas set at 1 but even then is it not expected that the replicas eventually catchup to the leader? As far as I could make out, ReplicaFetcherThread just stopped fetching for those partitions

Has anyone seen a similar issue?

Thanks,
Maruthi