You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by "Cheng Tan (Jira)" <ji...@apache.org> on 2020/10/20 23:34:00 UTC

[jira] [Commented] (KAFKA-8733) Offline partitions occur when leader's disk is slow in reads while responding to follower fetch requests.

    [ https://issues.apache.org/jira/browse/KAFKA-8733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17218000#comment-17218000 ] 

Cheng Tan commented on KAFKA-8733:
----------------------------------

Hi [~satish.duggana], [~flavr], and [~mingaliu],

We are actively investigating this issue and KIP. Meanwhile, have you worked out any way to mitigate this issue, especially the case below?
{quote}But we found an issue of partitions going offline even though follower replicas try their best to fetch from the leader replica.  Sometimes, the leader replica may take time to process the fetch requests while reading logs, it makes follower replicas out of sync even though followers send fetch requests within _[replica.lag.time.max.ms|http://replica.lag.time.max.ms/]_ duration. It may even lead to offline partitions when in-sync replicas go below _min.insync.replicas_ count. We observed this behavior multiple times in our clusters and making partitions offline.
{quote}

> Offline partitions occur when leader's disk is slow in reads while responding to follower fetch requests.
> ---------------------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-8733
>                 URL: https://issues.apache.org/jira/browse/KAFKA-8733
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 1.1.2, 2.4.0
>            Reporter: Satish Duggana
>            Assignee: Satish Duggana
>            Priority: Critical
>         Attachments: weighted-io-time-2.png, wio-time.png
>
>
> We found offline partitions issue multiple times on some of the hosts in our clusters. After going through the broker logs and hosts’s disk stats, it looks like this issue occurs whenever the read/write operations take more time on that disk. In a particular case where read time is more than the replica.lag.time.max.ms, follower replicas will be out of sync as their earlier fetch requests are stuck while reading the local log and their fetch status is not yet updated as mentioned in the below code of `ReplicaManager`. If there is an issue in reading the data from the log for a duration more than replica.lag.time.max.ms then all the replicas will be out of sync and partition becomes offline if min.isr.replicas > 1 and unclean.leader.election is false.
>  
> {code:java}
> def readFromLog(): Seq[(TopicPartition, LogReadResult)] = {
>   val result = readFromLocalLog( // this call took more than `replica.lag.time.max.ms`
>   replicaId = replicaId,
>   fetchOnlyFromLeader = fetchOnlyFromLeader,
>   readOnlyCommitted = fetchOnlyCommitted,
>   fetchMaxBytes = fetchMaxBytes,
>   hardMaxBytesLimit = hardMaxBytesLimit,
>   readPartitionInfo = fetchInfos,
>   quota = quota,
>   isolationLevel = isolationLevel)
>   if (isFromFollower) updateFollowerLogReadResults(replicaId, result). // fetch time gets updated here, but mayBeShrinkIsr should have been already called and the replica is removed from isr
>  else result
>  }
> val logReadResults = readFromLog()
> {code}
> Attached the graphs of disk weighted io time stats when this issue occurred.
> I will raise [KIP-501|https://s.apache.org/jhbpn] describing options on how to handle this scenario.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)