You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by "Stanislav Kozlovski (Jira)" <ji...@apache.org> on 2019/10/11 23:25:00 UTC

[jira] [Commented] (KAFKA-8667) Improve leadership transition time

    [ https://issues.apache.org/jira/browse/KAFKA-8667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16949834#comment-16949834 ] 

Stanislav Kozlovski commented on KAFKA-8667:
--------------------------------------------

[~hzxa21] have you considered submitting a patch for this along with KAFKA-8668? I saw you've authored patches for these JIRAs in LinkedIn's open-source Kafka - [https://github.com/linkedin/kafka/commit/feed875f8fcd8b9f8b8539e8a9b2e477a67b2faf]

> Improve leadership transition time
> ----------------------------------
>
>                 Key: KAFKA-8667
>                 URL: https://issues.apache.org/jira/browse/KAFKA-8667
>             Project: Kafka
>          Issue Type: Improvement
>            Reporter: Zhanxiang (Patrick) Huang
>            Assignee: Zhanxiang (Patrick) Huang
>            Priority: Major
>
> When the replica fetcher thread processes fetch response, it will hold the {{partitionMapLock}}. If at the same time, a LeaderAndIsr request comes in, it will be blocked at the end of its processing when calling {{shutdownIdleFetcherThread}} because it will need to wait for the {{partitionMapLock}} of each replica fetcher thread to be acquired to check whether there is any partition assigned to each fetcher and the request handler thread performs this check sequentially for the fetcher threads
> For example, in a cluster with 20 brokers and num.replica.fetcher.thread set to 32, if each fetcher thread holds lock for a little bit longer, the total time for the request handler thread to finish shutdownIdleFetcherThread can be a lot larger due to waiting for the partitionMapLock for a longer time for each fetcher thread. If the LeaderAndIsr gets blocked for >request.timeout.ms (default to 30s) in the broker, request send thread in the controller side will timeout while waiting for the response and try to establish a new connection to the broker and re-send the request, which will break in-order delivery because we will have more than one channel talking to the broker. Moreover, this may make the lock contention problem worse or saturate request handler threads because duplicate control requests are sent to the broker for multiple time. In our own testing, we saw up to *8 duplicate LeaderAndIsrRequest* being sent to the broker during bounce and the 99th LeaderAndIsr local time goes up to ~500s.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)