You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by "Onur Karaman (JIRA)" <ji...@apache.org> on 2017/05/12 04:29:04 UTC

[jira] [Resolved] (KAFKA-5120) Several controller metrics block if controller lock is held by another thread

     [ https://issues.apache.org/jira/browse/KAFKA-5120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Onur Karaman resolved KAFKA-5120.
---------------------------------
    Resolution: Fixed

KAFKA-5028 has been checked in so this should no longer be an issue.

> Several controller metrics block if controller lock is held by another thread
> -----------------------------------------------------------------------------
>
>                 Key: KAFKA-5120
>                 URL: https://issues.apache.org/jira/browse/KAFKA-5120
>             Project: Kafka
>          Issue Type: Bug
>          Components: controller, metrics
>    Affects Versions: 0.10.2.0
>            Reporter: Tim Carey-Smith
>            Priority: Minor
>
> We have been tracking latency issues surrounding queries to Controller MBeans. Upon digging into the root causes, we discovered that several metrics acquire the controller lock within the gauge. 
> The affected metrics are: 
> * {{ActiveControllerCount}}
> * {{OfflinePartitionsCount}}
> * {{PreferredReplicaImbalanceCount}}
> If the controller is currently holding the lock and a MBean request is received, the thread executing the request will block until the controller releases the lock. 
> We discovered this in a cluster where the controller was holding the lock for extended periods of time for normal operations. We have documented this issue in KAFKA-5116. 
> Several possible solutions exist: 
> * Remove the lock from inside these {{Gauge}} s. 
> * Store and update the metric values in {{AtomicLong}} s. 
> Modifying the {{ActiveControllerCount}} metric seems to be straight-forward while the other 2 metrics seem to be more involved. 
> We're happy to contribute a patch, but wanted to discuss potential solutions and their tradeoffs before proceeding. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)