You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by 涂扬 <tu...@meituan.com> on 2017/10/23 00:01:38 UTC

Get broker metrics timeout

Hi all,
	we have a cluster with 10 brokers, and our kafka version is 0.9.0.1,we repeatedly get our metric data such as offlinePartition metric from each broker with 2 minutes gap to achieve the goal of cluster’s monitor.
but accidental timeout occurs when we get data from some of brokers. which will leads to false alarm information.
	such as we may get exception as below.
	error: Failed to retrieve RMIServer stub: javax.naming.ServiceUnavailableException [Root exception is java.rmi.ConnectException: Connection refused to host: 10.11.12.13; nested exception is: 
	java.net.ConnectException: Connection timed out]
	
	we find our TcpExt.TCPBacklogDrop index is fluctuate repeatedly,and I put it in my attachment,may be this is some root cause. if it’s the problem. how can I optimize it.
	Any suggestion is appreciated. Thanks before.

Re: Get broker metrics timeout

Posted by Guozhang Wang <wa...@gmail.com>.
Hello Yang,

On the mailing list attachments are usually not allowed. Your attached file
did not show up. Could you please paste it somewhere and send the link in
this thread?

As for your observed TCP backlog issue, have you tried to simply increase
the backlog capacity and see if it helps?


Guozhang

On Sun, Oct 22, 2017 at 5:01 PM, 涂扬 <tu...@meituan.com> wrote:

> Hi all,
>         we have a cluster with 10 brokers, and our kafka version is
> 0.9.0.1,we repeatedly get our metric data such as offlinePartition metric
> from each broker with 2 minutes gap to achieve the goal of cluster’s
> monitor.
> but accidental timeout occurs when we get data from some of brokers. which
> will leads to false alarm information.
>         such as we may get exception as below.
>         error: Failed to retrieve RMIServer stub: javax.naming.ServiceUnavailableException
> [Root exception is java.rmi.ConnectException: Connection refused to host:
> 10.11.12.13; nested exception is:
>         java.net.ConnectException: Connection timed out]
>
>         we find our TcpExt.TCPBacklogDrop index is fluctuate
> repeatedly,and I put it in my attachment,may be this is some root cause. if
> it’s the problem. how can I optimize it.
>         Any suggestion is appreciated. Thanks before.




-- 
-- Guozhang