You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@kafka.apache.org by Ashish Karalkar <as...@yahoo.com.INVALID> on 2019/01/23 04:58:44 UTC

Broker continuously expand and shrinks to itself

Hi All,
We just upgraded from 0.10.x to 1.1 and enabled rack awareness on an existing clusters which has about 20 nodes in 4 rack . After this we see that few brokers goes on continuous expand and shrink ISR to itself  cycle , it is also causing high time for serving meta data requests.
What is the impact of enabling rack awareness on existing cluster assuming replication factor is 3 and all existing replica may or may not be in different rack when rack awareness was enabled after which a rolling bounce was done. 
Symptoms we are having are replica lag and slow metadata requests. Also in brokers log we continuously see disconnection from the broker where it is trying to expand. 
Thanks for helping
--A

Re: Broker continuously expand and shrinks to itself

Posted by Ashish Karalkar <as...@yahoo.com.INVALID>.

Thanks Harsha,
I will play with these settings.

-Ashish

Sent from Yahoo Mail for iPhone

On Monday, January 28, 2019, 2:13 PM, Harsha Chintalapani <ka...@harsha.io> wrote:

We’ve seen this similar in our setup and as you noticed it does happen infrequently. Based on my debugging there are few things that might be causing this issue , one of them would be
1. replica.lag.time.max.ms set to 10secs by default
2. replica.socket.timeout.ms set to 30secs by default

In situations where the broker is busy with lots of clients , a follower making a replica request and if this request takes longer or times out i.e waits for 30 secs and didn’t get any response. ReplicaManager thread calls maybeShrinkISR and shrinks the ISR if there no call from a follower with in replica.lag.time.max.ms which is possible in cases of heavy load and given the socket timeout itself takes 30secs it can be marked as not in ISR.

What we’ve seen is shrinkISR and expandISR happening back to back i.e one call is getting timed out and subsequent call making it part of ISR.  One option to try is to lower the socket timeout to be lower and increase the lag.time.max.ms .

Thanks,
Harsha
On Jan 27, 2019, 8:48 AM -0800, Ashish Karalkar <as...@yahoo.com.INVALID>, wrote:
> Hi Harsha,
> Thanks for the reply.
> Issue is resolved as of now and the root cause was a runaway application spawning many instances of kafkacat and hammering kafka brokers. I am still wondering that what could be reason for shrink and expand is a client hammers a broker  .
> --Ashish
> On Thursday, January 24, 2019, 8:53:10 AM PST, Harsha Chintalapani <ka...@harsha.io> wrote:
>
> Hi Ashish,
>            Whats your replica.lag.time.max.ms set to and do you see any network issues between brokers.
> -Harsha
>
>
>
> On Jan 22, 2019, 10:09 PM -0800, Ashish Karalkar <as...@yahoo.com.INVALID>, wrote:
> > Hi All,
> > We just upgraded from 0.10.x to 1.1 and enabled rack awareness on an existing clusters which has about 20 nodes in 4 rack . After this we see that few brokers goes on continuous expand and shrink ISR to itself  cycle , it is also causing high time for serving meta data requests.
> > What is the impact of enabling rack awareness on existing cluster assuming replication factor is 3 and all existing replica may or may not be in different rack when rack awareness was enabled after which a rolling bounce was done.
> > Symptoms we are having are replica lag and slow metadata requests. Also in brokers log we continuously see disconnection from the broker where it is trying to expand.
> > Thanks for helping
> > --A

Re: Broker continuously expand and shrinks to itself

Posted by Harsha Chintalapani <ka...@harsha.io>.

We’ve seen this similar in our setup and as you noticed it does happen infrequently. Based on my debugging there are few things that might be causing this issue , one of them would be
1. replica.lag.time.max.ms set to 10secs by default
2. replica.socket.timeout.ms set to 30secs by default

In situations where the broker is busy with lots of clients , a follower making a replica request and if this request takes longer or times out i.e waits for 30 secs and didn’t get any response. ReplicaManager thread calls maybeShrinkISR and shrinks the ISR if there no call from a follower with in replica.lag.time.max.ms which is possible in cases of heavy load and given the socket timeout itself takes 30secs it can be marked as not in ISR.

What we’ve seen is shrinkISR and expandISR happening back to back i.e one call is getting timed out and subsequent call making it part of ISR.  One option to try is to lower the socket timeout to be lower and increase the lag.time.max.ms .

Thanks,
Harsha
On Jan 27, 2019, 8:48 AM -0800, Ashish Karalkar <as...@yahoo.com.INVALID>, wrote:
> Hi Harsha,
> Thanks for the reply.
> Issue is resolved as of now and the root cause was a runaway application spawning many instances of kafkacat and hammering kafka brokers. I am still wondering that what could be reason for shrink and expand is a client hammers a broker  .
> --Ashish
> On Thursday, January 24, 2019, 8:53:10 AM PST, Harsha Chintalapani <ka...@harsha.io> wrote:
>
> Hi Ashish,
>            Whats your replica.lag.time.max.ms set to and do you see any network issues between brokers.
> -Harsha
>
>
>
> On Jan 22, 2019, 10:09 PM -0800, Ashish Karalkar <as...@yahoo.com.INVALID>, wrote:
> > Hi All,
> > We just upgraded from 0.10.x to 1.1 and enabled rack awareness on an existing clusters which has about 20 nodes in 4 rack . After this we see that few brokers goes on continuous expand and shrink ISR to itself  cycle , it is also causing high time for serving meta data requests.
> > What is the impact of enabling rack awareness on existing cluster assuming replication factor is 3 and all existing replica may or may not be in different rack when rack awareness was enabled after which a rolling bounce was done.
> > Symptoms we are having are replica lag and slow metadata requests. Also in brokers log we continuously see disconnection from the broker where it is trying to expand.
> > Thanks for helping
> > --A

Re: Broker continuously expand and shrinks to itself

Posted by Ashish Karalkar <as...@yahoo.com.INVALID>.

 Hi Harsha,
Thanks for the reply.
Issue is resolved as of now and the root cause was a runaway application spawning many instances of kafkacat and hammering kafka brokers. I am still wondering that what could be reason for shrink and expand is a client hammers a broker  .
--Ashish 
    On Thursday, January 24, 2019, 8:53:10 AM PST, Harsha Chintalapani <ka...@harsha.io> wrote:  
 
 Hi Ashish,
           Whats your replica.lag.time.max.ms set to and do you see any network issues between brokers.
-Harsha



On Jan 22, 2019, 10:09 PM -0800, Ashish Karalkar <as...@yahoo.com.INVALID>, wrote:
> Hi All,
> We just upgraded from 0.10.x to 1.1 and enabled rack awareness on an existing clusters which has about 20 nodes in 4 rack . After this we see that few brokers goes on continuous expand and shrink ISR to itself  cycle , it is also causing high time for serving meta data requests.
> What is the impact of enabling rack awareness on existing cluster assuming replication factor is 3 and all existing replica may or may not be in different rack when rack awareness was enabled after which a rolling bounce was done.
> Symptoms we are having are replica lag and slow metadata requests. Also in brokers log we continuously see disconnection from the broker where it is trying to expand.
> Thanks for helping
> --A

Re: Broker continuously expand and shrinks to itself

Posted by Harsha Chintalapani <ka...@harsha.io>.

Hi Ashish,
           Whats your replica.lag.time.max.ms set to and do you see any network issues between brokers.
-Harsha



On Jan 22, 2019, 10:09 PM -0800, Ashish Karalkar <as...@yahoo.com.INVALID>, wrote:
> Hi All,
> We just upgraded from 0.10.x to 1.1 and enabled rack awareness on an existing clusters which has about 20 nodes in 4 rack . After this we see that few brokers goes on continuous expand and shrink ISR to itself  cycle , it is also causing high time for serving meta data requests.
> What is the impact of enabling rack awareness on existing cluster assuming replication factor is 3 and all existing replica may or may not be in different rack when rack awareness was enabled after which a rolling bounce was done.
> Symptoms we are having are replica lag and slow metadata requests. Also in brokers log we continuously see disconnection from the broker where it is trying to expand.
> Thanks for helping
> --A

Broker continuously expand and shrinks to itself

Posted by Ashish Karalkar <as...@yahoo.com.INVALID>.

 Hi All,

We just upgraded from 0.10.x to 1.1 and enabled rack awareness on an existing clusters which has about 20 nodes in 4 rack . After this we see that few brokers goes on continuous expand and shrink ISR to itself  cycle , it is also causing high time for serving meta data requests.
What is the impact of enabling rack awareness on existing cluster assuming replication factor is 3 and all existing replica may or may not be in different rack when rack awareness was enabled after which a rolling bounce was done. 
Symptoms we are having are replica lag and slow metadata requests. Also in brokers log we continuously see disconnection from the broker where it is trying to expand. 
Thanks for helping...
--A