You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@kafka.apache.org by "Yu, Libo " <li...@citi.com> on 2013/08/23 20:33:24 UTC

questions about ISR

Hi,

When a broker is not in a topic's ISR, will it try to catch up to go back to ISR itself?
Or do we have to restart it?

We can increase replica.lag.time.max.ms and replica.lag.max.messages
to let brokers stay longer in ISR. Is that good practice? Still this is
related to the first questions. We want to know what happens after
a broker falls out of ISR and what we should do. Thanks.


Regards,

Libo

RE: questions about ISR

Posted by "Yu, Libo " <li...@citi.com>.

Thanks, Jun. That is exactly what I want to know. 

Regards,

Libo


-----Original Message-----
From: Jun Rao [mailto:junrao@gmail.com] 
Sent: Tuesday, August 27, 2013 11:25 AM
To: users@kafka.apache.org
Subject: Re: questions about ISR

Look for jmx beans under kafka.server. You will see ???MaxLag and ???MinFetchRate. In the normal case, when a broker fails, the controller will drop the failed broker out of ISR during leader election. So, the value of replica.lag.time.max.ms doesn't matter. This value only matters when the controller's decision is delayed somehow. In this case, having a large replica.lag.time.max.ms may delay the committing of a message.

Thanks,

Jun


On Tue, Aug 27, 2013 at 6:37 AM, Yu, Libo <li...@citi.com> wrote:

> Thanks, Jun. That is very helpful. However, I still have a couple of 
> questions. "We have a min fetch rate JMX in the broker". How to find 
> out how such min fetch rate is defined? And if replica.lag.time.max.ms 
> is too large, what is the consequence?
>
>
>
>
> Regards,
>
> Libo
>
>
> -----Original Message-----
> From: Jun Rao [mailto:junrao@gmail.com]
> Sent: Tuesday, August 27, 2013 12:07 AM
> To: users@kafka.apache.org
> Subject: Re: questions about ISR
>
> I added the following in FAQ:
>
> https://cwiki.apache.org/confluence/display/KAFKA/FAQ#FAQ-Howtoreducec
> hurnsinISR%3F
>
> Thanks,
>
> Jun
>
>
> On Mon, Aug 26, 2013 at 7:46 PM, James Wu <ja...@gmail.com> wrote:
>
> > Hi Jun,
> >
> > I am curious Yu's questions too.
> >
> > 1. What is the best practice to set replica.lag.time.max.ms & 
> > replica.lag.max.messages ? As long as possible or something else ?
> >
> > 2. If the broker exceeds one of these 2 configurations, how should 
> > we do to bring the broker back to ISR ? Will controller automatic 
> > cover this to catch broker up, the only thing we need to do is 
> > waiting for the broker back ?
> >
> > Thanks.
> >
> >
> >
> >
> > On Mon, Aug 26, 2013 at 11:15 PM, Jun Rao <ju...@gmail.com> wrote:
> >
> > > That's right. You shouldn't need to restart the whole cluster for 
> > > a
> > broker
> > > to rejoin ISR. Do you see many ZK session expirations in the 
> > > brokers (search for "(Expired)"? If so, you may need to tune the 
> > > GC on the
> > broker.
> > >
> > > Thanks,
> > >
> > > Jun
> > >
> > >
> > > On Mon, Aug 26, 2013 at 7:11 AM, Yu, Libo <li...@citi.com> wrote:
> > >
> > > > Hi Jun,
> > > >
> > > > Could you confirm the following?
> > > > So after a broker is out of ISR, the only way to let it go back 
> > > > is to restart it.
> > > >
> > > > We should set replica.lag.time.max.ms and 
> > > > replica.lag.max.messages as large as possible to avoid a broker fall outside of ISR.
> > > >
> > > > What we have experienced is that when a broker is out of ISR 
> > > > frequently we need to restart the whole cluster to make it back.
> > > > That is a
> > blocking
> > > > issue
> > > > for us.
> > > >
> > > >
> > > > Regards,
> > > >
> > > > Libo
> > > >
> > > >
> > > > -----Original Message-----
> > > > From: Jun Rao [mailto:junrao@gmail.com]
> > > > Sent: Friday, August 23, 2013 11:41 PM
> > > > To: users@kafka.apache.org
> > > > Subject: Re: questions about ISR
> > > >
> > > > When a broker is restarted, it will automatically catch up from 
> > > > the
> > > leader
> > > > and will join ISR when it's caught up. Are you not seeing this
> > happening?
> > > >
> > > > Thanks,
> > > >
> > > > Jun
> > > >
> > > >
> > > > On Fri, Aug 23, 2013 at 11:33 AM, Yu, Libo <li...@citi.com> wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > When a broker is not in a topic's ISR, will it try to catch up 
> > > > > to go back to ISR itself?
> > > > > Or do we have to restart it?
> > > > >
> > > > > We can increase replica.lag.time.max.ms and 
> > > > > replica.lag.max.messages to let brokers stay longer in ISR. Is 
> > > > > that good practice? Still this is related to the first 
> > > > > questions. We want to know what happens after a broker falls 
> > > > > out
> of ISR and what we should do. Thanks.
> > > > >
> > > > >
> > > > > Regards,
> > > > >
> > > > > Libo
> > > > >
> > > > >
> > > >
> > >
> >
> >
> >
> > --
> >
> > Friendly regards,
> >
> > *James Wu <https://plus.google.com/u/0/100829801349304669533>
> > *
> >
>

Re: questions about ISR

Posted by Jun Rao <ju...@gmail.com>.

Look for jmx beans under kafka.server. You will see ???MaxLag and
???MinFetchRate. In the normal case, when a broker fails, the controller
will drop the failed broker out of ISR during leader election. So, the
value of replica.lag.time.max.ms doesn't matter. This value only matters
when the controller's decision is delayed somehow. In this case, having a
large replica.lag.time.max.ms may delay the committing of a message.

Thanks,

Jun


On Tue, Aug 27, 2013 at 6:37 AM, Yu, Libo <li...@citi.com> wrote:

> Thanks, Jun. That is very helpful. However, I still have a couple of
> questions. "We have a min fetch rate JMX in the broker". How to
> find out how such min fetch rate is defined? And if
> replica.lag.time.max.ms is too large, what is the consequence?
>
>
>
>
> Regards,
>
> Libo
>
>
> -----Original Message-----
> From: Jun Rao [mailto:junrao@gmail.com]
> Sent: Tuesday, August 27, 2013 12:07 AM
> To: users@kafka.apache.org
> Subject: Re: questions about ISR
>
> I added the following in FAQ:
>
> https://cwiki.apache.org/confluence/display/KAFKA/FAQ#FAQ-HowtoreducechurnsinISR%3F
>
> Thanks,
>
> Jun
>
>
> On Mon, Aug 26, 2013 at 7:46 PM, James Wu <ja...@gmail.com> wrote:
>
> > Hi Jun,
> >
> > I am curious Yu's questions too.
> >
> > 1. What is the best practice to set replica.lag.time.max.ms &
> > replica.lag.max.messages ? As long as possible or something else ?
> >
> > 2. If the broker exceeds one of these 2 configurations, how should we
> > do to bring the broker back to ISR ? Will controller automatic cover
> > this to catch broker up, the only thing we need to do is waiting for
> > the broker back ?
> >
> > Thanks.
> >
> >
> >
> >
> > On Mon, Aug 26, 2013 at 11:15 PM, Jun Rao <ju...@gmail.com> wrote:
> >
> > > That's right. You shouldn't need to restart the whole cluster for a
> > broker
> > > to rejoin ISR. Do you see many ZK session expirations in the brokers
> > > (search for "(Expired)"? If so, you may need to tune the GC on the
> > broker.
> > >
> > > Thanks,
> > >
> > > Jun
> > >
> > >
> > > On Mon, Aug 26, 2013 at 7:11 AM, Yu, Libo <li...@citi.com> wrote:
> > >
> > > > Hi Jun,
> > > >
> > > > Could you confirm the following?
> > > > So after a broker is out of ISR, the only way to let it go back is
> > > > to restart it.
> > > >
> > > > We should set replica.lag.time.max.ms and replica.lag.max.messages
> > > > as large as possible to avoid a broker fall outside of ISR.
> > > >
> > > > What we have experienced is that when a broker is out of ISR
> > > > frequently we need to restart the whole cluster to make it back.
> > > > That is a
> > blocking
> > > > issue
> > > > for us.
> > > >
> > > >
> > > > Regards,
> > > >
> > > > Libo
> > > >
> > > >
> > > > -----Original Message-----
> > > > From: Jun Rao [mailto:junrao@gmail.com]
> > > > Sent: Friday, August 23, 2013 11:41 PM
> > > > To: users@kafka.apache.org
> > > > Subject: Re: questions about ISR
> > > >
> > > > When a broker is restarted, it will automatically catch up from
> > > > the
> > > leader
> > > > and will join ISR when it's caught up. Are you not seeing this
> > happening?
> > > >
> > > > Thanks,
> > > >
> > > > Jun
> > > >
> > > >
> > > > On Fri, Aug 23, 2013 at 11:33 AM, Yu, Libo <li...@citi.com> wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > When a broker is not in a topic's ISR, will it try to catch up
> > > > > to go back to ISR itself?
> > > > > Or do we have to restart it?
> > > > >
> > > > > We can increase replica.lag.time.max.ms and
> > > > > replica.lag.max.messages to let brokers stay longer in ISR. Is
> > > > > that good practice? Still this is related to the first
> > > > > questions. We want to know what happens after a broker falls out
> of ISR and what we should do. Thanks.
> > > > >
> > > > >
> > > > > Regards,
> > > > >
> > > > > Libo
> > > > >
> > > > >
> > > >
> > >
> >
> >
> >
> > --
> >
> > Friendly regards,
> >
> > *James Wu <https://plus.google.com/u/0/100829801349304669533>
> > *
> >
>

RE: questions about ISR

Posted by "Yu, Libo " <li...@citi.com>.

Thanks, Jun. That is very helpful. However, I still have a couple of
questions. "We have a min fetch rate JMX in the broker". How to 
find out how such min fetch rate is defined? And if  
replica.lag.time.max.ms is too large, what is the consequence?




Regards,

Libo


-----Original Message-----
From: Jun Rao [mailto:junrao@gmail.com] 
Sent: Tuesday, August 27, 2013 12:07 AM
To: users@kafka.apache.org
Subject: Re: questions about ISR

I added the following in FAQ:
https://cwiki.apache.org/confluence/display/KAFKA/FAQ#FAQ-HowtoreducechurnsinISR%3F

Thanks,

Jun


On Mon, Aug 26, 2013 at 7:46 PM, James Wu <ja...@gmail.com> wrote:

> Hi Jun,
>
> I am curious Yu's questions too.
>
> 1. What is the best practice to set replica.lag.time.max.ms & 
> replica.lag.max.messages ? As long as possible or something else ?
>
> 2. If the broker exceeds one of these 2 configurations, how should we 
> do to bring the broker back to ISR ? Will controller automatic cover 
> this to catch broker up, the only thing we need to do is waiting for 
> the broker back ?
>
> Thanks.
>
>
>
>
> On Mon, Aug 26, 2013 at 11:15 PM, Jun Rao <ju...@gmail.com> wrote:
>
> > That's right. You shouldn't need to restart the whole cluster for a
> broker
> > to rejoin ISR. Do you see many ZK session expirations in the brokers 
> > (search for "(Expired)"? If so, you may need to tune the GC on the
> broker.
> >
> > Thanks,
> >
> > Jun
> >
> >
> > On Mon, Aug 26, 2013 at 7:11 AM, Yu, Libo <li...@citi.com> wrote:
> >
> > > Hi Jun,
> > >
> > > Could you confirm the following?
> > > So after a broker is out of ISR, the only way to let it go back is 
> > > to restart it.
> > >
> > > We should set replica.lag.time.max.ms and replica.lag.max.messages  
> > > as large as possible to avoid a broker fall outside of ISR.
> > >
> > > What we have experienced is that when a broker is out of ISR 
> > > frequently we need to restart the whole cluster to make it back. 
> > > That is a
> blocking
> > > issue
> > > for us.
> > >
> > >
> > > Regards,
> > >
> > > Libo
> > >
> > >
> > > -----Original Message-----
> > > From: Jun Rao [mailto:junrao@gmail.com]
> > > Sent: Friday, August 23, 2013 11:41 PM
> > > To: users@kafka.apache.org
> > > Subject: Re: questions about ISR
> > >
> > > When a broker is restarted, it will automatically catch up from 
> > > the
> > leader
> > > and will join ISR when it's caught up. Are you not seeing this
> happening?
> > >
> > > Thanks,
> > >
> > > Jun
> > >
> > >
> > > On Fri, Aug 23, 2013 at 11:33 AM, Yu, Libo <li...@citi.com> wrote:
> > >
> > > > Hi,
> > > >
> > > > When a broker is not in a topic's ISR, will it try to catch up 
> > > > to go back to ISR itself?
> > > > Or do we have to restart it?
> > > >
> > > > We can increase replica.lag.time.max.ms and 
> > > > replica.lag.max.messages to let brokers stay longer in ISR. Is 
> > > > that good practice? Still this is related to the first 
> > > > questions. We want to know what happens after a broker falls out of ISR and what we should do. Thanks.
> > > >
> > > >
> > > > Regards,
> > > >
> > > > Libo
> > > >
> > > >
> > >
> >
>
>
>
> --
>
> Friendly regards,
>
> *James Wu <https://plus.google.com/u/0/100829801349304669533>
> *
>

Re: questions about ISR

Posted by Jun Rao <ju...@gmail.com>.

I added the following in FAQ:
https://cwiki.apache.org/confluence/display/KAFKA/FAQ#FAQ-HowtoreducechurnsinISR%3F

Thanks,

Jun


On Mon, Aug 26, 2013 at 7:46 PM, James Wu <ja...@gmail.com> wrote:

> Hi Jun,
>
> I am curious Yu's questions too.
>
> 1. What is the best practice to set replica.lag.time.max.ms &
> replica.lag.max.messages ? As long as possible or something else ?
>
> 2. If the broker exceeds one of these 2 configurations, how should we do to
> bring the broker back to ISR ? Will controller automatic cover this to
> catch broker up, the only thing we need to do is waiting for the broker
> back ?
>
> Thanks.
>
>
>
>
> On Mon, Aug 26, 2013 at 11:15 PM, Jun Rao <ju...@gmail.com> wrote:
>
> > That's right. You shouldn't need to restart the whole cluster for a
> broker
> > to rejoin ISR. Do you see many ZK session expirations in the brokers
> > (search for "(Expired)"? If so, you may need to tune the GC on the
> broker.
> >
> > Thanks,
> >
> > Jun
> >
> >
> > On Mon, Aug 26, 2013 at 7:11 AM, Yu, Libo <li...@citi.com> wrote:
> >
> > > Hi Jun,
> > >
> > > Could you confirm the following?
> > > So after a broker is out of ISR, the only way to let it go back is to
> > > restart it.
> > >
> > > We should set replica.lag.time.max.ms and replica.lag.max.messages  as
> > > large as possible to avoid a broker fall outside of ISR.
> > >
> > > What we have experienced is that when a broker is out of ISR frequently
> > > we need to restart the whole cluster to make it back. That is a
> blocking
> > > issue
> > > for us.
> > >
> > >
> > > Regards,
> > >
> > > Libo
> > >
> > >
> > > -----Original Message-----
> > > From: Jun Rao [mailto:junrao@gmail.com]
> > > Sent: Friday, August 23, 2013 11:41 PM
> > > To: users@kafka.apache.org
> > > Subject: Re: questions about ISR
> > >
> > > When a broker is restarted, it will automatically catch up from the
> > leader
> > > and will join ISR when it's caught up. Are you not seeing this
> happening?
> > >
> > > Thanks,
> > >
> > > Jun
> > >
> > >
> > > On Fri, Aug 23, 2013 at 11:33 AM, Yu, Libo <li...@citi.com> wrote:
> > >
> > > > Hi,
> > > >
> > > > When a broker is not in a topic's ISR, will it try to catch up to go
> > > > back to ISR itself?
> > > > Or do we have to restart it?
> > > >
> > > > We can increase replica.lag.time.max.ms and replica.lag.max.messages
> > > > to let brokers stay longer in ISR. Is that good practice? Still this
> > > > is related to the first questions. We want to know what happens after
> > > > a broker falls out of ISR and what we should do. Thanks.
> > > >
> > > >
> > > > Regards,
> > > >
> > > > Libo
> > > >
> > > >
> > >
> >
>
>
>
> --
>
> Friendly regards,
>
> *James Wu <https://plus.google.com/u/0/100829801349304669533>
> *
>

Re: questions about ISR

Posted by James Wu <ja...@gmail.com>.

Hi Jun,

I am curious Yu's questions too.

1. What is the best practice to set replica.lag.time.max.ms &
replica.lag.max.messages ? As long as possible or something else ?

2. If the broker exceeds one of these 2 configurations, how should we do to
bring the broker back to ISR ? Will controller automatic cover this to
catch broker up, the only thing we need to do is waiting for the broker
back ?

Thanks.




On Mon, Aug 26, 2013 at 11:15 PM, Jun Rao <ju...@gmail.com> wrote:

> That's right. You shouldn't need to restart the whole cluster for a broker
> to rejoin ISR. Do you see many ZK session expirations in the brokers
> (search for "(Expired)"? If so, you may need to tune the GC on the broker.
>
> Thanks,
>
> Jun
>
>
> On Mon, Aug 26, 2013 at 7:11 AM, Yu, Libo <li...@citi.com> wrote:
>
> > Hi Jun,
> >
> > Could you confirm the following?
> > So after a broker is out of ISR, the only way to let it go back is to
> > restart it.
> >
> > We should set replica.lag.time.max.ms and replica.lag.max.messages  as
> > large as possible to avoid a broker fall outside of ISR.
> >
> > What we have experienced is that when a broker is out of ISR frequently
> > we need to restart the whole cluster to make it back. That is a blocking
> > issue
> > for us.
> >
> >
> > Regards,
> >
> > Libo
> >
> >
> > -----Original Message-----
> > From: Jun Rao [mailto:junrao@gmail.com]
> > Sent: Friday, August 23, 2013 11:41 PM
> > To: users@kafka.apache.org
> > Subject: Re: questions about ISR
> >
> > When a broker is restarted, it will automatically catch up from the
> leader
> > and will join ISR when it's caught up. Are you not seeing this happening?
> >
> > Thanks,
> >
> > Jun
> >
> >
> > On Fri, Aug 23, 2013 at 11:33 AM, Yu, Libo <li...@citi.com> wrote:
> >
> > > Hi,
> > >
> > > When a broker is not in a topic's ISR, will it try to catch up to go
> > > back to ISR itself?
> > > Or do we have to restart it?
> > >
> > > We can increase replica.lag.time.max.ms and replica.lag.max.messages
> > > to let brokers stay longer in ISR. Is that good practice? Still this
> > > is related to the first questions. We want to know what happens after
> > > a broker falls out of ISR and what we should do. Thanks.
> > >
> > >
> > > Regards,
> > >
> > > Libo
> > >
> > >
> >
>



-- 

Friendly regards,

*James Wu <https://plus.google.com/u/0/100829801349304669533>
*

Re: questions about ISR

Posted by Jun Rao <ju...@gmail.com>.

That's right. You shouldn't need to restart the whole cluster for a broker
to rejoin ISR. Do you see many ZK session expirations in the brokers
(search for "(Expired)"? If so, you may need to tune the GC on the broker.

Thanks,

Jun


On Mon, Aug 26, 2013 at 7:11 AM, Yu, Libo <li...@citi.com> wrote:

> Hi Jun,
>
> Could you confirm the following?
> So after a broker is out of ISR, the only way to let it go back is to
> restart it.
>
> We should set replica.lag.time.max.ms and replica.lag.max.messages  as
> large as possible to avoid a broker fall outside of ISR.
>
> What we have experienced is that when a broker is out of ISR frequently
> we need to restart the whole cluster to make it back. That is a blocking
> issue
> for us.
>
>
> Regards,
>
> Libo
>
>
> -----Original Message-----
> From: Jun Rao [mailto:junrao@gmail.com]
> Sent: Friday, August 23, 2013 11:41 PM
> To: users@kafka.apache.org
> Subject: Re: questions about ISR
>
> When a broker is restarted, it will automatically catch up from the leader
> and will join ISR when it's caught up. Are you not seeing this happening?
>
> Thanks,
>
> Jun
>
>
> On Fri, Aug 23, 2013 at 11:33 AM, Yu, Libo <li...@citi.com> wrote:
>
> > Hi,
> >
> > When a broker is not in a topic's ISR, will it try to catch up to go
> > back to ISR itself?
> > Or do we have to restart it?
> >
> > We can increase replica.lag.time.max.ms and replica.lag.max.messages
> > to let brokers stay longer in ISR. Is that good practice? Still this
> > is related to the first questions. We want to know what happens after
> > a broker falls out of ISR and what we should do. Thanks.
> >
> >
> > Regards,
> >
> > Libo
> >
> >
>

RE: questions about ISR

Posted by "Yu, Libo " <li...@citi.com>.

Hi Jun,

Could you confirm the following?
So after a broker is out of ISR, the only way to let it go back is to restart it.

We should set replica.lag.time.max.ms and replica.lag.max.messages  as 
large as possible to avoid a broker fall outside of ISR.

What we have experienced is that when a broker is out of ISR frequently 
we need to restart the whole cluster to make it back. That is a blocking issue 
for us.

Regards,

Libo

-----Original Message-----
From: Jun Rao [mailto:junrao@gmail.com] 
Sent: Friday, August 23, 2013 11:41 PM
To: users@kafka.apache.org
Subject: Re: questions about ISR

When a broker is restarted, it will automatically catch up from the leader and will join ISR when it's caught up. Are you not seeing this happening?

Thanks,

Jun

On Fri, Aug 23, 2013 at 11:33 AM, Yu, Libo <li...@citi.com> wrote:

> Hi,
>
> When a broker is not in a topic's ISR, will it try to catch up to go 
> back to ISR itself?
> Or do we have to restart it?
>
> We can increase replica.lag.time.max.ms and replica.lag.max.messages 
> to let brokers stay longer in ISR. Is that good practice? Still this 
> is related to the first questions. We want to know what happens after 
> a broker falls out of ISR and what we should do. Thanks.
>
>
> Regards,
>
> Libo
>
>

Re: questions about ISR

Posted by Jun Rao <ju...@gmail.com>.

When a broker is restarted, it will automatically catch up from the leader
and will join ISR when it's caught up. Are you not seeing this happening?

Thanks,

Jun

On Fri, Aug 23, 2013 at 11:33 AM, Yu, Libo <li...@citi.com> wrote:

> Hi,
>
> When a broker is not in a topic's ISR, will it try to catch up to go back
> to ISR itself?
> Or do we have to restart it?
>
> We can increase replica.lag.time.max.ms and replica.lag.max.messages
> to let brokers stay longer in ISR. Is that good practice? Still this is
> related to the first questions. We want to know what happens after
> a broker falls out of ISR and what we should do. Thanks.
>
>
> Regards,
>
> Libo
>
>