You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@kafka.apache.org by Jun Rao <ju...@gmail.com> on 2013/07/01 05:48:27 UTC

Re: failover strategy

LinkedIn uses the first method for cross DC mirroring. For the second
method, there are 2 main issues. (1) Kafka depends on the ZK service to be
always available. For a ZK cluster to be available, you need a majority of
ZK servers to be up. If you set up a ZK cluster spanning only 2 data
centers, a single DC failure may make the ZK cluster unavailable. You can
set up a ZK cluster spanning 3 or more DCs, which allows you tolerate at
least 1 DC failure. (2) Long network latency across DCs. In order for the
follow to keep up with the leader in a different DC, you need to tune
parameters like replica.lag.max.messages, replica.lag.time.max.ms,
and replica.socket.receive.buffer.bytes to amortize the long network
latency.

Thanks,

Jun


On Sat, Jun 29, 2013 at 10:50 AM, Yu, Libo <li...@citi.com> wrote:

> The first method may lose message if cluster A is permanently down or
> cannot
> restart right away as B always lags behind A. Even with mirroring, B has
> to wait
> to get missing msg until A is back. So it is not ideal. What type of
> solution did
> you use at linkedin?
>
> Regards,
>
> Libo
>
>
> -----Original Message-----
> From: Joel Koshy [mailto:jjkoshy.w@gmail.com]
> Sent: Friday, June 28, 2013 8:59 PM
> To: users@kafka.apache.org
> Subject: Re: failover strategy
>
> The second method (replication across DCs) is not recommended.
> The first set up would work provided the set of topics you are mirroring
> from A->B is disjoint from the set of topics you are mirroring from B->A
> (i.e., to avoid a mirroring loop).
>
> Joel
>
> On Fri, Jun 28, 2013 at 5:29 PM, Yu, Libo <li...@citi.com> wrote:
> > Hi,
> >
> > I can think of two failover strategies. I am not sure which one is the
> right way to go.
> >
> > First method. set up kafka server A on cluster 1 and set up another
> server B on cluster 2.
> > The two clusters are in different data centers. Use customized
> > mirrormaker to sync between the two servers. Use one server in
> > production and use the other one as contingency. If server A is down,
> server B will be used (this can be transparent to publishers/consumers).
> > There may be a lag between the two servers before server A is down .
> > But after A is back, the customized mirrormaker can sync the two. And
> > eventually B will have all the data A had before the failure.
> >
> > Second method. Set up one kafka server using cluster 1 and cluster 2.
> > When creating a topic , always use two replications. For each
> > partition, assign one replication to a broker in cluster 1 and assign
> > the other replication to a broker in cluster 2. So kafka will handle the
> syncing and failover for the two clusters. Is that a right (expected) way
> to use kafka?
> >
> >
> > Regards,
> >
> > Libo
> >
>

RE: failover strategy

Posted by "Yu, Libo " <li...@citi.com>.

Thanks again. It seems the 2nd method is not doable.
The downside of the first method is that if the first data
center is down, the second one still lags behind and may
not have all the messages the first one has. We can let
publisher publish to the two data centers at the same 
time. But that may degrade the performance greatly.

Regards,

Libo

-----Original Message-----
From: Jun Rao [mailto:junrao@gmail.com] 
Sent: Sunday, June 30, 2013 11:48 PM
To: users@kafka.apache.org
Subject: Re: failover strategy

LinkedIn uses the first method for cross DC mirroring. For the second method, there are 2 main issues. (1) Kafka depends on the ZK service to be always available. For a ZK cluster to be available, you need a majority of ZK servers to be up. If you set up a ZK cluster spanning only 2 data centers, a single DC failure may make the ZK cluster unavailable. You can set up a ZK cluster spanning 3 or more DCs, which allows you tolerate at least 1 DC failure. (2) Long network latency across DCs. In order for the follow to keep up with the leader in a different DC, you need to tune parameters like replica.lag.max.messages, replica.lag.time.max.ms, and replica.socket.receive.buffer.bytes to amortize the long network latency.

Thanks,

Jun

On Sat, Jun 29, 2013 at 10:50 AM, Yu, Libo <li...@citi.com> wrote:

> The first method may lose message if cluster A is permanently down or 
> cannot restart right away as B always lags behind A. Even with 
> mirroring, B has to wait to get missing msg until A is back. So it is 
> not ideal. What type of solution did you use at linkedin?
>
> Regards,
>
> Libo
>
>
> -----Original Message-----
> From: Joel Koshy [mailto:jjkoshy.w@gmail.com]
> Sent: Friday, June 28, 2013 8:59 PM
> To: users@kafka.apache.org
> Subject: Re: failover strategy
>
> The second method (replication across DCs) is not recommended.
> The first set up would work provided the set of topics you are 
> mirroring from A->B is disjoint from the set of topics you are 
> mirroring from B->A (i.e., to avoid a mirroring loop).
>
> Joel
>
> On Fri, Jun 28, 2013 at 5:29 PM, Yu, Libo <li...@citi.com> wrote:
> > Hi,
> >
> > I can think of two failover strategies. I am not sure which one is 
> > the
> right way to go.
> >
> > First method. set up kafka server A on cluster 1 and set up another
> server B on cluster 2.
> > The two clusters are in different data centers. Use customized 
> > mirrormaker to sync between the two servers. Use one server in 
> > production and use the other one as contingency. If server A is 
> > down,
> server B will be used (this can be transparent to publishers/consumers).
> > There may be a lag between the two servers before server A is down .
> > But after A is back, the customized mirrormaker can sync the two. 
> > And eventually B will have all the data A had before the failure.
> >
> > Second method. Set up one kafka server using cluster 1 and cluster 2.
> > When creating a topic , always use two replications. For each 
> > partition, assign one replication to a broker in cluster 1 and 
> > assign the other replication to a broker in cluster 2. So kafka will 
> > handle the
> syncing and failover for the two clusters. Is that a right (expected) 
> way to use kafka?
> >
> >
> > Regards,
> >
> > Libo
> >
>