You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Weide Zhang <we...@gmail.com> on 2014/05/08 20:20:37 UTC

question about mirror maker

Hi,

I have a question about mirror maker. say I have 3 data centers each
producing topic 'A' with separate kafka cluster running. if 3 of the data
need to be kept in sync with each other, shall i create 3 mirror maker in
each data center to get the data from the other two ?

also, it mentioned that mirror making is not fault tolerant ? so what will
be the behavior of mirror consumer if it went down due to network and back
up ? do they catch up with last offset from which they last mirror ? If so,
is it enabled by default or I have to configure  ?

Thanks a lot,

Weide

Re: question about mirror maker

Posted by Todd Palino <tp...@linkedin.com>.
As far as Zookeeper goes, any time you have network communication you have
the change of a problem. I would rather have the network issue on the
consumer side, rather than the producer side.

I would certainly prefer to have the offsets committed only after the
message is produced (based on the acks setting for the producer). The
problem with this is that the consumer and the producer in mirror maker
are separated by a queue, and there would need to be a significant amount
of communicated added between the two for that to work, but without a loss
of throughput.

-Todd

On 5/12/14, 11:26 AM, "Steven Wu" <st...@netflix.com> wrote:

>if placing mirror maker in the same datacenter as target cluster,
>it/consumer will talks to zookeeper in remote/source datacenter. would it
>more susceptible to network problems?
>
>As for the problem commit offset without actually producing/writing msgs
>to
>target cluster, it can be solved by disabling auto-commit. and only commit
>msgs that are actually persisted in target cluster.
>
>what do you think  of this opposite approach?
>
>
>On Sun, May 11, 2014 at 8:48 PM, Todd Palino <tp...@linkedin.com> wrote:
>
>> Yes, on both counts. Putting the mirror maker in the same datacenter in
>> the target cluster is exactly what we do as well. We also monitor both
>>the
>> consumer lag (by comparing the offsets stored in Zookeeper and the tail
>> offset on the brokers), and the number of dropped and failed messages on
>> the mirror maker producer side. The other thing to do is to make sure to
>> check very carefully when you are changing anything about the producer
>> configuration, to assure that you have not made a mistake.
>>
>> -Todd
>>
>> On 5/11/14, 9:12 AM, "Weide Zhang" <we...@gmail.com> wrote:
>>
>> >Hi Todd,
>> >
>> >Thanks for your answer. with regard to fail over for mirror maker, does
>> >that mean if i have 4 mirror maker running in different machines with
>>same
>> >consumer group, it will auto load balance if one of the mirror maker
>>fails
>> >? Also, it looks to prevent mirror maker commit wrong (consumer work
>>but
>> >not producer) due to cross data center network issue, mirror maker
>>need to
>> >be placed along with the target cluster so that this scenario is
>>minimized
>> >?
>> >
>> >
>> >On Sat, May 10, 2014 at 11:39 PM, Todd Palino <tp...@linkedin.com>
>> >wrote:
>> >
>> >> Well, if you have a cluster in each datacenter, all with the same
>> >>topics,
>> >> you can¹t just mirror the messages between them, as you will create a
>> >> loop. The way we do it is to have a ³local² cluster and an
>>³aggregate²
>> >> cluster. The local cluster has the data for only that datacenter.
>>Then
>> >>we
>> >> run mirror makers that copy the messages from each of the local
>>clusters
>> >> into the aggregate cluster. Everything produces into the local
>>clusters,
>> >> and nothing produces into the aggregate clusters. In general,
>>consumers
>> >> consume from the aggregate cluster (unless they specifically want
>>only
>> >> local data).
>> >>
>> >> The mirror maker is as fault tolerant as any other consumer. That is,
>> >>if a
>> >> mirror maker goes down, the others configured with the same consumer
>> >>group
>> >> (we generally run at least 4 for any mirror maker, sometimes up to
>>10)
>> >> will rebalance and start back up from the last committed offset. What
>> >>you
>> >> need to watch out for is if the mirror maker is unable to produce
>> >> messages, for example, if the network goes down. If it can still
>>consume
>> >> messages, but cannot produce them, you will lose messages as the
>> >>consumer
>> >> will continue to commit offsets with no knowledge that the producer
>>is
>> >> failing.
>> >>
>> >> -Todd
>> >>
>> >> On 5/8/14, 11:20 AM, "Weide Zhang" <we...@gmail.com> wrote:
>> >>
>> >> >Hi,
>> >> >
>> >> >I have a question about mirror maker. say I have 3 data centers each
>> >> >producing topic 'A' with separate kafka cluster running. if 3 of the
>> >>data
>> >> >need to be kept in sync with each other, shall i create 3 mirror
>>maker
>> >>in
>> >> >each data center to get the data from the other two ?
>> >> >
>> >> >also, it mentioned that mirror making is not fault tolerant ? so
>>what
>> >>will
>> >> >be the behavior of mirror consumer if it went down due to network
>>and
>> >>back
>> >> >up ? do they catch up with last offset from which they last mirror
>>? If
>> >> >so,
>> >> >is it enabled by default or I have to configure  ?
>> >> >
>> >> >Thanks a lot,
>> >> >
>> >> >Weide
>> >>
>> >>
>>
>>


Re: question about mirror maker

Posted by Steven Wu <st...@netflix.com>.
if placing mirror maker in the same datacenter as target cluster,
it/consumer will talks to zookeeper in remote/source datacenter. would it
more susceptible to network problems?

As for the problem commit offset without actually producing/writing msgs to
target cluster, it can be solved by disabling auto-commit. and only commit
msgs that are actually persisted in target cluster.

what do you think  of this opposite approach?


On Sun, May 11, 2014 at 8:48 PM, Todd Palino <tp...@linkedin.com> wrote:

> Yes, on both counts. Putting the mirror maker in the same datacenter in
> the target cluster is exactly what we do as well. We also monitor both the
> consumer lag (by comparing the offsets stored in Zookeeper and the tail
> offset on the brokers), and the number of dropped and failed messages on
> the mirror maker producer side. The other thing to do is to make sure to
> check very carefully when you are changing anything about the producer
> configuration, to assure that you have not made a mistake.
>
> -Todd
>
> On 5/11/14, 9:12 AM, "Weide Zhang" <we...@gmail.com> wrote:
>
> >Hi Todd,
> >
> >Thanks for your answer. with regard to fail over for mirror maker, does
> >that mean if i have 4 mirror maker running in different machines with same
> >consumer group, it will auto load balance if one of the mirror maker fails
> >? Also, it looks to prevent mirror maker commit wrong (consumer work but
> >not producer) due to cross data center network issue, mirror maker need to
> >be placed along with the target cluster so that this scenario is minimized
> >?
> >
> >
> >On Sat, May 10, 2014 at 11:39 PM, Todd Palino <tp...@linkedin.com>
> >wrote:
> >
> >> Well, if you have a cluster in each datacenter, all with the same
> >>topics,
> >> you can¹t just mirror the messages between them, as you will create a
> >> loop. The way we do it is to have a ³local² cluster and an ³aggregate²
> >> cluster. The local cluster has the data for only that datacenter. Then
> >>we
> >> run mirror makers that copy the messages from each of the local clusters
> >> into the aggregate cluster. Everything produces into the local clusters,
> >> and nothing produces into the aggregate clusters. In general, consumers
> >> consume from the aggregate cluster (unless they specifically want only
> >> local data).
> >>
> >> The mirror maker is as fault tolerant as any other consumer. That is,
> >>if a
> >> mirror maker goes down, the others configured with the same consumer
> >>group
> >> (we generally run at least 4 for any mirror maker, sometimes up to 10)
> >> will rebalance and start back up from the last committed offset. What
> >>you
> >> need to watch out for is if the mirror maker is unable to produce
> >> messages, for example, if the network goes down. If it can still consume
> >> messages, but cannot produce them, you will lose messages as the
> >>consumer
> >> will continue to commit offsets with no knowledge that the producer is
> >> failing.
> >>
> >> -Todd
> >>
> >> On 5/8/14, 11:20 AM, "Weide Zhang" <we...@gmail.com> wrote:
> >>
> >> >Hi,
> >> >
> >> >I have a question about mirror maker. say I have 3 data centers each
> >> >producing topic 'A' with separate kafka cluster running. if 3 of the
> >>data
> >> >need to be kept in sync with each other, shall i create 3 mirror maker
> >>in
> >> >each data center to get the data from the other two ?
> >> >
> >> >also, it mentioned that mirror making is not fault tolerant ? so what
> >>will
> >> >be the behavior of mirror consumer if it went down due to network and
> >>back
> >> >up ? do they catch up with last offset from which they last mirror ? If
> >> >so,
> >> >is it enabled by default or I have to configure  ?
> >> >
> >> >Thanks a lot,
> >> >
> >> >Weide
> >>
> >>
>
>

Re: question about mirror maker

Posted by Todd Palino <tp...@linkedin.com>.
Yes, on both counts. Putting the mirror maker in the same datacenter in
the target cluster is exactly what we do as well. We also monitor both the
consumer lag (by comparing the offsets stored in Zookeeper and the tail
offset on the brokers), and the number of dropped and failed messages on
the mirror maker producer side. The other thing to do is to make sure to
check very carefully when you are changing anything about the producer
configuration, to assure that you have not made a mistake.

-Todd

On 5/11/14, 9:12 AM, "Weide Zhang" <we...@gmail.com> wrote:

>Hi Todd,
>
>Thanks for your answer. with regard to fail over for mirror maker, does
>that mean if i have 4 mirror maker running in different machines with same
>consumer group, it will auto load balance if one of the mirror maker fails
>? Also, it looks to prevent mirror maker commit wrong (consumer work but
>not producer) due to cross data center network issue, mirror maker need to
>be placed along with the target cluster so that this scenario is minimized
>?
>
>
>On Sat, May 10, 2014 at 11:39 PM, Todd Palino <tp...@linkedin.com>
>wrote:
>
>> Well, if you have a cluster in each datacenter, all with the same
>>topics,
>> you can¹t just mirror the messages between them, as you will create a
>> loop. The way we do it is to have a ³local² cluster and an ³aggregate²
>> cluster. The local cluster has the data for only that datacenter. Then
>>we
>> run mirror makers that copy the messages from each of the local clusters
>> into the aggregate cluster. Everything produces into the local clusters,
>> and nothing produces into the aggregate clusters. In general, consumers
>> consume from the aggregate cluster (unless they specifically want only
>> local data).
>>
>> The mirror maker is as fault tolerant as any other consumer. That is,
>>if a
>> mirror maker goes down, the others configured with the same consumer
>>group
>> (we generally run at least 4 for any mirror maker, sometimes up to 10)
>> will rebalance and start back up from the last committed offset. What
>>you
>> need to watch out for is if the mirror maker is unable to produce
>> messages, for example, if the network goes down. If it can still consume
>> messages, but cannot produce them, you will lose messages as the
>>consumer
>> will continue to commit offsets with no knowledge that the producer is
>> failing.
>>
>> -Todd
>>
>> On 5/8/14, 11:20 AM, "Weide Zhang" <we...@gmail.com> wrote:
>>
>> >Hi,
>> >
>> >I have a question about mirror maker. say I have 3 data centers each
>> >producing topic 'A' with separate kafka cluster running. if 3 of the
>>data
>> >need to be kept in sync with each other, shall i create 3 mirror maker
>>in
>> >each data center to get the data from the other two ?
>> >
>> >also, it mentioned that mirror making is not fault tolerant ? so what
>>will
>> >be the behavior of mirror consumer if it went down due to network and
>>back
>> >up ? do they catch up with last offset from which they last mirror ? If
>> >so,
>> >is it enabled by default or I have to configure  ?
>> >
>> >Thanks a lot,
>> >
>> >Weide
>>
>>


Re: question about mirror maker

Posted by Weide Zhang <we...@gmail.com>.
Hi Todd,

Thanks for your answer. with regard to fail over for mirror maker, does
that mean if i have 4 mirror maker running in different machines with same
consumer group, it will auto load balance if one of the mirror maker fails
? Also, it looks to prevent mirror maker commit wrong (consumer work but
not producer) due to cross data center network issue, mirror maker need to
be placed along with the target cluster so that this scenario is minimized
?


On Sat, May 10, 2014 at 11:39 PM, Todd Palino <tp...@linkedin.com> wrote:

> Well, if you have a cluster in each datacenter, all with the same topics,
> you can¹t just mirror the messages between them, as you will create a
> loop. The way we do it is to have a ³local² cluster and an ³aggregate²
> cluster. The local cluster has the data for only that datacenter. Then we
> run mirror makers that copy the messages from each of the local clusters
> into the aggregate cluster. Everything produces into the local clusters,
> and nothing produces into the aggregate clusters. In general, consumers
> consume from the aggregate cluster (unless they specifically want only
> local data).
>
> The mirror maker is as fault tolerant as any other consumer. That is, if a
> mirror maker goes down, the others configured with the same consumer group
> (we generally run at least 4 for any mirror maker, sometimes up to 10)
> will rebalance and start back up from the last committed offset. What you
> need to watch out for is if the mirror maker is unable to produce
> messages, for example, if the network goes down. If it can still consume
> messages, but cannot produce them, you will lose messages as the consumer
> will continue to commit offsets with no knowledge that the producer is
> failing.
>
> -Todd
>
> On 5/8/14, 11:20 AM, "Weide Zhang" <we...@gmail.com> wrote:
>
> >Hi,
> >
> >I have a question about mirror maker. say I have 3 data centers each
> >producing topic 'A' with separate kafka cluster running. if 3 of the data
> >need to be kept in sync with each other, shall i create 3 mirror maker in
> >each data center to get the data from the other two ?
> >
> >also, it mentioned that mirror making is not fault tolerant ? so what will
> >be the behavior of mirror consumer if it went down due to network and back
> >up ? do they catch up with last offset from which they last mirror ? If
> >so,
> >is it enabled by default or I have to configure  ?
> >
> >Thanks a lot,
> >
> >Weide
>
>

Re: question about mirror maker

Posted by Todd Palino <tp...@linkedin.com>.
Well, if you have a cluster in each datacenter, all with the same topics,
you can¹t just mirror the messages between them, as you will create a
loop. The way we do it is to have a ³local² cluster and an ³aggregate²
cluster. The local cluster has the data for only that datacenter. Then we
run mirror makers that copy the messages from each of the local clusters
into the aggregate cluster. Everything produces into the local clusters,
and nothing produces into the aggregate clusters. In general, consumers
consume from the aggregate cluster (unless they specifically want only
local data).

The mirror maker is as fault tolerant as any other consumer. That is, if a
mirror maker goes down, the others configured with the same consumer group
(we generally run at least 4 for any mirror maker, sometimes up to 10)
will rebalance and start back up from the last committed offset. What you
need to watch out for is if the mirror maker is unable to produce
messages, for example, if the network goes down. If it can still consume
messages, but cannot produce them, you will lose messages as the consumer
will continue to commit offsets with no knowledge that the producer is
failing.

-Todd

On 5/8/14, 11:20 AM, "Weide Zhang" <we...@gmail.com> wrote:

>Hi,
>
>I have a question about mirror maker. say I have 3 data centers each
>producing topic 'A' with separate kafka cluster running. if 3 of the data
>need to be kept in sync with each other, shall i create 3 mirror maker in
>each data center to get the data from the other two ?
>
>also, it mentioned that mirror making is not fault tolerant ? so what will
>be the behavior of mirror consumer if it went down due to network and back
>up ? do they catch up with last offset from which they last mirror ? If
>so,
>is it enabled by default or I have to configure  ?
>
>Thanks a lot,
>
>Weide