You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@kafka.apache.org by Vahid S Hashemian <va...@us.ibm.com> on 2017/07/05 23:00:46 UTC

Mirroring multiple clusters into one

The literature suggests running the MM on the target cluster when possible 
(with the exception of when encryption is required for transferred data).
I am wondering if this is still the recommended approach when mirroring 
from multiple clusters to a single cluster (i.e. multiple MM instances).
Is there anything in particular (metric, specification, etc.) to consider 
before making a decision?

Thanks.
--Vahid

Re: Mirroring multiple clusters into one

Posted by Vahid S Hashemian <va...@us.ibm.com>.

Thanks a lot for your input James.

Regards,
--Vahid



From:   James Cheng <wu...@gmail.com>
To:     dev@kafka.apache.org
Cc:     users@kafka.apache.org
Date:   07/06/2017 10:26 PM
Subject:        Re: Mirroring multiple clusters into one



Answers inline below.

-James

Sent from my iPhone

> On Jul 7, 2017, at 1:18 AM, Vahid S Hashemian 
<va...@us.ibm.com> wrote:
> 
> James,
> 
> Thanks for sharing your thoughts and experience.
> Could you please also confirm whether
> - you do any encryption for the mirrored data?
Not at the Kafka level. The data goes over a VPN.

> - you have a many-to-one mirroring similar to what I described?
> 

Yes, we mirror multiple source clusters to a single target cluster. We 
have a topic naming convention where our topics are prefixed with their 
cluster name, so as long as we follow that convention, each source topic 
gets mirrored to a unique target topic. That is, we try not to have 
multiple mirrormakers writing to a single target topic. 

Our topic names in the target cluster get prefixed with the string 
"mirror." And then we never mirror topics that start with "mirror." This 
prevents us from creating mirroring loops.

> Thanks.
> --Vahid
> 
> 
> 
> From:   James Cheng <wu...@gmail.com>
> To:     users@kafka.apache.org
> Cc:     dev <de...@kafka.apache.org>
> Date:   07/06/2017 12:37 PM
> Subject:        Re: Mirroring multiple clusters into one
> 
> 
> 
> I'm not sure what the "official" recommendation is. At TiVo, we *do* run 

> all our mirrormakers near the target cluster. It works fine for us, but 
> we're still fairly inexperienced, so I'm not sure how strong of a data 
> point we should be.
> 
> I think the thought process is, if you are mirroring from a source 
cluster 
> to a target cluster where there is a WAN between the two, then whichever 

> request goes across the WAN has a higher chance of intermittent failure 
> than the one over the LAN. That means that if mirrormaker is near the 
> source cluster, the produce request over the WAN to the target cluster 
may 
> fail. If the mirrormaker is near the target cluster, then the fetch 
> request over the WAN to the source cluster may fail.
> 
> Failed fetch requests don't have much impact on data replication, it 
just 
> delays it. Whereas a failure during a produce request may introduce 
> duplicates.
> 
> Becket Qin from LinkedIn did a presentation on tuning producer 
performance 
> at a meetup last year, and I remember he specifically talked about 
> producing over a WAN as one of the cases where you have to tune 
settings. 
> Maybe that presentation will give more ideas about what to look at. 
> 
https://www.slideshare.net/mobile/JiangjieQin/producer-performance-tuning-for-apache-kafka-63147600

> 
> 
> -James
> 
> Sent from my iPhone
> 
>> On Jul 6, 2017, at 1:00 AM, Vahid S Hashemian 
> <va...@us.ibm.com> wrote:
>> 
>> The literature suggests running the MM on the target cluster when 
> possible 
>> (with the exception of when encryption is required for transferred 
> data).
>> I am wondering if this is still the recommended approach when mirroring 

>> from multiple clusters to a single cluster (i.e. multiple MM 
instances).
>> Is there anything in particular (metric, specification, etc.) to 
> consider 
>> before making a decision?
>> 
>> Thanks.
>> --Vahid
>> 
>> 
> 
> 
> 
>

Re: Mirroring multiple clusters into one

Posted by Vahid S Hashemian <va...@us.ibm.com>.

Thanks a lot for your input James.

Regards,
--Vahid



From:   James Cheng <wu...@gmail.com>
To:     dev@kafka.apache.org
Cc:     users@kafka.apache.org
Date:   07/06/2017 10:26 PM
Subject:        Re: Mirroring multiple clusters into one



Answers inline below.

-James

Sent from my iPhone

> On Jul 7, 2017, at 1:18 AM, Vahid S Hashemian 
<va...@us.ibm.com> wrote:
> 
> James,
> 
> Thanks for sharing your thoughts and experience.
> Could you please also confirm whether
> - you do any encryption for the mirrored data?
Not at the Kafka level. The data goes over a VPN.

> - you have a many-to-one mirroring similar to what I described?
> 

Yes, we mirror multiple source clusters to a single target cluster. We 
have a topic naming convention where our topics are prefixed with their 
cluster name, so as long as we follow that convention, each source topic 
gets mirrored to a unique target topic. That is, we try not to have 
multiple mirrormakers writing to a single target topic. 

Our topic names in the target cluster get prefixed with the string 
"mirror." And then we never mirror topics that start with "mirror." This 
prevents us from creating mirroring loops.

> Thanks.
> --Vahid
> 
> 
> 
> From:   James Cheng <wu...@gmail.com>
> To:     users@kafka.apache.org
> Cc:     dev <de...@kafka.apache.org>
> Date:   07/06/2017 12:37 PM
> Subject:        Re: Mirroring multiple clusters into one
> 
> 
> 
> I'm not sure what the "official" recommendation is. At TiVo, we *do* run 

> all our mirrormakers near the target cluster. It works fine for us, but 
> we're still fairly inexperienced, so I'm not sure how strong of a data 
> point we should be.
> 
> I think the thought process is, if you are mirroring from a source 
cluster 
> to a target cluster where there is a WAN between the two, then whichever 

> request goes across the WAN has a higher chance of intermittent failure 
> than the one over the LAN. That means that if mirrormaker is near the 
> source cluster, the produce request over the WAN to the target cluster 
may 
> fail. If the mirrormaker is near the target cluster, then the fetch 
> request over the WAN to the source cluster may fail.
> 
> Failed fetch requests don't have much impact on data replication, it 
just 
> delays it. Whereas a failure during a produce request may introduce 
> duplicates.
> 
> Becket Qin from LinkedIn did a presentation on tuning producer 
performance 
> at a meetup last year, and I remember he specifically talked about 
> producing over a WAN as one of the cases where you have to tune 
settings. 
> Maybe that presentation will give more ideas about what to look at. 
> 
https://www.slideshare.net/mobile/JiangjieQin/producer-performance-tuning-for-apache-kafka-63147600

> 
> 
> -James
> 
> Sent from my iPhone
> 
>> On Jul 6, 2017, at 1:00 AM, Vahid S Hashemian 
> <va...@us.ibm.com> wrote:
>> 
>> The literature suggests running the MM on the target cluster when 
> possible 
>> (with the exception of when encryption is required for transferred 
> data).
>> I am wondering if this is still the recommended approach when mirroring 

>> from multiple clusters to a single cluster (i.e. multiple MM 
instances).
>> Is there anything in particular (metric, specification, etc.) to 
> consider 
>> before making a decision?
>> 
>> Thanks.
>> --Vahid
>> 
>> 
> 
> 
> 
>

Re: Mirroring multiple clusters into one

Posted by James Cheng <wu...@gmail.com>.

Answers inline below.

-James

Sent from my iPhone

> On Jul 7, 2017, at 1:18 AM, Vahid S Hashemian <va...@us.ibm.com> wrote:
> 
> James,
> 
> Thanks for sharing your thoughts and experience.
> Could you please also confirm whether
> - you do any encryption for the mirrored data?
Not at the Kafka level. The data goes over a VPN.

> - you have a many-to-one mirroring similar to what I described?
> 

Yes, we mirror multiple source clusters to a single target cluster. We have a topic naming convention where our topics are prefixed with their cluster name, so as long as we follow that convention, each source topic gets mirrored to a unique target topic. That is, we try not to have multiple mirrormakers writing to a single target topic. 

Our topic names in the target cluster get prefixed with the string "mirror." And then we never mirror topics that start with "mirror." This prevents us from creating mirroring loops.

> Thanks.
> --Vahid
> 
> 
> 
> From:   James Cheng <wu...@gmail.com>
> To:     users@kafka.apache.org
> Cc:     dev <de...@kafka.apache.org>
> Date:   07/06/2017 12:37 PM
> Subject:        Re: Mirroring multiple clusters into one
> 
> 
> 
> I'm not sure what the "official" recommendation is. At TiVo, we *do* run 
> all our mirrormakers near the target cluster. It works fine for us, but 
> we're still fairly inexperienced, so I'm not sure how strong of a data 
> point we should be.
> 
> I think the thought process is, if you are mirroring from a source cluster 
> to a target cluster where there is a WAN between the two, then whichever 
> request goes across the WAN has a higher chance of intermittent failure 
> than the one over the LAN. That means that if mirrormaker is near the 
> source cluster, the produce request over the WAN to the target cluster may 
> fail. If the mirrormaker is near the target cluster, then the fetch 
> request over the WAN to the source cluster may fail.
> 
> Failed fetch requests don't have much impact on data replication, it just 
> delays it. Whereas a failure during a produce request may introduce 
> duplicates.
> 
> Becket Qin from LinkedIn did a presentation on tuning producer performance 
> at a meetup last year, and I remember he specifically talked about 
> producing over a WAN as one of the cases where you have to tune settings. 
> Maybe that presentation will give more ideas about what to look at. 
> https://www.slideshare.net/mobile/JiangjieQin/producer-performance-tuning-for-apache-kafka-63147600
> 
> 
> -James
> 
> Sent from my iPhone
> 
>> On Jul 6, 2017, at 1:00 AM, Vahid S Hashemian 
> <va...@us.ibm.com> wrote:
>> 
>> The literature suggests running the MM on the target cluster when 
> possible 
>> (with the exception of when encryption is required for transferred 
> data).
>> I am wondering if this is still the recommended approach when mirroring 
>> from multiple clusters to a single cluster (i.e. multiple MM instances).
>> Is there anything in particular (metric, specification, etc.) to 
> consider 
>> before making a decision?
>> 
>> Thanks.
>> --Vahid
>> 
>> 
> 
> 
> 
>

Re: Mirroring multiple clusters into one

Posted by James Cheng <wu...@gmail.com>.

Answers inline below.

-James

Sent from my iPhone

> On Jul 7, 2017, at 1:18 AM, Vahid S Hashemian <va...@us.ibm.com> wrote:
> 
> James,
> 
> Thanks for sharing your thoughts and experience.
> Could you please also confirm whether
> - you do any encryption for the mirrored data?
Not at the Kafka level. The data goes over a VPN.

> - you have a many-to-one mirroring similar to what I described?
> 

Yes, we mirror multiple source clusters to a single target cluster. We have a topic naming convention where our topics are prefixed with their cluster name, so as long as we follow that convention, each source topic gets mirrored to a unique target topic. That is, we try not to have multiple mirrormakers writing to a single target topic. 

Our topic names in the target cluster get prefixed with the string "mirror." And then we never mirror topics that start with "mirror." This prevents us from creating mirroring loops.

> Thanks.
> --Vahid
> 
> 
> 
> From:   James Cheng <wu...@gmail.com>
> To:     users@kafka.apache.org
> Cc:     dev <de...@kafka.apache.org>
> Date:   07/06/2017 12:37 PM
> Subject:        Re: Mirroring multiple clusters into one
> 
> 
> 
> I'm not sure what the "official" recommendation is. At TiVo, we *do* run 
> all our mirrormakers near the target cluster. It works fine for us, but 
> we're still fairly inexperienced, so I'm not sure how strong of a data 
> point we should be.
> 
> I think the thought process is, if you are mirroring from a source cluster 
> to a target cluster where there is a WAN between the two, then whichever 
> request goes across the WAN has a higher chance of intermittent failure 
> than the one over the LAN. That means that if mirrormaker is near the 
> source cluster, the produce request over the WAN to the target cluster may 
> fail. If the mirrormaker is near the target cluster, then the fetch 
> request over the WAN to the source cluster may fail.
> 
> Failed fetch requests don't have much impact on data replication, it just 
> delays it. Whereas a failure during a produce request may introduce 
> duplicates.
> 
> Becket Qin from LinkedIn did a presentation on tuning producer performance 
> at a meetup last year, and I remember he specifically talked about 
> producing over a WAN as one of the cases where you have to tune settings. 
> Maybe that presentation will give more ideas about what to look at. 
> https://www.slideshare.net/mobile/JiangjieQin/producer-performance-tuning-for-apache-kafka-63147600
> 
> 
> -James
> 
> Sent from my iPhone
> 
>> On Jul 6, 2017, at 1:00 AM, Vahid S Hashemian 
> <va...@us.ibm.com> wrote:
>> 
>> The literature suggests running the MM on the target cluster when 
> possible 
>> (with the exception of when encryption is required for transferred 
> data).
>> I am wondering if this is still the recommended approach when mirroring 
>> from multiple clusters to a single cluster (i.e. multiple MM instances).
>> Is there anything in particular (metric, specification, etc.) to 
> consider 
>> before making a decision?
>> 
>> Thanks.
>> --Vahid
>> 
>> 
> 
> 
> 
>

Re: Mirroring multiple clusters into one

Posted by Vahid S Hashemian <va...@us.ibm.com>.

James,

Thanks for sharing your thoughts and experience.
Could you please also confirm whether
- you do any encryption for the mirrored data?
- you have a many-to-one mirroring similar to what I described?

Thanks.
--Vahid

From:   James Cheng <wu...@gmail.com>
To:     users@kafka.apache.org
Cc:     dev <de...@kafka.apache.org>
Date:   07/06/2017 12:37 PM
Subject:        Re: Mirroring multiple clusters into one

I'm not sure what the "official" recommendation is. At TiVo, we *do* run 
all our mirrormakers near the target cluster. It works fine for us, but 
we're still fairly inexperienced, so I'm not sure how strong of a data 
point we should be.

I think the thought process is, if you are mirroring from a source cluster 
to a target cluster where there is a WAN between the two, then whichever 
request goes across the WAN has a higher chance of intermittent failure 
than the one over the LAN. That means that if mirrormaker is near the 
source cluster, the produce request over the WAN to the target cluster may 
fail. If the mirrormaker is near the target cluster, then the fetch 
request over the WAN to the source cluster may fail.

Failed fetch requests don't have much impact on data replication, it just 
delays it. Whereas a failure during a produce request may introduce 
duplicates.

Becket Qin from LinkedIn did a presentation on tuning producer performance 
at a meetup last year, and I remember he specifically talked about 
producing over a WAN as one of the cases where you have to tune settings. 
Maybe that presentation will give more ideas about what to look at. 
https://www.slideshare.net/mobile/JiangjieQin/producer-performance-tuning-for-apache-kafka-63147600

-James

Sent from my iPhone

> On Jul 6, 2017, at 1:00 AM, Vahid S Hashemian 
<va...@us.ibm.com> wrote:
> 
> The literature suggests running the MM on the target cluster when 
possible 
> (with the exception of when encryption is required for transferred 
data).
> I am wondering if this is still the recommended approach when mirroring 
> from multiple clusters to a single cluster (i.e. multiple MM instances).
> Is there anything in particular (metric, specification, etc.) to 
consider 
> before making a decision?
> 
> Thanks.
> --Vahid
> 
>

Re: Mirroring multiple clusters into one

Posted by Vahid S Hashemian <va...@us.ibm.com>.

James,

Thanks for sharing your thoughts and experience.
Could you please also confirm whether
- you do any encryption for the mirrored data?
- you have a many-to-one mirroring similar to what I described?

Thanks.
--Vahid

From:   James Cheng <wu...@gmail.com>
To:     users@kafka.apache.org
Cc:     dev <de...@kafka.apache.org>
Date:   07/06/2017 12:37 PM
Subject:        Re: Mirroring multiple clusters into one

I'm not sure what the "official" recommendation is. At TiVo, we *do* run 
all our mirrormakers near the target cluster. It works fine for us, but 
we're still fairly inexperienced, so I'm not sure how strong of a data 
point we should be.

I think the thought process is, if you are mirroring from a source cluster 
to a target cluster where there is a WAN between the two, then whichever 
request goes across the WAN has a higher chance of intermittent failure 
than the one over the LAN. That means that if mirrormaker is near the 
source cluster, the produce request over the WAN to the target cluster may 
fail. If the mirrormaker is near the target cluster, then the fetch 
request over the WAN to the source cluster may fail.

Failed fetch requests don't have much impact on data replication, it just 
delays it. Whereas a failure during a produce request may introduce 
duplicates.

Becket Qin from LinkedIn did a presentation on tuning producer performance 
at a meetup last year, and I remember he specifically talked about 
producing over a WAN as one of the cases where you have to tune settings. 
Maybe that presentation will give more ideas about what to look at. 
https://www.slideshare.net/mobile/JiangjieQin/producer-performance-tuning-for-apache-kafka-63147600

-James

Sent from my iPhone

> On Jul 6, 2017, at 1:00 AM, Vahid S Hashemian 
<va...@us.ibm.com> wrote:
> 
> The literature suggests running the MM on the target cluster when 
possible 
> (with the exception of when encryption is required for transferred 
data).
> I am wondering if this is still the recommended approach when mirroring 
> from multiple clusters to a single cluster (i.e. multiple MM instances).
> Is there anything in particular (metric, specification, etc.) to 
consider 
> before making a decision?
> 
> Thanks.
> --Vahid
> 
>

Re: Mirroring multiple clusters into one

Posted by James Cheng <wu...@gmail.com>.

I'm not sure what the "official" recommendation is. At TiVo, we *do* run all our mirrormakers near the target cluster. It works fine for us, but we're still fairly inexperienced, so I'm not sure how strong of a data point we should be.

I think the thought process is, if you are mirroring from a source cluster to a target cluster where there is a WAN between the two, then whichever request goes across the WAN has a higher chance of intermittent failure than the one over the LAN. That means that if mirrormaker is near the source cluster, the produce request over the WAN to the target cluster may fail. If the mirrormaker is near the target cluster, then the fetch request over the WAN to the source cluster may fail.

Failed fetch requests don't have much impact on data replication, it just delays it. Whereas a failure during a produce request may introduce duplicates.

Becket Qin from LinkedIn did a presentation on tuning producer performance at a meetup last year, and I remember he specifically talked about producing over a WAN as one of the cases where you have to tune settings. Maybe that presentation will give more ideas about what to look at. https://www.slideshare.net/mobile/JiangjieQin/producer-performance-tuning-for-apache-kafka-63147600

-James

Sent from my iPhone

> On Jul 6, 2017, at 1:00 AM, Vahid S Hashemian <va...@us.ibm.com> wrote:
> 
> The literature suggests running the MM on the target cluster when possible 
> (with the exception of when encryption is required for transferred data).
> I am wondering if this is still the recommended approach when mirroring 
> from multiple clusters to a single cluster (i.e. multiple MM instances).
> Is there anything in particular (metric, specification, etc.) to consider 
> before making a decision?
> 
> Thanks.
> --Vahid
> 
>

Re: Mirroring multiple clusters into one

Posted by James Cheng <wu...@gmail.com>.

I'm not sure what the "official" recommendation is. At TiVo, we *do* run all our mirrormakers near the target cluster. It works fine for us, but we're still fairly inexperienced, so I'm not sure how strong of a data point we should be.

I think the thought process is, if you are mirroring from a source cluster to a target cluster where there is a WAN between the two, then whichever request goes across the WAN has a higher chance of intermittent failure than the one over the LAN. That means that if mirrormaker is near the source cluster, the produce request over the WAN to the target cluster may fail. If the mirrormaker is near the target cluster, then the fetch request over the WAN to the source cluster may fail.

Failed fetch requests don't have much impact on data replication, it just delays it. Whereas a failure during a produce request may introduce duplicates.

Becket Qin from LinkedIn did a presentation on tuning producer performance at a meetup last year, and I remember he specifically talked about producing over a WAN as one of the cases where you have to tune settings. Maybe that presentation will give more ideas about what to look at. https://www.slideshare.net/mobile/JiangjieQin/producer-performance-tuning-for-apache-kafka-63147600

-James

Sent from my iPhone

> On Jul 6, 2017, at 1:00 AM, Vahid S Hashemian <va...@us.ibm.com> wrote:
> 
> The literature suggests running the MM on the target cluster when possible 
> (with the exception of when encryption is required for transferred data).
> I am wondering if this is still the recommended approach when mirroring 
> from multiple clusters to a single cluster (i.e. multiple MM instances).
> Is there anything in particular (metric, specification, etc.) to consider 
> before making a decision?
> 
> Thanks.
> --Vahid
> 
>