You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Iftach Ben-Yosef <ib...@outbrain.com> on 2020/07/01 13:38:33 UTC

destination topics in mm2 larger than source topic

Hello everyone.

I'm testing mm2 for our cross dc topic replication. We used to do it using
mm1 but faced various issues.

So far, mm2 is working well, but I have 1 issue which I can't really
explain; the destination topic is larger than the source topic.

For example, We have 1 topic which on the source cluster is around
2.8-2.9TB with retention.ms=86400000

I added to our mm2 cluster the "sync.topic.configs.enabled=false" config,
and edited the retention.ms of the destination topic to be 57600000. Other
than that, I haven't touched the topic created by mm2 on the destination
cluster.

By logic I'd say that if I shortened the retention on the destination, the
topic size should decrease, but in practice, I see that it is larger than
the source topic (it's about 4.6TB).
This same behaviour is seen on all 3 topics which I am currently mirroring
(all 3 from different source clusters, into the same destination clusters)

Does anyone have any idea as to why mm2 acts this way for me?

Thanks,
Iftach

-- 
The above terms reflect a potential business arrangement, are provided 
solely as a basis for further discussion, and are not intended to be and do 
not constitute a legally binding obligation. No legally binding obligations 
will be created, implied, or inferred until an agreement in final form is 
executed in writing by all parties involved.


This email and any 
attachments hereto may be confidential or privileged.  If you received this 
communication by mistake, please don't forward it to anyone else, please 
erase all copies and attachments, and please let me know that it has gone 
to the wrong person. Thanks.

Re: destination topics in mm2 larger than source topic

Posted by Ricardo Ferreira <ri...@riferrei.com>.
Iftach,

This is a very useful finding. While I don't know the answer to your 
question below, I would like to take this opportunity to encourage you 
to write a blog about this finding =)

Thanks,

-- Ricardo

On 7/7/20 2:48 AM, Iftach Ben-Yosef wrote:
> I believe I got it to work with 
> "source->dest.producer.compression.type = gzip"
> Is there a way to set this globally for the mm2 process and not to do 
> it per mirroring flow?
>
> Thanks,
> Iftach
>
>
> On Tue, Jul 7, 2020 at 9:34 AM Iftach Ben-Yosef 
> <iben-yosef@outbrain.com <ma...@outbrain.com>> wrote:
>
>     Upon further investigation, it the issue is indeed compression as
>     in the logs i see 'ompression.type = none'
>     Does anyone know how to configure gzip compression for
>     the connect-mirror-maker.properties file?
>
>     I tried "producer.override.compression.type = gzip" but that
>     doesnt seem to work.
>
>     Thanks,
>     Iftach
>
>
>     On Mon, Jul 6, 2020 at 8:03 AM Iftach Ben-Yosef
>     <iben-yosef@outbrain.com <ma...@outbrain.com>> wrote:
>
>         Ricardo,
>
>         Thanks for the reply. I did some more testing. I tried
>         mirroring a different topic from 1 of the 3 source clusters
>         used from the previous test, into the same destination
>         cluster. Again, the result topic on the dest cluster is about
>         2 times larger than the source, same config and retention
>         (both have compression.type producer)
>
>         regarding my configuration, other than the clusters and
>         mirroring direction/topic whitelist configs I have the
>         following - changed all the prefixes to .. to make it shorter;
>
>         ..tasks.max = 128
>         ..fetch.max.wait.ms <http://fetch.max.wait.ms> = 150
>         ..fetch.min.bytes = 10485760
>         ..fetch.max.bytes = 52428800
>         ..max.request.size = 10485760
>         ..enable.idempotence = true
>         ..sync.topic.configs.enabled=false (played with this as true
>         and as false)
>
>         Don't see how anything other than perhaps the idempotency
>         could affect the topic size. I have also tried without
>         idempotency config, but it looks the same - and in any case I
>         expect idempotency to maybe decrease the topic size, not
>         increase it...
>
>         Thanks,
>         Iftach
>
>
>
>         On Thu, Jul 2, 2020 at 5:30 PM Ricardo Ferreira
>         <riferrei@riferrei.com <ma...@riferrei.com>> wrote:
>
>             Iftach,
>
>             I think you should try observe if this happens with other
>             topics. Maybe something unrelated might have happened
>             already in the case of the topic that currently has ~3TB
>             of data -- making things even harder to troubleshoot.
>
>             I would recommend creating a new topic with few partitions
>             and configure that topic in the whitelist. Then, observe
>             if the same behavior occur. If it does then it might be
>             something wrong with MM2 -- likely a bug or
>             misconfiguration. If not then you can eliminate MM2 as the
>             cause and work at a smaller scale to see if something went
>             south with the topic. Maybe that could be something not
>             even related to MM2 such as network failures that forced
>             the internal producer of MM2 to retry multiple times and
>             hence produce more data that it should.
>
>             The bottom-line is that certain troubleshooting exercises
>             are hard or sometimes impossible to diagnose with cases
>             that might have been an outlier.
>
>             -- Ricardo
>
>             On 7/1/20 10:02 AM, Iftach Ben-Yosef wrote:
>>             Hi Ryanne, thanks for the quick reply.
>>
>>             I had the thought it might be compression. I see that the topics have the
>>             following config "compression.type=producer". This is for both the source
>>             and destination topics. Should I check something else regarding compression?
>>
>>             Also, the destination topics are larger than the same topic being mirrored
>>             using mm1 - the sum of the 3 topics mirrored by mm2 is much larger than the
>>             1 topic that mm1 produced (they have the same 3 source topics, only mm1
>>             aggregates to 1 destination topic). Retention is again the same between the
>>             mm1 destination topic and the mm2 destination topics.
>>
>>             Thanks,
>>             Iftach
>>
>>
>>             On Wed, Jul 1, 2020 at 4:54 PM Ryanne Dolan<ry...@gmail.com>  <ma...@gmail.com>  wrote:
>>
>>>             Iftach, is it possible the source topic is compressed?
>>>
>>>             Ryanne
>>>
>>>             On Wed, Jul 1, 2020, 8:39 AM Iftach Ben-Yosef<ib...@outbrain.com>  <ma...@outbrain.com>
>>>             wrote:
>>>
>>>>             Hello everyone.
>>>>
>>>>             I'm testing mm2 for our cross dc topic replication. We used to do it
>>>             using
>>>>             mm1 but faced various issues.
>>>>
>>>>             So far, mm2 is working well, but I have 1 issue which I can't really
>>>>             explain; the destination topic is larger than the source topic.
>>>>
>>>>             For example, We have 1 topic which on the source cluster is around
>>>>             2.8-2.9TB withretention.ms  <http://retention.ms>=86400000
>>>>
>>>>             I added to our mm2 cluster the "sync.topic.configs.enabled=false" config,
>>>>             and edited theretention.ms  <http://retention.ms>  of the destination topic to be 57600000.
>>>             Other
>>>>             than that, I haven't touched the topic created by mm2 on the destination
>>>>             cluster.
>>>>
>>>>             By logic I'd say that if I shortened the retention on the destination,
>>>             the
>>>>             topic size should decrease, but in practice, I see that it is larger than
>>>>             the source topic (it's about 4.6TB).
>>>>             This same behaviour is seen on all 3 topics which I am currently
>>>             mirroring
>>>>             (all 3 from different source clusters, into the same destination
>>>             clusters)
>>>>             Does anyone have any idea as to why mm2 acts this way for me?
>>>>
>>>>             Thanks,
>>>>             Iftach
>>>>
>>>>             --
>>>>             The above terms reflect a potential business arrangement, are provided
>>>>             solely as a basis for further discussion, and are not intended to be and
>>>>             do
>>>>             not constitute a legally binding obligation. No legally binding
>>>>             obligations
>>>>             will be created, implied, or inferred until an agreement in final form is
>>>>             executed in writing by all parties involved.
>>>>
>>>>
>>>>             This email and any
>>>>             attachments hereto may be confidential or privileged.  If you received
>>>>             this
>>>>             communication by mistake, please don't forward it to anyone else, please
>>>>             erase all copies and attachments, and please let me know that it has gone
>>>>             to the wrong person. Thanks.
>>>>
>
> The above terms reflect a potential business arrangement, are provided 
> solely as a basis for further discussion, and are not intended to be 
> and do not constitute a legally binding obligation. No legally binding 
> obligations will be created, implied, or inferred until an agreement 
> in final form is executed in writing by all parties involved.
>
> This email and any attachments hereto may be confidential or 
> privileged.  If you received this communication by mistake, please 
> don't forward it to anyone else, please erase all copies and 
> attachments, and please let me know that it has gone to the wrong 
> person. Thanks.

Re: destination topics in mm2 larger than source topic

Posted by Iftach Ben-Yosef <ib...@outbrain.com>.
I believe I got it to work with "source->dest.producer.compression.type =
gzip"
Is there a way to set this globally for the mm2 process and not to do it
per mirroring flow?

Thanks,
Iftach


On Tue, Jul 7, 2020 at 9:34 AM Iftach Ben-Yosef <ib...@outbrain.com>
wrote:

> Upon further investigation, it the issue is indeed compression as in the
> logs i see 'ompression.type = none'
> Does anyone know how to configure gzip compression for
> the connect-mirror-maker.properties file?
>
> I tried "producer.override.compression.type = gzip" but that doesnt
> seem to work.
>
> Thanks,
> Iftach
>
>
> On Mon, Jul 6, 2020 at 8:03 AM Iftach Ben-Yosef <ib...@outbrain.com>
> wrote:
>
>> Ricardo,
>>
>> Thanks for the reply. I did some more testing. I tried mirroring a
>> different topic from 1 of the 3 source clusters used from the previous
>> test, into the same destination cluster. Again, the result topic on the
>> dest cluster is about 2 times larger than the source, same config and
>> retention (both have compression.type producer)
>>
>> regarding my configuration, other than the clusters and mirroring
>> direction/topic whitelist configs I have the following - changed all the
>> prefixes to .. to make it shorter;
>>
>> ..tasks.max = 128
>> ..fetch.max.wait.ms = 150
>> ..fetch.min.bytes = 10485760
>> ..fetch.max.bytes = 52428800
>> ..max.request.size = 10485760
>> ..enable.idempotence = true
>> ..sync.topic.configs.enabled=false (played with this as true and as false)
>>
>> Don't see how anything other than perhaps the idempotency could affect
>> the topic size. I have also tried without idempotency config, but it looks
>> the same - and in any case I expect idempotency to maybe decrease the topic
>> size, not increase it...
>>
>> Thanks,
>> Iftach
>>
>>
>>
>> On Thu, Jul 2, 2020 at 5:30 PM Ricardo Ferreira <ri...@riferrei.com>
>> wrote:
>>
>>> Iftach,
>>>
>>> I think you should try observe if this happens with other topics. Maybe
>>> something unrelated might have happened already in the case of the topic
>>> that currently has ~3TB of data -- making things even harder to
>>> troubleshoot.
>>>
>>> I would recommend creating a new topic with few partitions and configure
>>> that topic in the whitelist. Then, observe if the same behavior occur. If
>>> it does then it might be something wrong with MM2 -- likely a bug or
>>> misconfiguration. If not then you can eliminate MM2 as the cause and work
>>> at a smaller scale to see if something went south with the topic. Maybe
>>> that could be something not even related to MM2 such as network failures
>>> that forced the internal producer of MM2 to retry multiple times and hence
>>> produce more data that it should.
>>>
>>> The bottom-line is that certain troubleshooting exercises are hard or
>>> sometimes impossible to diagnose with cases that might have been an outlier.
>>>
>>> -- Ricardo
>>> On 7/1/20 10:02 AM, Iftach Ben-Yosef wrote:
>>>
>>> Hi Ryanne, thanks for the quick reply.
>>>
>>> I had the thought it might be compression. I see that the topics have the
>>> following config "compression.type=producer". This is for both the source
>>> and destination topics. Should I check something else regarding compression?
>>>
>>> Also, the destination topics are larger than the same topic being mirrored
>>> using mm1 - the sum of the 3 topics mirrored by mm2 is much larger than the
>>> 1 topic that mm1 produced (they have the same 3 source topics, only mm1
>>> aggregates to 1 destination topic). Retention is again the same between the
>>> mm1 destination topic and the mm2 destination topics.
>>>
>>> Thanks,
>>> Iftach
>>>
>>>
>>> On Wed, Jul 1, 2020 at 4:54 PM Ryanne Dolan <ry...@gmail.com> <ry...@gmail.com> wrote:
>>>
>>>
>>> Iftach, is it possible the source topic is compressed?
>>>
>>> Ryanne
>>>
>>> On Wed, Jul 1, 2020, 8:39 AM Iftach Ben-Yosef <ib...@outbrain.com> <ib...@outbrain.com>
>>> wrote:
>>>
>>>
>>> Hello everyone.
>>>
>>> I'm testing mm2 for our cross dc topic replication. We used to do it
>>>
>>> using
>>>
>>> mm1 but faced various issues.
>>>
>>> So far, mm2 is working well, but I have 1 issue which I can't really
>>> explain; the destination topic is larger than the source topic.
>>>
>>> For example, We have 1 topic which on the source cluster is around
>>> 2.8-2.9TB with retention.ms=86400000
>>>
>>> I added to our mm2 cluster the "sync.topic.configs.enabled=false" config,
>>> and edited the retention.ms of the destination topic to be 57600000.
>>>
>>> Other
>>>
>>> than that, I haven't touched the topic created by mm2 on the destination
>>> cluster.
>>>
>>> By logic I'd say that if I shortened the retention on the destination,
>>>
>>> the
>>>
>>> topic size should decrease, but in practice, I see that it is larger than
>>> the source topic (it's about 4.6TB).
>>> This same behaviour is seen on all 3 topics which I am currently
>>>
>>> mirroring
>>>
>>> (all 3 from different source clusters, into the same destination
>>>
>>> clusters)
>>>
>>> Does anyone have any idea as to why mm2 acts this way for me?
>>>
>>> Thanks,
>>> Iftach
>>>
>>> --
>>> The above terms reflect a potential business arrangement, are provided
>>> solely as a basis for further discussion, and are not intended to be and
>>> do
>>> not constitute a legally binding obligation. No legally binding
>>> obligations
>>> will be created, implied, or inferred until an agreement in final form is
>>> executed in writing by all parties involved.
>>>
>>>
>>> This email and any
>>> attachments hereto may be confidential or privileged.  If you received
>>> this
>>> communication by mistake, please don't forward it to anyone else, please
>>> erase all copies and attachments, and please let me know that it has gone
>>> to the wrong person. Thanks.
>>>
>>>
>>>

-- 
The above terms reflect a potential business arrangement, are provided 
solely as a basis for further discussion, and are not intended to be and do 
not constitute a legally binding obligation. No legally binding obligations 
will be created, implied, or inferred until an agreement in final form is 
executed in writing by all parties involved.


This email and any 
attachments hereto may be confidential or privileged.  If you received this 
communication by mistake, please don't forward it to anyone else, please 
erase all copies and attachments, and please let me know that it has gone 
to the wrong person. Thanks.

Re: destination topics in mm2 larger than source topic

Posted by Iftach Ben-Yosef <ib...@outbrain.com>.
Upon further investigation, it the issue is indeed compression as in the
logs i see 'ompression.type = none'
Does anyone know how to configure gzip compression for
the connect-mirror-maker.properties file?

I tried "producer.override.compression.type = gzip" but that doesnt seem to
work.

Thanks,
Iftach


On Mon, Jul 6, 2020 at 8:03 AM Iftach Ben-Yosef <ib...@outbrain.com>
wrote:

> Ricardo,
>
> Thanks for the reply. I did some more testing. I tried mirroring a
> different topic from 1 of the 3 source clusters used from the previous
> test, into the same destination cluster. Again, the result topic on the
> dest cluster is about 2 times larger than the source, same config and
> retention (both have compression.type producer)
>
> regarding my configuration, other than the clusters and mirroring
> direction/topic whitelist configs I have the following - changed all the
> prefixes to .. to make it shorter;
>
> ..tasks.max = 128
> ..fetch.max.wait.ms = 150
> ..fetch.min.bytes = 10485760
> ..fetch.max.bytes = 52428800
> ..max.request.size = 10485760
> ..enable.idempotence = true
> ..sync.topic.configs.enabled=false (played with this as true and as false)
>
> Don't see how anything other than perhaps the idempotency could affect the
> topic size. I have also tried without idempotency config, but it looks the
> same - and in any case I expect idempotency to maybe decrease the topic
> size, not increase it...
>
> Thanks,
> Iftach
>
>
>
> On Thu, Jul 2, 2020 at 5:30 PM Ricardo Ferreira <ri...@riferrei.com>
> wrote:
>
>> Iftach,
>>
>> I think you should try observe if this happens with other topics. Maybe
>> something unrelated might have happened already in the case of the topic
>> that currently has ~3TB of data -- making things even harder to
>> troubleshoot.
>>
>> I would recommend creating a new topic with few partitions and configure
>> that topic in the whitelist. Then, observe if the same behavior occur. If
>> it does then it might be something wrong with MM2 -- likely a bug or
>> misconfiguration. If not then you can eliminate MM2 as the cause and work
>> at a smaller scale to see if something went south with the topic. Maybe
>> that could be something not even related to MM2 such as network failures
>> that forced the internal producer of MM2 to retry multiple times and hence
>> produce more data that it should.
>>
>> The bottom-line is that certain troubleshooting exercises are hard or
>> sometimes impossible to diagnose with cases that might have been an outlier.
>>
>> -- Ricardo
>> On 7/1/20 10:02 AM, Iftach Ben-Yosef wrote:
>>
>> Hi Ryanne, thanks for the quick reply.
>>
>> I had the thought it might be compression. I see that the topics have the
>> following config "compression.type=producer". This is for both the source
>> and destination topics. Should I check something else regarding compression?
>>
>> Also, the destination topics are larger than the same topic being mirrored
>> using mm1 - the sum of the 3 topics mirrored by mm2 is much larger than the
>> 1 topic that mm1 produced (they have the same 3 source topics, only mm1
>> aggregates to 1 destination topic). Retention is again the same between the
>> mm1 destination topic and the mm2 destination topics.
>>
>> Thanks,
>> Iftach
>>
>>
>> On Wed, Jul 1, 2020 at 4:54 PM Ryanne Dolan <ry...@gmail.com> <ry...@gmail.com> wrote:
>>
>>
>> Iftach, is it possible the source topic is compressed?
>>
>> Ryanne
>>
>> On Wed, Jul 1, 2020, 8:39 AM Iftach Ben-Yosef <ib...@outbrain.com> <ib...@outbrain.com>
>> wrote:
>>
>>
>> Hello everyone.
>>
>> I'm testing mm2 for our cross dc topic replication. We used to do it
>>
>> using
>>
>> mm1 but faced various issues.
>>
>> So far, mm2 is working well, but I have 1 issue which I can't really
>> explain; the destination topic is larger than the source topic.
>>
>> For example, We have 1 topic which on the source cluster is around
>> 2.8-2.9TB with retention.ms=86400000
>>
>> I added to our mm2 cluster the "sync.topic.configs.enabled=false" config,
>> and edited the retention.ms of the destination topic to be 57600000.
>>
>> Other
>>
>> than that, I haven't touched the topic created by mm2 on the destination
>> cluster.
>>
>> By logic I'd say that if I shortened the retention on the destination,
>>
>> the
>>
>> topic size should decrease, but in practice, I see that it is larger than
>> the source topic (it's about 4.6TB).
>> This same behaviour is seen on all 3 topics which I am currently
>>
>> mirroring
>>
>> (all 3 from different source clusters, into the same destination
>>
>> clusters)
>>
>> Does anyone have any idea as to why mm2 acts this way for me?
>>
>> Thanks,
>> Iftach
>>
>> --
>> The above terms reflect a potential business arrangement, are provided
>> solely as a basis for further discussion, and are not intended to be and
>> do
>> not constitute a legally binding obligation. No legally binding
>> obligations
>> will be created, implied, or inferred until an agreement in final form is
>> executed in writing by all parties involved.
>>
>>
>> This email and any
>> attachments hereto may be confidential or privileged.  If you received
>> this
>> communication by mistake, please don't forward it to anyone else, please
>> erase all copies and attachments, and please let me know that it has gone
>> to the wrong person. Thanks.
>>
>>
>>

-- 
The above terms reflect a potential business arrangement, are provided 
solely as a basis for further discussion, and are not intended to be and do 
not constitute a legally binding obligation. No legally binding obligations 
will be created, implied, or inferred until an agreement in final form is 
executed in writing by all parties involved.


This email and any 
attachments hereto may be confidential or privileged.  If you received this 
communication by mistake, please don't forward it to anyone else, please 
erase all copies and attachments, and please let me know that it has gone 
to the wrong person. Thanks.

Re: destination topics in mm2 larger than source topic

Posted by Iftach Ben-Yosef <ib...@outbrain.com>.
Ricardo,

Thanks for the reply. I did some more testing. I tried mirroring a
different topic from 1 of the 3 source clusters used from the previous
test, into the same destination cluster. Again, the result topic on the
dest cluster is about 2 times larger than the source, same config and
retention (both have compression.type producer)

regarding my configuration, other than the clusters and mirroring
direction/topic whitelist configs I have the following - changed all the
prefixes to .. to make it shorter;

..tasks.max = 128
..fetch.max.wait.ms = 150
..fetch.min.bytes = 10485760
..fetch.max.bytes = 52428800
..max.request.size = 10485760
..enable.idempotence = true
..sync.topic.configs.enabled=false (played with this as true and as false)

Don't see how anything other than perhaps the idempotency could affect the
topic size. I have also tried without idempotency config, but it looks the
same - and in any case I expect idempotency to maybe decrease the topic
size, not increase it...

Thanks,
Iftach



On Thu, Jul 2, 2020 at 5:30 PM Ricardo Ferreira <ri...@riferrei.com>
wrote:

> Iftach,
>
> I think you should try observe if this happens with other topics. Maybe
> something unrelated might have happened already in the case of the topic
> that currently has ~3TB of data -- making things even harder to
> troubleshoot.
>
> I would recommend creating a new topic with few partitions and configure
> that topic in the whitelist. Then, observe if the same behavior occur. If
> it does then it might be something wrong with MM2 -- likely a bug or
> misconfiguration. If not then you can eliminate MM2 as the cause and work
> at a smaller scale to see if something went south with the topic. Maybe
> that could be something not even related to MM2 such as network failures
> that forced the internal producer of MM2 to retry multiple times and hence
> produce more data that it should.
>
> The bottom-line is that certain troubleshooting exercises are hard or
> sometimes impossible to diagnose with cases that might have been an outlier.
>
> -- Ricardo
> On 7/1/20 10:02 AM, Iftach Ben-Yosef wrote:
>
> Hi Ryanne, thanks for the quick reply.
>
> I had the thought it might be compression. I see that the topics have the
> following config "compression.type=producer". This is for both the source
> and destination topics. Should I check something else regarding compression?
>
> Also, the destination topics are larger than the same topic being mirrored
> using mm1 - the sum of the 3 topics mirrored by mm2 is much larger than the
> 1 topic that mm1 produced (they have the same 3 source topics, only mm1
> aggregates to 1 destination topic). Retention is again the same between the
> mm1 destination topic and the mm2 destination topics.
>
> Thanks,
> Iftach
>
>
> On Wed, Jul 1, 2020 at 4:54 PM Ryanne Dolan <ry...@gmail.com> <ry...@gmail.com> wrote:
>
>
> Iftach, is it possible the source topic is compressed?
>
> Ryanne
>
> On Wed, Jul 1, 2020, 8:39 AM Iftach Ben-Yosef <ib...@outbrain.com> <ib...@outbrain.com>
> wrote:
>
>
> Hello everyone.
>
> I'm testing mm2 for our cross dc topic replication. We used to do it
>
> using
>
> mm1 but faced various issues.
>
> So far, mm2 is working well, but I have 1 issue which I can't really
> explain; the destination topic is larger than the source topic.
>
> For example, We have 1 topic which on the source cluster is around
> 2.8-2.9TB with retention.ms=86400000
>
> I added to our mm2 cluster the "sync.topic.configs.enabled=false" config,
> and edited the retention.ms of the destination topic to be 57600000.
>
> Other
>
> than that, I haven't touched the topic created by mm2 on the destination
> cluster.
>
> By logic I'd say that if I shortened the retention on the destination,
>
> the
>
> topic size should decrease, but in practice, I see that it is larger than
> the source topic (it's about 4.6TB).
> This same behaviour is seen on all 3 topics which I am currently
>
> mirroring
>
> (all 3 from different source clusters, into the same destination
>
> clusters)
>
> Does anyone have any idea as to why mm2 acts this way for me?
>
> Thanks,
> Iftach
>
> --
> The above terms reflect a potential business arrangement, are provided
> solely as a basis for further discussion, and are not intended to be and
> do
> not constitute a legally binding obligation. No legally binding
> obligations
> will be created, implied, or inferred until an agreement in final form is
> executed in writing by all parties involved.
>
>
> This email and any
> attachments hereto may be confidential or privileged.  If you received
> this
> communication by mistake, please don't forward it to anyone else, please
> erase all copies and attachments, and please let me know that it has gone
> to the wrong person. Thanks.
>
>
>

-- 
The above terms reflect a potential business arrangement, are provided 
solely as a basis for further discussion, and are not intended to be and do 
not constitute a legally binding obligation. No legally binding obligations 
will be created, implied, or inferred until an agreement in final form is 
executed in writing by all parties involved.


This email and any 
attachments hereto may be confidential or privileged.  If you received this 
communication by mistake, please don't forward it to anyone else, please 
erase all copies and attachments, and please let me know that it has gone 
to the wrong person. Thanks.

Re: destination topics in mm2 larger than source topic

Posted by Ricardo Ferreira <ri...@riferrei.com>.
Iftach,

I think you should try observe if this happens with other topics. Maybe 
something unrelated might have happened already in the case of the topic 
that currently has ~3TB of data -- making things even harder to 
troubleshoot.

I would recommend creating a new topic with few partitions and configure 
that topic in the whitelist. Then, observe if the same behavior occur. 
If it does then it might be something wrong with MM2 -- likely a bug or 
misconfiguration. If not then you can eliminate MM2 as the cause and 
work at a smaller scale to see if something went south with the topic. 
Maybe that could be something not even related to MM2 such as network 
failures that forced the internal producer of MM2 to retry multiple 
times and hence produce more data that it should.

The bottom-line is that certain troubleshooting exercises are hard or 
sometimes impossible to diagnose with cases that might have been an outlier.

-- Ricardo

On 7/1/20 10:02 AM, Iftach Ben-Yosef wrote:
> Hi Ryanne, thanks for the quick reply.
>
> I had the thought it might be compression. I see that the topics have the
> following config "compression.type=producer". This is for both the source
> and destination topics. Should I check something else regarding compression?
>
> Also, the destination topics are larger than the same topic being mirrored
> using mm1 - the sum of the 3 topics mirrored by mm2 is much larger than the
> 1 topic that mm1 produced (they have the same 3 source topics, only mm1
> aggregates to 1 destination topic). Retention is again the same between the
> mm1 destination topic and the mm2 destination topics.
>
> Thanks,
> Iftach
>
>
> On Wed, Jul 1, 2020 at 4:54 PM Ryanne Dolan <ry...@gmail.com> wrote:
>
>> Iftach, is it possible the source topic is compressed?
>>
>> Ryanne
>>
>> On Wed, Jul 1, 2020, 8:39 AM Iftach Ben-Yosef <ib...@outbrain.com>
>> wrote:
>>
>>> Hello everyone.
>>>
>>> I'm testing mm2 for our cross dc topic replication. We used to do it
>> using
>>> mm1 but faced various issues.
>>>
>>> So far, mm2 is working well, but I have 1 issue which I can't really
>>> explain; the destination topic is larger than the source topic.
>>>
>>> For example, We have 1 topic which on the source cluster is around
>>> 2.8-2.9TB with retention.ms=86400000
>>>
>>> I added to our mm2 cluster the "sync.topic.configs.enabled=false" config,
>>> and edited the retention.ms of the destination topic to be 57600000.
>> Other
>>> than that, I haven't touched the topic created by mm2 on the destination
>>> cluster.
>>>
>>> By logic I'd say that if I shortened the retention on the destination,
>> the
>>> topic size should decrease, but in practice, I see that it is larger than
>>> the source topic (it's about 4.6TB).
>>> This same behaviour is seen on all 3 topics which I am currently
>> mirroring
>>> (all 3 from different source clusters, into the same destination
>> clusters)
>>> Does anyone have any idea as to why mm2 acts this way for me?
>>>
>>> Thanks,
>>> Iftach
>>>
>>> --
>>> The above terms reflect a potential business arrangement, are provided
>>> solely as a basis for further discussion, and are not intended to be and
>>> do
>>> not constitute a legally binding obligation. No legally binding
>>> obligations
>>> will be created, implied, or inferred until an agreement in final form is
>>> executed in writing by all parties involved.
>>>
>>>
>>> This email and any
>>> attachments hereto may be confidential or privileged.  If you received
>>> this
>>> communication by mistake, please don't forward it to anyone else, please
>>> erase all copies and attachments, and please let me know that it has gone
>>> to the wrong person. Thanks.
>>>

Re: destination topics in mm2 larger than source topic

Posted by Iftach Ben-Yosef <ib...@outbrain.com>.
Hi Ryanne, thanks for the quick reply.

I had the thought it might be compression. I see that the topics have the
following config "compression.type=producer". This is for both the source
and destination topics. Should I check something else regarding compression?

Also, the destination topics are larger than the same topic being mirrored
using mm1 - the sum of the 3 topics mirrored by mm2 is much larger than the
1 topic that mm1 produced (they have the same 3 source topics, only mm1
aggregates to 1 destination topic). Retention is again the same between the
mm1 destination topic and the mm2 destination topics.

Thanks,
Iftach


On Wed, Jul 1, 2020 at 4:54 PM Ryanne Dolan <ry...@gmail.com> wrote:

> Iftach, is it possible the source topic is compressed?
>
> Ryanne
>
> On Wed, Jul 1, 2020, 8:39 AM Iftach Ben-Yosef <ib...@outbrain.com>
> wrote:
>
> > Hello everyone.
> >
> > I'm testing mm2 for our cross dc topic replication. We used to do it
> using
> > mm1 but faced various issues.
> >
> > So far, mm2 is working well, but I have 1 issue which I can't really
> > explain; the destination topic is larger than the source topic.
> >
> > For example, We have 1 topic which on the source cluster is around
> > 2.8-2.9TB with retention.ms=86400000
> >
> > I added to our mm2 cluster the "sync.topic.configs.enabled=false" config,
> > and edited the retention.ms of the destination topic to be 57600000.
> Other
> > than that, I haven't touched the topic created by mm2 on the destination
> > cluster.
> >
> > By logic I'd say that if I shortened the retention on the destination,
> the
> > topic size should decrease, but in practice, I see that it is larger than
> > the source topic (it's about 4.6TB).
> > This same behaviour is seen on all 3 topics which I am currently
> mirroring
> > (all 3 from different source clusters, into the same destination
> clusters)
> >
> > Does anyone have any idea as to why mm2 acts this way for me?
> >
> > Thanks,
> > Iftach
> >
> > --
> > The above terms reflect a potential business arrangement, are provided
> > solely as a basis for further discussion, and are not intended to be and
> > do
> > not constitute a legally binding obligation. No legally binding
> > obligations
> > will be created, implied, or inferred until an agreement in final form is
> > executed in writing by all parties involved.
> >
> >
> > This email and any
> > attachments hereto may be confidential or privileged.  If you received
> > this
> > communication by mistake, please don't forward it to anyone else, please
> > erase all copies and attachments, and please let me know that it has gone
> > to the wrong person. Thanks.
> >
>

-- 
The above terms reflect a potential business arrangement, are provided 
solely as a basis for further discussion, and are not intended to be and do 
not constitute a legally binding obligation. No legally binding obligations 
will be created, implied, or inferred until an agreement in final form is 
executed in writing by all parties involved.


This email and any 
attachments hereto may be confidential or privileged.  If you received this 
communication by mistake, please don't forward it to anyone else, please 
erase all copies and attachments, and please let me know that it has gone 
to the wrong person. Thanks.

Re: destination topics in mm2 larger than source topic

Posted by Ryanne Dolan <ry...@gmail.com>.
Iftach, is it possible the source topic is compressed?

Ryanne

On Wed, Jul 1, 2020, 8:39 AM Iftach Ben-Yosef <ib...@outbrain.com>
wrote:

> Hello everyone.
>
> I'm testing mm2 for our cross dc topic replication. We used to do it using
> mm1 but faced various issues.
>
> So far, mm2 is working well, but I have 1 issue which I can't really
> explain; the destination topic is larger than the source topic.
>
> For example, We have 1 topic which on the source cluster is around
> 2.8-2.9TB with retention.ms=86400000
>
> I added to our mm2 cluster the "sync.topic.configs.enabled=false" config,
> and edited the retention.ms of the destination topic to be 57600000. Other
> than that, I haven't touched the topic created by mm2 on the destination
> cluster.
>
> By logic I'd say that if I shortened the retention on the destination, the
> topic size should decrease, but in practice, I see that it is larger than
> the source topic (it's about 4.6TB).
> This same behaviour is seen on all 3 topics which I am currently mirroring
> (all 3 from different source clusters, into the same destination clusters)
>
> Does anyone have any idea as to why mm2 acts this way for me?
>
> Thanks,
> Iftach
>
> --
> The above terms reflect a potential business arrangement, are provided
> solely as a basis for further discussion, and are not intended to be and
> do
> not constitute a legally binding obligation. No legally binding
> obligations
> will be created, implied, or inferred until an agreement in final form is
> executed in writing by all parties involved.
>
>
> This email and any
> attachments hereto may be confidential or privileged.  If you received
> this
> communication by mistake, please don't forward it to anyone else, please
> erase all copies and attachments, and please let me know that it has gone
> to the wrong person. Thanks.
>