You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pulsar.apache.org by PengHui Li <pe...@apache.org> on 2022/08/18 16:23:03 UTC

[DISCUSS] Enable message deduplication for replicator by default

Hi all,

When I tried to fix a problem related to replicator
https://github.com/apache/pulsar/pull/17154
It surprised me that the message deduplication will not work by default
with the replicator.
I always thought it was enabled for replicators by default. Details to see
[0].

I think we should enable the deduplication for the replicator. Otherwise,
we will see duplicated
messages on the remote cluster. And the producer of the replicator always
has a fixed producer
name, this will make the message deduplication work properly.

The test introduced in https://github.com/apache/pulsar/pull/17154 will
check the message
replication ordering. Without the message deduplication enabled, the test
is flaky with received
duplicated messages. After enabling, everything is fine.

Best,
Penghui

[0] https://github.com/apache/pulsar/pull/17154#discussion_r948736894

Re: [DISCUSS] Enable message deduplication for replicator by default

Posted by Dave Fisher <wa...@comcast.net>.
Excellent point Rajan!

We had a significant decrease in performance with consumption in 2.8.3 which was not fixed until 2.10.1.

Let’s take care to solve problems without surprising users and impacting performance.

Best,
Dave

Sent from my iPhone

> On Sep 5, 2022, at 5:11 PM, Rajan Dhabalia <rd...@apache.org> wrote:
> 
> Message deduplication always comes with memory and CPU cost and making it
> default means charging this penalty to every user without having this
> requirement.
> 
> Enabling by default means you are impacting every user who is not aware
> about this feature after upgrading the release. This is purely requirement
> bases and we should avoid enabling it by default.
> 
> Thanks,
> Rajan
> 
>> On Mon, Sep 5, 2022 at 2:50 AM lordcheng10 <lo...@gmail.com> wrote:
>> 
>> +1
>> 
>> Haiting Jiang <ji...@gmail.com> 于2022年8月26日周五 09:52写道:
>> 
>>> +1
>>> 
>>> Thanks,
>>> Haiting
>>> 
>>> On Thu, Aug 25, 2022 at 9:52 AM Baodi Shi <ba...@icloud.com.invalid>
>>> wrote:
>>> 
>>>> +1
>>>> 
>>>> Thanks,
>>>> Baodi Shi
>>>> 
>>>>> On Aug 24, 2022, at 20:1312, Qiang Huang <qi...@gmail.com>
>>>> wrote:
>>>>> 
>>>>> +1
>>>>> 
>>>>> Zike Yang <zi...@apache.org> 于2022年8月22日周一 15:32写道:
>>>>> 
>>>>>> +1
>>>>>> 
>>>>>> Thanks,
>>>>>> Zike Yang
>>>>>> 
>>>>>> On Mon, Aug 22, 2022 at 3:16 PM mattison chao <
>>> mattisonchao@apache.org>
>>>>>> wrote:
>>>>>>> 
>>>>>>> +1
>>>>>>> 
>>>>>>> Best,
>>>>>>> Mattison
>>>>>>> 
>>>>>>> On Fri, 19 Aug 2022 at 01:40, Enrico Olivelli <eolivelli@gmail.com
>>> 
>>>>>> wrote:
>>>>>>> 
>>>>>>>> I agree
>>>>>>>> 
>>>>>>>> Enrico
>>>>>>>> 
>>>>>>>> Il Gio 18 Ago 2022, 18:23 PengHui Li <pe...@apache.org> ha
>>> scritto:
>>>>>>>> 
>>>>>>>>> Hi all,
>>>>>>>>> 
>>>>>>>>> When I tried to fix a problem related to replicator
>>>>>>>>> https://github.com/apache/pulsar/pull/17154
>>>>>>>>> It surprised me that the message deduplication will not work by
>>>>>> default
>>>>>>>>> with the replicator.
>>>>>>>>> I always thought it was enabled for replicators by default.
>> Details
>>>>>> to
>>>>>>>> see
>>>>>>>>> [0].
>>>>>>>>> 
>>>>>>>>> I think we should enable the deduplication for the replicator.
>>>>>> Otherwise,
>>>>>>>>> we will see duplicated
>>>>>>>>> messages on the remote cluster. And the producer of the
>> replicator
>>>>>> always
>>>>>>>>> has a fixed producer
>>>>>>>>> name, this will make the message deduplication work properly.
>>>>>>>>> 
>>>>>>>>> The test introduced in
>> https://github.com/apache/pulsar/pull/17154
>>>>>> will
>>>>>>>>> check the message
>>>>>>>>> replication ordering. Without the message deduplication enabled,
>>> the
>>>>>> test
>>>>>>>>> is flaky with received
>>>>>>>>> duplicated messages. After enabling, everything is fine.
>>>>>>>>> 
>>>>>>>>> Best,
>>>>>>>>> Penghui
>>>>>>>>> 
>>>>>>>>> [0]
>>>>>> https://github.com/apache/pulsar/pull/17154#discussion_r948736894
>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> BR,
>>>>> Qiang Huang
>>>> 
>>>> 
>>> 
>> 


Re: [DISCUSS] Enable message deduplication for replicator by default

Posted by Rajan Dhabalia <rd...@apache.org>.
Message deduplication always comes with memory and CPU cost and making it
default means charging this penalty to every user without having this
requirement.

Enabling by default means you are impacting every user who is not aware
about this feature after upgrading the release. This is purely requirement
bases and we should avoid enabling it by default.

Thanks,
Rajan

On Mon, Sep 5, 2022 at 2:50 AM lordcheng10 <lo...@gmail.com> wrote:

> +1
>
> Haiting Jiang <ji...@gmail.com> 于2022年8月26日周五 09:52写道:
>
> > +1
> >
> > Thanks,
> > Haiting
> >
> > On Thu, Aug 25, 2022 at 9:52 AM Baodi Shi <ba...@icloud.com.invalid>
> > wrote:
> >
> > > +1
> > >
> > > Thanks,
> > > Baodi Shi
> > >
> > > > On Aug 24, 2022, at 20:1312, Qiang Huang <qi...@gmail.com>
> > > wrote:
> > > >
> > > > +1
> > > >
> > > > Zike Yang <zi...@apache.org> 于2022年8月22日周一 15:32写道:
> > > >
> > > >> +1
> > > >>
> > > >> Thanks,
> > > >> Zike Yang
> > > >>
> > > >> On Mon, Aug 22, 2022 at 3:16 PM mattison chao <
> > mattisonchao@apache.org>
> > > >> wrote:
> > > >>>
> > > >>> +1
> > > >>>
> > > >>> Best,
> > > >>> Mattison
> > > >>>
> > > >>> On Fri, 19 Aug 2022 at 01:40, Enrico Olivelli <eolivelli@gmail.com
> >
> > > >> wrote:
> > > >>>
> > > >>>> I agree
> > > >>>>
> > > >>>> Enrico
> > > >>>>
> > > >>>> Il Gio 18 Ago 2022, 18:23 PengHui Li <pe...@apache.org> ha
> > scritto:
> > > >>>>
> > > >>>>> Hi all,
> > > >>>>>
> > > >>>>> When I tried to fix a problem related to replicator
> > > >>>>> https://github.com/apache/pulsar/pull/17154
> > > >>>>> It surprised me that the message deduplication will not work by
> > > >> default
> > > >>>>> with the replicator.
> > > >>>>> I always thought it was enabled for replicators by default.
> Details
> > > >> to
> > > >>>> see
> > > >>>>> [0].
> > > >>>>>
> > > >>>>> I think we should enable the deduplication for the replicator.
> > > >> Otherwise,
> > > >>>>> we will see duplicated
> > > >>>>> messages on the remote cluster. And the producer of the
> replicator
> > > >> always
> > > >>>>> has a fixed producer
> > > >>>>> name, this will make the message deduplication work properly.
> > > >>>>>
> > > >>>>> The test introduced in
> https://github.com/apache/pulsar/pull/17154
> > > >> will
> > > >>>>> check the message
> > > >>>>> replication ordering. Without the message deduplication enabled,
> > the
> > > >> test
> > > >>>>> is flaky with received
> > > >>>>> duplicated messages. After enabling, everything is fine.
> > > >>>>>
> > > >>>>> Best,
> > > >>>>> Penghui
> > > >>>>>
> > > >>>>> [0]
> > > >> https://github.com/apache/pulsar/pull/17154#discussion_r948736894
> > > >>>>>
> > > >>>>
> > > >>
> > > >
> > > >
> > > > --
> > > > BR,
> > > > Qiang Huang
> > >
> > >
> >
>

Re: [DISCUSS] Enable message deduplication for replicator by default

Posted by lordcheng10 <lo...@gmail.com>.
+1

Haiting Jiang <ji...@gmail.com> 于2022年8月26日周五 09:52写道:

> +1
>
> Thanks,
> Haiting
>
> On Thu, Aug 25, 2022 at 9:52 AM Baodi Shi <ba...@icloud.com.invalid>
> wrote:
>
> > +1
> >
> > Thanks,
> > Baodi Shi
> >
> > > On Aug 24, 2022, at 20:1312, Qiang Huang <qi...@gmail.com>
> > wrote:
> > >
> > > +1
> > >
> > > Zike Yang <zi...@apache.org> 于2022年8月22日周一 15:32写道:
> > >
> > >> +1
> > >>
> > >> Thanks,
> > >> Zike Yang
> > >>
> > >> On Mon, Aug 22, 2022 at 3:16 PM mattison chao <
> mattisonchao@apache.org>
> > >> wrote:
> > >>>
> > >>> +1
> > >>>
> > >>> Best,
> > >>> Mattison
> > >>>
> > >>> On Fri, 19 Aug 2022 at 01:40, Enrico Olivelli <eo...@gmail.com>
> > >> wrote:
> > >>>
> > >>>> I agree
> > >>>>
> > >>>> Enrico
> > >>>>
> > >>>> Il Gio 18 Ago 2022, 18:23 PengHui Li <pe...@apache.org> ha
> scritto:
> > >>>>
> > >>>>> Hi all,
> > >>>>>
> > >>>>> When I tried to fix a problem related to replicator
> > >>>>> https://github.com/apache/pulsar/pull/17154
> > >>>>> It surprised me that the message deduplication will not work by
> > >> default
> > >>>>> with the replicator.
> > >>>>> I always thought it was enabled for replicators by default. Details
> > >> to
> > >>>> see
> > >>>>> [0].
> > >>>>>
> > >>>>> I think we should enable the deduplication for the replicator.
> > >> Otherwise,
> > >>>>> we will see duplicated
> > >>>>> messages on the remote cluster. And the producer of the replicator
> > >> always
> > >>>>> has a fixed producer
> > >>>>> name, this will make the message deduplication work properly.
> > >>>>>
> > >>>>> The test introduced in https://github.com/apache/pulsar/pull/17154
> > >> will
> > >>>>> check the message
> > >>>>> replication ordering. Without the message deduplication enabled,
> the
> > >> test
> > >>>>> is flaky with received
> > >>>>> duplicated messages. After enabling, everything is fine.
> > >>>>>
> > >>>>> Best,
> > >>>>> Penghui
> > >>>>>
> > >>>>> [0]
> > >> https://github.com/apache/pulsar/pull/17154#discussion_r948736894
> > >>>>>
> > >>>>
> > >>
> > >
> > >
> > > --
> > > BR,
> > > Qiang Huang
> >
> >
>

Re: [DISCUSS] Enable message deduplication for replicator by default

Posted by Haiting Jiang <ji...@gmail.com>.
+1

Thanks,
Haiting

On Thu, Aug 25, 2022 at 9:52 AM Baodi Shi <ba...@icloud.com.invalid>
wrote:

> +1
>
> Thanks,
> Baodi Shi
>
> > On Aug 24, 2022, at 20:1312, Qiang Huang <qi...@gmail.com>
> wrote:
> >
> > +1
> >
> > Zike Yang <zi...@apache.org> 于2022年8月22日周一 15:32写道:
> >
> >> +1
> >>
> >> Thanks,
> >> Zike Yang
> >>
> >> On Mon, Aug 22, 2022 at 3:16 PM mattison chao <ma...@apache.org>
> >> wrote:
> >>>
> >>> +1
> >>>
> >>> Best,
> >>> Mattison
> >>>
> >>> On Fri, 19 Aug 2022 at 01:40, Enrico Olivelli <eo...@gmail.com>
> >> wrote:
> >>>
> >>>> I agree
> >>>>
> >>>> Enrico
> >>>>
> >>>> Il Gio 18 Ago 2022, 18:23 PengHui Li <pe...@apache.org> ha scritto:
> >>>>
> >>>>> Hi all,
> >>>>>
> >>>>> When I tried to fix a problem related to replicator
> >>>>> https://github.com/apache/pulsar/pull/17154
> >>>>> It surprised me that the message deduplication will not work by
> >> default
> >>>>> with the replicator.
> >>>>> I always thought it was enabled for replicators by default. Details
> >> to
> >>>> see
> >>>>> [0].
> >>>>>
> >>>>> I think we should enable the deduplication for the replicator.
> >> Otherwise,
> >>>>> we will see duplicated
> >>>>> messages on the remote cluster. And the producer of the replicator
> >> always
> >>>>> has a fixed producer
> >>>>> name, this will make the message deduplication work properly.
> >>>>>
> >>>>> The test introduced in https://github.com/apache/pulsar/pull/17154
> >> will
> >>>>> check the message
> >>>>> replication ordering. Without the message deduplication enabled, the
> >> test
> >>>>> is flaky with received
> >>>>> duplicated messages. After enabling, everything is fine.
> >>>>>
> >>>>> Best,
> >>>>> Penghui
> >>>>>
> >>>>> [0]
> >> https://github.com/apache/pulsar/pull/17154#discussion_r948736894
> >>>>>
> >>>>
> >>
> >
> >
> > --
> > BR,
> > Qiang Huang
>
>

Re: [DISCUSS] Enable message deduplication for replicator by default

Posted by Baodi Shi <ba...@icloud.com.INVALID>.
+1

Thanks,
Baodi Shi

> On Aug 24, 2022, at 20:1312, Qiang Huang <qi...@gmail.com> wrote:
> 
> +1
> 
> Zike Yang <zi...@apache.org> 于2022年8月22日周一 15:32写道:
> 
>> +1
>> 
>> Thanks,
>> Zike Yang
>> 
>> On Mon, Aug 22, 2022 at 3:16 PM mattison chao <ma...@apache.org>
>> wrote:
>>> 
>>> +1
>>> 
>>> Best,
>>> Mattison
>>> 
>>> On Fri, 19 Aug 2022 at 01:40, Enrico Olivelli <eo...@gmail.com>
>> wrote:
>>> 
>>>> I agree
>>>> 
>>>> Enrico
>>>> 
>>>> Il Gio 18 Ago 2022, 18:23 PengHui Li <pe...@apache.org> ha scritto:
>>>> 
>>>>> Hi all,
>>>>> 
>>>>> When I tried to fix a problem related to replicator
>>>>> https://github.com/apache/pulsar/pull/17154
>>>>> It surprised me that the message deduplication will not work by
>> default
>>>>> with the replicator.
>>>>> I always thought it was enabled for replicators by default. Details
>> to
>>>> see
>>>>> [0].
>>>>> 
>>>>> I think we should enable the deduplication for the replicator.
>> Otherwise,
>>>>> we will see duplicated
>>>>> messages on the remote cluster. And the producer of the replicator
>> always
>>>>> has a fixed producer
>>>>> name, this will make the message deduplication work properly.
>>>>> 
>>>>> The test introduced in https://github.com/apache/pulsar/pull/17154
>> will
>>>>> check the message
>>>>> replication ordering. Without the message deduplication enabled, the
>> test
>>>>> is flaky with received
>>>>> duplicated messages. After enabling, everything is fine.
>>>>> 
>>>>> Best,
>>>>> Penghui
>>>>> 
>>>>> [0]
>> https://github.com/apache/pulsar/pull/17154#discussion_r948736894
>>>>> 
>>>> 
>> 
> 
> 
> -- 
> BR,
> Qiang Huang


Re: [DISCUSS] Enable message deduplication for replicator by default

Posted by Qiang Huang <qi...@gmail.com>.
+1

Zike Yang <zi...@apache.org> 于2022年8月22日周一 15:32写道:

> +1
>
> Thanks,
> Zike Yang
>
> On Mon, Aug 22, 2022 at 3:16 PM mattison chao <ma...@apache.org>
> wrote:
> >
> > +1
> >
> > Best,
> > Mattison
> >
> > On Fri, 19 Aug 2022 at 01:40, Enrico Olivelli <eo...@gmail.com>
> wrote:
> >
> > > I agree
> > >
> > > Enrico
> > >
> > > Il Gio 18 Ago 2022, 18:23 PengHui Li <pe...@apache.org> ha scritto:
> > >
> > > > Hi all,
> > > >
> > > > When I tried to fix a problem related to replicator
> > > > https://github.com/apache/pulsar/pull/17154
> > > > It surprised me that the message deduplication will not work by
> default
> > > > with the replicator.
> > > > I always thought it was enabled for replicators by default. Details
> to
> > > see
> > > > [0].
> > > >
> > > > I think we should enable the deduplication for the replicator.
> Otherwise,
> > > > we will see duplicated
> > > > messages on the remote cluster. And the producer of the replicator
> always
> > > > has a fixed producer
> > > > name, this will make the message deduplication work properly.
> > > >
> > > > The test introduced in https://github.com/apache/pulsar/pull/17154
> will
> > > > check the message
> > > > replication ordering. Without the message deduplication enabled, the
> test
> > > > is flaky with received
> > > > duplicated messages. After enabling, everything is fine.
> > > >
> > > > Best,
> > > > Penghui
> > > >
> > > > [0]
> https://github.com/apache/pulsar/pull/17154#discussion_r948736894
> > > >
> > >
>


-- 
BR,
Qiang Huang

Re: [DISCUSS] Enable message deduplication for replicator by default

Posted by Zike Yang <zi...@apache.org>.
+1

Thanks,
Zike Yang

On Mon, Aug 22, 2022 at 3:16 PM mattison chao <ma...@apache.org> wrote:
>
> +1
>
> Best,
> Mattison
>
> On Fri, 19 Aug 2022 at 01:40, Enrico Olivelli <eo...@gmail.com> wrote:
>
> > I agree
> >
> > Enrico
> >
> > Il Gio 18 Ago 2022, 18:23 PengHui Li <pe...@apache.org> ha scritto:
> >
> > > Hi all,
> > >
> > > When I tried to fix a problem related to replicator
> > > https://github.com/apache/pulsar/pull/17154
> > > It surprised me that the message deduplication will not work by default
> > > with the replicator.
> > > I always thought it was enabled for replicators by default. Details to
> > see
> > > [0].
> > >
> > > I think we should enable the deduplication for the replicator. Otherwise,
> > > we will see duplicated
> > > messages on the remote cluster. And the producer of the replicator always
> > > has a fixed producer
> > > name, this will make the message deduplication work properly.
> > >
> > > The test introduced in https://github.com/apache/pulsar/pull/17154 will
> > > check the message
> > > replication ordering. Without the message deduplication enabled, the test
> > > is flaky with received
> > > duplicated messages. After enabling, everything is fine.
> > >
> > > Best,
> > > Penghui
> > >
> > > [0] https://github.com/apache/pulsar/pull/17154#discussion_r948736894
> > >
> >

Re: [DISCUSS] Enable message deduplication for replicator by default

Posted by mattison chao <ma...@apache.org>.
+1

Best,
Mattison

On Fri, 19 Aug 2022 at 01:40, Enrico Olivelli <eo...@gmail.com> wrote:

> I agree
>
> Enrico
>
> Il Gio 18 Ago 2022, 18:23 PengHui Li <pe...@apache.org> ha scritto:
>
> > Hi all,
> >
> > When I tried to fix a problem related to replicator
> > https://github.com/apache/pulsar/pull/17154
> > It surprised me that the message deduplication will not work by default
> > with the replicator.
> > I always thought it was enabled for replicators by default. Details to
> see
> > [0].
> >
> > I think we should enable the deduplication for the replicator. Otherwise,
> > we will see duplicated
> > messages on the remote cluster. And the producer of the replicator always
> > has a fixed producer
> > name, this will make the message deduplication work properly.
> >
> > The test introduced in https://github.com/apache/pulsar/pull/17154 will
> > check the message
> > replication ordering. Without the message deduplication enabled, the test
> > is flaky with received
> > duplicated messages. After enabling, everything is fine.
> >
> > Best,
> > Penghui
> >
> > [0] https://github.com/apache/pulsar/pull/17154#discussion_r948736894
> >
>

Re: [DISCUSS] Enable message deduplication for replicator by default

Posted by Enrico Olivelli <eo...@gmail.com>.
I agree

Enrico

Il Gio 18 Ago 2022, 18:23 PengHui Li <pe...@apache.org> ha scritto:

> Hi all,
>
> When I tried to fix a problem related to replicator
> https://github.com/apache/pulsar/pull/17154
> It surprised me that the message deduplication will not work by default
> with the replicator.
> I always thought it was enabled for replicators by default. Details to see
> [0].
>
> I think we should enable the deduplication for the replicator. Otherwise,
> we will see duplicated
> messages on the remote cluster. And the producer of the replicator always
> has a fixed producer
> name, this will make the message deduplication work properly.
>
> The test introduced in https://github.com/apache/pulsar/pull/17154 will
> check the message
> replication ordering. Without the message deduplication enabled, the test
> is flaky with received
> duplicated messages. After enabling, everything is fine.
>
> Best,
> Penghui
>
> [0] https://github.com/apache/pulsar/pull/17154#discussion_r948736894
>