You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@bookkeeper.apache.org by Aniruddha Laud <tr...@gmail.com> on 2012/05/09 05:07:11 UTC

Question about Message Sequence Ids with multiple regions.

When the RegionManager subscribes to topics from a remote hub, it gets
merges the remote component part of the received message with that of the
last persisted local message. The merge policy takes the maximum of the two
values for any region X in these two messages. Why exactly is this being
done? How is the remote component section from one region useful for hubs
in another region?

Regards,
Aniruddha.

Re: Question about Message Sequence Ids with multiple regions.

Posted by Aniruddha Laud <tr...@gmail.com>.

Thanks Ivan and Utkarsh. I will open a Jira ticket and produce a patch.

Regards,
Aniruddha.

On Wed, May 9, 2012 at 2:07 PM, Utkarsh Srivastava <ut...@twitter.com>wrote:

> We trying to cater for non all-to-all topologies, i.e., region A
> subscribes to B which subscribes to C but there is no subscription
> from A to C directly.
>
> That being said, I don't think the non all-to-all case was ever
> fleshed out completely, and is quite complex. So, we could just change
> the policy as you suggest to make it simpler and better.
>
> Utkarsh
>
> On Wed, May 9, 2012 at 12:46 PM, Ivan Kelly <iv...@apache.org> wrote:
> > Yes, I think what we have not is broken causal consistency, and the
> > modification you suggest would fix it.
> >
> > Again, with regions A, B & C and topic t.
> >
> > Each region has a remote component = [A:0, B:0, C:0]
> > A publishes Ua1 to t, which has rc [A:1,B:0,C:0], which is sent to B,
> but delayed for C.
> >
> > A(rc) = [A:1, B:0, C:0]
> > B(rc) = [A:1, B:0, C:0]
> > C(rc) = [A:0, B:0, C:0]
> >
> > B publishes Ub1 to t, which has rc [A:1,B:1,C:0], which is sent to A
> > and C.
> >
> > * This is where the schemes diverge
> > 1. With the current scheme, the lastSeqIdPushed would be updated to
> > [A:1,B:1,C:0] and a subscriber at C would recieve this as the remote
> > component for both Ub1 and Ua1 when it eventually arrived. It could
> > not tell which was first.
> >
> > 2. With the new scheme, Ub1 would be delivered to a subscriber at C
> > with [A:1,B:1,C:0], and lastSeqIdPushed would be updated to
> > [A:0,B:1,C:0]. Then when Ua1 arrives, it would be delivered to the
> > subscriber with [A:1,B:0,C:0] and lastSeqIdPushed would be updated to
> > [A:1,B:1,C:0]. With this, the subscriber could see that Ub1 was
> > causally dependent on Ua1.
> >
> > -Ivan
> >
> > On Wed, May 09, 2012 at 09:45:20AM -0700, Aniruddha Laud wrote:
> >> Even if B sees that the remote component for A received from C is not
> >> monotonically increasing, would it make any difference in how region B
> >> orders messages? What if B updated it's remote component for region A
> only
> >> when the message it received has srcRegion as A ? That way we could
> still
> >> guarantee that the remote components internally are monotonically
> >> increasing and it would give us information on exactly how many
> messages a
> >> region has seen from any of the other regions.
> >>
> >> In the current scheme, after the merging, the remote component for
> region X
> >> gives the maximum messageID that any of the regions have seen from X for
> >> that topic.
> >>
> >> Regards,
> >> Aniruddha.
> >>
> >> On Wed, May 9, 2012 at 4:19 AM, Ivan Kelly <iv...@apache.org> wrote:
> >>
> >> > On Tue, May 08, 2012 at 08:07:11PM -0700, Aniruddha Laud wrote:
> >> > > When the RegionManager subscribes to topics from a remote hub, it
> gets
> >> > > merges the remote component part of the received message with that
> of the
> >> > > last persisted local message. The merge policy takes the maximum of
> the
> >> > two
> >> > > values for any region X in these two messages. Why exactly is this
> being
> >> > > done? How is the remote component section from one region useful
> for hubs
> >> > > in another region?
> >> > My understanding of this code is that it is trying to ensure that the
> >> > remote components for a topic are monotonically increasing in the view
> >> > of a single region.
> >> >
> >> > Take the example of 3 regions, A,B&C, with topic t.
> >> >
> >> > A publishes message 1, which B and C receives with remote components
> >> > [A:1,B:null,C:null].
> >> >
> >> > Now A publishs message 2, so which arrives quickly at B but with a
> >> > delay at C.
> >> >
> >> > The last pushedSeqId for the topic is now
> >> > at A: [A:2,B:null,C:null]
> >> > at B: [A:2,B:null,C:null]
> >> > at C: [A:1,B:null,C:null]
> >> >
> >> > C now publishes to topic t. The message has remote components of
> >> > [A:1,B:null,C:1]
> >> >
> >> > When this arrives at B, if B does not do the merging, the client will
> >> > see that the remote component for A is not increasing monotonically.
> >> >
> >> > Utkarsh can probably give you more detail on this.
> >> >
> >> > Regards
> >> > Ivan
> >> >
>

Re: Question about Message Sequence Ids with multiple regions.

Posted by Utkarsh Srivastava <ut...@twitter.com>.

We trying to cater for non all-to-all topologies, i.e., region A
subscribes to B which subscribes to C but there is no subscription
from A to C directly.

That being said, I don't think the non all-to-all case was ever
fleshed out completely, and is quite complex. So, we could just change
the policy as you suggest to make it simpler and better.

Utkarsh

On Wed, May 9, 2012 at 12:46 PM, Ivan Kelly <iv...@apache.org> wrote:
> Yes, I think what we have not is broken causal consistency, and the
> modification you suggest would fix it.
>
> Again, with regions A, B & C and topic t.
>
> Each region has a remote component = [A:0, B:0, C:0]
> A publishes Ua1 to t, which has rc [A:1,B:0,C:0], which is sent to B, but delayed for C.
>
> A(rc) = [A:1, B:0, C:0]
> B(rc) = [A:1, B:0, C:0]
> C(rc) = [A:0, B:0, C:0]
>
> B publishes Ub1 to t, which has rc [A:1,B:1,C:0], which is sent to A
> and C.
>
> * This is where the schemes diverge
> 1. With the current scheme, the lastSeqIdPushed would be updated to
> [A:1,B:1,C:0] and a subscriber at C would recieve this as the remote
> component for both Ub1 and Ua1 when it eventually arrived. It could
> not tell which was first.
>
> 2. With the new scheme, Ub1 would be delivered to a subscriber at C
> with [A:1,B:1,C:0], and lastSeqIdPushed would be updated to
> [A:0,B:1,C:0]. Then when Ua1 arrives, it would be delivered to the
> subscriber with [A:1,B:0,C:0] and lastSeqIdPushed would be updated to
> [A:1,B:1,C:0]. With this, the subscriber could see that Ub1 was
> causally dependent on Ua1.
>
> -Ivan
>
> On Wed, May 09, 2012 at 09:45:20AM -0700, Aniruddha Laud wrote:
>> Even if B sees that the remote component for A received from C is not
>> monotonically increasing, would it make any difference in how region B
>> orders messages? What if B updated it's remote component for region A only
>> when the message it received has srcRegion as A ? That way we could still
>> guarantee that the remote components internally are monotonically
>> increasing and it would give us information on exactly how many messages a
>> region has seen from any of the other regions.
>>
>> In the current scheme, after the merging, the remote component for region X
>> gives the maximum messageID that any of the regions have seen from X for
>> that topic.
>>
>> Regards,
>> Aniruddha.
>>
>> On Wed, May 9, 2012 at 4:19 AM, Ivan Kelly <iv...@apache.org> wrote:
>>
>> > On Tue, May 08, 2012 at 08:07:11PM -0700, Aniruddha Laud wrote:
>> > > When the RegionManager subscribes to topics from a remote hub, it gets
>> > > merges the remote component part of the received message with that of the
>> > > last persisted local message. The merge policy takes the maximum of the
>> > two
>> > > values for any region X in these two messages. Why exactly is this being
>> > > done? How is the remote component section from one region useful for hubs
>> > > in another region?
>> > My understanding of this code is that it is trying to ensure that the
>> > remote components for a topic are monotonically increasing in the view
>> > of a single region.
>> >
>> > Take the example of 3 regions, A,B&C, with topic t.
>> >
>> > A publishes message 1, which B and C receives with remote components
>> > [A:1,B:null,C:null].
>> >
>> > Now A publishs message 2, so which arrives quickly at B but with a
>> > delay at C.
>> >
>> > The last pushedSeqId for the topic is now
>> > at A: [A:2,B:null,C:null]
>> > at B: [A:2,B:null,C:null]
>> > at C: [A:1,B:null,C:null]
>> >
>> > C now publishes to topic t. The message has remote components of
>> > [A:1,B:null,C:1]
>> >
>> > When this arrives at B, if B does not do the merging, the client will
>> > see that the remote component for A is not increasing monotonically.
>> >
>> > Utkarsh can probably give you more detail on this.
>> >
>> > Regards
>> > Ivan
>> >

Re: Question about Message Sequence Ids with multiple regions.

Posted by Ivan Kelly <iv...@apache.org>.

Yes, I think what we have not is broken causal consistency, and the
modification you suggest would fix it. 

Again, with regions A, B & C and topic t.

Each region has a remote component = [A:0, B:0, C:0]
A publishes Ua1 to t, which has rc [A:1,B:0,C:0], which is sent to B, but delayed for C.

A(rc) = [A:1, B:0, C:0]
B(rc) = [A:1, B:0, C:0]
C(rc) = [A:0, B:0, C:0]

B publishes Ub1 to t, which has rc [A:1,B:1,C:0], which is sent to A
and C.

* This is where the schemes diverge
1. With the current scheme, the lastSeqIdPushed would be updated to
[A:1,B:1,C:0] and a subscriber at C would recieve this as the remote
component for both Ub1 and Ua1 when it eventually arrived. It could
not tell which was first.

2. With the new scheme, Ub1 would be delivered to a subscriber at C
with [A:1,B:1,C:0], and lastSeqIdPushed would be updated to
[A:0,B:1,C:0]. Then when Ua1 arrives, it would be delivered to the
subscriber with [A:1,B:0,C:0] and lastSeqIdPushed would be updated to
[A:1,B:1,C:0]. With this, the subscriber could see that Ub1 was
causally dependent on Ua1.

-Ivan

On Wed, May 09, 2012 at 09:45:20AM -0700, Aniruddha Laud wrote:
> Even if B sees that the remote component for A received from C is not
> monotonically increasing, would it make any difference in how region B
> orders messages? What if B updated it's remote component for region A only
> when the message it received has srcRegion as A ? That way we could still
> guarantee that the remote components internally are monotonically
> increasing and it would give us information on exactly how many messages a
> region has seen from any of the other regions.
> 
> In the current scheme, after the merging, the remote component for region X
> gives the maximum messageID that any of the regions have seen from X for
> that topic.
> 
> Regards,
> Aniruddha.
> 
> On Wed, May 9, 2012 at 4:19 AM, Ivan Kelly <iv...@apache.org> wrote:
> 
> > On Tue, May 08, 2012 at 08:07:11PM -0700, Aniruddha Laud wrote:
> > > When the RegionManager subscribes to topics from a remote hub, it gets
> > > merges the remote component part of the received message with that of the
> > > last persisted local message. The merge policy takes the maximum of the
> > two
> > > values for any region X in these two messages. Why exactly is this being
> > > done? How is the remote component section from one region useful for hubs
> > > in another region?
> > My understanding of this code is that it is trying to ensure that the
> > remote components for a topic are monotonically increasing in the view
> > of a single region.
> >
> > Take the example of 3 regions, A,B&C, with topic t.
> >
> > A publishes message 1, which B and C receives with remote components
> > [A:1,B:null,C:null].
> >
> > Now A publishs message 2, so which arrives quickly at B but with a
> > delay at C.
> >
> > The last pushedSeqId for the topic is now
> > at A: [A:2,B:null,C:null]
> > at B: [A:2,B:null,C:null]
> > at C: [A:1,B:null,C:null]
> >
> > C now publishes to topic t. The message has remote components of
> > [A:1,B:null,C:1]
> >
> > When this arrives at B, if B does not do the merging, the client will
> > see that the remote component for A is not increasing monotonically.
> >
> > Utkarsh can probably give you more detail on this.
> >
> > Regards
> > Ivan
> >

Re: Question about Message Sequence Ids with multiple regions.

Posted by Aniruddha Laud <tr...@gmail.com>.

Even if B sees that the remote component for A received from C is not
monotonically increasing, would it make any difference in how region B
orders messages? What if B updated it's remote component for region A only
when the message it received has srcRegion as A ? That way we could still
guarantee that the remote components internally are monotonically
increasing and it would give us information on exactly how many messages a
region has seen from any of the other regions.

In the current scheme, after the merging, the remote component for region X
gives the maximum messageID that any of the regions have seen from X for
that topic.

Regards,
Aniruddha.

On Wed, May 9, 2012 at 4:19 AM, Ivan Kelly <iv...@apache.org> wrote:

> On Tue, May 08, 2012 at 08:07:11PM -0700, Aniruddha Laud wrote:
> > When the RegionManager subscribes to topics from a remote hub, it gets
> > merges the remote component part of the received message with that of the
> > last persisted local message. The merge policy takes the maximum of the
> two
> > values for any region X in these two messages. Why exactly is this being
> > done? How is the remote component section from one region useful for hubs
> > in another region?
> My understanding of this code is that it is trying to ensure that the
> remote components for a topic are monotonically increasing in the view
> of a single region.
>
> Take the example of 3 regions, A,B&C, with topic t.
>
> A publishes message 1, which B and C receives with remote components
> [A:1,B:null,C:null].
>
> Now A publishs message 2, so which arrives quickly at B but with a
> delay at C.
>
> The last pushedSeqId for the topic is now
> at A: [A:2,B:null,C:null]
> at B: [A:2,B:null,C:null]
> at C: [A:1,B:null,C:null]
>
> C now publishes to topic t. The message has remote components of
> [A:1,B:null,C:1]
>
> When this arrives at B, if B does not do the merging, the client will
> see that the remote component for A is not increasing monotonically.
>
> Utkarsh can probably give you more detail on this.
>
> Regards
> Ivan
>

Re: Question about Message Sequence Ids with multiple regions.

Posted by Ivan Kelly <iv...@apache.org>.

On Tue, May 08, 2012 at 08:07:11PM -0700, Aniruddha Laud wrote:
> When the RegionManager subscribes to topics from a remote hub, it gets
> merges the remote component part of the received message with that of the
> last persisted local message. The merge policy takes the maximum of the two
> values for any region X in these two messages. Why exactly is this being
> done? How is the remote component section from one region useful for hubs
> in another region?
My understanding of this code is that it is trying to ensure that the
remote components for a topic are monotonically increasing in the view
of a single region.

Take the example of 3 regions, A,B&C, with topic t.

A publishes message 1, which B and C receives with remote components
[A:1,B:null,C:null].

Now A publishs message 2, so which arrives quickly at B but with a
delay at C.

The last pushedSeqId for the topic is now
at A: [A:2,B:null,C:null]
at B: [A:2,B:null,C:null]
at C: [A:1,B:null,C:null]

C now publishes to topic t. The message has remote components of
[A:1,B:null,C:1]

When this arrives at B, if B does not do the merging, the client will
see that the remote component for A is not increasing monotonically. 

Utkarsh can probably give you more detail on this.

Regards
Ivan