You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@geode.apache.org by Alberto Gomez <al...@est.tech> on 2019/12/02 16:57:28 UTC

Question about ordering of events in a partition in WAN deployments with parallel gateway senders

Hi,

On the subject of parallel gateway sender receivers, according to the Geode documentation:

[1]:
Although parallel gateway senders provide the best throughput for WAN distribution, they provide less control for event ordering. Event ordering for the region as a whole is not preserved, because multiple Geode servers distribute the regions events at the same time. However, the ordering of events for a given partition can be preserved. See Configuring Multi-Site (WAN) Event Queues.

[2]:
You cannot configure the order-policy for a parallel event queue, because parallel queues cannot preserve event ordering for regions. Only the ordering of events for a given partition (or in a given queue of a distributed region) can be preserved.


Given that the documentation seems to state that the ordering of events can be
preserved for a given partition when using parallel gateway senders, my question is how can this be achieved when using several gateway receivers.

Example:
Suppose we have a WAN deployment with 2 clusters, one with parallel gateway senders and a remote one with several gateway receivers.

Entry1 is updated twice consecutively as follows:
- First update: attribute1 = 3
- Second update: attribute1 = 5

As a result of the previous updates on Entry1, the gateway sender running on the server hosting the bucket where Entry1 is stored will send two events to the remote cluster that has several gateway receivers.

Could each event be sent to a different server or to the same server but over two different connections?
If this is possible, the order of arrival of the events to the server hosting the primary replica would not be guaranteed to be the same as the original one. Right? If this is the case, how would the ordering be preserved? Would timestamps be used?

Thanks in advance,

/Alberto G.


[1] https://geode.apache.org/docs/guide/12/topologies_and_comm/topology_concepts/multisite_overview.html

[2] https://geode.apache.org/docs/guide/12/developing/events/configuring_gateway_concurrency_levels.html



Re: Question about ordering of events in a partition in WAN deployments with parallel gateway senders

Posted by Alberto Gomez <al...@est.tech>.
Barry, thanks a lot for your thorough answer!

Cheers,

/Alberto G.

________________________________
From: Barry Oglesby <bo...@pivotal.io>
Sent: Monday, December 2, 2019 7:39 PM
To: user@geode.apache.org <us...@geode.apache.org>
Subject: Re: Question about ordering of events in a partition in WAN deployments with parallel gateway senders

Alberto,

At the gateway sender / receiver level, event ordering is preserved because the same event processor is processing the same buckets (partitions) across the same connection to the same remote receiver.

The data region's primary buckets are spread among the servers. In each member defining the gateway sender, there is one event processor per configured dispatcher thread. The primary buckets in each member are split up among the event processors. The event processor processes the events in its buckets in order (actually random across buckets, but ordered within a bucket). It sends its batches on the same connection to the same remote receiver.

So, the processing looks like:

configured dispatcher thread -> event processor thread -> set of primary buckets -> batches on the same connection

At the region level, the region entry also has versioning, so an earlier event will not overwrite a later event.

Thanks,
Barry Oglesby



On Mon, Dec 2, 2019 at 8:57 AM Alberto Gomez <al...@est.tech> wrote:
Hi,

On the subject of parallel gateway sender receivers, according to the Geode documentation:

[1]:
Although parallel gateway senders provide the best throughput for WAN distribution, they provide less control for event ordering. Event ordering for the region as a whole is not preserved, because multiple Geode servers distribute the regions events at the same time. However, the ordering of events for a given partition can be preserved. See Configuring Multi-Site (WAN) Event Queues.

[2]:
You cannot configure the order-policy for a parallel event queue, because parallel queues cannot preserve event ordering for regions. Only the ordering of events for a given partition (or in a given queue of a distributed region) can be preserved.


Given that the documentation seems to state that the ordering of events can be
preserved for a given partition when using parallel gateway senders, my question is how can this be achieved when using several gateway receivers.

Example:
Suppose we have a WAN deployment with 2 clusters, one with parallel gateway senders and a remote one with several gateway receivers.

Entry1 is updated twice consecutively as follows:
- First update: attribute1 = 3
- Second update: attribute1 = 5

As a result of the previous updates on Entry1, the gateway sender running on the server hosting the bucket where Entry1 is stored will send two events to the remote cluster that has several gateway receivers.

Could each event be sent to a different server or to the same server but over two different connections?
If this is possible, the order of arrival of the events to the server hosting the primary replica would not be guaranteed to be the same as the original one. Right? If this is the case, how would the ordering be preserved? Would timestamps be used?

Thanks in advance,

/Alberto G.


[1] https://geode.apache.org/docs/guide/12/topologies_and_comm/topology_concepts/multisite_overview.html

[2] https://geode.apache.org/docs/guide/12/developing/events/configuring_gateway_concurrency_levels.html



Re: Question about ordering of events in a partition in WAN deployments with parallel gateway senders

Posted by Barry Oglesby <bo...@pivotal.io>.
Alberto,

At the gateway sender / receiver level, event ordering is preserved because
the same event processor is processing the same buckets (partitions) across
the same connection to the same remote receiver.

The data region's primary buckets are spread among the servers. In each
member defining the gateway sender, there is one event processor per
configured dispatcher thread. The primary buckets in each member are split
up among the event processors. The event processor processes the events in
its buckets in order (actually random across buckets, but ordered within a
bucket). It sends its batches on the same connection to the same remote
receiver.

So, the processing looks like:

configured dispatcher thread -> event processor thread -> set of primary
buckets -> batches on the same connection

At the region level, the region entry also has versioning, so an earlier
event will not overwrite a later event.

Thanks,
Barry Oglesby



On Mon, Dec 2, 2019 at 8:57 AM Alberto Gomez <al...@est.tech> wrote:

> Hi,
>
> On the subject of parallel gateway sender receivers, according to the
> Geode documentation:
>
> [1]:
> Although parallel gateway senders provide the best throughput for WAN
> distribution, they provide less control for event ordering. Event ordering
> for the region as a whole is not preserved, because multiple Geode servers
> distribute the regions events at the same time. However, the ordering of
> events for a given partition can be preserved. See Configuring Multi-Site
> (WAN) Event Queues.
>
> [2]:
> You cannot configure the order-policy for a parallel event queue, because
> parallel queues cannot preserve event ordering for regions. Only the
> ordering of events for a given partition (or in a given queue of a
> distributed region) can be preserved.
>
>
> Given that the documentation seems to state that the ordering of events
> can be
> preserved for a given partition when using parallel gateway senders, my
> question is how can this be achieved when using several gateway receivers.
>
> Example:
> Suppose we have a WAN deployment with 2 clusters, one with parallel
> gateway senders and a remote one with several gateway receivers.
>
> Entry1 is updated twice consecutively as follows:
> - First update: attribute1 = 3
> - Second update: attribute1 = 5
>
> As a result of the previous updates on Entry1, the gateway sender running
> on the server hosting the bucket where Entry1 is stored will send two
> events to the remote cluster that has several gateway receivers.
>
> Could each event be sent to a different server or to the same server but
> over two different connections?
> If this is possible, the order of arrival of the events to the server
> hosting the primary replica would not be guaranteed to be the same as the
> original one. Right? If this is the case, how would the ordering be
> preserved? Would timestamps be used?
>
> Thanks in advance,
>
> /Alberto G.
>
>
> [1]
> https://geode.apache.org/docs/guide/12/topologies_and_comm/topology_concepts/multisite_overview.html
>
> [2]
> https://geode.apache.org/docs/guide/12/developing/events/configuring_gateway_concurrency_levels.html
>
>
>