You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@ignite.apache.org by Denis Mekhanikov <dm...@gmail.com> on 2018/08/02 12:35:06 UTC

Backup queue for local continuous query

Igniters,

I noticed, that local continuous queries store queues with backup entries.
And since the queries are local, other nodes never send acknowledges.

It has two negative effects:
1) Backup queue may grow indefinitely and cause OOME.
2) When topology changes, the backup queue is flushed, and old
notifications are delivered to the listener, even if they happened long
time ago.

I prepared a fix for it. You can find it in the ticket: IGNITE-9009
<https://issues.apache.org/jira/browse/IGNITE-9009>
It makes local continuous queries ignore backup entries and not put them
into the backup queues.

Nikolay, Semyon,
You are the people, who implemented this piece of functionality.
Could you review these changes and share your opinion?

I can think of another way to fix it: we could register
*CacheContinuousQueryHandlers* on all nodes even for a local CQ, and make
them send backup notifications to the subscriber node.
It will help in cases, when you create local listeners on all nodes and
want all nodes to process changes, that happen on their primary partitions.
In this case backup entries may help not to lose updates. But actually it
will leave a time window between sending a backup acknowledge and notifying
the local listener. And such use-case will generate quadratic number of
handlers and pollute the network communication.

So, I prefer the first option.

What do you think?

Denis

Re: Backup queue for local continuous query

Posted by Andrey Mashenkov <an...@gmail.com>.

Sorry for late answer.

Yes, I'm agree local CQ should ignore backup partitions at all for
partitioned caches,
and we should trigger all events (primary and backups) for replicated
caches.

With this approach there is no need to collect backup events in queue
anymore.

On Fri, Aug 10, 2018 at 4:51 PM Denis Mekhanikov <dm...@gmail.com>
wrote:

> Andrey,
>
> I also think, that remote nodes shouldn't be involved.
>
> > Instead of using backup queue we can notify CQ on backup events instantly
> This would be an incompatible change. It would break users' code, that
> doesn't expect notifications on backup entries.
> On the other hand, it would make our API more consistent, since CQ events
> are triggered for all entries, if a cache is replicated.
> We could add *CacheEntryEvent#isPrimary* property, that would let users
> distinguish primary and backup events.
>
> But since it would change behaviour, I'm inclined to just ignore events on
> backup partitions.
>
> What do you think?
>
> Denis
>
> пт, 10 авг. 2018 г. в 13:41, Andrey Mashenkov <andrey.mashenkov@gmail.com
> >:
>
> > Hi Denis,
> >
> > As my understanding right, to prevent potential OOM Local CQ either
> > shouldn't collect anything in backup queue or other nodes should send
> akcs.
> >
> > Second way look strange as Local CQ runs on local node only and doesn't
> > care whats going on other nodes.
> > It is not clear what events and when other nodes should ack as they
> doesn't
> > participate in query?
> > When a primary node for a partition goes away and node with local CQ
> > running become a new primary
> > then seems, it make no sense to notify CQ with stale events from backup
> > queue and only new events make sense (as from new primary).
> >
> > Instead of using backup queue we can notify CQ on backup events instantly
> > and let user filter out backup events if needed in listener.
> >
> > So, I'd expect local query shouldn't involve remote nodes at all.
> > Also, I'm not sure we should filter out backup events. Can we let users
> to
> > do this in their listener code as it shouldn't costs for local CQ?
> >
> >
> >
> >
> >
> > On Thu, Aug 2, 2018 at 3:35 PM Denis Mekhanikov <dm...@gmail.com>
> > wrote:
> >
> > > Igniters,
> > >
> > > I noticed, that local continuous queries store queues with backup
> > entries.
> > > And since the queries are local, other nodes never send acknowledges.
> > >
> > > It has two negative effects:
> > > 1) Backup queue may grow indefinitely and cause OOME.
> > > 2) When topology changes, the backup queue is flushed, and old
> > > notifications are delivered to the listener, even if they happened long
> > > time ago.
> > >
> > > I prepared a fix for it. You can find it in the ticket: IGNITE-9009
> > > <https://issues.apache.org/jira/browse/IGNITE-9009>
> > > It makes local continuous queries ignore backup entries and not put
> them
> > > into the backup queues.
> > >
> > > Nikolay, Semyon,
> > > You are the people, who implemented this piece of functionality.
> > > Could you review these changes and share your opinion?
> > >
> > > I can think of another way to fix it: we could register
> > > *CacheContinuousQueryHandlers* on all nodes even for a local CQ, and
> make
> > > them send backup notifications to the subscriber node.
> > > It will help in cases, when you create local listeners on all nodes and
> > > want all nodes to process changes, that happen on their primary
> > partitions.
> > > In this case backup entries may help not to lose updates. But actually
> it
> > > will leave a time window between sending a backup acknowledge and
> > notifying
> > > the local listener. And such use-case will generate quadratic number of
> > > handlers and pollute the network communication.
> > >
> > > So, I prefer the first option.
> > >
> > > What do you think?
> > >
> > > Denis
> > >
> >
> >
> > --
> > Best regards,
> > Andrey V. Mashenkov
> >
>


-- 
Best regards,
Andrey V. Mashenkov

Re: Backup queue for local continuous query

Posted by Denis Mekhanikov <dm...@gmail.com>.

Andrey,

I also think, that remote nodes shouldn't be involved.

> Instead of using backup queue we can notify CQ on backup events instantly
This would be an incompatible change. It would break users' code, that
doesn't expect notifications on backup entries.
On the other hand, it would make our API more consistent, since CQ events
are triggered for all entries, if a cache is replicated.
We could add *CacheEntryEvent#isPrimary* property, that would let users
distinguish primary and backup events.

But since it would change behaviour, I'm inclined to just ignore events on
backup partitions.

What do you think?

Denis

пт, 10 авг. 2018 г. в 13:41, Andrey Mashenkov <an...@gmail.com>:

> Hi Denis,
>
> As my understanding right, to prevent potential OOM Local CQ either
> shouldn't collect anything in backup queue or other nodes should send akcs.
>
> Second way look strange as Local CQ runs on local node only and doesn't
> care whats going on other nodes.
> It is not clear what events and when other nodes should ack as they doesn't
> participate in query?
> When a primary node for a partition goes away and node with local CQ
> running become a new primary
> then seems, it make no sense to notify CQ with stale events from backup
> queue and only new events make sense (as from new primary).
>
> Instead of using backup queue we can notify CQ on backup events instantly
> and let user filter out backup events if needed in listener.
>
> So, I'd expect local query shouldn't involve remote nodes at all.
> Also, I'm not sure we should filter out backup events. Can we let users to
> do this in their listener code as it shouldn't costs for local CQ?
>
>
>
>
>
> On Thu, Aug 2, 2018 at 3:35 PM Denis Mekhanikov <dm...@gmail.com>
> wrote:
>
> > Igniters,
> >
> > I noticed, that local continuous queries store queues with backup
> entries.
> > And since the queries are local, other nodes never send acknowledges.
> >
> > It has two negative effects:
> > 1) Backup queue may grow indefinitely and cause OOME.
> > 2) When topology changes, the backup queue is flushed, and old
> > notifications are delivered to the listener, even if they happened long
> > time ago.
> >
> > I prepared a fix for it. You can find it in the ticket: IGNITE-9009
> > <https://issues.apache.org/jira/browse/IGNITE-9009>
> > It makes local continuous queries ignore backup entries and not put them
> > into the backup queues.
> >
> > Nikolay, Semyon,
> > You are the people, who implemented this piece of functionality.
> > Could you review these changes and share your opinion?
> >
> > I can think of another way to fix it: we could register
> > *CacheContinuousQueryHandlers* on all nodes even for a local CQ, and make
> > them send backup notifications to the subscriber node.
> > It will help in cases, when you create local listeners on all nodes and
> > want all nodes to process changes, that happen on their primary
> partitions.
> > In this case backup entries may help not to lose updates. But actually it
> > will leave a time window between sending a backup acknowledge and
> notifying
> > the local listener. And such use-case will generate quadratic number of
> > handlers and pollute the network communication.
> >
> > So, I prefer the first option.
> >
> > What do you think?
> >
> > Denis
> >
>
>
> --
> Best regards,
> Andrey V. Mashenkov
>

Re: Backup queue for local continuous query

Posted by Andrey Mashenkov <an...@gmail.com>.

Hi Denis,

As my understanding right, to prevent potential OOM Local CQ either
shouldn't collect anything in backup queue or other nodes should send akcs.

Second way look strange as Local CQ runs on local node only and doesn't
care whats going on other nodes.
It is not clear what events and when other nodes should ack as they doesn't
participate in query?
When a primary node for a partition goes away and node with local CQ
running become a new primary
then seems, it make no sense to notify CQ with stale events from backup
queue and only new events make sense (as from new primary).

Instead of using backup queue we can notify CQ on backup events instantly
and let user filter out backup events if needed in listener.

So, I'd expect local query shouldn't involve remote nodes at all.
Also, I'm not sure we should filter out backup events. Can we let users to
do this in their listener code as it shouldn't costs for local CQ?

On Thu, Aug 2, 2018 at 3:35 PM Denis Mekhanikov <dm...@gmail.com>
wrote:

> Igniters,
>
> I noticed, that local continuous queries store queues with backup entries.
> And since the queries are local, other nodes never send acknowledges.
>
> It has two negative effects:
> 1) Backup queue may grow indefinitely and cause OOME.
> 2) When topology changes, the backup queue is flushed, and old
> notifications are delivered to the listener, even if they happened long
> time ago.
>
> I prepared a fix for it. You can find it in the ticket: IGNITE-9009
> <https://issues.apache.org/jira/browse/IGNITE-9009>
> It makes local continuous queries ignore backup entries and not put them
> into the backup queues.
>
> Nikolay, Semyon,
> You are the people, who implemented this piece of functionality.
> Could you review these changes and share your opinion?
>
> I can think of another way to fix it: we could register
> *CacheContinuousQueryHandlers* on all nodes even for a local CQ, and make
> them send backup notifications to the subscriber node.
> It will help in cases, when you create local listeners on all nodes and
> want all nodes to process changes, that happen on their primary partitions.
> In this case backup entries may help not to lose updates. But actually it
> will leave a time window between sending a backup acknowledge and notifying
> the local listener. And such use-case will generate quadratic number of
> handlers and pollute the network communication.
>
> So, I prefer the first option.
>
> What do you think?
>
> Denis
>

-- 
Best regards,
Andrey V. Mashenkov