You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@ignite.apache.org by ssansoy <s....@cmcmarkets.com> on 2020/10/20 16:37:39 UTC

Workaround for getting ContinuousQuery to support transactions

Following on from:
http://apache-ignite-users.70518.x6.nabble.com/ContinuousQuery-Batch-updates-td34198.html

The takeaway from there is that the continuous queries do not honour
transactions, so if a writer writes 100 records (e.g. CalculationParameters)
in a transaction, the continuous query will see the updates before the
entire batch of 100 has been committed.
This is a show stopper issue for us for using ignite unfortunately so we are
trying to think of some work arounds.

We are considering updating the writer app so, when writing the 100
CalculationParameters records, it:

1. Writes the 100 CalculationParameter records to the cluster (e.g. with
some incremented version e.g. 2)
2. It writes a separate entry into a special "Modifications" cache, with the
number of rows written (numRows), and the version id of those records
(versionId).

Client apps don't subscribe to the CalculationParameter cache. Instead apps
subscribe to the Modifications cache.

The remote filter will:

1. Do a cache.get or a scan query on all the records in
CalculationParameter, filtering on versionId=2. The query has to keep
repeating until all rows are visible (e.g. the numRows records are seen).
Then these records are subjected to another filter (e.g. the usual filter
criteria the the client app would have applied to CalculationParameter). If
there are any records, then the filter returns true.

2. The transformer does a similar process to the above, and groups all the
numRows records into a collection, and returns this collection to the
localListen in the client. The client then has access to all the records it
is interested in, in one batch.

We would have liked to avoid waiting/retrying for all the
CalculationParameter records to appear after the Modifications update is
seen, but since this is happening on the cluster and not on the client we
can probably live with it.

Do any ignite developers in here see any fundamental flaws with this
approach? We ultimately just want the localListen to be called with a
collection of records that were all updated at the same time - so if there
are any other things worth trying please shout.
Thanks!
Sham



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Workaround for getting ContinuousQuery to support transactions

Posted by Ilya Kasnacheev <il...@gmail.com>.

Hello!

I think it's OK to try.

Regards,
-- 
Ilya Kasnacheev


пн, 9 нояб. 2020 г. в 19:56, ssansoy <s....@cmcmarkets.com>:

> interesting! might just work. We will try it out.
> E.g. A chance of 500 V's. V has fields a, b, c, (b=foo on all records) and
> some client app wants to run a continuous query on all V where b=foo, or
> was
> =foo but now is not following the update.
>
> The writer updates 100 V's, by setting b=bar on all records, and some
> incrementing version int N
> The datastreamer transformer mutates V by adding a new field called
> "changes" which contains b=foo to denote that only the field b was changed,
> and it's old value was foo. (e.g. a set of {fieldname, oldvalue}, {.... )
> The writer updates the V_signal cache to denote a change was made, with
> version N.
>
> The client continuous query listens to the V_signal cache. When it receives
> an update (denoting V updates have occurred), it does a scanquery on V in
> the transformer, (scan query filters the records that were updated as part
> of version N, and either the fields we care about match our predicate, or
> the "changes" field are one of the ones we are interested in and match the
> predicate).
>
> These are batched up as a collection and returned to the client. Does this
> seem like a reasonable approach?
>
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>

Re: Workaround for getting ContinuousQuery to support transactions

Posted by ssansoy <s....@cmcmarkets.com>.

interesting! might just work. We will try it out.
E.g. A chance of 500 V's. V has fields a, b, c, (b=foo on all records) and
some client app wants to run a continuous query on all V where b=foo, or was
=foo but now is not following the update.

The writer updates 100 V's, by setting b=bar on all records, and some
incrementing version int N
The datastreamer transformer mutates V by adding a new field called
"changes" which contains b=foo to denote that only the field b was changed,
and it's old value was foo. (e.g. a set of {fieldname, oldvalue}, {.... )
The writer updates the V_signal cache to denote a change was made, with
version N.

The client continuous query listens to the V_signal cache. When it receives
an update (denoting V updates have occurred), it does a scanquery on V in
the transformer, (scan query filters the records that were updated as part
of version N, and either the fields we care about match our predicate, or
the "changes" field are one of the ones we are interested in and match the
predicate).

These are batched up as a collection and returned to the client. Does this
seem like a reasonable approach?




--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Workaround for getting ContinuousQuery to support transactions

Posted by Ilya Kasnacheev <il...@gmail.com>.

Hello!

You can have a transformer on your data streamer to do something to the old
value (e.g. keep some fields of it in new value V).

Regards,
-- 
Ilya Kasnacheev


пн, 9 нояб. 2020 г. в 14:52, ssansoy <s....@cmcmarkets.com>:

> Thanks for this,
>
> We considering this approach - writing all the entries to some table V, and
> then updating a separate token cache T with a signal, picked up by the
> continuous query, which then filters the underlying V records, transforms
> them and sends them to the client.
>
> However, one problem we ran into - is we lose the "old" values from the
> underlying table V. Normally the continuous query has access to the new
> value and the old value, so the client app can detect which entries no
> longer match the remote filter. With this technique however, the continuous
> query remote transformer has the old and new value of T, but is ultimately
> doing a ScanQuery on V to get all the "current" values.
>
> Do you have any advice on how we can still achieve that?
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>

Re: Workaround for getting ContinuousQuery to support transactions

Posted by ssansoy <s....@cmcmarkets.com>.

Thanks for this,

We considering this approach - writing all the entries to some table V, and
then updating a separate token cache T with a signal, picked up by the
continuous query, which then filters the underlying V records, transforms
them and sends them to the client.

However, one problem we ran into - is we lose the "old" values from the
underlying table V. Normally the continuous query has access to the new
value and the old value, so the client app can detect which entries no
longer match the remote filter. With this technique however, the continuous
query remote transformer has the old and new value of T, but is ultimately
doing a ScanQuery on V to get all the "current" values.

Do you have any advice on how we can still achieve that?



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Workaround for getting ContinuousQuery to support transactions

Posted by Ilya Kasnacheev <il...@gmail.com>.

Hello!

After you flush the data streamer, you may put a token entry into a small
cache to be picked by continuous query. When query handler is triggered,
all the data will already be available from caches.

The difference with transactional behavior is that transactions promise
(and fail to deliver) "at the same time" guarantee, whilst data streamer
will deliver on "after" guarantee.

Regards,
-- 
Ilya Kasnacheev


пт, 6 нояб. 2020 г. в 19:16, ssansoy <s....@cmcmarkets.com>:

> Yeah the key thing is to be able to be notified when all records have been
> updated in the cache.
> We've tried using IgniteLock or this too by the way (e.g. the writer locks,
> writes the records, unlocks).
>
> Then the client app, internally queues all updates as they arrive from the
> continuous query. If it can acquire the lock, then it knows all updates
> have
> arrived (because the writer has finished). However this doesn't work
> either,
> because even though the writer has unlocked, the written records are still
> in transit to the continuous query (e.g. they don't exist in the internal
> client side queue yet)
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>

Re: Workaround for getting ContinuousQuery to support transactions

Posted by ssansoy <s....@cmcmarkets.com>.

Yeah the key thing is to be able to be notified when all records have been
updated in the cache.
We've tried using IgniteLock or this too by the way (e.g. the writer locks,
writes the records, unlocks).

Then the client app, internally queues all updates as they arrive from the
continuous query. If it can acquire the lock, then it knows all updates have
arrived (because the writer has finished). However this doesn't work either,
because even though the writer has unlocked, the written records are still
in transit to the continuous query (e.g. they don't exist in the internal
client side queue yet)



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Workaround for getting ContinuousQuery to support transactions

Posted by Ilya Kasnacheev <il...@gmail.com>.

Hello!

It may handle your use case (doing something when all records are in cache).
But it will not fix the tool that you're using for it (continuous query
with expectation of batched handling).

Regards,
-- 
Ilya Kasnacheev


пт, 6 нояб. 2020 г. в 16:41, ssansoy <s....@cmcmarkets.com>:

> Ah ok so this wouldn't help solve our problem?
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>

Re: Workaround for getting ContinuousQuery to support transactions

Posted by ssansoy <s....@cmcmarkets.com>.

Ah ok so this wouldn't help solve our problem?



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Workaround for getting ContinuousQuery to support transactions

Posted by Ilya Kasnacheev <il...@gmail.com>.

Hello!

No, it does not mean anything about the continuous query listener.

But it means that once a data streamer is flushed, all data is available in
caches.

Regards,
-- 
Ilya Kasnacheev


вт, 3 нояб. 2020 г. в 16:28, ssansoy <s....@cmcmarkets.com>:

> Thanks,
> How is this different to multiple puts inside a transaction?
>
> By using the data streamer to write the records, does that mean the
> continuous query will receive all 10,000 records in one go in the local
> listen?
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>

Re: Workaround for getting ContinuousQuery to support transactions

Posted by ssansoy <s....@cmcmarkets.com>.

Thanks,
How is this different to multiple puts inside a transaction?

By using the data streamer to write the records, does that mean the
continuous query will receive all 10,000 records in one go in the local
listen?



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Workaround for getting ContinuousQuery to support transactions

Posted by Ilya Kasnacheev <il...@gmail.com>.

Hello!

You may actually use our data streamer (with allowOverwrite false).

Once you call flush() on it and it returns, you should be confident that
all 10,000 entries are readable from cache. Of course it has to be 1 cache.

Regards,
-- 
Ilya Kasnacheev


вт, 27 окт. 2020 г. в 13:18, ssansoy <s....@cmcmarkets.com>:

> Hi thanks for the reply. Appreciate the suggestion - and if creating a new
> solution around this, we would likely take that tact. Unfortunately the
> entire platform we are looking to migrate over to ignite has dependencies
> in
> places for updates to come in as a complete batch (e.g. whatever was in an
> update transaction).
>
> We've experimented with putting a queue in the client as you say, with a
> timeout which gathers all sequentially arriving updates from the continuous
> query and grouping them together after a timeout of e.g. 50ms. However this
> is quite fragile/timing sensitive and not something we can comfortably put
> into production.
>
> Are there any locking or signaling mechanisms (or anything else really)
> that
> might help us here? E.g. we buffer the updates in the client, and await
> some
> signal that the updates are complete. This signal would need to be fired
> after the continuous query has seen all the updates. E.g. the writer will:
>
> Write 10,000 records to the cache
> Notify something
>
> The client app will:
> Receive 10,000 updates, 1 at a time, queueing them up locally
> Upon that notification, drain this queue and process the records.
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>

Re: Workaround for getting ContinuousQuery to support transactions

Posted by ssansoy <s....@cmcmarkets.com>.

Hi thanks for the reply. Appreciate the suggestion - and if creating a new
solution around this, we would likely take that tact. Unfortunately the
entire platform we are looking to migrate over to ignite has dependencies in
places for updates to come in as a complete batch (e.g. whatever was in an
update transaction).

We've experimented with putting a queue in the client as you say, with a
timeout which gathers all sequentially arriving updates from the continuous
query and grouping them together after a timeout of e.g. 50ms. However this
is quite fragile/timing sensitive and not something we can comfortably put
into production.

Are there any locking or signaling mechanisms (or anything else really) that
might help us here? E.g. we buffer the updates in the client, and await some
signal that the updates are complete. This signal would need to be fired
after the continuous query has seen all the updates. E.g. the writer will:

Write 10,000 records to the cache
Notify something

The client app will:
Receive 10,000 updates, 1 at a time, queueing them up locally
Upon that notification, drain this queue and process the records.



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Workaround for getting ContinuousQuery to support transactions

Posted by Ilya Kasnacheev <il...@gmail.com>.

Hello!

Nothing obviously wrong but it still seems to me that you're stretching it
too far.

Maybe you need to rethink your cache entry granularity (put more stuff in
the entry to make it self-contained) or use some kind of message queue.

Regards,
-- 
Ilya Kasnacheev


вт, 20 окт. 2020 г. в 19:37, ssansoy <s....@cmcmarkets.com>:

> Following on from:
>
> http://apache-ignite-users.70518.x6.nabble.com/ContinuousQuery-Batch-updates-td34198.html
>
> The takeaway from there is that the continuous queries do not honour
> transactions, so if a writer writes 100 records (e.g.
> CalculationParameters)
> in a transaction, the continuous query will see the updates before the
> entire batch of 100 has been committed.
> This is a show stopper issue for us for using ignite unfortunately so we
> are
> trying to think of some work arounds.
>
> We are considering updating the writer app so, when writing the 100
> CalculationParameters records, it:
>
> 1. Writes the 100 CalculationParameter records to the cluster (e.g. with
> some incremented version e.g. 2)
> 2. It writes a separate entry into a special "Modifications" cache, with
> the
> number of rows written (numRows), and the version id of those records
> (versionId).
>
> Client apps don't subscribe to the CalculationParameter cache. Instead apps
> subscribe to the Modifications cache.
>
> The remote filter will:
>
> 1. Do a cache.get or a scan query on all the records in
> CalculationParameter, filtering on versionId=2. The query has to keep
> repeating until all rows are visible (e.g. the numRows records are seen).
> Then these records are subjected to another filter (e.g. the usual filter
> criteria the the client app would have applied to CalculationParameter). If
> there are any records, then the filter returns true.
>
> 2. The transformer does a similar process to the above, and groups all the
> numRows records into a collection, and returns this collection to the
> localListen in the client. The client then has access to all the records it
> is interested in, in one batch.
>
> We would have liked to avoid waiting/retrying for all the
> CalculationParameter records to appear after the Modifications update is
> seen, but since this is happening on the cluster and not on the client we
> can probably live with it.
>
> Do any ignite developers in here see any fundamental flaws with this
> approach? We ultimately just want the localListen to be called with a
> collection of records that were all updated at the same time - so if there
> are any other things worth trying please shout.
> Thanks!
> Sham
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>