You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@ignite.apache.org by ssansoy <s....@cmcmarkets.com> on 2020/07/03 08:17:01 UTC

Block until partition map exchange is complete

Hi Ignite users,

I have 3 nodes running, with a cache with the following configuration:

cacheConfiguration.setCacheMode(CacheMode.PARTITIONED);
cacheConfiguration.setBackups(1);
cacheConfiguration.setRebalanceMode(CacheRebalanceMode.SYNC);
cacheConfiguration.setWriteSynchronizationMode(CacheWriteSynchronizationMode.FULL_SYNC);
cacheConfiguration.setAtomicityMode(CacheAtomicityMode.TRANSACTIONAL);

E.g. a partitioned cache with 1 backup - so if one of the three nodes goes
down, all the data is still available across the remaining 2 nodes.

I also have some custom code that runs on the current "leader". E.g. the
server code runs some tasks if it is the leader node - defined as being the
"oldest node".
The code running on each server registers a listener for 

{EventType.EVT_NODE_SEGMENTED,
EventType.EVT_NODE_FAILED,EventType.EVT_NODE_LEFT}

And if it discovers that it is now the new leader, the tasks restart on the
new "oldest node".

This works fine. The issue I am having is that one of these tasks that runs
on the leader, needs to issue a cache query to do some work.

I am finding, if one of my three nodes drops off, when one of the remaining
two nodes becomes the leader and resumes the work, the records it gets back
from the cache are incomplete. E.g. there may be 400 entries in the cache,
but when node 1 drops off and node 2 takes over - it only sees 250, or some
other number less than 400. A little later, this does correctly return to
400 - I expect because the exchange process has completed behind the scenes
and the node now has all the data it needs.

I am a little surprised by this however, because I am using
CacheRebalanceMode.SYNC which suggests from the docs here:
https://apacheignite.readme.io/docs/rebalancing that

"This means that any call to cache public API will be blocked until
rebalancing is finished."

E.g. if I call ignite.cache("MYCACHE").size() (a public cache method) then
this should not return an incomplete number, but rather block until the
underlying rebalance has completes and then only return 400.

Does anyone have any pointers to what I might be doing wrong here? Thanks!



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Block until partition map exchange is complete

Posted by Ilya Kasnacheev <il...@gmail.com>.
Hello!

I think that you need to start a new isolated transaction from callback
(possibly in another thread) to get proper isolation.

Regards,
-- 
Ilya Kasnacheev


пт, 21 авг. 2020 г. в 18:34, ssansoy <s....@cmcmarkets.com>:

> Still seeing the same issue in 2.8.1 unfortunately.
>
> I have a related question however.
>
> Assuming I perform the following operation on node 1 of my 3 node cluster
> (All caches use CacheRebalanceMode.SYNC,
> CacheWriteSynchronizationMode.FULL_SYNC, CacheAtomicityMode.TRANSACTIONAL):
>
>
>               try (Transaction tx = ignite.transactions().txStart(
>                         TransactionConcurrency.PESSIMISTIC,
>                         TransactionIsolation.READ_COMMITTED,
> transactionTimeout, igniteTransactionBatchSize)) {
>
> // write 1 record to cache A
> // write 11 records to cache B
>
> tx.commit()
>
> }
>
>
> How should I expect the updated A and B records to appear on some other
> node, e.g. node 2.
> I was expecting them to both become visible together at exactly the same
> time. I am using CacheMode.REPLICATED.
>
> On node 2, I am performing a scan query on A, and in the local listen for
> A,
> I am getting those 11 B records (using an SQLFieldsQuery) that were updated
> in the same transaction. However, they don't seem to always be visible
> until
> some time after the local listen for A has been entered. If I put a sleep
> in
> there and try again, I do get all the B's
>
> 2020-08-21 16:25:05,484 [callback-#192] DEBUG x.TableDataSelector [] -
> Executing SQL query SqlFieldsQuery [sql=SELECT * FROM B WHERE A_FK =
> 'TEST4', args=null, collocated=false, timeout=-1, enforceJoinOrder=false,
> distributedJoins=false, replicatedOnly=false, lazy=false, schema=null,
> updateBatchSize=1]
> 2020-08-21 16:25:05,486 [callback-#192] DEBUG x.TableDataSelector [] -
> Received 3 results
> 2020-08-21 16:25:05,486 [callback-#192] DEBUG x.TableDataSelector [] -
> Trying again in 5 seconds
> 2020-08-21 16:25:10,486 [callback-#192] DEBUG x.TableDataSelector [] -
> Received 11 results
>
>
> My local listen for A is annotated with @IgniteAsyncCallback incase that
> matters. Anything obviously wrong here?
> My requirement is that node 2 has access to A and all the updated B's that
> were written on node 1.
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>

Re: Block until partition map exchange is complete

Posted by ssansoy <s....@cmcmarkets.com>.
Still seeing the same issue in 2.8.1 unfortunately.

I have a related question however. 

Assuming I perform the following operation on node 1 of my 3 node cluster 
(All caches use CacheRebalanceMode.SYNC,
CacheWriteSynchronizationMode.FULL_SYNC, CacheAtomicityMode.TRANSACTIONAL):


              try (Transaction tx = ignite.transactions().txStart(
                        TransactionConcurrency.PESSIMISTIC,
                        TransactionIsolation.READ_COMMITTED,
transactionTimeout, igniteTransactionBatchSize)) {

// write 1 record to cache A
// write 11 records to cache B

tx.commit()

}


How should I expect the updated A and B records to appear on some other
node, e.g. node 2.
I was expecting them to both become visible together at exactly the same
time. I am using CacheMode.REPLICATED.

On node 2, I am performing a scan query on A, and in the local listen for A,
I am getting those 11 B records (using an SQLFieldsQuery) that were updated
in the same transaction. However, they don't seem to always be visible until
some time after the local listen for A has been entered. If I put a sleep in
there and try again, I do get all the B's

2020-08-21 16:25:05,484 [callback-#192] DEBUG x.TableDataSelector [] -
Executing SQL query SqlFieldsQuery [sql=SELECT * FROM B WHERE A_FK =
'TEST4', args=null, collocated=false, timeout=-1, enforceJoinOrder=false,
distributedJoins=false, replicatedOnly=false, lazy=false, schema=null,
updateBatchSize=1]
2020-08-21 16:25:05,486 [callback-#192] DEBUG x.TableDataSelector [] -
Received 3 results
2020-08-21 16:25:05,486 [callback-#192] DEBUG x.TableDataSelector [] -
Trying again in 5 seconds
2020-08-21 16:25:10,486 [callback-#192] DEBUG x.TableDataSelector [] -
Received 11 results


My local listen for A is annotated with @IgniteAsyncCallback incase that
matters. Anything obviously wrong here?
My requirement is that node 2 has access to A and all the updated B's that
were written on node 1.



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Block until partition map exchange is complete

Posted by Ilya Kasnacheev <il...@gmail.com>.
Hello!

It is supposed to be fixed in 2.8. Did you check that?

Thanks.
-- 
Ilya Kasnacheev


ср, 22 июл. 2020 г. в 12:24, ssansoy <s....@cmcmarkets.com>:

> Hi, could the behaviour I have observed be captured by this bug:
>
> https://issues.apache.org/jira/browse/IGNITE-9841
>
> "Note, ScanQuery exhibits the same behavior - returns partial results when
> some partitions are lost.  Not sure if solution would be related or needs
> to
> be tracked and fixed under a separate ticket."
>
>
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>

Re: Block until partition map exchange is complete

Posted by ssansoy <s....@cmcmarkets.com>.
Hi, could the behaviour I have observed be captured by this bug:

https://issues.apache.org/jira/browse/IGNITE-9841

"Note, ScanQuery exhibits the same behavior - returns partial results when
some partitions are lost.  Not sure if solution would be related or needs to
be tracked and fixed under a separate ticket."





--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Block until partition map exchange is complete

Posted by Ilya Kasnacheev <il...@gmail.com>.
Hello!

I'm not actually sure. Do you have a reproducer where you see decreased
count() result? What is PartitionLossPolicy, have you tried tweaking it?

I can see a method in our tests for doing that, and it is very raw: it
checks every cache to make sure that all partitions are OWNING. This
is org.apache.ignite.testframework.junits.common.GridCommonAbstractTest#awaitPartitionMapExchange(boolean,
boolean, java.util.Collection<org.apache.ignite.cluster.ClusterNode>,
boolean, java.util.Set<java.lang.String>)

Regards,
-- 
Ilya Kasnacheev


ср, 15 июл. 2020 г. в 12:03, ssansoy <s....@cmcmarkets.com>:

> By the way, just referring back to the original question - is there such a
> callback that can be used to wait for the partition exchange to complete,
> in
> any version of ignite? We are using ignite 2.7.6 (which I acknowledge is
> slightly behind - but we planning to upgrade)
>
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>

Re: Block until partition map exchange is complete

Posted by ssansoy <s....@cmcmarkets.com>.
By the way, just referring back to the original question - is there such a
callback that can be used to wait for the partition exchange to complete, in
any version of ignite? We are using ignite 2.7.6 (which I acknowledge is
slightly behind - but we planning to upgrade)




--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Block until partition map exchange is complete

Posted by ssansoy <s....@cmcmarkets.com>.
Hi, the following setup should reproduce the issue:

A server class starts up a server node with the config in my original mail
(eg 3 servers, partitioned with 1 backup). In that class, at the end do
something like:

ignite.events(ignite.cluster().forServers()).localListen(ignitePredicate,
            EventType.EVT_NODE_SEGMENTED, EventType.EVT_NODE_FAILED,
            EventType.EVT_NODE_LEFT);

in that ignitePredicate - "if the current node is the oldest node in the
cluster", then do a ScanQuery on some cache MYCACHE and print out the
records.

The first server to start up will print out all the records.
If you kill the first server, the next oldest server will perform the scan
query upon receiving the NODE_LEFT event and will not print out all the
records (because the localListen runs before the exchange has happened).

Is that enough information?

The issue has gone away now that I have updated my ignitePredicate to merely
set a flag if this node is the oldest, and have a seperate scheduledExecutor
to periodically check that flag, and if true then do the scan query. This
seems to work - probably because there is a sufficient delay before
performing the scan query. My worry is, we could get unlucky with scheduling
and the Scan query could still occur after the flag is set, but before the
locallisten has returned. E.g. ideally it would seem sensible to either
support @IgniteAsyncCallback in this local listen (which hopefully takes
care of the ordering) or have a callback that can be executed after the
localListen has returned (if that is indeed cause of the issue here)





--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Block until partition map exchange is complete

Posted by Ilya Kasnacheev <il...@gmail.com>.
Hello!

Can you throw together a reproducer project which shows this behavior? I
would check.

Regards,
-- 
Ilya Kasnacheev


пт, 3 июл. 2020 г. в 13:14, ssansoy <s....@cmcmarkets.com>:

> Thanks - the issue I have now is how can I confirm that the local listen
> has
> returned before executing my code?
> e.g. in the local listen I can set a flag, and then the local listen
> returns
> - but the thread that detects this flag and runs the task could still be
> scheduled to run before the local listen has returned.
> Is there a callback I can register which is triggered after the local
> listen
> returns so I can guarantee I am executing in the correct order (e.g. after
> whatever needs to be committed has been committed)?
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>

Re: Block until partition map exchange is complete

Posted by ssansoy <s....@cmcmarkets.com>.
Thanks - the issue I have now is how can I confirm that the local listen has
returned before executing my code?
e.g. in the local listen I can set a flag, and then the local listen returns
- but the thread that detects this flag and runs the task could still be
scheduled to run before the local listen has returned.
Is there a callback I can register which is triggered after the local listen
returns so I can guarantee I am executing in the correct order (e.g. after
whatever needs to be committed has been committed)?



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Block until partition map exchange is complete

Posted by Ilya Kasnacheev <il...@gmail.com>.
Hello!

Yes, you need to return from event listener as soon as you can.

Regards,
-- 
Ilya Kasnacheev


пт, 3 июл. 2020 г. в 12:03, ssansoy <s....@cmcmarkets.com>:

> Hi Ilya, thanks for the quick help!
> Within the local listen, I am adding a task to an executor - so the cache
> operations happen in a different thread. However, is the key thing here
> that
> the local listen handler metho needs to have returned?
> E.g. the local listen may not have fully completed by the time the task on
> the executor has been started - so perhaps there is a transaction still
> open
> somewhere by the time the cache operation occurs?
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>

Re: Block until partition map exchange is complete

Posted by ssansoy <s....@cmcmarkets.com>.
Hi Ilya, thanks for the quick help!
Within the local listen, I am adding a task to an executor - so the cache
operations happen in a different thread. However, is the key thing here that
the local listen handler metho needs to have returned?
E.g. the local listen may not have fully completed by the time the task on
the executor has been started - so perhaps there is a transaction still open
somewhere by the time the cache operation occurs?



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Block until partition map exchange is complete

Posted by Ilya Kasnacheev <il...@gmail.com>.
Hello!

Do you issue your cache operations from event listener thread? This might
be unsafe and also not return the expected results. Event listeners are
invoked from internal threads.

Consider issuing a task to public pool from event listener, and then
returning. I would expect that task will run when rebalance already takes
place.

Regards,
-- 
Ilya Kasnacheev


пт, 3 июл. 2020 г. в 11:17, ssansoy <s....@cmcmarkets.com>:

> Hi Ignite users,
>
> I have 3 nodes running, with a cache with the following configuration:
>
> cacheConfiguration.setCacheMode(CacheMode.PARTITIONED);
> cacheConfiguration.setBackups(1);
> cacheConfiguration.setRebalanceMode(CacheRebalanceMode.SYNC);
>
> cacheConfiguration.setWriteSynchronizationMode(CacheWriteSynchronizationMode.FULL_SYNC);
> cacheConfiguration.setAtomicityMode(CacheAtomicityMode.TRANSACTIONAL);
>
> E.g. a partitioned cache with 1 backup - so if one of the three nodes goes
> down, all the data is still available across the remaining 2 nodes.
>
> I also have some custom code that runs on the current "leader". E.g. the
> server code runs some tasks if it is the leader node - defined as being the
> "oldest node".
> The code running on each server registers a listener for
>
> {EventType.EVT_NODE_SEGMENTED,
> EventType.EVT_NODE_FAILED,EventType.EVT_NODE_LEFT}
>
> And if it discovers that it is now the new leader, the tasks restart on the
> new "oldest node".
>
> This works fine. The issue I am having is that one of these tasks that runs
> on the leader, needs to issue a cache query to do some work.
>
> I am finding, if one of my three nodes drops off, when one of the remaining
> two nodes becomes the leader and resumes the work, the records it gets back
> from the cache are incomplete. E.g. there may be 400 entries in the cache,
> but when node 1 drops off and node 2 takes over - it only sees 250, or some
> other number less than 400. A little later, this does correctly return to
> 400 - I expect because the exchange process has completed behind the scenes
> and the node now has all the data it needs.
>
> I am a little surprised by this however, because I am using
> CacheRebalanceMode.SYNC which suggests from the docs here:
> https://apacheignite.readme.io/docs/rebalancing that
>
> "This means that any call to cache public API will be blocked until
> rebalancing is finished."
>
> E.g. if I call ignite.cache("MYCACHE").size() (a public cache method) then
> this should not return an incomplete number, but rather block until the
> underlying rebalance has completes and then only return 400.
>
> Does anyone have any pointers to what I might be doing wrong here? Thanks!
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>