You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@ignite.apache.org by Raymond Wilson <ra...@trimble.com> on 2017/07/03 22:00:23 UTC

Possible race condition when starting up Ignite nodes taking part in partition affinity map

Hi,



I have been working on a POC using the Ignite v1.9 C# client and have been
simulating clusters of Ignite nodes by running collections of Ignite
processes on the same physical machine.



As a part of this I am running 4 Ignite nodes (where each node is a windows
forms application for simplicity so you can see when it is running) that
each creates four caches: two are replicated and two and partitioned. The
partitioned caches each use an affinity map that uses a command line
argument containing the index to map the node into the affinity map (the
argument is placed into a user attribute registered with the call to start
the Ignite node). The affinity map function maps all available nodes into
the returned map meaning that partial affinity maps are returned until all
4 ignite nodes have initialised.



I have a batch file that runs all four nodes simultaneously. Typically,
only one to three of the windows apps are observed to appear, and often one
of the processes is absent from the task monitor in Windows. If one of the
running apps is closed the missing service becomes available. From my
logging I often see only one Ignite process reporting calls into
IAffinityFunction::AssignPartitions
that is typically called only twice, when I expected it to be called four
times as each new node comes on line.



On rare occasions all four nodes will correctly initialise, but this is an
exception.



I don’t have a small reproducer case yet but I wanted to first ask if this
deadlock or race condition behaviour is known in Ignite with respect to
partitioned caches with affinity maps.



Thanks,

Raymond.

RE: Possible race condition when starting up Ignite nodes taking part in partition affinity map

Posted by Raymond Wilson <ra...@trimble.com>.
Hi Val,

I don't have any tests, sorry. I have a test app that renders thematic
tiles from spatial data held in Ignite. It works by asking certain Ignite
nodes to perform processing across a partitioned cache with affinity
mapping across a set of data stored on 4 nodes, the results of which are
sent as (possibly many) messages directly to another node which issued the
request for the data, which then renders each piece of information as it
comes in. The volume of data is not large; all the ignite nodes consume
around 1.6Gb of memory on a computer with 16Gb RAM and 8 cores. There is
no SQL DB in this system, all the cache entries are persisted versions of
objects with significant internal structure represented as .Net
MemoryStream instances in the cache to minimize
serialization/deserialization overhead.

I'm not using the Ignite map reduce style queries as I could not see how
to send partial results from a node to the caller other than by using the
Ignite message fabric (ie: have a node send the results for its share of
the processing back in multiple parcels). Each message contains several Kb
of data, and there may be thousands of such messages relayed to the
rendering node.

It 'feels' slower, but I don't have any hard performance numbers (and have
not had the time to invest in side by side comparisons). When 'zoomed in'
so that relatively little data is being requested and rendered, the v1.9
client is very snappy, but the v2.0 client has small but noticeable
delays, it just doesn't feel as responsive.

Thanks,
Raymond.

-----Original Message-----
From: vkulichenko [mailto:valentin.kulichenko@gmail.com]
Sent: Thursday, July 6, 2017 8:37 AM
To: user@ignite.apache.org
Subject: RE: Possible race condition when starting up Ignite nodes taking
part in partition affinity map

Hi Raymond,

What exactly is slower? Can you provide any tests?

-Val



--
View this message in context:
http://apache-ignite-users.70518.x6.nabble.com/Possible-race-condition-whe
n-starting-up-Ignite-nodes-taking-part-in-partition-affinity-map-tp14293p1
4349.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

RE: Possible race condition when starting up Ignite nodes taking part in partition affinity map

Posted by vkulichenko <va...@gmail.com>.
Hi Raymond,

Got it. Feel free to let us know in case you have more information. I would
great to have more insight and improve performance if needed.

-Val



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Possible-race-condition-when-starting-up-Ignite-nodes-taking-part-in-partition-affinity-map-tp14293p14357.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

RE: Possible race condition when starting up Ignite nodes taking part in partition affinity map

Posted by vkulichenko <va...@gmail.com>.
Hi Raymond,

What exactly is slower? Can you provide any tests?

-Val



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Possible-race-condition-when-starting-up-Ignite-nodes-taking-part-in-partition-affinity-map-tp14293p14349.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

RE: Possible race condition when starting up Ignite nodes taking part in partition affinity map

Posted by Raymond Wilson <ra...@trimble.com>.
I have upgraded to 2.0 and things seem much better now (though with some
other oddities I am chasing down). I may also have been mixing release and
debug builds (fun with batch files) which probably was not helping.



Just anecdotally, 2.0 feels slower than 1.9 in terms of the speed of
responses to my client test app with the same data set.





*From:* Raymond Wilson [mailto:raymond_wilson@trimble.com]
*Sent:* Tuesday, July 4, 2017 10:00 AM
*To:* user@ignite.apache.org
*Subject:* Possible race condition when starting up Ignite nodes taking
part in partition affinity map



Hi,



I have been working on a POC using the Ignite v1.9 C# client and have been
simulating clusters of Ignite nodes by running collections of Ignite
processes on the same physical machine.



As a part of this I am running 4 Ignite nodes (where each node is a windows
forms application for simplicity so you can see when it is running) that
each creates four caches: two are replicated and two and partitioned. The
partitioned caches each use an affinity map that uses a command line
argument containing the index to map the node into the affinity map (the
argument is placed into a user attribute registered with the call to start
the Ignite node). The affinity map function maps all available nodes into
the returned map meaning that partial affinity maps are returned until all
4 ignite nodes have initialised.



I have a batch file that runs all four nodes simultaneously. Typically,
only one to three of the windows apps are observed to appear, and often one
of the processes is absent from the task monitor in Windows. If one of the
running apps is closed the missing service becomes available. From my
logging I often see only one Ignite process reporting calls into
IAffinityFunction::AssignPartitions
that is typically called only twice, when I expected it to be called four
times as each new node comes on line.



On rare occasions all four nodes will correctly initialise, but this is an
exception.



I don’t have a small reproducer case yet but I wanted to first ask if this
deadlock or race condition behaviour is known in Ignite with respect to
partitioned caches with affinity maps.



Thanks,

Raymond.