You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@ignite.apache.org by Raymond Wilson <ra...@trimble.com> on 2018/04/15 07:55:12 UTC

Efficiently determining if cache keys belong to the local server node

I have a type of query that asks for potentially large numbers of
information elements to be computed. Each element has an affinity key that
maps it to a server node through an IAffinityFunction.



The way the question is asked means that a single query broadcast to the
compute projection (owning the cache containing the source data for the
request) contains the identities of all the pieces of information needed to
be processed.



Each server node then scans the elements requested and identifies which
ones are its responsibility according to the affinity key.



Calculating the partition ID from the affinity key is simple (I have an
affinity function set up and supplied to the cache configuration, or I
could use IAffinity.GetPartition()), so the question became: How do I know
the server node executing the query is responsible for that partition, and
so should process this element? IE: I need to derive the vector of primary
or backup  partitions that this node is responsible for.



I can query the partition map and return it, like this:



        ICacheAffinity affinity = Cache.Ignite.GetAffinity(Cache.Name);

        public Dictionary<int, bool> primaryPartitions =
affinity.GetPrimaryPartitions(Cache.Ignite.GetCluster().GetLocalNode()).ToDictionary(k
=> k, v => true);



This lets me do a dictionary lookup, but its less efficient that having a
complete partition map with simple array lookup semantics, like this:



            ICacheAffinity affinity = Cache.Ignite.GetAffinity(Cache.Name);

            bool[] partitionMap = new bool[affinity.Partitions];



            foreach (int partition in
affinity.GetBackupPartitions(Cache.Ignite.GetCluster().GetLocalNode()))

                partitionMap[partition] = true;



This is a nice lookup for the query to determine which elements are its
responsibility from the overall request.



I’m not sure of the performance profile of this approach if I end up doing
it a lot, so I’m considering caching this lookup and invalidate it if any
event occurs that could modify the key -> partition map.



Questions:



   1. How big is the penalty when determining the full partition map like
   this?
   2. If I decide to invalidate the cached map, what are all the events I’d
   need to listen to?
      1. Rebalancing events?:I found CacheRebalancingEvent, but I’m not
      sure if this gives visibility to the points in time when a rebalanced
      partition becomes active on the new node and so the partition map changes
      2. Topology change events? (eg: adding a new backup node without
      rebalancing (if that is a thing) I looked for an event like that but have
      not found it so far, though I do know the affinity function can
respond to
      this via AssignPartitions()
   3. How do I provide my own affinity key mapper to for keys to partition
   IDs, but allow Ignite to map the partitions to nodes. The IAffinityFunction
   implementation requires both steps to be implemented. I’d prefer not to
   have the partition -> server mapping responsibility as this requires
   persistent configuration on the nodes to ensure stable mapping.



Thanks,

Raymond.

Re: Efficiently determining if cache keys belong to the localservernode

Posted by Stanislav Lukyanov <st...@gmail.com>.
 > How does Ignite ensure requests consistency during rebalancing?
Basically, sorts of locking and some retry-on-failure handling. Ignite will
make sure that all operations on a partition are finished before
moving/evicting it, and will try to switch to another node if the first
node that it tried to access has failed.
The difference with compute grid APIs (affinityCall/broadcast/execute) is
that they're not locking any caches themselves. affinityCall uses a cache
key to determine where to execute, but it doesn't use that key after that.
Compare it with IgniteCache.Invoke() which does kind of the same thing as
affinityCall, but with semantics of "acquiring" a cache key it is called on
- and therefore being atomic or transactional in relation to other
operations on the same cache.

Stan

On Tue, Apr 17, 2018 at 10:25 PM, Raymond Wilson <raymond_wilson@trimble.com
> wrote:

> Agree on the idempotent comments. Many of the requests are aggregative
> summarisations so there’ll need to be some additional tracking to detect
> double computation and missed computation in these cases.
>
>
>
> I understand that Ignite grids respond to requests during rebalancing
> operations where partitions may move between nodes over significant time
> periods. How does Ignite ensure requests consistency during rebalancing?
>
>
>
> *From:* Stanislav Lukyanov [mailto:stanlukyanov@gmail.com]
> *Sent:* Wednesday, April 18, 2018 12:26 AM
>
> *To:* user@ignite.apache.org
> *Subject:* RE: Efficiently determining if cache keys belong to the
> localservernode
>
>
>
> > Is the failure mode of a node changing primality for a key during an
> affinity co-located compute function handled by Ignite automatically for
> other contexts?
>
> Are you asking whether or not affinityCall() would handle that? If so,
> then no, not really – once the job is sent to a node, it is out. To handle
> that Ignite would need to be able to stop the job, revert its changes and
> restart it on another node – which is not possible in general, of course.
>
>
>
> > Is there an event or similar facility to hook into to gain a
> notification that this has occurred (and so re-run the computation to
> ensure the correct result)?
>
> You could listen to EVT_NODE_LEFT, EVT_NODE_FAILED and EVT_NODE_JOINED to
> track topology changes, but it seems rather complex and fragile to me
>
> Instead I would try to make the computations idempotent (i.e. to make sure
> that processing the same key on two nodes doesn’t lead to inconsistency),
> and track which keys were processed to be able to restart the computation
> on the unprocessed ones (if any).
>
>
>
> Stan
>
>
>
> *From: *Raymond Wilson <ra...@trimble.com>
> *Sent: *17 апреля 2018 г. 14:01
> *To: *user@ignite.apache.org
> *Subject: *RE: Efficiently determining if cache keys belong to the
> localservernode
>
>
>
> Hi Stan
>
>
>
> Thanks for the additional pointers.
>
>
>
> Is the failure mode of a node changing primality for a key during an
> affinity co-located compute function handled by Ignite automatically for
> other contexts? Is there an event or similar facility to hook into to gain
> a notification that this has occurred (and so re-run the computation to
> ensure the correct result)?
>
>
>
> Thanks,
>
> Raymond.
>
>
>
>
>
> *From:* Stanislav Lukyanov [mailto:stanlukyanov@gmail.com]
> *Sent:* Tuesday, April 17, 2018 10:42 PM
> *To:* user@ignite.apache.org
> *Subject:* RE: Efficiently determining if cache keys belong to the local
> servernode
>
>
>
> Hi Raymond,
>
>
>
> OK, I see, batching the requests makes sense.
>
> Have you looked at the ICacheAffinity interface? It provides a way to
> query Ignite about the key-to-node mappings,
>
> without dealing with partitions yourself.
>
> The call
>
>     ignite.GetAffinity(“cache”).MapKeysToNodes(keys)
>
> is suitable to split the request into batches on the client side.
>
> The call
>
>     ignite.GetAffinity(“cache”).IsPrimary(key, ignite.GetCluster().
> GetLocalNode())
>
> is suitable to determine if a the current node is primary for the key.
>
>
>
> This way you don’t need to cache affinity mappings – you just always use
> the current mappings of the node.
>
> However, you still need to make sure you can handle affinity mappings
> changing while your jobs are running.
>
> One can imagine situations when two nodes process the same key (because
> both were primary at different times),
>
> or no nodes processed a key (e.g. because a new node has joined, became
> primary for the key but didn’t receive the broadcast).
>
>
>
> Thanks,
>
> Stan
>
>
>
> *From: *Raymond Wilson <ra...@trimble.com>
> *Sent: *16 апреля 2018 г. 23:36
> *To: *user@ignite.apache.org
> *Subject: *RE: Efficiently determining if cache keys belong to the local
> servernode
>
>
>
> Hi Stan,
>
>
>
> Your understanding is correct.
>
>
>
> I'm aware of the AffinityRun and AffinityCall methods, and their simple key
>
> limitation.
>
>
>
> My use case may require 100,000 or more elements of information to be
>
> processed, so I don't want to call AffinityRun/Call that often. Each of
>
> these elements is identified by a key that is very efficiently encoded into
>
> the request (at the ~1 bit per key  level)
>
>
>
> Further, each of those elements identifies work units that in themselves
>
> could have 100,000 or more different elements to be processed.
>
>
>
> One approach would be to explicitly break up the request into smaller ones,
>
> each targeted at a server node. But that requires the requestor to have
>
> intimate knowledge of the composition of the grid resources deployed, which
>
> is not desirable.
>
>
>
> The approach I'm looking into here is to have each server node receive the
>
> same request via Cluster.Broadcast(), and for those nodes to determine
> which
>
> elements in the overall request via the Key -> Partition affinity mapping.
>
> The mapping itself is very efficient, and as I noted in my original post
>
> determining the partition -> node map seems simple enough to do.
>
>
>
> I'm unsure of the performance of requesting that mapping for every request,
>
> versus caching it and adding watchers for rebalancing and topology change
>
> events to invalidate that cache mapping as needed (and how to wire those
>
> up).
>
>
>
> Thanks,
>
> Raymond.
>
>
>
> -----Original Message-----
>
> From: Stanislav Lukyanov [mailto:stanlukyanov@gmail.com
> <st...@gmail.com>]
>
> Sent: Tuesday, April 17, 2018 12:02 AM
>
> To: user@ignite.apache.org
>
> Subject: RE: Efficiently determining if cache keys belong to the local
>
> server node
>
>
>
> // Bcc’ing off dev@ignite list for now as it seems to be rather a
> user-space
>
> discussion.
>
>
>
> Hi,
>
>
>
> Let me take a step back first. It seems a bit like an XY problem
>
> (https://meta.stackexchange.com/questions/66377/what-is-the-xy-problem),
>
> so I’d like to clarify the goals before diving into your current solution.
>
>
>
> AFAIU you want to process certain entries in your cache locally on the
>
> server that caches these entries. Is that correct?
>
> Have you looked at affinityRun and affinityCall
>
> (https://apacheignite.readme.io/docs/collocate-compute-and-data)? If yes,
>
> why they don’t work for you?
>
> One limitation with these methods is that they accept a single key to
>
> process. Can you process your keys one by one, or do you need to access
>
> multiple keys at once?
>
>
>
> Thanks,
>
> Stan
>
>
>
> From: Raymond Wilson
>
> Sent: 15 апреля 2018 г. 10:55
>
> To: user@ignite.apache.org
>
> Cc: dev@ignite.apache.org
>
> Subject: Efficiently determining if cache keys belong to the local server
>
> node
>
>
>
> I have a type of query that asks for potentially large numbers of
>
> information elements to be computed. Each element has an affinity key that
>
> maps it to a server node through an IAffinityFunction.
>
>
>
>
>
>
>
> The way the question is asked means that a single query broadcast to the
>
> compute projection (owning the cache containing the source data for the
>
> request) contains the identities of all the pieces of information needed to
>
> be processed.
>
>
>
>
>
>
>
> Each server node then scans the elements requested and identifies which
> ones
>
> are its responsibility according to the affinity key.
>
>
>
>
>
>
>
> Calculating the partition ID from the affinity key is simple (I have an
>
> affinity function set up and supplied to the cache configuration, or I
> could
>
> use IAffinity.GetPartition()), so the question became: How do I know the
>
> server node executing the query is responsible for that partition, and so
>
> should process this element? IE: I need to derive the vector of primary or
>
> backup  partitions that this node is responsible for.
>
>
>
>
>
>
>
> I can query the partition map and return it, like this:
>
>
>
>
>
>
>
>         ICacheAffinity affinity = Cache.Ignite.GetAffinity(Cache.Name);
>
>
>
>         public Dictionary<int, bool> primaryPartitions =
>
> affinity.GetPrimaryPartitions(Cache.Ignite.GetCluster().
> GetLocalNode()).ToDictionary(k
>
> => k, v => true);
>
>
>
>
>
>
>
> This lets me do a dictionary lookup, but its less efficient that having a
>
> complete partition map with simple array lookup semantics, like this:
>
>
>
>
>
>
>
>             ICacheAffinity affinity = Cache.Ignite.GetAffinity(
> Cache.Name);
>
>
>
>             bool[] partitionMap = new bool[affinity.Partitions];
>
>
>
>
>
>
>
>             foreach (int partition in
>
> affinity.GetBackupPartitions(Cache.Ignite.GetCluster().GetLocalNode()))
>
>
>
>                 partitionMap[partition] = true;
>
>
>
>
>
>
>
> This is a nice lookup for the query to determine which elements are its
>
> responsibility from the overall request.
>
>
>
>
>
>
>
> I’m not sure of the performance profile of this approach if I end up doing
>
> it a lot, so I’m considering caching this lookup and invalidate it if any
>
> event occurs that could modify the key -> partition map.
>
>
>
>
>
>
>
> Questions:
>
>
>
>
>
>
>
>    1. How big is the penalty when determining the full partition map like
>
>    this?
>
>    2. If I decide to invalidate the cached map, what are all the events I’d
>
>    need to listen to?
>
>       1. Rebalancing events?:I found CacheRebalancingEvent, but I’m not
>
>       sure if this gives visibility to the points in time when a rebalanced
>
>       partition becomes active on the new node and so the partition map
>
> changes
>
>       2. Topology change events? (eg: adding a new backup node without
>
>       rebalancing (if that is a thing) I looked for an event like that but
>
> have
>
>       not found it so far, though I do know the affinity function can
>
> respond to
>
>       this via AssignPartitions()
>
>    3. How do I provide my own affinity key mapper to for keys to partition
>
>    IDs, but allow Ignite to map the partitions to nodes. The
>
> IAffinityFunction
>
>    implementation requires both steps to be implemented. I’d prefer not to
>
>    have the partition -> server mapping responsibility as this requires
>
>    persistent configuration on the nodes to ensure stable mapping.
>
>
>
>
>
>
>
> Thanks,
>
>
>
> Raymond.
>
>
>
>
>

RE: Efficiently determining if cache keys belong to the localservernode

Posted by Raymond Wilson <ra...@trimble.com>.
Agree on the idempotent comments. Many of the requests are aggregative
summarisations so there’ll need to be some additional tracking to detect
double computation and missed computation in these cases.



I understand that Ignite grids respond to requests during rebalancing
operations where partitions may move between nodes over significant time
periods. How does Ignite ensure requests consistency during rebalancing?



*From:* Stanislav Lukyanov [mailto:stanlukyanov@gmail.com]
*Sent:* Wednesday, April 18, 2018 12:26 AM
*To:* user@ignite.apache.org
*Subject:* RE: Efficiently determining if cache keys belong to the
localservernode



> Is the failure mode of a node changing primality for a key during an
affinity co-located compute function handled by Ignite automatically for
other contexts?

Are you asking whether or not affinityCall() would handle that? If so, then
no, not really – once the job is sent to a node, it is out. To handle that
Ignite would need to be able to stop the job, revert its changes and
restart it on another node – which is not possible in general, of course.



> Is there an event or similar facility to hook into to gain a notification
that this has occurred (and so re-run the computation to ensure the correct
result)?

You could listen to EVT_NODE_LEFT, EVT_NODE_FAILED and EVT_NODE_JOINED to
track topology changes, but it seems rather complex and fragile to me

Instead I would try to make the computations idempotent (i.e. to make sure
that processing the same key on two nodes doesn’t lead to inconsistency),
and track which keys were processed to be able to restart the computation
on the unprocessed ones (if any).



Stan



*From: *Raymond Wilson <ra...@trimble.com>
*Sent: *17 апреля 2018 г. 14:01
*To: *user@ignite.apache.org
*Subject: *RE: Efficiently determining if cache keys belong to the
localservernode



Hi Stan



Thanks for the additional pointers.



Is the failure mode of a node changing primality for a key during an
affinity co-located compute function handled by Ignite automatically for
other contexts? Is there an event or similar facility to hook into to gain
a notification that this has occurred (and so re-run the computation to
ensure the correct result)?



Thanks,

Raymond.





*From:* Stanislav Lukyanov [mailto:stanlukyanov@gmail.com]
*Sent:* Tuesday, April 17, 2018 10:42 PM
*To:* user@ignite.apache.org
*Subject:* RE: Efficiently determining if cache keys belong to the local
servernode



Hi Raymond,



OK, I see, batching the requests makes sense.

Have you looked at the ICacheAffinity interface? It provides a way to query
Ignite about the key-to-node mappings,

without dealing with partitions yourself.

The call

    ignite.GetAffinity(“cache”).MapKeysToNodes(keys)

is suitable to split the request into batches on the client side.

The call

    ignite.GetAffinity(“cache”).IsPrimary(key,
ignite.GetCluster().GetLocalNode())

is suitable to determine if a the current node is primary for the key.



This way you don’t need to cache affinity mappings – you just always use
the current mappings of the node.

However, you still need to make sure you can handle affinity mappings
changing while your jobs are running.

One can imagine situations when two nodes process the same key (because
both were primary at different times),

or no nodes processed a key (e.g. because a new node has joined, became
primary for the key but didn’t receive the broadcast).



Thanks,

Stan



*From: *Raymond Wilson <ra...@trimble.com>
*Sent: *16 апреля 2018 г. 23:36
*To: *user@ignite.apache.org
*Subject: *RE: Efficiently determining if cache keys belong to the local
servernode



Hi Stan,



Your understanding is correct.



I'm aware of the AffinityRun and AffinityCall methods, and their simple key

limitation.



My use case may require 100,000 or more elements of information to be

processed, so I don't want to call AffinityRun/Call that often. Each of

these elements is identified by a key that is very efficiently encoded into

the request (at the ~1 bit per key  level)



Further, each of those elements identifies work units that in themselves

could have 100,000 or more different elements to be processed.



One approach would be to explicitly break up the request into smaller ones,

each targeted at a server node. But that requires the requestor to have

intimate knowledge of the composition of the grid resources deployed, which

is not desirable.



The approach I'm looking into here is to have each server node receive the

same request via Cluster.Broadcast(), and for those nodes to determine which

elements in the overall request via the Key -> Partition affinity mapping.

The mapping itself is very efficient, and as I noted in my original post

determining the partition -> node map seems simple enough to do.



I'm unsure of the performance of requesting that mapping for every request,

versus caching it and adding watchers for rebalancing and topology change

events to invalidate that cache mapping as needed (and how to wire those

up).



Thanks,

Raymond.



-----Original Message-----

From: Stanislav Lukyanov [mailto:stanlukyanov@gmail.com
<st...@gmail.com>]

Sent: Tuesday, April 17, 2018 12:02 AM

To: user@ignite.apache.org

Subject: RE: Efficiently determining if cache keys belong to the local

server node



// Bcc’ing off dev@ignite list for now as it seems to be rather a user-space

discussion.



Hi,



Let me take a step back first. It seems a bit like an XY problem

(https://meta.stackexchange.com/questions/66377/what-is-the-xy-problem),

so I’d like to clarify the goals before diving into your current solution.



AFAIU you want to process certain entries in your cache locally on the

server that caches these entries. Is that correct?

Have you looked at affinityRun and affinityCall

(https://apacheignite.readme.io/docs/collocate-compute-and-data)? If yes,

why they don’t work for you?

One limitation with these methods is that they accept a single key to

process. Can you process your keys one by one, or do you need to access

multiple keys at once?



Thanks,

Stan



From: Raymond Wilson

Sent: 15 апреля 2018 г. 10:55

To: user@ignite.apache.org

Cc: dev@ignite.apache.org

Subject: Efficiently determining if cache keys belong to the local server

node



I have a type of query that asks for potentially large numbers of

information elements to be computed. Each element has an affinity key that

maps it to a server node through an IAffinityFunction.







The way the question is asked means that a single query broadcast to the

compute projection (owning the cache containing the source data for the

request) contains the identities of all the pieces of information needed to

be processed.







Each server node then scans the elements requested and identifies which ones

are its responsibility according to the affinity key.







Calculating the partition ID from the affinity key is simple (I have an

affinity function set up and supplied to the cache configuration, or I could

use IAffinity.GetPartition()), so the question became: How do I know the

server node executing the query is responsible for that partition, and so

should process this element? IE: I need to derive the vector of primary or

backup  partitions that this node is responsible for.







I can query the partition map and return it, like this:







        ICacheAffinity affinity = Cache.Ignite.GetAffinity(Cache.Name);



        public Dictionary<int, bool> primaryPartitions =

affinity.GetPrimaryPartitions(Cache.Ignite.GetCluster().GetLocalNode()).ToDictionary(k

=> k, v => true);







This lets me do a dictionary lookup, but its less efficient that having a

complete partition map with simple array lookup semantics, like this:







            ICacheAffinity affinity = Cache.Ignite.GetAffinity(Cache.Name);



            bool[] partitionMap = new bool[affinity.Partitions];







            foreach (int partition in

affinity.GetBackupPartitions(Cache.Ignite.GetCluster().GetLocalNode()))



                partitionMap[partition] = true;







This is a nice lookup for the query to determine which elements are its

responsibility from the overall request.







I’m not sure of the performance profile of this approach if I end up doing

it a lot, so I’m considering caching this lookup and invalidate it if any

event occurs that could modify the key -> partition map.







Questions:







   1. How big is the penalty when determining the full partition map like

   this?

   2. If I decide to invalidate the cached map, what are all the events I’d

   need to listen to?

      1. Rebalancing events?:I found CacheRebalancingEvent, but I’m not

      sure if this gives visibility to the points in time when a rebalanced

      partition becomes active on the new node and so the partition map

changes

      2. Topology change events? (eg: adding a new backup node without

      rebalancing (if that is a thing) I looked for an event like that but

have

      not found it so far, though I do know the affinity function can

respond to

      this via AssignPartitions()

   3. How do I provide my own affinity key mapper to for keys to partition

   IDs, but allow Ignite to map the partitions to nodes. The

IAffinityFunction

   implementation requires both steps to be implemented. I’d prefer not to

   have the partition -> server mapping responsibility as this requires

   persistent configuration on the nodes to ensure stable mapping.







Thanks,



Raymond.

RE: Efficiently determining if cache keys belong to the localservernode

Posted by Stanislav Lukyanov <st...@gmail.com>.
> Is the failure mode of a node changing primality for a key during an affinity co-located compute function handled by Ignite automatically for other contexts?
Are you asking whether or not affinityCall() would handle that? If so, then no, not really – once the job is sent to a node, it is out. To handle that Ignite would need to be able to stop the job, revert its changes and restart it on another node – which is not possible in general, of course.

> Is there an event or similar facility to hook into to gain a notification that this has occurred (and so re-run the computation to ensure the correct result)?
You could listen to EVT_NODE_LEFT, EVT_NODE_FAILED and EVT_NODE_JOINED to track topology changes, but it seems rather complex and fragile to me
Instead I would try to make the computations idempotent (i.e. to make sure that processing the same key on two nodes doesn’t lead to inconsistency), and track which keys were processed to be able to restart the computation on the unprocessed ones (if any).

Stan 

From: Raymond Wilson
Sent: 17 апреля 2018 г. 14:01
To: user@ignite.apache.org
Subject: RE: Efficiently determining if cache keys belong to the localservernode

Hi Stan
 
Thanks for the additional pointers. 
 
Is the failure mode of a node changing primality for a key during an affinity co-located compute function handled by Ignite automatically for other contexts? Is there an event or similar facility to hook into to gain a notification that this has occurred (and so re-run the computation to ensure the correct result)?
 
Thanks,
Raymond.
 
 
From: Stanislav Lukyanov [mailto:stanlukyanov@gmail.com] 
Sent: Tuesday, April 17, 2018 10:42 PM
To: user@ignite.apache.org
Subject: RE: Efficiently determining if cache keys belong to the local servernode
 
Hi Raymond,
 
OK, I see, batching the requests makes sense.
Have you looked at the ICacheAffinity interface? It provides a way to query Ignite about the key-to-node mappings,
without dealing with partitions yourself.
The call
    ignite.GetAffinity(“cache”).MapKeysToNodes(keys)
is suitable to split the request into batches on the client side.
The call
    ignite.GetAffinity(“cache”).IsPrimary(key, ignite.GetCluster().GetLocalNode())
is suitable to determine if a the current node is primary for the key.
 
This way you don’t need to cache affinity mappings – you just always use the current mappings of the node.
However, you still need to make sure you can handle affinity mappings changing while your jobs are running.
One can imagine situations when two nodes process the same key (because both were primary at different times),
or no nodes processed a key (e.g. because a new node has joined, became primary for the key but didn’t receive the broadcast).
 
Thanks,
Stan
 
From: Raymond Wilson
Sent: 16 апреля 2018 г. 23:36
To: user@ignite.apache.org
Subject: RE: Efficiently determining if cache keys belong to the local servernode
 
Hi Stan,
 
Your understanding is correct.
 
I'm aware of the AffinityRun and AffinityCall methods, and their simple key
limitation.
 
My use case may require 100,000 or more elements of information to be
processed, so I don't want to call AffinityRun/Call that often. Each of
these elements is identified by a key that is very efficiently encoded into
the request (at the ~1 bit per key  level)
 
Further, each of those elements identifies work units that in themselves
could have 100,000 or more different elements to be processed.
 
One approach would be to explicitly break up the request into smaller ones,
each targeted at a server node. But that requires the requestor to have
intimate knowledge of the composition of the grid resources deployed, which
is not desirable.
 
The approach I'm looking into here is to have each server node receive the
same request via Cluster.Broadcast(), and for those nodes to determine which
elements in the overall request via the Key -> Partition affinity mapping.
The mapping itself is very efficient, and as I noted in my original post
determining the partition -> node map seems simple enough to do.
 
I'm unsure of the performance of requesting that mapping for every request,
versus caching it and adding watchers for rebalancing and topology change
events to invalidate that cache mapping as needed (and how to wire those
up).
 
Thanks,
Raymond.
 
-----Original Message-----
From: Stanislav Lukyanov [mailto:stanlukyanov@gmail.com]
Sent: Tuesday, April 17, 2018 12:02 AM
To: user@ignite.apache.org
Subject: RE: Efficiently determining if cache keys belong to the local
server node
 
// Bcc’ing off dev@ignite list for now as it seems to be rather a user-space
discussion.
 
Hi,
 
Let me take a step back first. It seems a bit like an XY problem
(https://meta.stackexchange.com/questions/66377/what-is-the-xy-problem),
so I’d like to clarify the goals before diving into your current solution.
 
AFAIU you want to process certain entries in your cache locally on the
server that caches these entries. Is that correct?
Have you looked at affinityRun and affinityCall
(https://apacheignite.readme.io/docs/collocate-compute-and-data)? If yes,
why they don’t work for you?
One limitation with these methods is that they accept a single key to
process. Can you process your keys one by one, or do you need to access
multiple keys at once?
 
Thanks,
Stan
 
From: Raymond Wilson
Sent: 15 апреля 2018 г. 10:55
To: user@ignite.apache.org
Cc: dev@ignite.apache.org
Subject: Efficiently determining if cache keys belong to the local server
node
 
I have a type of query that asks for potentially large numbers of
information elements to be computed. Each element has an affinity key that
maps it to a server node through an IAffinityFunction.
 
 
 
The way the question is asked means that a single query broadcast to the
compute projection (owning the cache containing the source data for the
request) contains the identities of all the pieces of information needed to
be processed.
 
 
 
Each server node then scans the elements requested and identifies which ones
are its responsibility according to the affinity key.
 
 
 
Calculating the partition ID from the affinity key is simple (I have an
affinity function set up and supplied to the cache configuration, or I could
use IAffinity.GetPartition()), so the question became: How do I know the
server node executing the query is responsible for that partition, and so
should process this element? IE: I need to derive the vector of primary or
backup  partitions that this node is responsible for.
 
 
 
I can query the partition map and return it, like this:
 
 
 
        ICacheAffinity affinity = Cache.Ignite.GetAffinity(Cache.Name);
 
        public Dictionary<int, bool> primaryPartitions =
affinity.GetPrimaryPartitions(Cache.Ignite.GetCluster().GetLocalNode()).ToDictionary(k
=> k, v => true);
 
 
 
This lets me do a dictionary lookup, but its less efficient that having a
complete partition map with simple array lookup semantics, like this:
 
 
 
            ICacheAffinity affinity = Cache.Ignite.GetAffinity(Cache.Name);
 
            bool[] partitionMap = new bool[affinity.Partitions];
 
 
 
            foreach (int partition in
affinity.GetBackupPartitions(Cache.Ignite.GetCluster().GetLocalNode()))
 
                partitionMap[partition] = true;
 
 
 
This is a nice lookup for the query to determine which elements are its
responsibility from the overall request.
 
 
 
I’m not sure of the performance profile of this approach if I end up doing
it a lot, so I’m considering caching this lookup and invalidate it if any
event occurs that could modify the key -> partition map.
 
 
 
Questions:
 
 
 
   1. How big is the penalty when determining the full partition map like
   this?
   2. If I decide to invalidate the cached map, what are all the events I’d
   need to listen to?
      1. Rebalancing events?:I found CacheRebalancingEvent, but I’m not
      sure if this gives visibility to the points in time when a rebalanced
      partition becomes active on the new node and so the partition map
changes
      2. Topology change events? (eg: adding a new backup node without
      rebalancing (if that is a thing) I looked for an event like that but
have
      not found it so far, though I do know the affinity function can
respond to
      this via AssignPartitions()
   3. How do I provide my own affinity key mapper to for keys to partition
   IDs, but allow Ignite to map the partitions to nodes. The
IAffinityFunction
   implementation requires both steps to be implemented. I’d prefer not to
   have the partition -> server mapping responsibility as this requires
   persistent configuration on the nodes to ensure stable mapping.
 
 
 
Thanks,
 
Raymond.
 


RE: Efficiently determining if cache keys belong to the local servernode

Posted by Raymond Wilson <ra...@trimble.com>.
Hi Stan



Thanks for the additional pointers.



Is the failure mode of a node changing primality for a key during an
affinity co-located compute function handled by Ignite automatically for
other contexts? Is there an event or similar facility to hook into to gain
a notification that this has occurred (and so re-run the computation to
ensure the correct result)?



Thanks,

Raymond.





*From:* Stanislav Lukyanov [mailto:stanlukyanov@gmail.com]
*Sent:* Tuesday, April 17, 2018 10:42 PM
*To:* user@ignite.apache.org
*Subject:* RE: Efficiently determining if cache keys belong to the local
servernode



Hi Raymond,



OK, I see, batching the requests makes sense.

Have you looked at the ICacheAffinity interface? It provides a way to query
Ignite about the key-to-node mappings,

without dealing with partitions yourself.

The call

    ignite.GetAffinity(“cache”).MapKeysToNodes(keys)

is suitable to split the request into batches on the client side.

The call

    ignite.GetAffinity(“cache”).IsPrimary(key,
ignite.GetCluster().GetLocalNode())

is suitable to determine if a the current node is primary for the key.



This way you don’t need to cache affinity mappings – you just always use
the current mappings of the node.

However, you still need to make sure you can handle affinity mappings
changing while your jobs are running.

One can imagine situations when two nodes process the same key (because
both were primary at different times),

or no nodes processed a key (e.g. because a new node has joined, became
primary for the key but didn’t receive the broadcast).



Thanks,

Stan



*From: *Raymond Wilson <ra...@trimble.com>
*Sent: *16 апреля 2018 г. 23:36
*To: *user@ignite.apache.org
*Subject: *RE: Efficiently determining if cache keys belong to the local
servernode



Hi Stan,



Your understanding is correct.



I'm aware of the AffinityRun and AffinityCall methods, and their simple key

limitation.



My use case may require 100,000 or more elements of information to be

processed, so I don't want to call AffinityRun/Call that often. Each of

these elements is identified by a key that is very efficiently encoded into

the request (at the ~1 bit per key  level)



Further, each of those elements identifies work units that in themselves

could have 100,000 or more different elements to be processed.



One approach would be to explicitly break up the request into smaller ones,

each targeted at a server node. But that requires the requestor to have

intimate knowledge of the composition of the grid resources deployed, which

is not desirable.



The approach I'm looking into here is to have each server node receive the

same request via Cluster.Broadcast(), and for those nodes to determine which

elements in the overall request via the Key -> Partition affinity mapping.

The mapping itself is very efficient, and as I noted in my original post

determining the partition -> node map seems simple enough to do.



I'm unsure of the performance of requesting that mapping for every request,

versus caching it and adding watchers for rebalancing and topology change

events to invalidate that cache mapping as needed (and how to wire those

up).



Thanks,

Raymond.



-----Original Message-----

From: Stanislav Lukyanov [mailto:stanlukyanov@gmail.com
<st...@gmail.com>]

Sent: Tuesday, April 17, 2018 12:02 AM

To: user@ignite.apache.org

Subject: RE: Efficiently determining if cache keys belong to the local

server node



// Bcc’ing off dev@ignite list for now as it seems to be rather a user-space

discussion.



Hi,



Let me take a step back first. It seems a bit like an XY problem

(https://meta.stackexchange.com/questions/66377/what-is-the-xy-problem),

so I’d like to clarify the goals before diving into your current solution.



AFAIU you want to process certain entries in your cache locally on the

server that caches these entries. Is that correct?

Have you looked at affinityRun and affinityCall

(https://apacheignite.readme.io/docs/collocate-compute-and-data)? If yes,

why they don’t work for you?

One limitation with these methods is that they accept a single key to

process. Can you process your keys one by one, or do you need to access

multiple keys at once?



Thanks,

Stan



From: Raymond Wilson

Sent: 15 апреля 2018 г. 10:55

To: user@ignite.apache.org

Cc: dev@ignite.apache.org

Subject: Efficiently determining if cache keys belong to the local server

node



I have a type of query that asks for potentially large numbers of

information elements to be computed. Each element has an affinity key that

maps it to a server node through an IAffinityFunction.







The way the question is asked means that a single query broadcast to the

compute projection (owning the cache containing the source data for the

request) contains the identities of all the pieces of information needed to

be processed.







Each server node then scans the elements requested and identifies which ones

are its responsibility according to the affinity key.







Calculating the partition ID from the affinity key is simple (I have an

affinity function set up and supplied to the cache configuration, or I could

use IAffinity.GetPartition()), so the question became: How do I know the

server node executing the query is responsible for that partition, and so

should process this element? IE: I need to derive the vector of primary or

backup  partitions that this node is responsible for.







I can query the partition map and return it, like this:







        ICacheAffinity affinity = Cache.Ignite.GetAffinity(Cache.Name);



        public Dictionary<int, bool> primaryPartitions =

affinity.GetPrimaryPartitions(Cache.Ignite.GetCluster().GetLocalNode()).ToDictionary(k

=> k, v => true);







This lets me do a dictionary lookup, but its less efficient that having a

complete partition map with simple array lookup semantics, like this:







            ICacheAffinity affinity = Cache.Ignite.GetAffinity(Cache.Name);



            bool[] partitionMap = new bool[affinity.Partitions];







            foreach (int partition in

affinity.GetBackupPartitions(Cache.Ignite.GetCluster().GetLocalNode()))



                partitionMap[partition] = true;







This is a nice lookup for the query to determine which elements are its

responsibility from the overall request.







I’m not sure of the performance profile of this approach if I end up doing

it a lot, so I’m considering caching this lookup and invalidate it if any

event occurs that could modify the key -> partition map.







Questions:







   1. How big is the penalty when determining the full partition map like

   this?

   2. If I decide to invalidate the cached map, what are all the events I’d

   need to listen to?

      1. Rebalancing events?:I found CacheRebalancingEvent, but I’m not

      sure if this gives visibility to the points in time when a rebalanced

      partition becomes active on the new node and so the partition map

changes

      2. Topology change events? (eg: adding a new backup node without

      rebalancing (if that is a thing) I looked for an event like that but

have

      not found it so far, though I do know the affinity function can

respond to

      this via AssignPartitions()

   3. How do I provide my own affinity key mapper to for keys to partition

   IDs, but allow Ignite to map the partitions to nodes. The

IAffinityFunction

   implementation requires both steps to be implemented. I’d prefer not to

   have the partition -> server mapping responsibility as this requires

   persistent configuration on the nodes to ensure stable mapping.







Thanks,



Raymond.

RE: Efficiently determining if cache keys belong to the local servernode

Posted by Stanislav Lukyanov <st...@gmail.com>.
Hi Raymond,

OK, I see, batching the requests makes sense.
Have you looked at the ICacheAffinity interface? It provides a way to query Ignite about the key-to-node mappings,
without dealing with partitions yourself.
The call
    ignite.GetAffinity(“cache”).MapKeysToNodes(keys)
is suitable to split the request into batches on the client side.
The call
    ignite.GetAffinity(“cache”).IsPrimary(key, ignite.GetCluster().GetLocalNode())
is suitable to determine if a the current node is primary for the key.

This way you don’t need to cache affinity mappings – you just always use the current mappings of the node.
However, you still need to make sure you can handle affinity mappings changing while your jobs are running.
One can imagine situations when two nodes process the same key (because both were primary at different times),
or no nodes processed a key (e.g. because a new node has joined, became primary for the key but didn’t receive the broadcast).

Thanks,
Stan

From: Raymond Wilson
Sent: 16 апреля 2018 г. 23:36
To: user@ignite.apache.org
Subject: RE: Efficiently determining if cache keys belong to the local servernode

Hi Stan,

Your understanding is correct.

I'm aware of the AffinityRun and AffinityCall methods, and their simple key
limitation.

My use case may require 100,000 or more elements of information to be
processed, so I don't want to call AffinityRun/Call that often. Each of
these elements is identified by a key that is very efficiently encoded into
the request (at the ~1 bit per key  level)

Further, each of those elements identifies work units that in themselves
could have 100,000 or more different elements to be processed.

One approach would be to explicitly break up the request into smaller ones,
each targeted at a server node. But that requires the requestor to have
intimate knowledge of the composition of the grid resources deployed, which
is not desirable.

The approach I'm looking into here is to have each server node receive the
same request via Cluster.Broadcast(), and for those nodes to determine which
elements in the overall request via the Key -> Partition affinity mapping.
The mapping itself is very efficient, and as I noted in my original post
determining the partition -> node map seems simple enough to do.

I'm unsure of the performance of requesting that mapping for every request,
versus caching it and adding watchers for rebalancing and topology change
events to invalidate that cache mapping as needed (and how to wire those
up).

Thanks,
Raymond.

-----Original Message-----
From: Stanislav Lukyanov [mailto:stanlukyanov@gmail.com]
Sent: Tuesday, April 17, 2018 12:02 AM
To: user@ignite.apache.org
Subject: RE: Efficiently determining if cache keys belong to the local
server node

// Bcc’ing off dev@ignite list for now as it seems to be rather a user-space
discussion.

Hi,

Let me take a step back first. It seems a bit like an XY problem
(https://meta.stackexchange.com/questions/66377/what-is-the-xy-problem),
so I’d like to clarify the goals before diving into your current solution.

AFAIU you want to process certain entries in your cache locally on the
server that caches these entries. Is that correct?
Have you looked at affinityRun and affinityCall
(https://apacheignite.readme.io/docs/collocate-compute-and-data)? If yes,
why they don’t work for you?
One limitation with these methods is that they accept a single key to
process. Can you process your keys one by one, or do you need to access
multiple keys at once?

Thanks,
Stan

From: Raymond Wilson
Sent: 15 апреля 2018 г. 10:55
To: user@ignite.apache.org
Cc: dev@ignite.apache.org
Subject: Efficiently determining if cache keys belong to the local server
node

I have a type of query that asks for potentially large numbers of
information elements to be computed. Each element has an affinity key that
maps it to a server node through an IAffinityFunction.



The way the question is asked means that a single query broadcast to the
compute projection (owning the cache containing the source data for the
request) contains the identities of all the pieces of information needed to
be processed.



Each server node then scans the elements requested and identifies which ones
are its responsibility according to the affinity key.



Calculating the partition ID from the affinity key is simple (I have an
affinity function set up and supplied to the cache configuration, or I could
use IAffinity.GetPartition()), so the question became: How do I know the
server node executing the query is responsible for that partition, and so
should process this element? IE: I need to derive the vector of primary or
backup  partitions that this node is responsible for.



I can query the partition map and return it, like this:



        ICacheAffinity affinity = Cache.Ignite.GetAffinity(Cache.Name);

        public Dictionary<int, bool> primaryPartitions =
affinity.GetPrimaryPartitions(Cache.Ignite.GetCluster().GetLocalNode()).ToDictionary(k
=> k, v => true);



This lets me do a dictionary lookup, but its less efficient that having a
complete partition map with simple array lookup semantics, like this:



            ICacheAffinity affinity = Cache.Ignite.GetAffinity(Cache.Name);

            bool[] partitionMap = new bool[affinity.Partitions];



            foreach (int partition in
affinity.GetBackupPartitions(Cache.Ignite.GetCluster().GetLocalNode()))

                partitionMap[partition] = true;



This is a nice lookup for the query to determine which elements are its
responsibility from the overall request.



I’m not sure of the performance profile of this approach if I end up doing
it a lot, so I’m considering caching this lookup and invalidate it if any
event occurs that could modify the key -> partition map.



Questions:



   1. How big is the penalty when determining the full partition map like
   this?
   2. If I decide to invalidate the cached map, what are all the events I’d
   need to listen to?
      1. Rebalancing events?:I found CacheRebalancingEvent, but I’m not
      sure if this gives visibility to the points in time when a rebalanced
      partition becomes active on the new node and so the partition map
changes
      2. Topology change events? (eg: adding a new backup node without
      rebalancing (if that is a thing) I looked for an event like that but
have
      not found it so far, though I do know the affinity function can
respond to
      this via AssignPartitions()
   3. How do I provide my own affinity key mapper to for keys to partition
   IDs, but allow Ignite to map the partitions to nodes. The
IAffinityFunction
   implementation requires both steps to be implemented. I’d prefer not to
   have the partition -> server mapping responsibility as this requires
   persistent configuration on the nodes to ensure stable mapping.



Thanks,

Raymond.


RE: Efficiently determining if cache keys belong to the local server node

Posted by Raymond Wilson <ra...@trimble.com>.
Hi Stan,

Your understanding is correct.

I'm aware of the AffinityRun and AffinityCall methods, and their simple key
limitation.

My use case may require 100,000 or more elements of information to be
processed, so I don't want to call AffinityRun/Call that often. Each of
these elements is identified by a key that is very efficiently encoded into
the request (at the ~1 bit per key  level)

Further, each of those elements identifies work units that in themselves
could have 100,000 or more different elements to be processed.

One approach would be to explicitly break up the request into smaller ones,
each targeted at a server node. But that requires the requestor to have
intimate knowledge of the composition of the grid resources deployed, which
is not desirable.

The approach I'm looking into here is to have each server node receive the
same request via Cluster.Broadcast(), and for those nodes to determine which
elements in the overall request via the Key -> Partition affinity mapping.
The mapping itself is very efficient, and as I noted in my original post
determining the partition -> node map seems simple enough to do.

I'm unsure of the performance of requesting that mapping for every request,
versus caching it and adding watchers for rebalancing and topology change
events to invalidate that cache mapping as needed (and how to wire those
up).

Thanks,
Raymond.

-----Original Message-----
From: Stanislav Lukyanov [mailto:stanlukyanov@gmail.com]
Sent: Tuesday, April 17, 2018 12:02 AM
To: user@ignite.apache.org
Subject: RE: Efficiently determining if cache keys belong to the local
server node

// Bcc’ing off dev@ignite list for now as it seems to be rather a user-space
discussion.

Hi,

Let me take a step back first. It seems a bit like an XY problem
(https://meta.stackexchange.com/questions/66377/what-is-the-xy-problem),
so I’d like to clarify the goals before diving into your current solution.

AFAIU you want to process certain entries in your cache locally on the
server that caches these entries. Is that correct?
Have you looked at affinityRun and affinityCall
(https://apacheignite.readme.io/docs/collocate-compute-and-data)? If yes,
why they don’t work for you?
One limitation with these methods is that they accept a single key to
process. Can you process your keys one by one, or do you need to access
multiple keys at once?

Thanks,
Stan

From: Raymond Wilson
Sent: 15 апреля 2018 г. 10:55
To: user@ignite.apache.org
Cc: dev@ignite.apache.org
Subject: Efficiently determining if cache keys belong to the local server
node

I have a type of query that asks for potentially large numbers of
information elements to be computed. Each element has an affinity key that
maps it to a server node through an IAffinityFunction.



The way the question is asked means that a single query broadcast to the
compute projection (owning the cache containing the source data for the
request) contains the identities of all the pieces of information needed to
be processed.



Each server node then scans the elements requested and identifies which ones
are its responsibility according to the affinity key.



Calculating the partition ID from the affinity key is simple (I have an
affinity function set up and supplied to the cache configuration, or I could
use IAffinity.GetPartition()), so the question became: How do I know the
server node executing the query is responsible for that partition, and so
should process this element? IE: I need to derive the vector of primary or
backup  partitions that this node is responsible for.



I can query the partition map and return it, like this:



        ICacheAffinity affinity = Cache.Ignite.GetAffinity(Cache.Name);

        public Dictionary<int, bool> primaryPartitions =
affinity.GetPrimaryPartitions(Cache.Ignite.GetCluster().GetLocalNode()).ToDictionary(k
=> k, v => true);



This lets me do a dictionary lookup, but its less efficient that having a
complete partition map with simple array lookup semantics, like this:



            ICacheAffinity affinity = Cache.Ignite.GetAffinity(Cache.Name);

            bool[] partitionMap = new bool[affinity.Partitions];



            foreach (int partition in
affinity.GetBackupPartitions(Cache.Ignite.GetCluster().GetLocalNode()))

                partitionMap[partition] = true;



This is a nice lookup for the query to determine which elements are its
responsibility from the overall request.



I’m not sure of the performance profile of this approach if I end up doing
it a lot, so I’m considering caching this lookup and invalidate it if any
event occurs that could modify the key -> partition map.



Questions:



   1. How big is the penalty when determining the full partition map like
   this?
   2. If I decide to invalidate the cached map, what are all the events I’d
   need to listen to?
      1. Rebalancing events?:I found CacheRebalancingEvent, but I’m not
      sure if this gives visibility to the points in time when a rebalanced
      partition becomes active on the new node and so the partition map
changes
      2. Topology change events? (eg: adding a new backup node without
      rebalancing (if that is a thing) I looked for an event like that but
have
      not found it so far, though I do know the affinity function can
respond to
      this via AssignPartitions()
   3. How do I provide my own affinity key mapper to for keys to partition
   IDs, but allow Ignite to map the partitions to nodes. The
IAffinityFunction
   implementation requires both steps to be implemented. I’d prefer not to
   have the partition -> server mapping responsibility as this requires
   persistent configuration on the nodes to ensure stable mapping.



Thanks,

Raymond.

RE: Efficiently determining if cache keys belong to the local server node

Posted by Stanislav Lukyanov <st...@gmail.com>.
// Bcc’ing off dev@ignite list for now as it seems to be rather a user-space discussion.

Hi,

Let me take a step back first. It seems a bit like an XY problem (https://meta.stackexchange.com/questions/66377/what-is-the-xy-problem),
so I’d like to clarify the goals before diving into your current solution.

AFAIU you want to process certain entries in your cache locally on the server that caches these entries. Is that correct?
Have you looked at affinityRun and affinityCall (https://apacheignite.readme.io/docs/collocate-compute-and-data)? If yes, why they don’t work for you?
One limitation with these methods is that they accept a single key to process. Can you process your keys one by one, or do you need to access multiple keys at once?

Thanks,
Stan 

From: Raymond Wilson
Sent: 15 апреля 2018 г. 10:55
To: user@ignite.apache.org
Cc: dev@ignite.apache.org
Subject: Efficiently determining if cache keys belong to the local server node

I have a type of query that asks for potentially large numbers of
information elements to be computed. Each element has an affinity key that
maps it to a server node through an IAffinityFunction.



The way the question is asked means that a single query broadcast to the
compute projection (owning the cache containing the source data for the
request) contains the identities of all the pieces of information needed to
be processed.



Each server node then scans the elements requested and identifies which
ones are its responsibility according to the affinity key.



Calculating the partition ID from the affinity key is simple (I have an
affinity function set up and supplied to the cache configuration, or I
could use IAffinity.GetPartition()), so the question became: How do I know
the server node executing the query is responsible for that partition, and
so should process this element? IE: I need to derive the vector of primary
or backup  partitions that this node is responsible for.



I can query the partition map and return it, like this:



        ICacheAffinity affinity = Cache.Ignite.GetAffinity(Cache.Name);

        public Dictionary<int, bool> primaryPartitions =
affinity.GetPrimaryPartitions(Cache.Ignite.GetCluster().GetLocalNode()).ToDictionary(k
=> k, v => true);



This lets me do a dictionary lookup, but its less efficient that having a
complete partition map with simple array lookup semantics, like this:



            ICacheAffinity affinity = Cache.Ignite.GetAffinity(Cache.Name);

            bool[] partitionMap = new bool[affinity.Partitions];



            foreach (int partition in
affinity.GetBackupPartitions(Cache.Ignite.GetCluster().GetLocalNode()))

                partitionMap[partition] = true;



This is a nice lookup for the query to determine which elements are its
responsibility from the overall request.



I’m not sure of the performance profile of this approach if I end up doing
it a lot, so I’m considering caching this lookup and invalidate it if any
event occurs that could modify the key -> partition map.



Questions:



   1. How big is the penalty when determining the full partition map like
   this?
   2. If I decide to invalidate the cached map, what are all the events I’d
   need to listen to?
      1. Rebalancing events?:I found CacheRebalancingEvent, but I’m not
      sure if this gives visibility to the points in time when a rebalanced
      partition becomes active on the new node and so the partition map changes
      2. Topology change events? (eg: adding a new backup node without
      rebalancing (if that is a thing) I looked for an event like that but have
      not found it so far, though I do know the affinity function can
respond to
      this via AssignPartitions()
   3. How do I provide my own affinity key mapper to for keys to partition
   IDs, but allow Ignite to map the partitions to nodes. The IAffinityFunction
   implementation requires both steps to be implemented. I’d prefer not to
   have the partition -> server mapping responsibility as this requires
   persistent configuration on the nodes to ensure stable mapping.



Thanks,

Raymond.


RE: Efficiently determining if cache keys belong to the local server node

Posted by Stanislav Lukyanov <st...@gmail.com>.
// Bcc’ing off dev@ignite list for now as it seems to be rather a user-space discussion.

Hi,

Let me take a step back first. It seems a bit like an XY problem (https://meta.stackexchange.com/questions/66377/what-is-the-xy-problem),
so I’d like to clarify the goals before diving into your current solution.

AFAIU you want to process certain entries in your cache locally on the server that caches these entries. Is that correct?
Have you looked at affinityRun and affinityCall (https://apacheignite.readme.io/docs/collocate-compute-and-data)? If yes, why they don’t work for you?
One limitation with these methods is that they accept a single key to process. Can you process your keys one by one, or do you need to access multiple keys at once?

Thanks,
Stan 

From: Raymond Wilson
Sent: 15 апреля 2018 г. 10:55
To: user@ignite.apache.org
Cc: dev@ignite.apache.org
Subject: Efficiently determining if cache keys belong to the local server node

I have a type of query that asks for potentially large numbers of
information elements to be computed. Each element has an affinity key that
maps it to a server node through an IAffinityFunction.



The way the question is asked means that a single query broadcast to the
compute projection (owning the cache containing the source data for the
request) contains the identities of all the pieces of information needed to
be processed.



Each server node then scans the elements requested and identifies which
ones are its responsibility according to the affinity key.



Calculating the partition ID from the affinity key is simple (I have an
affinity function set up and supplied to the cache configuration, or I
could use IAffinity.GetPartition()), so the question became: How do I know
the server node executing the query is responsible for that partition, and
so should process this element? IE: I need to derive the vector of primary
or backup  partitions that this node is responsible for.



I can query the partition map and return it, like this:



        ICacheAffinity affinity = Cache.Ignite.GetAffinity(Cache.Name);

        public Dictionary<int, bool> primaryPartitions =
affinity.GetPrimaryPartitions(Cache.Ignite.GetCluster().GetLocalNode()).ToDictionary(k
=> k, v => true);



This lets me do a dictionary lookup, but its less efficient that having a
complete partition map with simple array lookup semantics, like this:



            ICacheAffinity affinity = Cache.Ignite.GetAffinity(Cache.Name);

            bool[] partitionMap = new bool[affinity.Partitions];



            foreach (int partition in
affinity.GetBackupPartitions(Cache.Ignite.GetCluster().GetLocalNode()))

                partitionMap[partition] = true;



This is a nice lookup for the query to determine which elements are its
responsibility from the overall request.



I’m not sure of the performance profile of this approach if I end up doing
it a lot, so I’m considering caching this lookup and invalidate it if any
event occurs that could modify the key -> partition map.



Questions:



   1. How big is the penalty when determining the full partition map like
   this?
   2. If I decide to invalidate the cached map, what are all the events I’d
   need to listen to?
      1. Rebalancing events?:I found CacheRebalancingEvent, but I’m not
      sure if this gives visibility to the points in time when a rebalanced
      partition becomes active on the new node and so the partition map changes
      2. Topology change events? (eg: adding a new backup node without
      rebalancing (if that is a thing) I looked for an event like that but have
      not found it so far, though I do know the affinity function can
respond to
      this via AssignPartitions()
   3. How do I provide my own affinity key mapper to for keys to partition
   IDs, but allow Ignite to map the partitions to nodes. The IAffinityFunction
   implementation requires both steps to be implemented. I’d prefer not to
   have the partition -> server mapping responsibility as this requires
   persistent configuration on the nodes to ensure stable mapping.



Thanks,

Raymond.