You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@ignite.apache.org by "anoorag.saxena" <an...@gmail.com> on 2017/03/15 22:45:28 UTC

Understanding Cache Key, Indexes, Partition and Affinity

Hello, 

I am new to Apache Ignite and come from a Data Warehousing background. So
pardon if I try to relate to Ignite through DBMS jargon. I already went
through some of the posts on the forum but I am still unclear about some of
the basics. 

1.) CacheMode=PARTITIONED 
     - When I just declare a cache as partitioned, I understand that data is
equally distributed across all 
     nodes. Is there an option to provide a "partition key" based on which
the data would be distributed 
     across the nodes (in which case, the skewness of distribution would
depend on choice of Partition Key? 
      
     - Without an Affinity Key, when I load data (using loadCache()) into a
partitioned cache, will all source 
       rows be sent to all nodes on my cluster? 

2.) Affinity 
     I understand that the concept of affinity is to use a key that
distinctly identifies the node on which the 
     data may reside. 

     - When I partition the cache and define an affinity key, is the data
partitioned based 
     on the Affinity Key itself? If not, how does affinity differ from
partitioning? 
      
     - With an Affinity Key defined, when I load data (using loadCache())
into a partitioned cache, will the 
       source rows be sent to the node they belong to or all the nodes on my
cluster? 

3.) When I create an index for a cache...does it distribute the data
automatically (without defining any 
      Affinity Key or Partition Key) ? 

SCNEARIO DESCRIPTION 
----------------------------- 
I want to load data from a persistent layer into a Staging Cache (assume
~2B) using loadCache(). 
The cache resides on a 4 node cluster. 
a.) Can I load data in such a way that each node has to process only 0.5B
records? 
     Is that using Partitioned Cache mode and defining an Affinity Key? 

Then I want to read transactions from the Staging Cache in TRANSACTIONAL
atomicity mode, lookup a Target Cache and do some operations. 
b.) When I do the lookup on Target Cache, how can I ensure that the lookup
is happening only on the node where the data resides and not do lookup on
all the nodes on which Target Cache resides? 
Would that be using the Affinity Key? If yes, how? 

c.) Lets say I wanted to do a lookup on a key other than Affinity Key
column, can creating an index on the lookup column help? Would I end up
scanning all nodes in that case? 

Staging Cache 
CustomerID 
CustomerEmail 
CustomerPhone       

Target Cache 
Seq_Num 
CustomerID 
CustomerEmail 
CustomerPhone 
StartDate 
EndDate 




--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Understanding-Cache-Key-Indexes-Partition-and-Affinity-tp11212.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Re: Understanding Cache Key, Indexes, Partition and Affinity

Posted by "anoorag.saxena" <an...@gmail.com>.
Thank you so much for explaining the concepts, Andrew.
Much appreciated.



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Understanding-Cache-Key-Indexes-Partition-and-Affinity-tp11212p11289.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Re: Understanding Cache Key, Indexes, Partition and Affinity

Posted by Andrey Mashenkov <an...@gmail.com>.
Hi,

1. Ignite is key-value cache and each cache entry has key and value.

2. Index resides on same node with the data. Actually, Ignite has local
indices on nodes and there no distributed index.
So, if you run Select query for unique key, you will end up with query
being executed on all cache nodes.
It should be fixed in future releases [1].

c. Yes. But you can hint Ignite to make it in optimal way.
Either disable distributeJoins for query, that makes ignite to run queries
locally on every node and then merge the results.
Or run local query on affinity node for each key (CustomerID).

BTW, transactions are not supported by SQL Grid due to indices are not
transactional.


[1] https://issues.apache.org/jira/browse/IGNITE-4509


On Thu, Mar 16, 2017 at 11:45 PM, anoorag.saxena <an...@gmail.com>
wrote:

> Thank you for the quick turnaround, Andrew!
>
> I have some follow-up questions:
>
> 1.) //AffinityKey should be a part of entry key and if AffinityKey mapping
> is configured then AffinityKey will be used instead of entry key for
> entry->partition mapping //
>
> What is meant by Entry Key in this context?
>
>
> 3.) //Indexes always resides on same node with the data.//
>     That is my understanding too. Someone explained to me that if I create
> an Index on a column in the Cache, the column (Index Key) is also used as
> the partition or affinity key. Is that statement true?
>
>
>
>
> c.) //c.) Lets say I wanted to do a lookup on a key other than Affinity Key
> column, can creating an index on the lookup column help? Would I end up
> scanning all nodes in that case?
>
> Staging Cache
> CustomerID
> CustomerEmail
> CustomerPhone
>
> Target Cache
> Seq_Num
> CustomerID
> CustomerEmail
> CustomerPhone
> StartDate
> EndDate
> //
>
>
> In the example above where I am using a Staging Cache with 3 columns and a
> Target Cache with 6 columns, lets say CustomerID is my Affinity Key and I
> have an Index on CustomerID column in both the caches.
>
> Now if I process records from Staging Cache one by one
> (atomicityMode='TRANSACTIONAL') and compare whether CustomerEmail exists
> in
> Target Cache or not, will my application scan the Target Cache on all the
> nodes?
>
>
>
>
>
> --
> View this message in context: http://apache-ignite-users.
> 70518.x6.nabble.com/Understanding-Cache-Key-Indexes-Partition-and-
> Affinity-tp11212p11253.html
> Sent from the Apache Ignite Users mailing list archive at Nabble.com.
>



-- 
Best regards,
Andrey V. Mashenkov

Re: Understanding Cache Key, Indexes, Partition and Affinity

Posted by "anoorag.saxena" <an...@gmail.com>.
Thank you for the quick turnaround, Andrew!

I have some follow-up questions:

1.) //AffinityKey should be a part of entry key and if AffinityKey mapping
is configured then AffinityKey will be used instead of entry key for
entry->partition mapping //

What is meant by Entry Key in this context?


3.) //Indexes always resides on same node with the data.//
    That is my understanding too. Someone explained to me that if I create
an Index on a column in the Cache, the column (Index Key) is also used as
the partition or affinity key. Is that statement true?




c.) //c.) Lets say I wanted to do a lookup on a key other than Affinity Key
column, can creating an index on the lookup column help? Would I end up
scanning all nodes in that case?

Staging Cache
CustomerID
CustomerEmail
CustomerPhone

Target Cache
Seq_Num
CustomerID
CustomerEmail
CustomerPhone
StartDate
EndDate
//


In the example above where I am using a Staging Cache with 3 columns and a
Target Cache with 6 columns, lets say CustomerID is my Affinity Key and I
have an Index on CustomerID column in both the caches.

Now if I process records from Staging Cache one by one
(atomicityMode='TRANSACTIONAL') and compare whether CustomerEmail exists in
Target Cache or not, will my application scan the Target Cache on all the
nodes?





--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Understanding-Cache-Key-Indexes-Partition-and-Affinity-tp11212p11253.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Re: Understanding Cache Key, Indexes, Partition and Affinity

Posted by Andrey Mashenkov <an...@gmail.com>.
Hi,

1. Ignite uses AffinityFunction [1] for data distribution. AF implements
two mappings: key->partition and partition->node.
Key->Partition mapping is definitely map entry to partition. It doesn't
bother of backups, but data collocation\distribution over partitions.
Usually, entry key (actually it's hashcode) is used to calculate partition
entry belongs to.
But you can use AffinityKey [2] that would be use instead to manage data
collocation. See also '*org.apache.ignite.cache.affinity.AffinityKey'*
 javadoc.

Partition->Node mapping determines primary and backup nodes for partition.
It doesn't bother of data collocation, but backups and partition
distribution among nodes

*Cache.loadCache* just makes all nodes to call *localLoadCache* method.
Which calls *CacheStore.loadCache. *So, each of grid nodes will load all
the data from cache store and then discard data that is not local for the
node.

2. Same data may resides on several nodes if you use a backups. AffinityKey
should be a part of entry key and if AffinityKey mapping is configured then
AffinityKey will be used instead of entry key for entry->partition mapping
and AffinityKey will be passed to AffinityFunction.

3. Indexes always resides on same node with the data.

a. To achieve this you should implement* CacheStore.loadCache* method to
load data for certain partitions. E.g. you can store partitionID for each
row in database.
However, if you change AF or partitions numbers you should update
partitionID for entries in database as well.

The other way. If it is posible, you can load all the data in single node
and then add other nodes to the grid. Data will rebalanced over nodes
automatically.

b. AffinityKey is always used if it is as it shoud be part of entry key.
So, lookup will always be happening on the node where the data resides.

c. I can't understand the question. Would you please clarify if it still is
actual?



[1] https://apacheignite.readme.io/docs/affinity-collocation#
affinity-function
[2] https://apacheignite.readme.io/docs/affinity-collocation#affinity-key-
mapper


On Thu, Mar 16, 2017 at 1:45 AM, anoorag.saxena <an...@gmail.com>
wrote:

> Hello,
>
> I am new to Apache Ignite and come from a Data Warehousing background. So
> pardon if I try to relate to Ignite through DBMS jargon. I already went
> through some of the posts on the forum but I am still unclear about some of
> the basics.
>
> 1.) CacheMode=PARTITIONED
>      - When I just declare a cache as partitioned, I understand that data
> is
> equally distributed across all
>      nodes. Is there an option to provide a "partition key" based on which
> the data would be distributed
>      across the nodes (in which case, the skewness of distribution would
> depend on choice of Partition Key?
>
>      - Without an Affinity Key, when I load data (using loadCache()) into a
> partitioned cache, will all source
>        rows be sent to all nodes on my cluster?
>
> 2.) Affinity
>      I understand that the concept of affinity is to use a key that
> distinctly identifies the node on which the
>      data may reside.
>
>      - When I partition the cache and define an affinity key, is the data
> partitioned based
>      on the Affinity Key itself? If not, how does affinity differ from
> partitioning?
>
>      - With an Affinity Key defined, when I load data (using loadCache())
> into a partitioned cache, will the
>        source rows be sent to the node they belong to or all the nodes on
> my
> cluster?
>
> 3.) When I create an index for a cache...does it distribute the data
> automatically (without defining any
>       Affinity Key or Partition Key) ?
>
> SCNEARIO DESCRIPTION
> -----------------------------
> I want to load data from a persistent layer into a Staging Cache (assume
> ~2B) using loadCache().
> The cache resides on a 4 node cluster.
> a.) Can I load data in such a way that each node has to process only 0.5B
> records?
>      Is that using Partitioned Cache mode and defining an Affinity Key?
>
> Then I want to read transactions from the Staging Cache in TRANSACTIONAL
> atomicity mode, lookup a Target Cache and do some operations.
> b.) When I do the lookup on Target Cache, how can I ensure that the lookup
> is happening only on the node where the data resides and not do lookup on
> all the nodes on which Target Cache resides?
> Would that be using the Affinity Key? If yes, how?
>
> c.) Lets say I wanted to do a lookup on a key other than Affinity Key
> column, can creating an index on the lookup column help? Would I end up
> scanning all nodes in that case?
>
> Staging Cache
> CustomerID
> CustomerEmail
> CustomerPhone
>
> Target Cache
> Seq_Num
> CustomerID
> CustomerEmail
> CustomerPhone
> StartDate
> EndDate
>
>
>
>
> --
> View this message in context: http://apache-ignite-users.
> 70518.x6.nabble.com/Understanding-Cache-Key-Indexes-Partition-and-
> Affinity-tp11212.html
> Sent from the Apache Ignite Users mailing list archive at Nabble.com.
>



-- 
Best regards,
Andrey V. Mashenkov