You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@ignite.apache.org by Courtney Robinson <co...@hypi.io> on 2021/08/05 06:40:23 UTC

Best practices on how to approach data centre aware affinity

Hi all,
Our growth with Ignite continues and as we enter the next phase, we need to
support multi-cluster deployments for our platform.
We deploy Ignite and the rest of our stack in Kubernetes and we're in the
early stages of designing what a multi-region deployment should look like.
We are 90% SQL based when using Ignite, the other 10% includes Ignite
messaging, Queues and compute.

In our case we have thousands of tables

CREATE TABLE IF NOT EXISTS Person (
  id int,
  city_id int,
  name varchar,
  company_id varchar,
  PRIMARY KEY (id, city_id)) WITH "template=...";

In our case, most tables use a template that looks like this:

partitioned,backups=2,data_region=hypi,cache_group=hypi,write_synchronization_mode=primary_sync,affinity_key=instance_id,atomicity=ATOMIC,cache_name=Person,key_type=PersonKey,value_type=PersonValue

I'm aware of affinity co-location (
https://ignite.apache.org/docs/latest/data-modeling/affinity-collocation)
and in the past when we used the key value APIs more than SQL we also used
custom affinity a function to control placement.

What I don't know is how to best do this with SQL defined caches.
We will have at least 3 Kubernetes clusters, each in a different data
centre, let's say EU_WEST, EU_EAST, CAN0

Previously we provided environment variables that our custom affinity
function would use and we're thinking of providing the data centre name
this way.

We have 2 backups in all cases + the primary and so we want the primary in
one DC and each backup to be in a different DC.

There is no syntax in the SQL template that we could find to enables
specifying a custom affinity function.
Our instance_id column currently used has no common prefix or anything to
associate with a DC.

We're thinking of getting the cache for each table and then setting the
affinity function to replace the default RendevousAffinityFunction the way
we did before we switched to SQL.
Something like this:

repo.ctx.ignite.cache("Person").getConfiguration(org.apache.ignite.configuration.CacheConfiguration)
.setAffinity(new org.apache.ignite.cache.affinity.AffinityFunction() {
    ...
})


There are a few things unclear about this:

   1. Is this the right approach?
   2. How do we handle existing data, changing the affinity function will
   cause Ignite to not be able to find existing data right?
   3. How would you recommend implementing the affinity function to be
   aware of the data centre?
   4. Are there any other caveats we need to be thinking about?

There is a lot of existing data, we want to try to avoid a full copy/move
to new tables if possible, that will prove to be very difficult in
production.

Regards,
Courtney Robinson
Founder and CEO, Hypi
Tel: ++44 208 123 2413 (GMT+0) <https://hypi.io>

<https://hypi.io>
https://hypi.io

Re: Best practices on how to approach data centre aware affinity

Posted by Stephen Darlington <st...@gridgain.com>.
You’re right, Ignite does not provide any built-in facilities for this. There are some commercial solutions for replication, or you could write code that listens to changes and pushes them on a queue/message bus.

Whether this is worth doing depends on how reliable and fast your network is, and your RPO/RTO requirements.

> On 16 Aug 2021, at 12:30, Courtney Robinson <co...@hypi.io> wrote:
> 
> Hi Stephen,
> We've been considering those points you've raised. The challenge with having isolated clusters is how to deal with synchronisation issues. In one cluster, Ignite will handle an offline node re-joining the cluster. If there are multiple clusters we'd need to detect and replay changes from the application side effectively duplicating a part of what Ignite's doing.
> 
> Did I miss anything and if not, how would you suggest handling this in the case of multiple clusters - one in each data centre?
> 
> Regards,
> Courtney Robinson
> Founder and CEO, Hypi
> Tel: ++44 208 123 2413 (GMT+0) <https://hypi.io/>
> 
>  <https://hypi.io/>https://hypi.io <https://hypi.io/>
> 
> On Mon, Aug 16, 2021 at 10:19 AM Stephen Darlington <stephen.darlington@gridgain.com <ma...@gridgain.com>> wrote:
> A word of caution: you’re generally better replicating your data across clusters than stretching a single cluster across data centres. If the latency is very low it should work, but it could degrade your throughput and you need to be careful about split-brain and other networking issues.
> 
> Regards,
> Stephen
> 
>> On 5 Aug 2021, at 15:24, Courtney Robinson <courtney.robinson@hypi.io <ma...@hypi.io>> wrote:
>> 
>> Hi Alex,
>> Thanks for the reply. I'm glad I asked before the team went any further.
>> So we can achieve this with the built in affinity function and the backup filter. The real complexity is going to be in migrating our existing caches.
>> 
>> So to clarify the steps involved here are 
>> because Ignite registers all env. vars as node attributes we can set e.g. NODE_DC=<EU_WEST|EU_EAST|CAN0> as an environment var in each k8s cluster
>> Then set the backup filter's constructor-arg.value to be NODE_DC. This will tell Ignite that two backups cannot be placed on any two nodes with the same NODE_DC value - correct?
>> When we call create table, we must set template=myTemplateName
>> Before creating any tables, myTemplateName must be created and must include the backup filter with NODE_DC
>> Have I got that right?
>> 
>> If so, it seems simple enough. Now the real challenge is where you said the cache has to be re-created.
>> 
>> I can't see how we do this without major down time, we have functionality in place that allows customers to effectively do a "copy from table A to B and then delete A" but it will be impossible to get all of them to do this any time soon.
>> 
>> Has anyone else had to do something similar, how is the community generally doing migrations like this?
>> 
>> Side note: The only thing that comes to mind is that we will need to build a virtual catalog that we maintain so that there isn't a one to one mapping between customer tables and the actual Ignite table name.
>> So if a table is currently called A and we add a virtual catalog then we keep a mapping that says when the user wants to call "A" it should really go to table "A_v2" or something. This comes with its own challenge and a massive testing overhead.
>> 
>> Regards,
>> Courtney Robinson
>> Founder and CEO, Hypi
>> Tel: ++44 208 123 2413 (GMT+0) <https://hypi.io/>
>> 
>>  <https://hypi.io/>https://hypi.io <https://hypi.io/>
>> 
>> On Thu, Aug 5, 2021 at 11:43 AM Alex Plehanov <plehanov.alex@gmail.com <ma...@gmail.com>> wrote:
>> Hello,
>> 
>> You can create your own cache templates with the affinity function you require (currently you use a predefined "partitioned" template, which only sets cache mode to "PARTITIONED"). See [1] for more information about cache templates.
>> 
>> > Is this the right approach
>> > How do we handle existing data, changing the affinity function will cause Ignite to not be able to find existing data right?
>> You can't change cache configuration after cache creation. In your example these changes will be just ignored. The only way to change cache configuration - is to create the new cache and migrate data.
>> 
>> > How would you recommend implementing the affinity function to be aware of the data centre?
>> It's better to use the standard affinity function with a backup filter for such cases. There is one shipped with Ignite (see [2]).
>>  
>> [1]: https://ignite.apache.org/docs/latest/configuring-caches/configuration-overview#cache-templates <https://ignite.apache.org/docs/latest/configuring-caches/configuration-overview#cache-templates>
>> [2]: https://ignite.apache.org/releases/latest/javadoc/org/apache/ignite/cache/affinity/rendezvous/ClusterNodeAttributeAffinityBackupFilter.html <https://ignite.apache.org/releases/latest/javadoc/org/apache/ignite/cache/affinity/rendezvous/ClusterNodeAttributeAffinityBackupFilter.html>
>> чт, 5 авг. 2021 г. в 09:40, Courtney Robinson <courtney.robinson@hypi.io <ma...@hypi.io>>:
>> Hi all,
>> Our growth with Ignite continues and as we enter the next phase, we need to support multi-cluster deployments for our platform.
>> We deploy Ignite and the rest of our stack in Kubernetes and we're in the early stages of designing what a multi-region deployment should look like.
>> We are 90% SQL based when using Ignite, the other 10% includes Ignite messaging, Queues and compute.
>> 
>> In our case we have thousands of tables
>> CREATE TABLE IF NOT EXISTS Person (
>>   id int,
>>   city_id int,
>>   name varchar,
>>   company_id varchar,
>>   PRIMARY KEY (id, city_id)
>> ) WITH "template=...";
>> In our case, most tables use a template that looks like this:
>> 
>> partitioned,backups=2,data_region=hypi,cache_group=hypi,write_synchronization_mode=primary_sync,affinity_key=instance_id,atomicity=ATOMIC,cache_name=Person,key_type=PersonKey,value_type=PersonValue
>> 
>> I'm aware of affinity co-location (https://ignite.apache.org/docs/latest/data-modeling/affinity-collocation <https://ignite.apache.org/docs/latest/data-modeling/affinity-collocation>) and in the past when we used the key value APIs more than SQL we also used custom affinity a function to control placement.
>> 
>> What I don't know is how to best do this with SQL defined caches.
>> We will have at least 3 Kubernetes clusters, each in a different data centre, let's say EU_WEST, EU_EAST, CAN0
>> 
>> Previously we provided environment variables that our custom affinity function would use and we're thinking of providing the data centre name this way.
>> 
>> We have 2 backups in all cases + the primary and so we want the primary in one DC and each backup to be in a different DC.
>> 
>> There is no syntax in the SQL template that we could find to enables specifying a custom affinity function.
>> Our instance_id column currently used has no common prefix or anything to associate with a DC.
>> 
>> We're thinking of getting the cache for each table and then setting the affinity function to replace the default RendevousAffinityFunction the way we did before we switched to SQL.
>> Something like this:
>> repo.ctx.ignite.cache("Person").getConfiguration(org.apache.ignite.configuration.CacheConfiguration)
>> .setAffinity(new org.apache.ignite.cache.affinity.AffinityFunction() {
>>     ...
>> })
>> 
>> There are a few things unclear about this:
>> Is this the right approach?
>> How do we handle existing data, changing the affinity function will cause Ignite to not be able to find existing data right?
>> How would you recommend implementing the affinity function to be aware of the data centre?
>> Are there any other caveats we need to be thinking about?
>> There is a lot of existing data, we want to try to avoid a full copy/move to new tables if possible, that will prove to be very difficult in production.
>> 
>> Regards,
>> Courtney Robinson
>> Founder and CEO, Hypi
>> Tel: ++44 208 123 2413 (GMT+0) <https://hypi.io/>
>> 
>>  <https://hypi.io/>https://hypi.io <https://hypi.io/>
> 



Re: Best practices on how to approach data centre aware affinity

Posted by Courtney Robinson <co...@hypi.io>.
Hi Stephen,
We've been considering those points you've raised. The challenge with
having isolated clusters is how to deal with synchronisation issues. In one
cluster, Ignite will handle an offline node re-joining the cluster. If
there are multiple clusters we'd need to detect and replay changes from the
application side effectively duplicating a part of what Ignite's doing.

Did I miss anything and if not, how would you suggest handling this in the
case of multiple clusters - one in each data centre?

Regards,
Courtney Robinson
Founder and CEO, Hypi
Tel: ++44 208 123 2413 (GMT+0) <https://hypi.io>

<https://hypi.io>
https://hypi.io


On Mon, Aug 16, 2021 at 10:19 AM Stephen Darlington <
stephen.darlington@gridgain.com> wrote:

> A word of caution: you’re generally better replicating your data across
> clusters than stretching a single cluster across data centres. If the
> latency is very low it should work, but it could degrade your throughput
> and you need to be careful about split-brain and other networking issues.
>
> Regards,
> Stephen
>
> On 5 Aug 2021, at 15:24, Courtney Robinson <co...@hypi.io>
> wrote:
>
> Hi Alex,
> Thanks for the reply. I'm glad I asked before the team went any further.
> So we can achieve this with the built in affinity function and the backup
> filter. The real complexity is going to be in migrating our existing caches.
>
> So to clarify the steps involved here are
>
>    1. because Ignite registers all env. vars as node attributes we can
>    set e.g. NODE_DC=<EU_WEST|EU_EAST|CAN0> as an environment var in each k8s
>    cluster
>    2. Then set the backup filter's constructor-arg.value to be NODE_DC.
>    This will tell Ignite that two backups cannot be placed on any two nodes
>    with the same NODE_DC value - correct?
>    3. When we call create table, we must set template=myTemplateName
>    4. Before creating any tables, myTemplateName must be created and must
>    include the backup filter with NODE_DC
>
> Have I got that right?
>
> If so, it seems simple enough. Now the real challenge is where you said
> the cache has to be re-created.
>
> I can't see how we do this without major down time, we have functionality
> in place that allows customers to effectively do a "copy from table A to B
> and then delete A" but it will be impossible to get all of them to do this
> any time soon.
>
> Has anyone else had to do something similar, how is the community
> generally doing migrations like this?
>
> Side note: The only thing that comes to mind is that we will need to build
> a virtual catalog that we maintain so that there isn't a one to one mapping
> between customer tables and the actual Ignite table name.
> So if a table is currently called A and we add a virtual catalog then we
> keep a mapping that says when the user wants to call "A" it should really
> go to table "A_v2" or something. This comes with its own challenge and a
> massive testing overhead.
>
> Regards,
> Courtney Robinson
> Founder and CEO, Hypi
> Tel: ++44 208 123 2413 (GMT+0) <https://hypi.io/>
>
> <https://hypi.io/>
> https://hypi.io
>
>
> On Thu, Aug 5, 2021 at 11:43 AM Alex Plehanov <pl...@gmail.com>
> wrote:
>
>> Hello,
>>
>> You can create your own cache templates with the affinity function you
>> require (currently you use a predefined "partitioned" template, which only
>> sets cache mode to "PARTITIONED"). See [1] for more information about cache
>> templates.
>>
>> > Is this the right approach
>> > How do we handle existing data, changing the affinity function will
>> cause Ignite to not be able to find existing data right?
>> You can't change cache configuration after cache creation. In your
>> example these changes will be just ignored. The only way to change cache
>> configuration - is to create the new cache and migrate data.
>>
>> > How would you recommend implementing the affinity function to be aware
>> of the data centre?
>> It's better to use the standard affinity function with a backup filter
>> for such cases. There is one shipped with Ignite (see [2]).
>>
>> [1]:
>> https://ignite.apache.org/docs/latest/configuring-caches/configuration-overview#cache-templates
>> [2]:
>> https://ignite.apache.org/releases/latest/javadoc/org/apache/ignite/cache/affinity/rendezvous/ClusterNodeAttributeAffinityBackupFilter.html
>>
>> чт, 5 авг. 2021 г. в 09:40, Courtney Robinson <courtney.robinson@hypi.io
>> >:
>>
>>> Hi all,
>>> Our growth with Ignite continues and as we enter the next phase, we need
>>> to support multi-cluster deployments for our platform.
>>> We deploy Ignite and the rest of our stack in Kubernetes and we're in
>>> the early stages of designing what a multi-region deployment should look
>>> like.
>>> We are 90% SQL based when using Ignite, the other 10% includes Ignite
>>> messaging, Queues and compute.
>>>
>>> In our case we have thousands of tables
>>>
>>> CREATE TABLE IF NOT EXISTS Person (
>>>   id int,
>>>   city_id int,
>>>   name varchar,
>>>   company_id varchar,
>>>   PRIMARY KEY (id, city_id)) WITH "template=...";
>>>
>>> In our case, most tables use a template that looks like this:
>>>
>>>
>>> partitioned,backups=2,data_region=hypi,cache_group=hypi,write_synchronization_mode=primary_sync,affinity_key=instance_id,atomicity=ATOMIC,cache_name=Person,key_type=PersonKey,value_type=PersonValue
>>>
>>> I'm aware of affinity co-location (
>>> https://ignite.apache.org/docs/latest/data-modeling/affinity-collocation)
>>> and in the past when we used the key value APIs more than SQL we also used
>>> custom affinity a function to control placement.
>>>
>>> What I don't know is how to best do this with SQL defined caches.
>>> We will have at least 3 Kubernetes clusters, each in a different data
>>> centre, let's say EU_WEST, EU_EAST, CAN0
>>>
>>> Previously we provided environment variables that our custom affinity
>>> function would use and we're thinking of providing the data centre name
>>> this way.
>>>
>>> We have 2 backups in all cases + the primary and so we want the primary
>>> in one DC and each backup to be in a different DC.
>>>
>>> There is no syntax in the SQL template that we could find to enables
>>> specifying a custom affinity function.
>>> Our instance_id column currently used has no common prefix or anything
>>> to associate with a DC.
>>>
>>> We're thinking of getting the cache for each table and then setting the
>>> affinity function to replace the default RendevousAffinityFunction the way
>>> we did before we switched to SQL.
>>> Something like this:
>>>
>>> repo.ctx.ignite.cache("Person").getConfiguration(org.apache.ignite.configuration.CacheConfiguration)
>>> .setAffinity(new org.apache.ignite.cache.affinity.AffinityFunction() {
>>>     ...
>>> })
>>>
>>>
>>> There are a few things unclear about this:
>>>
>>>    1. Is this the right approach?
>>>    2. How do we handle existing data, changing the affinity function
>>>    will cause Ignite to not be able to find existing data right?
>>>    3. How would you recommend implementing the affinity function to be
>>>    aware of the data centre?
>>>    4. Are there any other caveats we need to be thinking about?
>>>
>>> There is a lot of existing data, we want to try to avoid a full
>>> copy/move to new tables if possible, that will prove to be very difficult
>>> in production.
>>>
>>> Regards,
>>> Courtney Robinson
>>> Founder and CEO, Hypi
>>> Tel: ++44 208 123 2413 (GMT+0) <https://hypi.io/>
>>>
>>> <https://hypi.io/>
>>> https://hypi.io
>>>
>>
>
>

Re: Best practices on how to approach data centre aware affinity

Posted by Stephen Darlington <st...@gridgain.com>.
A word of caution: you’re generally better replicating your data across clusters than stretching a single cluster across data centres. If the latency is very low it should work, but it could degrade your throughput and you need to be careful about split-brain and other networking issues.

Regards,
Stephen

> On 5 Aug 2021, at 15:24, Courtney Robinson <co...@hypi.io> wrote:
> 
> Hi Alex,
> Thanks for the reply. I'm glad I asked before the team went any further.
> So we can achieve this with the built in affinity function and the backup filter. The real complexity is going to be in migrating our existing caches.
> 
> So to clarify the steps involved here are 
> because Ignite registers all env. vars as node attributes we can set e.g. NODE_DC=<EU_WEST|EU_EAST|CAN0> as an environment var in each k8s cluster
> Then set the backup filter's constructor-arg.value to be NODE_DC. This will tell Ignite that two backups cannot be placed on any two nodes with the same NODE_DC value - correct?
> When we call create table, we must set template=myTemplateName
> Before creating any tables, myTemplateName must be created and must include the backup filter with NODE_DC
> Have I got that right?
> 
> If so, it seems simple enough. Now the real challenge is where you said the cache has to be re-created.
> 
> I can't see how we do this without major down time, we have functionality in place that allows customers to effectively do a "copy from table A to B and then delete A" but it will be impossible to get all of them to do this any time soon.
> 
> Has anyone else had to do something similar, how is the community generally doing migrations like this?
> 
> Side note: The only thing that comes to mind is that we will need to build a virtual catalog that we maintain so that there isn't a one to one mapping between customer tables and the actual Ignite table name.
> So if a table is currently called A and we add a virtual catalog then we keep a mapping that says when the user wants to call "A" it should really go to table "A_v2" or something. This comes with its own challenge and a massive testing overhead.
> 
> Regards,
> Courtney Robinson
> Founder and CEO, Hypi
> Tel: ++44 208 123 2413 (GMT+0) <https://hypi.io/>
> 
>  <https://hypi.io/>https://hypi.io <https://hypi.io/>
> 
> On Thu, Aug 5, 2021 at 11:43 AM Alex Plehanov <plehanov.alex@gmail.com <ma...@gmail.com>> wrote:
> Hello,
> 
> You can create your own cache templates with the affinity function you require (currently you use a predefined "partitioned" template, which only sets cache mode to "PARTITIONED"). See [1] for more information about cache templates.
> 
> > Is this the right approach
> > How do we handle existing data, changing the affinity function will cause Ignite to not be able to find existing data right?
> You can't change cache configuration after cache creation. In your example these changes will be just ignored. The only way to change cache configuration - is to create the new cache and migrate data.
> 
> > How would you recommend implementing the affinity function to be aware of the data centre?
> It's better to use the standard affinity function with a backup filter for such cases. There is one shipped with Ignite (see [2]).
>  
> [1]: https://ignite.apache.org/docs/latest/configuring-caches/configuration-overview#cache-templates <https://ignite.apache.org/docs/latest/configuring-caches/configuration-overview#cache-templates>
> [2]: https://ignite.apache.org/releases/latest/javadoc/org/apache/ignite/cache/affinity/rendezvous/ClusterNodeAttributeAffinityBackupFilter.html <https://ignite.apache.org/releases/latest/javadoc/org/apache/ignite/cache/affinity/rendezvous/ClusterNodeAttributeAffinityBackupFilter.html>
> чт, 5 авг. 2021 г. в 09:40, Courtney Robinson <courtney.robinson@hypi.io <ma...@hypi.io>>:
> Hi all,
> Our growth with Ignite continues and as we enter the next phase, we need to support multi-cluster deployments for our platform.
> We deploy Ignite and the rest of our stack in Kubernetes and we're in the early stages of designing what a multi-region deployment should look like.
> We are 90% SQL based when using Ignite, the other 10% includes Ignite messaging, Queues and compute.
> 
> In our case we have thousands of tables
> CREATE TABLE IF NOT EXISTS Person (
>   id int,
>   city_id int,
>   name varchar,
>   company_id varchar,
>   PRIMARY KEY (id, city_id)
> ) WITH "template=...";
> In our case, most tables use a template that looks like this:
> 
> partitioned,backups=2,data_region=hypi,cache_group=hypi,write_synchronization_mode=primary_sync,affinity_key=instance_id,atomicity=ATOMIC,cache_name=Person,key_type=PersonKey,value_type=PersonValue
> 
> I'm aware of affinity co-location (https://ignite.apache.org/docs/latest/data-modeling/affinity-collocation <https://ignite.apache.org/docs/latest/data-modeling/affinity-collocation>) and in the past when we used the key value APIs more than SQL we also used custom affinity a function to control placement.
> 
> What I don't know is how to best do this with SQL defined caches.
> We will have at least 3 Kubernetes clusters, each in a different data centre, let's say EU_WEST, EU_EAST, CAN0
> 
> Previously we provided environment variables that our custom affinity function would use and we're thinking of providing the data centre name this way.
> 
> We have 2 backups in all cases + the primary and so we want the primary in one DC and each backup to be in a different DC.
> 
> There is no syntax in the SQL template that we could find to enables specifying a custom affinity function.
> Our instance_id column currently used has no common prefix or anything to associate with a DC.
> 
> We're thinking of getting the cache for each table and then setting the affinity function to replace the default RendevousAffinityFunction the way we did before we switched to SQL.
> Something like this:
> repo.ctx.ignite.cache("Person").getConfiguration(org.apache.ignite.configuration.CacheConfiguration)
> .setAffinity(new org.apache.ignite.cache.affinity.AffinityFunction() {
>     ...
> })
> 
> There are a few things unclear about this:
> Is this the right approach?
> How do we handle existing data, changing the affinity function will cause Ignite to not be able to find existing data right?
> How would you recommend implementing the affinity function to be aware of the data centre?
> Are there any other caveats we need to be thinking about?
> There is a lot of existing data, we want to try to avoid a full copy/move to new tables if possible, that will prove to be very difficult in production.
> 
> Regards,
> Courtney Robinson
> Founder and CEO, Hypi
> Tel: ++44 208 123 2413 (GMT+0) <https://hypi.io/>
> 
>  <https://hypi.io/>https://hypi.io <https://hypi.io/>


Re: Best practices on how to approach data centre aware affinity

Posted by Courtney Robinson <co...@hypi.io>.
Hi Alex,
Thanks for the reply. I'm glad I asked before the team went any further.
So we can achieve this with the built in affinity function and the backup
filter. The real complexity is going to be in migrating our existing caches.

So to clarify the steps involved here are

   1. because Ignite registers all env. vars as node attributes we can set
   e.g. NODE_DC=<EU_WEST|EU_EAST|CAN0> as an environment var in each k8s
   cluster
   2. Then set the backup filter's constructor-arg.value to be NODE_DC.
   This will tell Ignite that two backups cannot be placed on any two nodes
   with the same NODE_DC value - correct?
   3. When we call create table, we must set template=myTemplateName
   4. Before creating any tables, myTemplateName must be created and must
   include the backup filter with NODE_DC

Have I got that right?

If so, it seems simple enough. Now the real challenge is where you said
the cache has to be re-created.

I can't see how we do this without major down time, we have functionality
in place that allows customers to effectively do a "copy from table A to B
and then delete A" but it will be impossible to get all of them to do this
any time soon.

Has anyone else had to do something similar, how is the community generally
doing migrations like this?

Side note: The only thing that comes to mind is that we will need to build
a virtual catalog that we maintain so that there isn't a one to one mapping
between customer tables and the actual Ignite table name.
So if a table is currently called A and we add a virtual catalog then we
keep a mapping that says when the user wants to call "A" it should really
go to table "A_v2" or something. This comes with its own challenge and a
massive testing overhead.

Regards,
Courtney Robinson
Founder and CEO, Hypi
Tel: ++44 208 123 2413 (GMT+0) <https://hypi.io>

<https://hypi.io>
https://hypi.io


On Thu, Aug 5, 2021 at 11:43 AM Alex Plehanov <pl...@gmail.com>
wrote:

> Hello,
>
> You can create your own cache templates with the affinity function you
> require (currently you use a predefined "partitioned" template, which only
> sets cache mode to "PARTITIONED"). See [1] for more information about cache
> templates.
>
> > Is this the right approach
> > How do we handle existing data, changing the affinity function will
> cause Ignite to not be able to find existing data right?
> You can't change cache configuration after cache creation. In your example
> these changes will be just ignored. The only way to change cache
> configuration - is to create the new cache and migrate data.
>
> > How would you recommend implementing the affinity function to be aware
> of the data centre?
> It's better to use the standard affinity function with a backup filter for
> such cases. There is one shipped with Ignite (see [2]).
>
> [1]:
> https://ignite.apache.org/docs/latest/configuring-caches/configuration-overview#cache-templates
> [2]:
> https://ignite.apache.org/releases/latest/javadoc/org/apache/ignite/cache/affinity/rendezvous/ClusterNodeAttributeAffinityBackupFilter.html
>
> чт, 5 авг. 2021 г. в 09:40, Courtney Robinson <co...@hypi.io>:
>
>> Hi all,
>> Our growth with Ignite continues and as we enter the next phase, we need
>> to support multi-cluster deployments for our platform.
>> We deploy Ignite and the rest of our stack in Kubernetes and we're in the
>> early stages of designing what a multi-region deployment should look like.
>> We are 90% SQL based when using Ignite, the other 10% includes Ignite
>> messaging, Queues and compute.
>>
>> In our case we have thousands of tables
>>
>> CREATE TABLE IF NOT EXISTS Person (
>>   id int,
>>   city_id int,
>>   name varchar,
>>   company_id varchar,
>>   PRIMARY KEY (id, city_id)) WITH "template=...";
>>
>> In our case, most tables use a template that looks like this:
>>
>>
>> partitioned,backups=2,data_region=hypi,cache_group=hypi,write_synchronization_mode=primary_sync,affinity_key=instance_id,atomicity=ATOMIC,cache_name=Person,key_type=PersonKey,value_type=PersonValue
>>
>> I'm aware of affinity co-location (
>> https://ignite.apache.org/docs/latest/data-modeling/affinity-collocation)
>> and in the past when we used the key value APIs more than SQL we also used
>> custom affinity a function to control placement.
>>
>> What I don't know is how to best do this with SQL defined caches.
>> We will have at least 3 Kubernetes clusters, each in a different data
>> centre, let's say EU_WEST, EU_EAST, CAN0
>>
>> Previously we provided environment variables that our custom affinity
>> function would use and we're thinking of providing the data centre name
>> this way.
>>
>> We have 2 backups in all cases + the primary and so we want the primary
>> in one DC and each backup to be in a different DC.
>>
>> There is no syntax in the SQL template that we could find to enables
>> specifying a custom affinity function.
>> Our instance_id column currently used has no common prefix or anything to
>> associate with a DC.
>>
>> We're thinking of getting the cache for each table and then setting the
>> affinity function to replace the default RendevousAffinityFunction the way
>> we did before we switched to SQL.
>> Something like this:
>>
>> repo.ctx.ignite.cache("Person").getConfiguration(org.apache.ignite.configuration.CacheConfiguration)
>> .setAffinity(new org.apache.ignite.cache.affinity.AffinityFunction() {
>>     ...
>> })
>>
>>
>> There are a few things unclear about this:
>>
>>    1. Is this the right approach?
>>    2. How do we handle existing data, changing the affinity function
>>    will cause Ignite to not be able to find existing data right?
>>    3. How would you recommend implementing the affinity function to be
>>    aware of the data centre?
>>    4. Are there any other caveats we need to be thinking about?
>>
>> There is a lot of existing data, we want to try to avoid a full copy/move
>> to new tables if possible, that will prove to be very difficult in
>> production.
>>
>> Regards,
>> Courtney Robinson
>> Founder and CEO, Hypi
>> Tel: ++44 208 123 2413 (GMT+0) <https://hypi.io>
>>
>> <https://hypi.io>
>> https://hypi.io
>>
>

Re: Best practices on how to approach data centre aware affinity

Posted by Alex Plehanov <pl...@gmail.com>.
Hello,

You can create your own cache templates with the affinity function you
require (currently you use a predefined "partitioned" template, which only
sets cache mode to "PARTITIONED"). See [1] for more information about cache
templates.

> Is this the right approach
> How do we handle existing data, changing the affinity function will cause
Ignite to not be able to find existing data right?
You can't change cache configuration after cache creation. In your example
these changes will be just ignored. The only way to change cache
configuration - is to create the new cache and migrate data.

> How would you recommend implementing the affinity function to be aware of
the data centre?
It's better to use the standard affinity function with a backup filter for
such cases. There is one shipped with Ignite (see [2]).

[1]:
https://ignite.apache.org/docs/latest/configuring-caches/configuration-overview#cache-templates
[2]:
https://ignite.apache.org/releases/latest/javadoc/org/apache/ignite/cache/affinity/rendezvous/ClusterNodeAttributeAffinityBackupFilter.html

чт, 5 авг. 2021 г. в 09:40, Courtney Robinson <co...@hypi.io>:

> Hi all,
> Our growth with Ignite continues and as we enter the next phase, we need
> to support multi-cluster deployments for our platform.
> We deploy Ignite and the rest of our stack in Kubernetes and we're in the
> early stages of designing what a multi-region deployment should look like.
> We are 90% SQL based when using Ignite, the other 10% includes Ignite
> messaging, Queues and compute.
>
> In our case we have thousands of tables
>
> CREATE TABLE IF NOT EXISTS Person (
>   id int,
>   city_id int,
>   name varchar,
>   company_id varchar,
>   PRIMARY KEY (id, city_id)) WITH "template=...";
>
> In our case, most tables use a template that looks like this:
>
>
> partitioned,backups=2,data_region=hypi,cache_group=hypi,write_synchronization_mode=primary_sync,affinity_key=instance_id,atomicity=ATOMIC,cache_name=Person,key_type=PersonKey,value_type=PersonValue
>
> I'm aware of affinity co-location (
> https://ignite.apache.org/docs/latest/data-modeling/affinity-collocation)
> and in the past when we used the key value APIs more than SQL we also used
> custom affinity a function to control placement.
>
> What I don't know is how to best do this with SQL defined caches.
> We will have at least 3 Kubernetes clusters, each in a different data
> centre, let's say EU_WEST, EU_EAST, CAN0
>
> Previously we provided environment variables that our custom affinity
> function would use and we're thinking of providing the data centre name
> this way.
>
> We have 2 backups in all cases + the primary and so we want the primary in
> one DC and each backup to be in a different DC.
>
> There is no syntax in the SQL template that we could find to enables
> specifying a custom affinity function.
> Our instance_id column currently used has no common prefix or anything to
> associate with a DC.
>
> We're thinking of getting the cache for each table and then setting the
> affinity function to replace the default RendevousAffinityFunction the way
> we did before we switched to SQL.
> Something like this:
>
> repo.ctx.ignite.cache("Person").getConfiguration(org.apache.ignite.configuration.CacheConfiguration)
> .setAffinity(new org.apache.ignite.cache.affinity.AffinityFunction() {
>     ...
> })
>
>
> There are a few things unclear about this:
>
>    1. Is this the right approach?
>    2. How do we handle existing data, changing the affinity function will
>    cause Ignite to not be able to find existing data right?
>    3. How would you recommend implementing the affinity function to be
>    aware of the data centre?
>    4. Are there any other caveats we need to be thinking about?
>
> There is a lot of existing data, we want to try to avoid a full copy/move
> to new tables if possible, that will prove to be very difficult in
> production.
>
> Regards,
> Courtney Robinson
> Founder and CEO, Hypi
> Tel: ++44 208 123 2413 (GMT+0) <https://hypi.io>
>
> <https://hypi.io>
> https://hypi.io
>