You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@ignite.apache.org by ag...@apache.org on 2021/03/29 16:59:16 UTC

[ignite-3] branch ignite-14393 updated: IGNITE-14393 Get rid of multi-key update for table creation

This is an automated email from the ASF dual-hosted git repository.

agoncharuk pushed a commit to branch ignite-14393
in repository https://gitbox.apache.org/repos/asf/ignite-3.git


The following commit(s) were added to refs/heads/ignite-14393 by this push:
     new 558a98b  IGNITE-14393 Get rid of multi-key update for table creation
558a98b is described below

commit 558a98b08fc9fe26a28873cb5d00d490c4e126e5
Author: Alexey Goncharuk <al...@gmail.com>
AuthorDate: Mon Mar 29 19:59:09 2021 +0300

    IGNITE-14393 Get rid of multi-key update for table creation
---
 modules/runner/README.md | 32 +++++++++++++++-----------------
 1 file changed, 15 insertions(+), 17 deletions(-)

diff --git a/modules/runner/README.md b/modules/runner/README.md
index 7226377..6776168 100644
--- a/modules/runner/README.md
+++ b/modules/runner/README.md
@@ -90,34 +90,32 @@ of the watch, and not processed events are replayed.
 
 ### Example: `CREATE TABLE` flow
 We require that each Ignite table is assigned a globally unique ID (the ID must not repeat even after the table is 
-dropped, so we use a growing `long` counter to assign table IDs).
+dropped, so we use the metastorage key revision to assign table IDs).
 
-When a table is created, Ignite first checks that a table with the given name does not exist, chooses the next available 
-table ID ``idNext`` and attempts to create the following pair of key-value pairs in the metastorage via the conditional 
-multi-update:
+When a table is created, Ignite first checks that a table with the given name does not exist, then attempts to create 
+the following key-value pair in the metastorage via the conditional update ensuring atomic `putIfAbsent` semantics:
 
 ```
-internal.tables.names.<name>=<idNext>
-internal.tables.<idNext>=name
+internal.tables.names.<name>=<name>
 ```  
 
-If the multi-update succeeds, Ignite considers the table created. If the multi-update fails, then either the table with
-the same name was concurrently created (the operation fails in this case) or the ``idNext`` was assigned to another 
-table with a different name (Ignite retries the operation in this case).
+If the update succeeds, Ignite considers the table created and uses the returned key-value pair revision as the table 
+ID. If the update fails, then the table with the same name was concurrently created (the operation fails in this case).
 
 In order to process affinity calculations and assignments, the affinity manager creates a reliable watch for the 
 following keys on metastorage group members:
 
 ```
-internal.tables.<ID>
+internal.tables.names.*
 internal.baseline
 ``` 
 
 Whenever a watch is fired, the affinity manager checks which key was updated. If the watch is triggered for 
-``internal.tables.<ID>`` key, it calculates a new affinity for the table with the given ``ID``. If the watch is 
-triggered for ``internal.baseline`` key, the manager recalculates affinity for all tables exsiting at the watch revision
-(this can be done using the metastorage ``range(keys, upperBound)`` method providing the watch event revision as the 
-upper bound). The calculated affinity is written to the ``internal.tables.<ID>.affinity`` key.
+``internal.tables.names.<name>`` key, it calculates a new affinity for the table with the given name (using key revision 
+as the table ID). If the watch is triggered for ``internal.baseline`` key, the manager recalculates affinity for all 
+tables exsiting at the watch revision (this can be done using the metastorage ``range(keys, upperBound)`` method 
+providing the watch event revision as the upper bound). The calculated affinity is written to the 
+``internal.tables.affinity.<ID>`` key.
 
 > Note that ideally the watch should only be processed on metastorage group leader, thus eliminating unnecessary network
 > trips. Theoretically, we could have embedded this logic to the state machine, but this would enormously complicate 
@@ -127,7 +125,7 @@ To handle partition assignments, partition manager creates a reliable watch for
 nodes:
 
 ```
-internal.tables.<ID>.affinity
+internal.tables.affinity.<ID>
 ```
 
 Whenever a watch is fired, the node checks whether there exist new partitions assigned to the local node, and if there 
@@ -135,8 +133,8 @@ are, the node bootstraps corresponding Raft partition servers (i.e. allocates pa
 The allocation information is written to projected vault keys:
 
 ```
-local.tables.<ID>.<PARTITION_ID>.logpath=/path/to/raft/log
-local.tables.<ID>.<PARTITION_ID>.storagepath=/path/to/storage/file
+local.tables.partition.<ID>.<PARTITION_ID>.logpath=/path/to/raft/log
+local.tables.partition.<ID>.<PARTITION_ID>.storagepath=/path/to/storage/file
 ``` 
 
 Once the projected keys are synced to the vault, the partition manager can create partition Raft servers (initialize