You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ignite.apache.org by "Sergey Chugunov (Jira)" <ji...@apache.org> on 2023/05/17 14:45:00 UTC

[jira] [Updated] (IGNITE-18535) Define new classes for versioned tables/indexes schemas

     [ https://issues.apache.org/jira/browse/IGNITE-18535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sergey Chugunov updated IGNITE-18535:
-------------------------------------
    Epic Link: IGNITE-19502  (was: IGNITE-17766)

> Define new classes for versioned tables/indexes schemas
> -------------------------------------------------------
>
>                 Key: IGNITE-18535
>                 URL: https://issues.apache.org/jira/browse/IGNITE-18535
>             Project: Ignite
>          Issue Type: Improvement
>          Components: sql
>            Reporter: Ivan Bessonov
>            Assignee: Andrey Mashenkov
>            Priority: Major
>              Labels: ignite-3
>             Fix For: 3.0.0-beta2
>
>          Time Spent: 50m
>  Remaining Estimate: 0h
>
> Current approach with schema management is faulty and can't support indexes. On top of that, it doesn't allow us to truly have multi-versioned historical data. Once the table is removed, it's removed for good, meaning that "current" RO transactions will not be able to finish. This is not acceptable.
> h3. Schema definitions
> What we need to have is the following:
> {code:java}
> SchemaDefinitions = map {version -> SchemaDefinition}
> SchemaDefinition = {timestamp, set {TableDefinition}, set{IndexDefinition}}
> TableDefinition = {name, id, array[ColumnDefinition], ...}
> IndexDefinition = {name, id, tableId, state, array[IdxColumnDefinition], ...}{code}
> Schema must be versioned, that's the first point. Well, it's already versioned in "main", here I mean the global versioning to tie everything to transactions and management of SQL indexes.
> Each definition correspond to a time period, where it represents the "actual" state of things. It must be used for RO queries, for example. RW transactions always use LATEST schema, obviously.
> Now, the meaning of defined values:
>  * version - a simple auto-incrementing integer value;
>  * "timestamp" - the schema is considered to be valid from this timestamp until the timestamp of "next" version (or "inifinity" if the next version doesn't yet exist);
>  * most of tables and indexes properties are self-explanatory;
>  * index state - RO or RW. We should differentiate the indexes that are not yet built frome indexes that are fully available.
> Currently, it's not too clear where to store this structure. The problem lies in the realm of metadata synchronization, that's not yet designed. But the thing is that all nodes must eventually have an up-to-date state and every data/index update must be consistent with the version that belongs to a current operation's timestamp.
> There are two likely candidates - Meta-Storage or Configuration. We'll figure it out later.
> h3. Seralization / storage
> It would be convenient to only store the oldest version + the collection of diffs. Every node would unpack that locally, but we would save a lot on the storage space in meta-storage in case when user has a lot of tables/indexes.
> This approach would also be beneficial for another reason: we need to know, what's changed between versions. It may be hard to calculate if all that we have are definitions themselves.
> h3. General thoughts
> This may be a good place to start using integer tableId and indexId more often. UUIDs are too much. What's good is that "serializability" of schemas gives us easy way of generating integer ids, just like it's don right now with configuration.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)