You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@ignite.apache.org by ag...@apache.org on 2021/03/11 17:48:27 UTC

[ignite-3] branch ignite-13618 updated: IGNITE-13618 Corrected a few checks for bytecode module, moved pieces of IEP-54 to module README.md

This is an automated email from the ASF dual-hosted git repository.

agoncharuk pushed a commit to branch ignite-13618
in repository https://gitbox.apache.org/repos/asf/ignite-3.git


The following commit(s) were added to refs/heads/ignite-13618 by this push:
     new 021f826  IGNITE-13618 Corrected a few checks for bytecode module, moved pieces of IEP-54 to module README.md
021f826 is described below

commit 021f82695b4faa722133416599510bd071d75df7
Author: Alexey Goncharuk <al...@gmail.com>
AuthorDate: Thu Mar 11 20:48:21 2021 +0300

    IGNITE-13618 Corrected a few checks for bytecode module, moved pieces of IEP-54 to module README.md
---
 modules/bytecode/README.md                         |  4 +-
 .../facebook/presto/bytecode/MethodDefinition.java |  8 +-
 .../presto/bytecode/MethodGenerationContext.java   |  2 +-
 modules/schema/README.md                           | 50 ++++++++++++-
 .../org/apache/ignite/internal/schema/README.md    | 87 ++++++++++++++++++++++
 .../ignite/internal/schema/package-info.java       | 46 ------------
 6 files changed, 139 insertions(+), 58 deletions(-)

diff --git a/modules/bytecode/README.md b/modules/bytecode/README.md
index 0135e01..3a178c8 100644
--- a/modules/bytecode/README.md
+++ b/modules/bytecode/README.md
@@ -1,4 +1,6 @@
 # Apache Ignite Bytecode module
-Fork of PrestoDB Bytecode module (ver 0.243).
+Fork of [PrestoDB Bytecode module (ver 0.243)](https://github.com/prestodb/presto/tree/0.243/presto-bytecode).
 * Removed unnecessary guava dependency.
 * Tests migrated from TestNG to JUnit 5.
+
+This module provides a convenient thin wrapper around [ASM](https://asm.ow2.io/) library to generate classes at runtime.
\ No newline at end of file
diff --git a/modules/bytecode/src/main/java/com/facebook/presto/bytecode/MethodDefinition.java b/modules/bytecode/src/main/java/com/facebook/presto/bytecode/MethodDefinition.java
index 0297d9b..1405765 100644
--- a/modules/bytecode/src/main/java/com/facebook/presto/bytecode/MethodDefinition.java
+++ b/modules/bytecode/src/main/java/com/facebook/presto/bytecode/MethodDefinition.java
@@ -11,6 +11,7 @@
  * See the License for the specific language governing permissions and
  * limitations under the License.
  */
+
 package com.facebook.presto.bytecode;
 
 import java.util.ArrayList;
@@ -67,12 +68,7 @@ public class MethodDefinition {
 
         this.access = access;
         this.name = name;
-        if (returnType != null) {
-            this.returnType = returnType;
-        }
-        else {
-            this.returnType = type(void.class);
-        }
+        this.returnType = returnType != null ? returnType : type(void.class);
         this.parameters = List.copyOf(parameters);
         this.parameterTypes = parameters.stream().map(Parameter::getType).collect(Collectors.toList());
         this.parameterAnnotations = parameters.stream().map(p -> new ArrayList<AnnotationDefinition>()).collect(Collectors.toList());
diff --git a/modules/bytecode/src/main/java/com/facebook/presto/bytecode/MethodGenerationContext.java b/modules/bytecode/src/main/java/com/facebook/presto/bytecode/MethodGenerationContext.java
index ec2a65a..e62da9d 100644
--- a/modules/bytecode/src/main/java/com/facebook/presto/bytecode/MethodGenerationContext.java
+++ b/modules/bytecode/src/main/java/com/facebook/presto/bytecode/MethodGenerationContext.java
@@ -91,7 +91,7 @@ public class MethodGenerationContext {
         return true;
     }
 
-    private final class ScopeContext {
+    private static final class ScopeContext {
         private final Scope scope;
         private final List<Variable> variables;
 
diff --git a/modules/schema/README.md b/modules/schema/README.md
index e742dc5..e640252 100644
--- a/modules/schema/README.md
+++ b/modules/schema/README.md
@@ -1,7 +1,49 @@
 # Schema module
 
-This module provides implementations for schema configuration API and schema management components.
+This module provides API and implementation for schema management components:
 
-* Schema configuration public API implementation.
-* Distributed schema management for processing schema change events at runtime.
-* Schema version management for transparent upgrade stored data purposes according to life-schema concept.
\ No newline at end of file
+* Public API for schema definition and evolution
+* Schema manager component that implements necessary machinary to translate schema management commands to corresponding
+  metastorage modifications, as well as schema modification event processing logic 
+* Necessary logic to build and upgrade tuples - rows of specific schema that encode user data in schema-defined format.
+
+## Schema-aware tables
+We require that at any moment in time an Ignite table has only one most recent relevant schema. Upon schema 
+modification, we assign a monotonically growing identifier to each version of the cache schema. The ordering guarantees 
+are provided by the underlying distributed metastorage. The history of schema versions must be kept in the metastorage 
+for a long enough period of time to allow upgrade of all existing data stored in a given table.
+              
+Given a schema evolution history, a tuple migration from version `N-k` to version `N` is a straightforward operation. 
+We identify fields that were dropped during the last k schema operations and fields that were added (taking into account
+default field values) and update the tuple based on the field modifications. Afterward, the updated tuple is written in 
+the schema version `N` layout format. The tuple upgrade may happen on read with an optional writeback or on next update. 
+Additionally, tuple upgrade in background is possible.
+              
+Since the tuple key hashcode is inlined to the tuple data for quick key lookups, we require that the set of key columns 
+do not change during the schema evolution. In the future, we may remove this restriction, but this will require careful 
+hashcode calculation adjustments. Removing a column from the key columns does not seem to be possible since it may 
+produce duplicates, and we assume PK has no duplicates.
+              
+Additionally to adding and removing columns, it may be possible to allow for column type migrations when the type change 
+is non-ambiguous (a type upcast, e.g. Int8 → Int16, or by means of a certain expression, e,g, Int8 → String using 
+the `CAST` expression).
+
+### Dynamic schema expansion (live schema)
+Ignite can operate in two modes that provide different flexibility level and restrictions wrt object-to-schema mapping:
+ * Strict mode. When a user attempts to insert/update an object to a table, Ignite checks that the object does not 
+ contain any extra columns that are not present in the current table schema. If such columns are detected, Ignite will
+ fail the operation requiring the user to manually update the schema before working with added columns.     
+ * Live mode. When an object is inserted into a table, we attempt to 'fit' object fields to the schema columns. If the 
+ object has some extra fields which are not present in the current schema, the schema is automatically updated to store 
+ additional extra fields that are present in the object. If there are two concurrent live schema modifications, they can 
+ either merge together if modifications are non-conflicting (e.g. adding disjoint sets of columns or adding columns with
+ the same definition), or one of the modifications will fail (e.g. two columns with the same name, but conflicting type
+ are being inserted). Live schema will try to automatically expand the schema even if there was an explicit drop column
+ command executed right before the live schema expansion. **Live schema never drops columns during automatic schema 
+ evolution.** If a schema has columns that were not fulfilled by object fields, they will be either kept `null` or 
+ populated with defaults when provided, or the update will fail with an exception.
+ 
+### Data Layout
+Data layout is documentation can be found [here](src/main/java/org/apache/ignite/internal/schema/README.md)
+
+## Object-to-schema mapping
diff --git a/modules/schema/src/main/java/org/apache/ignite/internal/schema/README.md b/modules/schema/src/main/java/org/apache/ignite/internal/schema/README.md
new file mode 100644
index 0000000..435c7be
--- /dev/null
+++ b/modules/schema/src/main/java/org/apache/ignite/internal/schema/README.md
@@ -0,0 +1,87 @@
+This package provides necessary infrastructure to create, read, convert to and from POJO classes
+schema-defined tuples.
+
+### Schema definition
+
+Schema is defined as a set of columns which are split into key columns chunk and value columns chunk.
+Each column defined by a name, nullability flag, and a `org.apache.ignite.internal.schema.NativeType`.
+Type is a thin wrapper over the `org.apache.ignite.internal.schema.NativeTypeSpec` to provide differentiation
+between types of one kind with different size (an example of such differentiation is bitmask(n) or number(n)).
+`org.apache.ignite.internal.schema.NativeTypeSpec` provides necessary indirection to read a column as a
+`java.lang.Object` without needing to switch over the column type.
+
+`NativeType` defines one of the following types: 
+
+Type | Size | Description
+---- | ---- | -----------
+Bitmask(n)|⌈n/8⌉ bytes|A fixed-length bitmask of n bits
+Int8|1 byte|1-byte signed integer
+Uint8|1 byte|1-byte unsigned integer
+Int16|2 bytes|2-byte signed integer
+Uint16|2 bytes|2-byte unsigned integer
+Int32|4 bytes|4-byte signed integer
+Uint32|4 bytes|4-byte unsigned integer
+Int64|8 bytes|8-byte signed integer
+Uint64|8 bytes|8-byte unsigned integer
+Float|4 bytes|4-byte floating-point number
+Double|8 bytes|8-byte floating-point number
+Number([n])|Variable|Variable-length number (optionally bound by n bytes in size)
+Decimal|Variable|Variable-length floating-point number
+UUID|16 bytes|UUID
+String|Variable|A string encoded with a given Charset
+Date|3 bytes|A timezone-free date encoded as a year (15 bits), month (4 bits), day (5 bits)
+Time|4 bytes|A timezone-free time encoded as padding (5 bits), hour (5 bits), minute (6 bits), second (6 bits), millisecond (10 bits)
+Datetime|7 bytes|A timezone-free datetime encoded as (date, time)
+Timestamp|8 bytes|Number of milliseconds since Jan 1, 1970 00:00:00.000 (with no timezone)
+Binary|Variable|Variable-size byte array
+
+Arbitrary nested object serialization at this point is not supported, but can be provided in the future by either 
+explicit inlining, or by providing an upper-level serialization primitive that will be mapped to a `Binary` column.
+
+### Tuple layout
+A tuple itself does not contain any type metadata and only contains necessary information required for fast column 
+lookup. In a tuple, key columns and value columns are separated and written to chunks with identical structure 
+(so that chunk is self-sufficient, and, provided with the column types can be read independently).
+
+Tuple structure has the following format:
+
+    ┌─────────────────────────────┬─────────────────────┐
+    │           Header            │        Data         │
+    ├─────────┬─────────┬─────────┼──────────┬──────────┤
+    │ Schema  │ Flags   │ Key     │ Key      │ Value    │
+    │ Version │         │ Hash    │ Chunk    │ Chunk    │
+    ├─────────┼─────────┼─────────┼──────────┼──────────┤
+    │ 2 Bytes │ 2 Bytes │ 4 Bytes │ Variable │ Variable │
+    └─────────┴─────────┴─────────┴──────────┴──────────┘
+
+
+Each chunk section has the following structure:
+
+                                                 ┌──────────────────────────────────────────────────┐
+                                                 │                                                  │
+    ┌─────────┬─────────────────────────┬────────┴────────┬─────────────────────────┬──────────┬────⌄─────┐
+    │ Full    │ Varsize Columns Offsets │ Varsize Columns │ Null-Defaults           │ Fixsize  │ Varsize  │
+    │ Size    │ Table Size              │ Offsets Table   │ Map                     │ Columns  │ Columns  │
+    ├─────────┼─────────────────────────┼─────────────────┼─────────────────────────┼──────────┼──────────┤
+    │ 4 Bytes │ 2 Bytes                 │ Variable        │ ⌈Number of columns / 8⌉ │ Variable │ Variable │
+    └─────────┴─────────────────────────┴─────────────────┴─────────────────────────┴──────────┴──────────┘
+All columns within a group are split into groups of fixed-size columns and variable-size columns. Withing the group of 
+fixsize columns, the columns are sorted by size, then by column name. Within the group of varsize columns, the columns 
+are sorted by column name. Inside a tuple default values and nulls are omitted and encoded in the null-defaults map 
+(essentially, a bitset). The size of the varsize columns offsets table is equal to the number of non-null non-default 
+varsize columns multiplied by 2 (a single entry in the offsets table is 2 bytes). The offset stored in the offsets table 
+is calculated from the beginning of the chunk.
+
+### Tuple construction and access
+To assemble a tuple with some schema, an instance of `org.apache.ignite.internal.schema.TupleAssembler`
+must be used which provides the low-level API for building tuples. When using the tuple assembler, the
+columns must be passed to the assembler in the internal schema sort order. Additionally, when constructing
+the instance of the assembler, the user should pre-calculate the size of the tuple to avoid extra array copies,
+and the number of non-null varlen columns for key and value chunks. Less restrictive building techniques
+are provided by class (de)serializers and tuple builder, which take care of sizing and column order.
+
+To read column values of a tuple, one needs to construct a subclass of
+`org.apache.ignite.internal.schema.Tuple` which provides necessary logic to read arbitrary columns with
+type checking. For primitive types, `org.apache.ignite.internal.schema.Tuple` provides boxed and non-boxed
+value methods to avoid boxing in scenarios where boxing can be avoided (deserialization of non-null columns to
+POJO primitives, for example).
diff --git a/modules/schema/src/main/java/org/apache/ignite/internal/schema/package-info.java b/modules/schema/src/main/java/org/apache/ignite/internal/schema/package-info.java
index 5aaa359..fe5f0e3 100644
--- a/modules/schema/src/main/java/org/apache/ignite/internal/schema/package-info.java
+++ b/modules/schema/src/main/java/org/apache/ignite/internal/schema/package-info.java
@@ -18,51 +18,5 @@
 /**
  * <!-- Package description. -->
  * Contains schema description, tuple assembly and field accessor classes.
- * <p>
- * This package provides necessary infrastructure to create, read, convert to and from POJO classes
- * schema-defined tuples.
- * <p>
- * Schema is defined as a set of columns which are split into key columns chunk and value columns chunk.
- * Each column defined by a name, nullability flag, and a {@link org.apache.ignite.internal.schema.NativeType}.
- * Type is a thin wrapper over the {@link org.apache.ignite.internal.schema.NativeTypeSpec} to provide differentiation
- * between types of one kind with different size (an example of such differentiation is bitmask(n) or number(n)).
- * {@link org.apache.ignite.internal.schema.NativeTypeSpec} provides necessary indirection to read a column as a
- * {@code java.lang.Object} without needing to switch over the column type.
- * <p>
- * A tuple itself does not contain any type metadata and only contains necessary
- * information required for fast column lookup. In a tuple, key columns and value columns are separated
- * and written to chunks with identical structure (so that chunk is self-sufficient, and, provided with
- * the column types can be read independently).
- * Tuple structure has the following format:
- *
- * <pre>
- * +---------+----------+----------+-------------+
- * |  Schema |    Key  | Key chunk | Value chunk |
- * | Version |   Hash  | Bytes     | Bytes       |
- * +---------+------ --+-----------+-------------+
- * | 2 bytes | 4 bytes |                         |
- * +---------+---------+-------------------------+
- * </pre>
- * Each bytes section has the following structure:
- * <pre>
- * +---------+----------+---------+------+--------+--------+
- * |   Total | Vartable |  Varlen | Null | Fixlen | Varlen |
- * |  Length |   Length | Offsets |  Map |  Bytes |  Bytes |
- * +---------+----------+---------+------+--------+--------+
- * | 4 bytes |  2 bytes |                                  |
- * +---------+---------------------------------------------+
- * </pre>
- * To assemble a tuple with some schema, an instance of {@link org.apache.ignite.internal.schema.TupleAssembler}
- * must be used which provides the low-level API for building tuples. When using the tuple assembler, the
- * columns must be passed to the assembler in the internal schema sort order. Additionally, when constructing
- * the instance of the assembler, the user should pre-calculate the size of the tuple to avoid extra array copies,
- * and the number of non-null varlen columns for key and value chunks. Less restrictive building techniques
- * are provided by class (de)serializers and tuple builder, which take care of sizing and column order.
- * <p>
- * To read column values of a tuple, one needs to construct a subclass of
- * {@link org.apache.ignite.internal.schema.Tuple} which provides necessary logic to read arbitrary columns with
- * type checking. For primitive types, {@link org.apache.ignite.internal.schema.Tuple} provides boxed and non-boxed
- * value methods to avoid boxing in scenarios where boxing can be avoided (deserialization of non-null columns to
- * POJO primitives, for example).
  */
 package org.apache.ignite.internal.schema;
\ No newline at end of file