You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@quickstep.apache.org by ra...@apache.org on 2016/08/05 22:52:42 UTC

[18/30] incubator-quickstep git commit: Added README for types module.

Added README for types module.


Project: http://git-wip-us.apache.org/repos/asf/incubator-quickstep/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-quickstep/commit/33554c3e
Tree: http://git-wip-us.apache.org/repos/asf/incubator-quickstep/tree/33554c3e
Diff: http://git-wip-us.apache.org/repos/asf/incubator-quickstep/diff/33554c3e

Branch: refs/heads/quickstep-28-29
Commit: 33554c3edcac6becb84bfcdcdb8a60b9dd6a3f0b
Parents: 7415ee8
Author: Craig Chasseur <sp...@gmail.com>
Authored: Wed Jul 27 20:01:53 2016 -0700
Committer: Craig Chasseur <sp...@gmail.com>
Committed: Wed Jul 27 20:01:53 2016 -0700

----------------------------------------------------------------------
 types/README.md | 102 +++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 102 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-quickstep/blob/33554c3e/types/README.md
----------------------------------------------------------------------
diff --git a/types/README.md b/types/README.md
new file mode 100644
index 0000000..baf01aa
--- /dev/null
+++ b/types/README.md
@@ -0,0 +1,102 @@
+# The Quickstep Type System
+
+The types module is used across Quickstep and handles details of how date values
+are stored and represented, how they are parsed from and printed to
+human-readable text, and low-level operations on values that form the building
+blocks for more complex [expressions](../expressions).
+
+## The Type Class
+
+Every distinct concrete type in Quickstep is represented by a single object of
+a class derived from the base `quickstep::Type` class. All types have some
+common properties, including the following:
+
+  * A `TypeID` - an enum identifying the type, e.g. `kInt` for 32-bit integers,
+    or `kVarChar` for variable-length strings.
+  * Nullability - whether the type allows NULL values. All types have both a
+    nullable and a non-nullable flavor, except for NullType, a special type that
+    can ONLY store NULLs and has no non-nullable version.
+  * Storage size - minimum and maximum byte length. For fixed-length types like
+    basic numeric types and fixed length `CHAR(X)` strings, these lengths are
+    the same. For variable-length types like `VARCHAR(X)`, they can be
+    different (and the `Type` class has a method `estimateAverageByteLength()`
+    that can be used to make educated guesses when allocating storage). Note
+    that storage requirements really only apply to uncompressed, non-NULL
+    values. The actual bytes needed to store the values in the
+    [storage system](../storage) may be different if
+    [compression](../compression) is used, and some storage formats might store
+    NULLs differently.
+
+Some categories of types have additional properties (e.g. `CharType` and
+`VarCharType` also have a length parameter that indicates the maximum length of
+string that can be stored).
+
+### Getting a Type
+
+Each distinct, concrete Type is represented by a single object in the entire
+Quickstep process. To actually get a reference to usable `Type`, most code will
+go through the `TypeFactory`. `TypeFactory` provides static methods to access
+specific types by `TypeID` and other parameters. It can also deserialize a type
+from its protobuf representation (a `quickstep::serialization::Type` message).
+Finally, it also provides methods that can determine a `Type` that two different
+types can be cast to.
+
+### More on the `Type` Interface
+
+In addition to methods that allow inspection of a type's properties (e.g. those
+listed above), the Type class defines an interface with useful functionality
+common to all types:
+
+  * Serialization (of the type itself) - the `getProto()` method produces a
+    protobuf message that can be serialized and deserialized and later passed to
+    the TypeFactory to get back the same type.
+  * Relationship to other types - `equals()` determines if two types are exactly
+    the same, while `isCoercibleFrom()` determines if it is possible to convert
+    from another type to a given type (e.g. with a `CAST`), and
+    `isSafelyCoercibleFrom()` determines if such a conversion can always be done
+    without loss of precision.
+  * Printing to human-readable format - `printValueToString()` and
+    `printValueToFile()` can print out values of a type (see `TypedValue` below)
+    in human-readable format.
+  * Parsing from human-readable format - Similarly, `parseValueFromString()`
+    produces a `TypedValue` that is parsed from a string in human-readable
+    format.
+  * Making values - `makeValue()` creates a `TypedValue` from a bare pointer to
+    a value's representation in storage. For nullable types, `makeNullValue()`
+    makes a NULL value, and for numeric types, `makeZeroValue()` makes a zero
+    of that type.
+  * Coercing values - `coerceValue()` takes a value of another type and converts
+    it to the given type (e.g. as part of a `CAST`).
+
+## The TypedValue Class
+
+An individual typed value in Quickstep is represented by an instance of the
+`TypedValue` class. TypedValues can be created by methods of the `Type` class,
+by operation and expression classes that operate on values, or simply by calling
+one of several constructors provided in the class itself for convenience.
+TypedValues have C++ value semantics (i.e. they are copyable, assignable, and
+movable). A TypedValue may own its own data, or it may be a lightweight
+reference to data that is stored elsewhere in memory (this can be checked with
+`isReference()`, and any reference can be upgraded to own its own data copy by
+calling `ensureNotReference()`).
+
+Here are some of the things you can do with a TypedValue:
+
+  * NULL checks - calling `isNull()` determines if the TypedValue represents a
+    NULL. Several methods of TypedValue are usable only for non-NULL values, so
+    it is often important to check this first if in doubt.
+  * Access to underlying data - `getDataPtr()` returns an untyped `void*`
+    pointer to the underlying data, and `getDataSize()` returns the size of the
+    underlying data in bytes. Depending on the type of the value, the templated
+    method `getLiteral()` can be used to get the underlying data as a literal
+    scalar, or `getAsciiStringLength()` can be used to get the string length of
+    a `CHAR(X)` or `VARCHAR(X)` without counting null-terminators.
+  * Hashing - `getHash()` returns a hash of the value, which is suitable for
+    use in the HashTables of the [storage system](../storage), or in generic
+    hash-based C++ containers. `fastEqualCheck()` is provided to quickly check
+    whether two TypedValues of the same type (e.g. in the same hash table) are
+    actually equal.
+  * Serialization/Deserialization - `getProto()` serializes a TypedValue to a
+    `serialization::TypedValue` protobuf. The static method `ProtoIsValid()`
+    checks whether a serialized TypedValue is valid, and
+    `ReconstructFromProto()` rebuilds a TypedValue from its serialized form.