You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@kudu.apache.org by gr...@apache.org on 2020/04/21 17:02:56 UTC
[kudu] branch branch-1.12.x updated: [docs] Update schema
documentation
This is an automated email from the ASF dual-hosted git repository.
granthenke pushed a commit to branch branch-1.12.x
in repository https://gitbox.apache.org/repos/asf/kudu.git
The following commit(s) were added to refs/heads/branch-1.12.x by this push:
new e345619 [docs] Update schema documentation
e345619 is described below
commit e34561901442f6c73d2477a95435309e64e1aacf
Author: Grant Henke <gr...@apache.org>
AuthorDate: Sun Apr 19 10:39:29 2020 -0500
[docs] Update schema documentation
This patch adds more details on the VARCHAR type to the schema
docs. It also adds the DATE type and includes a small update to
remove the explicit Hbase call out.
Change-Id: I681e0af517b08c348420b3b217c393797717d3fc
Reviewed-on: http://gerrit.cloudera.org:8080/15757
Tested-by: Kudu Jenkins
Reviewed-by: Volodymyr Verovkin <ve...@cloudera.com>
Reviewed-by: Hao Hao <ha...@cloudera.com>
(cherry picked from commit 386cc74c1a15d10e85989b67e92cfe3d6b134f44)
Reviewed-on: http://gerrit.cloudera.org:8080/15770
---
docs/schema_design.adoc | 39 +++++++++++++++++++++++++++++----------
1 file changed, 29 insertions(+), 10 deletions(-)
diff --git a/docs/schema_design.adoc b/docs/schema_design.adoc
index 9b05991..0c1e0a6 100644
--- a/docs/schema_design.adoc
+++ b/docs/schema_design.adoc
@@ -72,13 +72,14 @@ column types include:
* 16-bit signed integer
* 32-bit signed integer
* 64-bit signed integer
+* date (32-bit days since the Unix epoch)
* unixtime_micros (64-bit microseconds since the Unix epoch)
* single-precision (32-bit) IEEE-754 floating-point number
* double-precision (64-bit) IEEE-754 floating-point number
* decimal (see <<decimal>> for details)
+* varchar (see <<varchar>> for details)
* UTF-8 encoded string (up to 64KB uncompressed)
* binary (up to 64KB uncompressed)
-* VARCHAR type with configurable maximum length (up to 64KB uncompressed)
Kudu takes advantage of strongly-typed columns and a columnar on-disk storage
format to provide efficient encoding and serialization. To make the most of
@@ -90,9 +91,9 @@ be specified on a per-column basis.
[[no_version_column]]
[IMPORTANT]
.No Version or Timestamp Column
-Unlike HBase, Kudu does not provide a version or timestamp column to track changes
-to a row. If version or timestamp information is needed, the schema should include
-an explicit version or timestamp column.
+Kudu does not provide a version or timestamp column to track changes to a row.
+If version or timestamp information is needed, the schema should include an
+explicit version or timestamp column.
[[decimal]]
=== Decimal Type
@@ -136,6 +137,24 @@ Before encoding and compression:
NOTE: The precision and scale of `decimal` columns cannot be changed by altering
the table.
+[[varchar]]
+=== Varchar Type
+
+The `varchar` type is a UTF-8 encoded string (up to 64KB uncompressed) with a
+fixed maximum character length. This type is especially useful when migrating
+from or integrating with legacy systems that support the `varchar` type.
+If a maximum character length is not required the `string` type should be
+used instead.
+
+The `varchar` type is a parameterized type that takes a length attribute.
+
+*Length* represents the maximum number of UTF-8 characters allowed. Values
+with characters greater than the limit will be truncated. This value must
+be between 1 and 65535 and has no default. Note that some other systems
+may represent the length limit in bytes instead of characters. That means
+that Kudu may be able to represent longer values in the case of multi-byte
+UTF-8 characters.
+
[[encoding]]
=== Column Encoding
@@ -145,12 +164,12 @@ of the column.
.Encoding Types
[options="header"]
|===
-| Column Type | Encoding | Default
-| int8, int16, int32 | plain, bitshuffle, run length | bitshuffle
-| int64, unixtime_micros | plain, bitshuffle, run length | bitshuffle
-| float, double, decimal | plain, bitshuffle | bitshuffle
-| bool | plain, run length | run length
-| string, binary, varchar | plain, prefix, dictionary | dictionary
+| Column Type | Encoding | Default
+| int8, int16, int32, int64 | plain, bitshuffle, run length | bitshuffle
+| date, unixtime_micros | plain, bitshuffle, run length | bitshuffle
+| float, double, decimal | plain, bitshuffle | bitshuffle
+| bool | plain, run length | run length
+| string, varchar, binary | plain, prefix, dictionary | dictionary
|===
[[plain]]