You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@kudu.apache.org by jd...@apache.org on 2017/04/26 17:52:54 UTC
kudu git commit: [docs] Refresh and augment the known issues

Repository: kudu
Updated Branches:
  refs/heads/master 21f5ae0d5 -> df9da6a84


[docs] Refresh and augment the known issues

We've learned a lot about Kudu since people have started using it.
I've gathered in this patch what I think should be the new recommendations
we make to users.

Change-Id: I5d8e817a402f419aeb5ed9d700a8207ad9f91e4d
Reviewed-on: http://gerrit.cloudera.org:8080/6699
Tested-by: Kudu Jenkins
Reviewed-by: Dan Burkert <da...@apache.org>


Project: http://git-wip-us.apache.org/repos/asf/kudu/repo
Commit: http://git-wip-us.apache.org/repos/asf/kudu/commit/df9da6a8
Tree: http://git-wip-us.apache.org/repos/asf/kudu/tree/df9da6a8
Diff: http://git-wip-us.apache.org/repos/asf/kudu/diff/df9da6a8

Branch: refs/heads/master
Commit: df9da6a84381795e2334bf21d8337f6f72333571
Parents: 21f5ae0
Author: Jean-Daniel Cryans <jd...@apache.org>
Authored: Wed Apr 19 16:50:36 2017 -0700
Committer: Jean-Daniel Cryans <jd...@apache.org>
Committed: Wed Apr 26 17:52:39 2017 +0000

----------------------------------------------------------------------
 docs/developing.adoc              |   9 ++-
 docs/installation.adoc            |   1 +
 docs/known_issues.adoc            | 113 +++++++++++++++++++++++++++------
 docs/kudu_impala_integration.adoc |   5 ++
 4 files changed, 107 insertions(+), 21 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/kudu/blob/df9da6a8/docs/developing.adoc
----------------------------------------------------------------------
diff --git a/docs/developing.adoc b/docs/developing.adoc
index ba9561b..d1cc0ec 100644
--- a/docs/developing.adoc
+++ b/docs/developing.adoc
@@ -162,12 +162,15 @@ kuduContext.deleteTable("unwanted_table")
 - Kudu tables with a name containing upper case or non-ascii characters must be
   assigned an alternate name when registered as a temporary table.
 - Kudu tables with a column name containing upper case or non-ascii characters
-  may not be used with SparkSQL. Columns may be renamed in Kudu to work around 
+  may not be used with SparkSQL. Columns may be renamed in Kudu to work around
   this issue.
-- `NULL`, `NOT NULL`, `<>`, `OR`, `LIKE`, and `IN` predicates are not pushed to
-  Kudu, and instead will be evaluated by the Spark task.
+- `<>` and `OR` predicates are not pushed to Kudu, and instead will be evaluated
+  by the Spark task. Only `LIKE` predicates with a suffix wildcard are pushed to
+  Kudu, meaning that `LIKE "FOO%"` is pushed down but `LIKE "FOO%BAR"` isn't.
 - Kudu does not support all types supported by Spark SQL, such as `Date`,
   `Decimal` and complex types.
+- Kudu tables may only be registered as temporary tables in SparkSQL.
+  Kudu tables may not be queried using HiveContext.
 
 
 == Kudu Python Client

http://git-wip-us.apache.org/repos/asf/kudu/blob/df9da6a8/docs/installation.adoc
----------------------------------------------------------------------
diff --git a/docs/installation.adoc b/docs/installation.adoc
index 8edc79f..2469c12 100644
--- a/docs/installation.adoc
+++ b/docs/installation.adoc
@@ -53,6 +53,7 @@ Linux::
       link:troubleshooting.html#req_hole_punching[troubleshooting hole punching] for more
       information.
     - ntp.
+    - xfs or ext4 formatted drives.
 macOS::
     - OS X 10.10 Yosemite, OS X 10.11 El Capitan, or macOS Sierra.
     - Prebuilt macOS packages are not provided.

http://git-wip-us.apache.org/repos/asf/kudu/blob/df9da6a8/docs/known_issues.adoc
----------------------------------------------------------------------
diff --git a/docs/known_issues.adoc b/docs/known_issues.adoc
index ce73a51..8828f4b 100644
--- a/docs/known_issues.adoc
+++ b/docs/known_issues.adoc
@@ -27,15 +27,15 @@
 :sectlinks:
 :experimental:
 
-== Schema and Usage Limitations
-* Kudu is primarily designed for analytic use cases. You are likely to encounter issues if
-  a single row contains multiple kilobytes of data.
+== Schema
 
-* The columns which make up the primary key must be listed first in the schema.
+=== Primary keys
 
 * The primary key may not be changed after the table is created.
   You must drop and recreate a table to select a new primary key.
 
+* The columns which make up the primary key must be listed first in the schema.
+
 * The primary key of a row may not be modified using the `UPDATE` functionality.
   To modify a row's primary key, the row must be deleted and re-inserted with
   the modified key. Such a modification is non-atomic.
@@ -44,13 +44,50 @@
   primary key definition. Additionally, all columns that are part of a primary
   key definition must be `NOT NULL`.
 
-* Type and nullability of existing columns cannot be changed by altering the table.
+* Auto-generated primary keys are not supported.
+
+* Cells making up a composite primary key are limited to a total of 16KB after the internal
+  composite-key encoding done by Kudu.
+
+=== Columns
+
+* TIMESTAMP, DECIMAL, CHAR, VARCHAR, DATE, and complex types like ARRAY are not supported.
+
+* Type, nullability, compression, and encoding of existing columns cannot be changed by altering the table.
+
+* Tables can have a maximum of 300 columns.
+
+=== Tables
+
+* Tables must have an odd number of replicas, with a maximum of 7.
+
+* Replication factor (set at table creation time) cannot be changed.
+
+=== Cells (individual values)
+
+* Cells cannot be larger than 64KB.
+
+=== Other usage limitations
+
+* Kudu is primarily designed for analytic use cases. You are likely to encounter issues if
+  a single row contains multiple kilobytes of data.
+
+* Secondary indexes are not supported.
+
+* Multi-row transactions are not supported.
+
+* Relational features, like foreign keys, are not supported.
+
+* Identifiers such as column and table names are restricted to be valid UTF-8 strings.
+  Additionally, a maximum length of 256 characters is enforced.
 
 * Dropping a column does not immediately reclaim space. Compaction must run first.
-There is no way to run compaction manually, but dropping the table will reclaim the
-space immediately.
+
+* There is no way to run compaction manually, but dropping the table will reclaim the
+  space immediately.
 
 == Partitioning Limitations
+
 * Tables must be manually pre-split into tablets using simple or compound primary
   keys. Automatic splitting is not yet possible. Range partitions may be added
   or dropped after a table has been created. See
@@ -60,21 +97,61 @@ space immediately.
   create a new table with the new partitioning and insert the contents of the old
   table.
 
-== Replication and Backup Limitations
-* Kudu does not currently include any built-in features for backup and restore.
-  Users are encouraged to use tools such as Spark or Impala to export or import
-  tables as necessary.
+* Tablets that lose a majority of replicas (such as 1 left out of 3) require manual
+  intervention to be repaired.
+
+== Cluster management
+
+* Rack awareness is not supported.
+
+* Multi-datacenter is not supported.
+
+* Rolling restart is not supported.
+
+== Server management
+
+* Production deployments should configure a least 4GB of memory for tablet servers,
+  and ideally more than 10GB.
+
+* Write ahead logs (WAL) can only be stored on one disk.
+
+* Disk failures are not tolerated and tablets servers will crash as soon as one is detected.
+
+* Failed disks with unrecoverable data require the formatting of all the Kudu data for
+  that tablet server before it can be started again.
+
+* Data directories cannot be added/removed; all must be reformatted to change the set
+  of directories.
 
-== Impala Limitations
+* Tablet servers cannot be gracefully decommissioned.
 
-* Updates, inserts, and deletes via Impala are non-transactional. If a query
-  fails part of the way through, its partial effects will not be rolled back.
+* Tablet servers can\u2019t change address/port.
 
-* No timestamp and decimal type support.
+* Kudu has a hard requirement on having up-to-date NTP. Kudu masters and tablet servers
+  will crash when out of sync.
 
-* The maximum parallelism of a single query is limited to the number of tablets
-  in a table. For good analytic performance, aim for 10 or more tablets per host
-  or use large tables.
+* Kudu releases are only tested with NTP. Other time synchronization providers like Chrony
+  may or may not work.
+
+== Scale
+
+* Recommended maximum number of tablet servers is 100.
+
+* Recommended maximum number of masters is 3.
+
+* Recommended maximum amount of stored data, post-replication and post-compression,
+  per tablet server is 4TB.
+
+* Recommended maximum number of tablets per tablet server is 1000, post-replication.
+
+* Maximum number of tablets per table for each tablet server is 60, post-replication,
+  at table-creation time.
+
+== Replication and Backup Limitations
+
+* Kudu does not currently include any built-in features for backup and restore.
+  Users are encouraged to use tools such as Spark or Impala to export or import
+  tables as necessary.
 
 == Security Limitations
 

http://git-wip-us.apache.org/repos/asf/kudu/blob/df9da6a8/docs/kudu_impala_integration.adoc
----------------------------------------------------------------------
diff --git a/docs/kudu_impala_integration.adoc b/docs/kudu_impala_integration.adoc
index 4faca85..8d2d510 100755
--- a/docs/kudu_impala_integration.adoc
+++ b/docs/kudu_impala_integration.adoc
@@ -743,3 +743,8 @@ The examples above have only explored a fraction of what you can do with Impala
 - `NULL`, `NOT NULL`, `!=`, and `LIKE` predicates are not pushed to Kudu, and
   instead will be evaluated by the Impala scan node. This may decrease performance
   relative to other types of predicates.
+- Updates, inserts, and deletes via Impala are non-transactional. If a query
+  fails part of the way through, its partial effects will not be rolled back.
+- The maximum parallelism of a single query is limited to the number of tablets
+  in a table. For good analytic performance, aim for 10 or more tablets per host
+  or use large tables.