You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@kudu.apache.org by jd...@apache.org on 2017/04/26 17:52:54 UTC
kudu git commit: [docs] Refresh and augment the known issues
Repository: kudu
Updated Branches:
refs/heads/master 21f5ae0d5 -> df9da6a84
[docs] Refresh and augment the known issues
We've learned a lot about Kudu since people have started using it.
I've gathered in this patch what I think should be the new recommendations
we make to users.
Change-Id: I5d8e817a402f419aeb5ed9d700a8207ad9f91e4d
Reviewed-on: http://gerrit.cloudera.org:8080/6699
Tested-by: Kudu Jenkins
Reviewed-by: Dan Burkert <da...@apache.org>
Project: http://git-wip-us.apache.org/repos/asf/kudu/repo
Commit: http://git-wip-us.apache.org/repos/asf/kudu/commit/df9da6a8
Tree: http://git-wip-us.apache.org/repos/asf/kudu/tree/df9da6a8
Diff: http://git-wip-us.apache.org/repos/asf/kudu/diff/df9da6a8
Branch: refs/heads/master
Commit: df9da6a84381795e2334bf21d8337f6f72333571
Parents: 21f5ae0
Author: Jean-Daniel Cryans <jd...@apache.org>
Authored: Wed Apr 19 16:50:36 2017 -0700
Committer: Jean-Daniel Cryans <jd...@apache.org>
Committed: Wed Apr 26 17:52:39 2017 +0000
----------------------------------------------------------------------
docs/developing.adoc | 9 ++-
docs/installation.adoc | 1 +
docs/known_issues.adoc | 113 +++++++++++++++++++++++++++------
docs/kudu_impala_integration.adoc | 5 ++
4 files changed, 107 insertions(+), 21 deletions(-)
----------------------------------------------------------------------
http://git-wip-us.apache.org/repos/asf/kudu/blob/df9da6a8/docs/developing.adoc
----------------------------------------------------------------------
diff --git a/docs/developing.adoc b/docs/developing.adoc
index ba9561b..d1cc0ec 100644
--- a/docs/developing.adoc
+++ b/docs/developing.adoc
@@ -162,12 +162,15 @@ kuduContext.deleteTable("unwanted_table")
- Kudu tables with a name containing upper case or non-ascii characters must be
assigned an alternate name when registered as a temporary table.
- Kudu tables with a column name containing upper case or non-ascii characters
- may not be used with SparkSQL. Columns may be renamed in Kudu to work around
+ may not be used with SparkSQL. Columns may be renamed in Kudu to work around
this issue.
-- `NULL`, `NOT NULL`, `<>`, `OR`, `LIKE`, and `IN` predicates are not pushed to
- Kudu, and instead will be evaluated by the Spark task.
+- `<>` and `OR` predicates are not pushed to Kudu, and instead will be evaluated
+ by the Spark task. Only `LIKE` predicates with a suffix wildcard are pushed to
+ Kudu, meaning that `LIKE "FOO%"` is pushed down but `LIKE "FOO%BAR"` isn't.
- Kudu does not support all types supported by Spark SQL, such as `Date`,
`Decimal` and complex types.
+- Kudu tables may only be registered as temporary tables in SparkSQL.
+ Kudu tables may not be queried using HiveContext.
== Kudu Python Client
http://git-wip-us.apache.org/repos/asf/kudu/blob/df9da6a8/docs/installation.adoc
----------------------------------------------------------------------
diff --git a/docs/installation.adoc b/docs/installation.adoc
index 8edc79f..2469c12 100644
--- a/docs/installation.adoc
+++ b/docs/installation.adoc
@@ -53,6 +53,7 @@ Linux::
link:troubleshooting.html#req_hole_punching[troubleshooting hole punching] for more
information.
- ntp.
+ - xfs or ext4 formatted drives.
macOS::
- OS X 10.10 Yosemite, OS X 10.11 El Capitan, or macOS Sierra.
- Prebuilt macOS packages are not provided.
http://git-wip-us.apache.org/repos/asf/kudu/blob/df9da6a8/docs/known_issues.adoc
----------------------------------------------------------------------
diff --git a/docs/known_issues.adoc b/docs/known_issues.adoc
index ce73a51..8828f4b 100644
--- a/docs/known_issues.adoc
+++ b/docs/known_issues.adoc
@@ -27,15 +27,15 @@
:sectlinks:
:experimental:
-== Schema and Usage Limitations
-* Kudu is primarily designed for analytic use cases. You are likely to encounter issues if
- a single row contains multiple kilobytes of data.
+== Schema
-* The columns which make up the primary key must be listed first in the schema.
+=== Primary keys
* The primary key may not be changed after the table is created.
You must drop and recreate a table to select a new primary key.
+* The columns which make up the primary key must be listed first in the schema.
+
* The primary key of a row may not be modified using the `UPDATE` functionality.
To modify a row's primary key, the row must be deleted and re-inserted with
the modified key. Such a modification is non-atomic.
@@ -44,13 +44,50 @@
primary key definition. Additionally, all columns that are part of a primary
key definition must be `NOT NULL`.
-* Type and nullability of existing columns cannot be changed by altering the table.
+* Auto-generated primary keys are not supported.
+
+* Cells making up a composite primary key are limited to a total of 16KB after the internal
+ composite-key encoding done by Kudu.
+
+=== Columns
+
+* TIMESTAMP, DECIMAL, CHAR, VARCHAR, DATE, and complex types like ARRAY are not supported.
+
+* Type, nullability, compression, and encoding of existing columns cannot be changed by altering the table.
+
+* Tables can have a maximum of 300 columns.
+
+=== Tables
+
+* Tables must have an odd number of replicas, with a maximum of 7.
+
+* Replication factor (set at table creation time) cannot be changed.
+
+=== Cells (individual values)
+
+* Cells cannot be larger than 64KB.
+
+=== Other usage limitations
+
+* Kudu is primarily designed for analytic use cases. You are likely to encounter issues if
+ a single row contains multiple kilobytes of data.
+
+* Secondary indexes are not supported.
+
+* Multi-row transactions are not supported.
+
+* Relational features, like foreign keys, are not supported.
+
+* Identifiers such as column and table names are restricted to be valid UTF-8 strings.
+ Additionally, a maximum length of 256 characters is enforced.
* Dropping a column does not immediately reclaim space. Compaction must run first.
-There is no way to run compaction manually, but dropping the table will reclaim the
-space immediately.
+
+* There is no way to run compaction manually, but dropping the table will reclaim the
+ space immediately.
== Partitioning Limitations
+
* Tables must be manually pre-split into tablets using simple or compound primary
keys. Automatic splitting is not yet possible. Range partitions may be added
or dropped after a table has been created. See
@@ -60,21 +97,61 @@ space immediately.
create a new table with the new partitioning and insert the contents of the old
table.
-== Replication and Backup Limitations
-* Kudu does not currently include any built-in features for backup and restore.
- Users are encouraged to use tools such as Spark or Impala to export or import
- tables as necessary.
+* Tablets that lose a majority of replicas (such as 1 left out of 3) require manual
+ intervention to be repaired.
+
+== Cluster management
+
+* Rack awareness is not supported.
+
+* Multi-datacenter is not supported.
+
+* Rolling restart is not supported.
+
+== Server management
+
+* Production deployments should configure a least 4GB of memory for tablet servers,
+ and ideally more than 10GB.
+
+* Write ahead logs (WAL) can only be stored on one disk.
+
+* Disk failures are not tolerated and tablets servers will crash as soon as one is detected.
+
+* Failed disks with unrecoverable data require the formatting of all the Kudu data for
+ that tablet server before it can be started again.
+
+* Data directories cannot be added/removed; all must be reformatted to change the set
+ of directories.
-== Impala Limitations
+* Tablet servers cannot be gracefully decommissioned.
-* Updates, inserts, and deletes via Impala are non-transactional. If a query
- fails part of the way through, its partial effects will not be rolled back.
+* Tablet servers can\u2019t change address/port.
-* No timestamp and decimal type support.
+* Kudu has a hard requirement on having up-to-date NTP. Kudu masters and tablet servers
+ will crash when out of sync.
-* The maximum parallelism of a single query is limited to the number of tablets
- in a table. For good analytic performance, aim for 10 or more tablets per host
- or use large tables.
+* Kudu releases are only tested with NTP. Other time synchronization providers like Chrony
+ may or may not work.
+
+== Scale
+
+* Recommended maximum number of tablet servers is 100.
+
+* Recommended maximum number of masters is 3.
+
+* Recommended maximum amount of stored data, post-replication and post-compression,
+ per tablet server is 4TB.
+
+* Recommended maximum number of tablets per tablet server is 1000, post-replication.
+
+* Maximum number of tablets per table for each tablet server is 60, post-replication,
+ at table-creation time.
+
+== Replication and Backup Limitations
+
+* Kudu does not currently include any built-in features for backup and restore.
+ Users are encouraged to use tools such as Spark or Impala to export or import
+ tables as necessary.
== Security Limitations
http://git-wip-us.apache.org/repos/asf/kudu/blob/df9da6a8/docs/kudu_impala_integration.adoc
----------------------------------------------------------------------
diff --git a/docs/kudu_impala_integration.adoc b/docs/kudu_impala_integration.adoc
index 4faca85..8d2d510 100755
--- a/docs/kudu_impala_integration.adoc
+++ b/docs/kudu_impala_integration.adoc
@@ -743,3 +743,8 @@ The examples above have only explored a fraction of what you can do with Impala
- `NULL`, `NOT NULL`, `!=`, and `LIKE` predicates are not pushed to Kudu, and
instead will be evaluated by the Impala scan node. This may decrease performance
relative to other types of predicates.
+- Updates, inserts, and deletes via Impala are non-transactional. If a query
+ fails part of the way through, its partial effects will not be rolled back.
+- The maximum parallelism of a single query is limited to the number of tablets
+ in a table. For good analytic performance, aim for 10 or more tablets per host
+ or use large tables.