You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@kudu.apache.org by al...@apache.org on 2016/09/26 15:59:39 UTC

[5/6] kudu git commit: Document Impala and Spark integration known issues & limitations

Document Impala and Spark integration known issues & limitations

Change-Id: I993a09a00f5ab0049fec95e967abc1740b44dc8d
Reviewed-on: http://gerrit.cloudera.org:8080/4443
Tested-by: Dan Burkert <da...@cloudera.com>
Reviewed-by: Jean-Daniel Cryans <jd...@apache.org>


Project: http://git-wip-us.apache.org/repos/asf/kudu/repo
Commit: http://git-wip-us.apache.org/repos/asf/kudu/commit/92f7c191
Tree: http://git-wip-us.apache.org/repos/asf/kudu/tree/92f7c191
Diff: http://git-wip-us.apache.org/repos/asf/kudu/diff/92f7c191

Branch: refs/heads/master
Commit: 92f7c1914ab29061d324a9a38aa5bb05ca598d47
Parents: 4824f64
Author: Dan Burkert <da...@cloudera.com>
Authored: Fri Sep 16 14:16:36 2016 -0700
Committer: Will Berkeley <wd...@gmail.com>
Committed: Sat Sep 24 16:14:40 2016 +0000

----------------------------------------------------------------------
 docs/developing.adoc              | 14 ++++++++++++++
 docs/kudu_impala_integration.adoc | 22 ++++++++++++++++++++++
 2 files changed, 36 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/kudu/blob/92f7c191/docs/developing.adoc
----------------------------------------------------------------------
diff --git a/docs/developing.adoc b/docs/developing.adoc
index b4d8604..8833369 100644
--- a/docs/developing.adoc
+++ b/docs/developing.adoc
@@ -151,6 +151,20 @@ kuduContext.tableExists("another_table")
 kuduContext.deleteTable("unwanted_table")
 ----
 
+=== Spark Integration Known Issues and Limitations
+
+- The Kudu Spark integration is tested and developed against Spark 1.6 and Scala
+  2.10.
+- Kudu tables with a name containing upper case or non-ascii characters must be
+  assigned an alternate name when registered as a temporary table.
+- Kudu tables with a column name containing upper case or non-ascii characters
+  may not be used with SparkSQL. Non-primary key columns may be renamed in Kudu
+  to work around this issue.
+- `NULL`, `NOT NULL`, `<>`, `OR`, `LIKE`, and `IN` predicates are not pushed to
+  Kudu, and instead will be evaluated by the Spark task.
+- Kudu does not support all types supported by Spark SQL, such as `Date`,
+  `Decimal` and complex types.
+
 == Integration with MapReduce, YARN, and Other Frameworks
 
 Kudu was designed to integrate with MapReduce, YARN, Spark, and other frameworks in

http://git-wip-us.apache.org/repos/asf/kudu/blob/92f7c191/docs/kudu_impala_integration.adoc
----------------------------------------------------------------------
diff --git a/docs/kudu_impala_integration.adoc b/docs/kudu_impala_integration.adoc
index e2fe89c..ec86c18 100755
--- a/docs/kudu_impala_integration.adoc
+++ b/docs/kudu_impala_integration.adoc
@@ -1083,3 +1083,25 @@ The examples above have only explored a fraction of what you can do with Impala
 - View the link:http://www.cloudera.com/content/www/en-us/documentation/enterprise/latest/topics/impala_langref.html[Impala SQL reference].
 - Read about Impala internals or learn how to contribute to Impala on the link:https://github.com/cloudera/Impala/wiki[Impala Wiki].
 - Read about the native link:installation.html#view_api[Kudu APIs].
+
+=== Known Issues and Limitations
+
+- Kudu tables with a name containing upper case or non-ascii characters must be
+  assigned an alternate name when used as an external table in Impala.
+- Kudu tables with a column name containing upper case or non-ascii characters
+  may not be used as an external table in Impala. Non-primary key columns may be
+  renamed in Kudu to work around this issue.
+- When creating a Kudu table, the `CREATE TABLE` statement must include the
+  primary key columns before other columns, in primary key order.
+- Kudu tables containing `UNIXTIME_MICROS`-typed columns may not be used as an
+  external table in Impala.
+- Impala can not create Kudu tables with `TIMESTAMP` or nested-typed columns.
+- Impala can not update values in primary key columns.
+- `NULL`, `NOT NULL`, `!=`, and `IN` predicates are not pushed to Kudu, and
+  instead will be evaluated by the Impala scan node.
+- Impala can not specify column encoding or compression during Kudu table
+  creation, or alter a columns encoding or compression.
+- Impala can not create Kudu tables with bounded range partitions, and can not
+  alter a table to add or remove range partitions.
+- When bulk writing to a Kudu table, performance may be improved by setting the
+  `batch_size` option (see <<kudu_impala_insert_bulk>>).