You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by rx...@apache.org on 2016/11/30 04:06:42 UTC

spark git commit: [SPARK-18145] Update documentation for hive partition management in 2.1

Repository: spark
Updated Branches:
  refs/heads/master af9789a4f -> 489845f3a


[SPARK-18145] Update documentation for hive partition management in 2.1

## What changes were proposed in this pull request?

This documents the partition handling changes for Spark 2.1 and how to migrate existing tables.

## How was this patch tested?

Built docs locally.

rxin

Author: Eric Liang <ek...@databricks.com>

Closes #16074 from ericl/spark-18145.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/489845f3
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/489845f3
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/489845f3

Branch: refs/heads/master
Commit: 489845f3a0e2a3555b96b6f3dbb984c783b20d97
Parents: af9789a
Author: Eric Liang <ek...@databricks.com>
Authored: Tue Nov 29 20:06:39 2016 -0800
Committer: Reynold Xin <rx...@databricks.com>
Committed: Tue Nov 29 20:06:39 2016 -0800

----------------------------------------------------------------------
 docs/sql-programming-guide.md | 9 +++++++++
 1 file changed, 9 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/489845f3/docs/sql-programming-guide.md
----------------------------------------------------------------------
diff --git a/docs/sql-programming-guide.md b/docs/sql-programming-guide.md
index 3adbe23..c7ad06c 100644
--- a/docs/sql-programming-guide.md
+++ b/docs/sql-programming-guide.md
@@ -1331,6 +1331,15 @@ options.
 
 # Migration Guide
 
+## Upgrading From Spark SQL 2.0 to 2.1
+
+ - Datasource tables now store partition metadata in the Hive metastore. This means that Hive DDLs such as `ALTER TABLE PARTITION ... SET LOCATION` are now available for tables created with the Datasource API.
+    - Legacy datasource tables can be migrated to this format via the `MSCK REPAIR TABLE` command. Migrating legacy tables is recommended to take advantage of Hive DDL support and improved planning performance.
+    - To determine if a table has been migrated, look for the `PartitionProvider: Catalog` attribute when issuing `DESCRIBE FORMATTED` on the table.
+ - Changes to `INSERT OVERWRITE TABLE ... PARTITION ...` behavior for Datasource tables.
+    - In prior Spark versions `INSERT OVERWRITE` overwrote the entire Datasource table, even when given a partition specification. Now only partitions matching the specification are overwritten.
+    - Note that this still differs from the behavior of Hive tables, which is to overwrite only partitions overlapping with newly inserted data.
+
 ## Upgrading From Spark SQL 1.6 to 2.0
 
  - `SparkSession` is now the new entry point of Spark that replaces the old `SQLContext` and


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org