You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@iceberg.apache.org by pv...@apache.org on 2022/12/06 11:59:36 UTC

[iceberg] branch master updated: Docs: Update Iceberg Hive documentation (#6337)

This is an automated email from the ASF dual-hosted git repository.

pvary pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/iceberg.git


The following commit(s) were added to refs/heads/master by this push:
     new 5b15053784 Docs: Update Iceberg Hive documentation (#6337)
5b15053784 is described below

commit 5b1505378414295476a73e07aca65fdb1a29da7d
Author: InvisibleProgrammer <zs...@gmail.com>
AuthorDate: Tue Dec 6 12:59:29 2022 +0100

    Docs: Update Iceberg Hive documentation (#6337)
---
 docs/hive.md | 52 ++++++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 50 insertions(+), 2 deletions(-)

diff --git a/docs/hive.md b/docs/hive.md
index 7e42c22a70..2e9eb21748 100644
--- a/docs/hive.md
+++ b/docs/hive.md
@@ -38,6 +38,16 @@ Iceberg compatibility with Hive 2.x and Hive 3.1.2/3 supports the following feat
 DML operations work only with MapReduce execution engine.
 {{< /hint >}}
 
+With Hive version 4.0.0-alpha-2 and above,
+the Iceberg integration when using HiveCatalog supports the following additional features:
+
+* Altering a table with expiring snapshots.
+* Create a table like an existing table (CTLT table)
+* Support adding parquet compression type via Table properties [Compression types](https://spark.apache.org/docs/2.4.3/sql-data-sources-parquet.html#configuration)
+* Altering a table metadata location
+* Supporting table rollback
+* Honours sort orders on existing tables when writing a table [Sort orders specification](https://iceberg.apache.org/spec/#sort-orders)
+
 With Hive version 4.0.0-alpha-1 and above,
 the Iceberg integration when using HiveCatalog supports the following additional features:
 
@@ -243,7 +253,7 @@ The result is:
 | j                                  | IDENTITY       | NULL
 
 You can create Iceberg partitions using the following Iceberg partition specification syntax
-(supported only in Hive 4.0.0-alpha-1):
+(supported only from Hive 4.0.0-alpha-1):
 
 ```sql
 CREATE TABLE x (i int, ts timestamp) PARTITIONED BY SPEC (month(ts), bucket(2, i)) STORED AS ICEBERG;
@@ -286,6 +296,12 @@ CREATE TABLE target PARTITIONED BY SPEC (year(year_field), identity_field) STORE
     SELECT * FROM source;
 ```
 
+### CREATE TABLE LIKE TABLE
+
+```sql
+CREATE TABLE target LIKE source STORED BY ICEBERG;
+```
+ 
 ### CREATE EXTERNAL TABLE overlaying an existing Iceberg table
 
 The `CREATE EXTERNAL TABLE` command is used to overlay a Hive table "on top of" an existing Iceberg table. Iceberg
@@ -432,6 +448,15 @@ Tables can be dropped using the `DROP TABLE` command:
 DROP TABLE [IF EXISTS] table_name [PURGE];
 ```
 
+### METADATA LOCATION
+
+The metadata location (snapshot location) only can be changed if the new path contains the exact same metadata json. 
+It can be done only after migrating the table to Iceberg, the two operation cannot be done in one step. 
+
+```sql
+ALTER TABLE t set TBLPROPERTIES ('metadata_location'='<path>/hivemetadata/00003-a1ada2b8-fc86-4b5b-8c91-400b6b46d0f2.metadata.json');
+```
+
 ## DML Commands
 
 ### SELECT
@@ -508,7 +533,15 @@ SELECT * FROM table_a FOR SYSTEM_TIME AS OF '2021-08-09 10:35:57';
 SELECT * FROM table_a FOR SYSTEM_VERSION AS OF 1234567;
 ```
 
-## Type compatibility
+You can expire snapshots of an Iceberg table using an ALTER TABLE query from Hive. You should periodically expire snapshots to delete data files that is no longer needed, and reduce the size of table metadata.
+
+Each write to an Iceberg table from Hive creates a new snapshot, or version, of a table. Snapshots can be used for time-travel queries, or the table can be rolled back to any valid snapshot. Snapshots accumulate until they are expired by the expire_snapshots operation.
+Enter a query to expire snapshots having the following timestamp: `2021-12-09 05:39:18.689000000`
+```sql
+ALTER TABLE test_table EXECUTE expire_snapshots('2021-12-09 05:39:18.689000000');
+```
+
+### Type compatibility
 
 Hive and Iceberg support different set of types. Iceberg can perform type conversion automatically, but not for all
 combinations, so you may want to understand the type conversion in Iceberg in prior to design the types of columns in
@@ -546,3 +579,18 @@ creating Iceberg table and writing to Iceberg table via Hive.
 | list             | list                    |       |
 | map              | map                     |       |
 | union            |                         | not supported |
+
+### Table rollback
+
+Rolling back iceberg table's data to the state at an older table snapshot.
+
+Rollback to the last snapshot before a specific timestamp
+
+```sql
+ALTER TABLE ice_t EXECUTE ROLLBACK('2022-05-12 00:00:00')
+```
+
+Rollback to a specific snapshot ID
+```sql
+ALTER TABLE ice_t EXECUTE ROLLBACK(1111);
+```