You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2022/06/01 09:09:35 UTC

[GitHub] [iceberg] szlta commented on a diff in pull request #4915: [DOC] Update Hive doc page with the 4.0.0-alpha-1 features

szlta commented on code in PR #4915:
URL: https://github.com/apache/iceberg/pull/4915#discussion_r886565431


##########
docs/hive/_index.md:
##########
@@ -24,19 +24,44 @@ weight: 400
 # Hive
 
 Iceberg supports reading and writing Iceberg tables through [Hive](https://hive.apache.org) by using a [StorageHandler](https://cwiki.apache.org/confluence/display/Hive/StorageHandlers).
-Here is the current compatibility matrix for Iceberg Hive support: 
 
-| Feature                  | Hive 2.x               | Hive 3.1.2             |
-| ------------------------ | ---------------------- | ---------------------- |
-| CREATE EXTERNAL TABLE    | ✔️                     | ✔️                     |
-| CREATE TABLE             | ✔️                     | ✔️                     |
-| DROP TABLE               | ✔️                     | ✔️                     |
-| SELECT                   | ✔️ (MapReduce and Tez) | ✔️ (MapReduce and Tez) |
-| INSERT INTO              | ✔️ (MapReduce only)️    | ✔️ (MapReduce only)    |
+## Feature support
+Iceberg compatibility with Hive 2.x and Hive 3.1.2/3 supported the following features:
+
+* Creating a table
+* Dropping a table
+* Reading a table
+* Inserting data by appending it to a table (MapReduce only)

Review Comment:
   nit: sounds bit strange to me, perhaps:
   - Inserting data into a table (will append rows) (MapReduce only)



##########
docs/hive/_index.md:
##########
@@ -24,19 +24,44 @@ weight: 400
 # Hive
 
 Iceberg supports reading and writing Iceberg tables through [Hive](https://hive.apache.org) by using a [StorageHandler](https://cwiki.apache.org/confluence/display/Hive/StorageHandlers).
-Here is the current compatibility matrix for Iceberg Hive support: 
 
-| Feature                  | Hive 2.x               | Hive 3.1.2             |
-| ------------------------ | ---------------------- | ---------------------- |
-| CREATE EXTERNAL TABLE    | ✔️                     | ✔️                     |
-| CREATE TABLE             | ✔️                     | ✔️                     |
-| DROP TABLE               | ✔️                     | ✔️                     |
-| SELECT                   | ✔️ (MapReduce and Tez) | ✔️ (MapReduce and Tez) |
-| INSERT INTO              | ✔️ (MapReduce only)️    | ✔️ (MapReduce only)    |
+## Feature support
+Iceberg compatibility with Hive 2.x and Hive 3.1.2/3 supported the following features:
+
+* Creating a table
+* Dropping a table
+* Reading a table
+* Inserting data by appending it to a table (MapReduce only)
+
+Iceberg compatibility with Hive 4.0.0-alpha-1
+(Using HiveCatalog) supports the following, additional features:
+
+* Creating an Iceberg identity-partitioned table
+* Creating an Iceberg identity-partitioned table from a spec

Review Comment:
   Maybe:
   - Creating an Iceberg table with any partition spec (including the various transforms Iceberg supports)
   
   ..as I think the feature is there in alpha1: https://github.com/apache/hive/commit/eedcd82bc2d61861a27205f925ba0ffab9b6bca8
   
   also, see "Partitioned tables" part



##########
docs/hive/_index.md:
##########
@@ -24,19 +24,44 @@ weight: 400
 # Hive
 
 Iceberg supports reading and writing Iceberg tables through [Hive](https://hive.apache.org) by using a [StorageHandler](https://cwiki.apache.org/confluence/display/Hive/StorageHandlers).
-Here is the current compatibility matrix for Iceberg Hive support: 
 
-| Feature                  | Hive 2.x               | Hive 3.1.2             |
-| ------------------------ | ---------------------- | ---------------------- |
-| CREATE EXTERNAL TABLE    | ✔️                     | ✔️                     |
-| CREATE TABLE             | ✔️                     | ✔️                     |
-| DROP TABLE               | ✔️                     | ✔️                     |
-| SELECT                   | ✔️ (MapReduce and Tez) | ✔️ (MapReduce and Tez) |
-| INSERT INTO              | ✔️ (MapReduce only)️    | ✔️ (MapReduce only)    |
+## Feature support
+Iceberg compatibility with Hive 2.x and Hive 3.1.2/3 supported the following features:
+
+* Creating a table
+* Dropping a table
+* Reading a table
+* Inserting data by appending it to a table (MapReduce only)
+
+Iceberg compatibility with Hive 4.0.0-alpha-1
+(Using HiveCatalog) supports the following, additional features:
+
+* Creating an Iceberg identity-partitioned table
+* Creating an Iceberg identity-partitioned table from a spec
+* Creating a table from an existing table (CTAS table)
+* Altering a table while keeping Iceberg and Hive schemas in sync
+* Altering the partition schema (updating columns)
+* Altering the partition schema by specifying partition transforms
+* Truncating a table
+* Migrating tables in Avro, Parquet, or ORC format to Iceberg
+* Reading the schema of a table
+* Time travel applications
+* Inserting data by appending it into a table (Tez only)
+* Inserting data overwriting existing data

Review Comment:
   nit: similarly:
   - Inserting into a table (will append rows) (Tez only)
   - Insert overrite an existing table



##########
docs/hive/_index.md:
##########
@@ -24,19 +24,44 @@ weight: 400
 # Hive
 
 Iceberg supports reading and writing Iceberg tables through [Hive](https://hive.apache.org) by using a [StorageHandler](https://cwiki.apache.org/confluence/display/Hive/StorageHandlers).
-Here is the current compatibility matrix for Iceberg Hive support: 
 
-| Feature                  | Hive 2.x               | Hive 3.1.2             |
-| ------------------------ | ---------------------- | ---------------------- |
-| CREATE EXTERNAL TABLE    | ✔️                     | ✔️                     |
-| CREATE TABLE             | ✔️                     | ✔️                     |
-| DROP TABLE               | ✔️                     | ✔️                     |
-| SELECT                   | ✔️ (MapReduce and Tez) | ✔️ (MapReduce and Tez) |
-| INSERT INTO              | ✔️ (MapReduce only)️    | ✔️ (MapReduce only)    |
+## Feature support
+Iceberg compatibility with Hive 2.x and Hive 3.1.2/3 supported the following features:
+
+* Creating a table
+* Dropping a table
+* Reading a table
+* Inserting data by appending it to a table (MapReduce only)
+
+Iceberg compatibility with Hive 4.0.0-alpha-1
+(Using HiveCatalog) supports the following, additional features:
+
+* Creating an Iceberg identity-partitioned table
+* Creating an Iceberg identity-partitioned table from a spec
+* Creating a table from an existing table (CTAS table)
+* Altering a table while keeping Iceberg and Hive schemas in sync
+* Altering the partition schema (updating columns)
+* Altering the partition schema by specifying partition transforms
+* Truncating a table
+* Migrating tables in Avro, Parquet, or ORC format to Iceberg

Review Comment:
   Should we point out in ORC's case that migration is supported for only non-ACID ORC tables?



##########
docs/hive/_index.md:
##########
@@ -234,89 +337,118 @@ to the shared data files in case of accidental drop table commands from the Hive
 which would unintentionally remove all the data in the table.
 {{< /hint >}}
 
-### DROP TABLE
-
-Tables can be dropped using the `DROP TABLE` command:
+### ALTER TABLE
+#### Table properties
+For HiveCatalog tables the Iceberg table properties and the Hive table properties stored in HMS are kept in sync.
 
+{{< hint info >}}
+IMPORTANT: This feature is not available for other Catalog implementations.
+{{< /hint >}}
 ```sql
-DROP TABLE [IF EXISTS] table_name [PURGE];
+ALTER TABLE t SET TBLPROPERTIES('...'='...');
 ```
 
-You can configure purge behavior through global Hadoop configuration or Hive metastore table properties:
-
-| Config key                  | Default                    | Description                                                     |
-| ----------------------------| ---------------------------| --------------------------------------------------------------- |
-| external.table.purge        | true                       | if all data and metadata should be purged in a table by default |
-
-Each Iceberg table's default purge behavior can also be configured through Iceberg table properties:
-
-| Property                    | Default                    | Description                                                       |
-| ----------------------------| ---------------------------| ----------------------------------------------------------------- |
-| gc.enabled                  | true                       | if all data and metadata should be purged in the table by default |
+#### Schema evolution
+The Hive table schema is kept in sync with the Iceberg table. If an outside source (Impala/Spark/Java API/etc)
+changes the schema, the Hive table immediately reflects the changes. You alter the table schema using Hive commands:
 
-When changing `gc.enabled` on the Iceberg table via `UpdateProperties`, `external.table.purge` is also updated on HMS table accordingly.
-When setting `external.table.purge` as a table prop during Hive `CREATE TABLE`, `gc.enabled` is pushed down accordingly to the Iceberg table properties.
-This makes sure that the 2 properties are always consistent at table level between Hive and Iceberg.
-
-{{< hint danger >}}
-Changing `external.table.purge` via Hive `ALTER TABLE SET TBLPROPERTIES` does not update `gc.enabled` on the Iceberg table. 
-This is a limitation on Hive 3.1.2 because the `HiveMetaHook` doesn't have all the hooks for alter tables yet.
-{{< /hint >}}
-
-## Querying with SQL
-
-Here are the features highlights for Iceberg Hive read support:
-
-1. **Predicate pushdown**: Pushdown of the Hive SQL `WHERE` clause has been implemented so that these filters are used at the Iceberg `TableScan` level as well as by the Parquet and ORC Readers.
-2. **Column projection**: Columns from the Hive SQL `SELECT` clause are projected down to the Iceberg readers to reduce the number of columns read.
-3. **Hive query engines**: Both the MapReduce and Tez query execution engines are supported.
-
-### Configurations
-
-Here are the Hadoop configurations that one can adjust for the Hive reader:
+* Add a column
+```sql
+ALTER TABLE orders ADD COLUMNS (nickname string);
+```
+* Rename a column
+```sql
+ALTER TABLE orders CHANGE COLUMN item fruit string;
+```
+* Reorder columns
+```sql
+ALTER TABLE orders CHANGE COLUMN quantity quantity int AFTER price;
+```
+* Change a column type - only if the Iceberg defined the column type change as safe
+```sql
+ALTER TABLE orders CHANGE COLUMN price price long;
+```
+* Drop column by using CHANGE COLUMN to remove the old column

Review Comment:
    not CHANGE COLUMN, it's REPLACE COLUMNS
   Also note, that dropping columns is only thing REPLACE COLUMNS can be used for i.e. if columns are specified out-of-order an error will be thrown signalling this limitation.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org