You are viewing a plain text version of this content. The canonical link for it is here.
Posted to gitbox@hive.apache.org by GitBox <gi...@apache.org> on 2021/05/25 13:39:47 UTC

[GitHub] [hive] lcspinter opened a new pull request #2317: HIVE-25162: Add support for CREATE TABLE ... STORED BY ICEBERG statements

lcspinter opened a new pull request #2317:
URL: https://github.com/apache/hive/pull/2317


   <!--
   Thanks for sending a pull request!  Here are some tips for you:
     1. If this is your first time, please read our contributor guidelines: https://cwiki.apache.org/confluence/display/Hive/HowToContribute
     2. Ensure that you have created an issue on the Hive project JIRA: https://issues.apache.org/jira/projects/HIVE/summary
     3. Ensure you have added or run the appropriate tests for your PR: 
     4. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP]HIVE-XXXXX:  Your PR title ...'.
     5. Be sure to keep the PR description updated to reflect all changes.
     6. Please write your PR title to summarize what this PR proposes.
     7. If possible, provide a concise example to reproduce the issue for a faster review.
   
   -->
   
   ### What changes were proposed in this pull request?
   <!--
   Please clarify what changes you are proposing. The purpose of this section is to outline the changes and how this PR fixes the issue. 
   If possible, please consider writing useful notes for better and faster reviews in your PR. See the examples below.
     1. If you refactor some codes with changing classes, showing the class hierarchy will help reviewers.
     2. If you fix some SQL features, you can provide some references of other DBMSes.
     3. If there is design documentation, please add the link.
     4. If there is a discussion in the mailing list, please add the link.
   -->
   This PR introduces syntactic shortcut to allow the creation of Iceberg tables  without providing the handler class
   `CREATE TABLE ice_t (a int) STORED BY ICEBERG [WITH SERDEPROPERTIES (...)]`
   
   
   ### Why are the changes needed?
   <!--
   Please clarify why the changes are needed. For instance,
     1. If you propose a new API, clarify the use case for a new API.
     2. If you fix a bug, you can clarify why it is a bug.
   -->
   At the moment, when we want to create an Iceberg table, we have to provide the fully qualified iceberg storage handler class name.
   `CREATE TABLE ice_t (a int) STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' [WITH SERDEPROPERTIES (...)]`
   
   ### Does this PR introduce _any_ user-facing change?
   <!--
   Note that it means *any* user-facing change including all aspects such as the documentation fix.
   If yes, please clarify the previous behavior and the change this PR proposes - provide the console output, description, screenshot and/or a reproducable example to show the behavior difference if possible.
   If possible, please also clarify if this is a user-facing change compared to the released Hive versions or within the unreleased branches such as master.
   If no, write 'No'.
   -->
   
   
   ### How was this patch tested?
   <!--
   If tests were added, say they were added here. Please make sure to add some test cases that check the changes thoroughly including negative and positive cases if possible.
   If it was tested in a way different from regular unit tests, please clarify how you tested step by step, ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future.
   If tests were not added, please describe why they were not added and/or why it was difficult to add.
   -->
   Unit test, q test


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] lcspinter merged pull request #2317: HIVE-25162: Add support for CREATE TABLE ... STORED BY ICEBERG statements

Posted by GitBox <gi...@apache.org>.
lcspinter merged pull request #2317:
URL: https://github.com/apache/hive/pull/2317


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] pvary commented on a change in pull request #2317: HIVE-25162: Add support for CREATE TABLE ... STORED BY ICEBERG statements

Posted by GitBox <gi...@apache.org>.
pvary commented on a change in pull request #2317:
URL: https://github.com/apache/hive/pull/2317#discussion_r639554024



##########
File path: iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerNoScan.java
##########
@@ -587,6 +618,7 @@ public void testIcebergAndHmsTableProperties() throws Exception {
     expectedIcebergProperties.put("custom_property", "initial_val");
     expectedIcebergProperties.put("EXTERNAL", "TRUE");
     expectedIcebergProperties.put("storage_handler", HiveIcebergStorageHandler.class.getName());
+    expectedIcebergProperties.put(serdeConstants.SERIALIZATION_FORMAT, "1");

Review comment:
       Why is this change?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] marton-bod commented on a change in pull request #2317: HIVE-25162: Add support for CREATE TABLE ... STORED BY ICEBERG statements

Posted by GitBox <gi...@apache.org>.
marton-bod commented on a change in pull request #2317:
URL: https://github.com/apache/hive/pull/2317#discussion_r640381488



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/parse/StorageFormat.java
##########
@@ -93,6 +106,14 @@ public boolean fillStorageFormat(ASTNode child) throws SemanticException {
     return true;
   }
 
+  private String processStorageHandler(String name) throws SemanticException {
+    if (StorageHandlerTypes.ICEBERG.name().equalsIgnoreCase(name)) {

Review comment:
       I think the code change to make it generic would be very small, probably a 4-line for loop, so we might as well add it into this PR, but I don't feel strongly, just a suggestion.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] pvary commented on a change in pull request #2317: HIVE-25162: Add support for CREATE TABLE ... STORED BY ICEBERG statements

Posted by GitBox <gi...@apache.org>.
pvary commented on a change in pull request #2317:
URL: https://github.com/apache/hive/pull/2317#discussion_r639680280



##########
File path: iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerNoScan.java
##########
@@ -239,12 +240,42 @@ public void testCreateDropTableNonDefaultCatalog() throws TException, Interrupte
     );
   }
 
+  @Test
+  public void testCreateTableStoredByIceberg() {
+    TableIdentifier identifier = TableIdentifier.of("default", "customers");
+    String query = String.format("CREATE EXTERNAL TABLE customers (customer_id BIGINT, first_name STRING, last_name " +
+        "STRING) STORED BY %s %s TBLPROPERTIES ('%s'='%s')",
+        "iceBeRg",

Review comment:
       Probably we should not use `%s` here, just substitute the string by hand

##########
File path: iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerNoScan.java
##########
@@ -239,12 +240,42 @@ public void testCreateDropTableNonDefaultCatalog() throws TException, Interrupte
     );
   }
 
+  @Test
+  public void testCreateTableStoredByIceberg() {
+    TableIdentifier identifier = TableIdentifier.of("default", "customers");
+    String query = String.format("CREATE EXTERNAL TABLE customers (customer_id BIGINT, first_name STRING, last_name " +
+        "STRING) STORED BY %s %s TBLPROPERTIES ('%s'='%s')",
+        "iceBeRg",
+        testTables.locationForCreateTableSQL(identifier),
+        InputFormatConfig.CATALOG_NAME,
+        Catalogs.ICEBERG_DEFAULT_CATALOG_NAME);
+    shell.executeStatement(query);
+    Assert.assertNotNull(testTables.loadTable(identifier));
+  }
+
+  @Test
+  public void testCreateTableStoredByIcebergWithSerdeProperties() {
+    TableIdentifier identifier = TableIdentifier.of("default", "customers");
+    String query = String.format("CREATE EXTERNAL TABLE customers (customer_id BIGINT, first_name STRING, last_name " +
+            "STRING) STORED BY %s WITH SERDEPROPERTIES('%s'='%s') %s TBLPROPERTIES ('%s'='%s')",
+        "iceberg",

Review comment:
       Probably we should not use `%s` here, just substitute the string by hand




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] lcspinter commented on a change in pull request #2317: HIVE-25162: Add support for CREATE TABLE ... STORED BY ICEBERG statements

Posted by GitBox <gi...@apache.org>.
lcspinter commented on a change in pull request #2317:
URL: https://github.com/apache/hive/pull/2317#discussion_r641963384



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/parse/StorageFormat.java
##########
@@ -93,6 +106,14 @@ public boolean fillStorageFormat(ASTNode child) throws SemanticException {
     return true;
   }
 
+  private String processStorageHandler(String name) throws SemanticException {
+    if (StorageHandlerTypes.ICEBERG.name().equalsIgnoreCase(name)) {

Review comment:
       Done.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] marton-bod commented on a change in pull request #2317: HIVE-25162: Add support for CREATE TABLE ... STORED BY ICEBERG statements

Posted by GitBox <gi...@apache.org>.
marton-bod commented on a change in pull request #2317:
URL: https://github.com/apache/hive/pull/2317#discussion_r640374426



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/parse/StorageFormat.java
##########
@@ -93,6 +106,14 @@ public boolean fillStorageFormat(ASTNode child) throws SemanticException {
     return true;
   }
 
+  private String processStorageHandler(String name) throws SemanticException {
+    if (StorageHandlerTypes.ICEBERG.name().equalsIgnoreCase(name)) {

Review comment:
       Shall we make this generic so it works for future StorageHandlerTypes types? i.e. iterate through the enum values and if `name` matches any enum value, then reset the `name` with `enum.name()` accordingly.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] lcspinter commented on a change in pull request #2317: HIVE-25162: Add support for CREATE TABLE ... STORED BY ICEBERG statements

Posted by GitBox <gi...@apache.org>.
lcspinter commented on a change in pull request #2317:
URL: https://github.com/apache/hive/pull/2317#discussion_r641960742



##########
File path: iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerNoScan.java
##########
@@ -462,7 +493,7 @@ public void testCreateTableWithNotSupportedTypes() {
           "Unsupported Hive type", () -> {
             shell.executeStatement("CREATE EXTERNAL TABLE not_supported_types " +
                 "(not_supported " + notSupportedType + ") " +
-                "STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' " +
+                "STORED BY ICEBERG " +

Review comment:
       Yes, we have plenty of other test cases (the ones calling the `TestTables#createTable()`) using the storage handler syntax. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] pvary commented on a change in pull request #2317: HIVE-25162: Add support for CREATE TABLE ... STORED BY ICEBERG statements

Posted by GitBox <gi...@apache.org>.
pvary commented on a change in pull request #2317:
URL: https://github.com/apache/hive/pull/2317#discussion_r639683528



##########
File path: iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerNoScan.java
##########
@@ -587,6 +618,7 @@ public void testIcebergAndHmsTableProperties() throws Exception {
     expectedIcebergProperties.put("custom_property", "initial_val");
     expectedIcebergProperties.put("EXTERNAL", "TRUE");
     expectedIcebergProperties.put("storage_handler", HiveIcebergStorageHandler.class.getName());
+    expectedIcebergProperties.put(serdeConstants.SERIALIZATION_FORMAT, "1");

Review comment:
       Thanks for the explanation!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] lcspinter commented on a change in pull request #2317: HIVE-25162: Add support for CREATE TABLE ... STORED BY ICEBERG statements

Posted by GitBox <gi...@apache.org>.
lcspinter commented on a change in pull request #2317:
URL: https://github.com/apache/hive/pull/2317#discussion_r639674283



##########
File path: iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerNoScan.java
##########
@@ -239,6 +240,36 @@ public void testCreateDropTableNonDefaultCatalog() throws TException, Interrupte
     );
   }
 
+  @Test
+  public void testCreateTableStoredByIceberg() {
+    TableIdentifier identifier = TableIdentifier.of("default", "customers");
+    String query = String.format("CREATE EXTERNAL TABLE customers (customer_id BIGINT, first_name STRING, last_name " +
+        "STRING) STORED BY %s %s TBLPROPERTIES ('%s'='%s')",
+        "ICEBERG",

Review comment:
       Done.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] pvary commented on a change in pull request #2317: HIVE-25162: Add support for CREATE TABLE ... STORED BY ICEBERG statements

Posted by GitBox <gi...@apache.org>.
pvary commented on a change in pull request #2317:
URL: https://github.com/apache/hive/pull/2317#discussion_r640378772



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/parse/StorageFormat.java
##########
@@ -93,6 +106,14 @@ public boolean fillStorageFormat(ASTNode child) throws SemanticException {
     return true;
   }
 
+  private String processStorageHandler(String name) throws SemanticException {
+    if (StorageHandlerTypes.ICEBERG.name().equalsIgnoreCase(name)) {

Review comment:
       I was thinking about HBase / Kudu / Druid, but that probably merits another PR 😄 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] marton-bod commented on a change in pull request #2317: HIVE-25162: Add support for CREATE TABLE ... STORED BY ICEBERG statements

Posted by GitBox <gi...@apache.org>.
marton-bod commented on a change in pull request #2317:
URL: https://github.com/apache/hive/pull/2317#discussion_r640368683



##########
File path: iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergMetaHook.java
##########
@@ -333,6 +334,12 @@ private static Properties getCatalogProperties(org.apache.hadoop.hive.metastore.
       properties.put(Catalogs.NAME, TableIdentifier.of(hmsTable.getDbName(), hmsTable.getTableName()).toString());
     }
 
+    hmsTable.getSd().getSerdeInfo().getParameters().entrySet().stream()

Review comment:
       `SerdeInfo::getParameters` has a `@Nullable` annotation. Shall we guard against that case to be safe?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] marton-bod commented on a change in pull request #2317: HIVE-25162: Add support for CREATE TABLE ... STORED BY ICEBERG statements

Posted by GitBox <gi...@apache.org>.
marton-bod commented on a change in pull request #2317:
URL: https://github.com/apache/hive/pull/2317#discussion_r640370818



##########
File path: iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerNoScan.java
##########
@@ -462,7 +493,7 @@ public void testCreateTableWithNotSupportedTypes() {
           "Unsupported Hive type", () -> {
             shell.executeStatement("CREATE EXTERNAL TABLE not_supported_types " +
                 "(not_supported " + notSupportedType + ") " +
-                "STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' " +
+                "STORED BY ICEBERG " +

Review comment:
       Did we leave some test cases with the orginal syntax `STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler'` to make sure it's still working?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] lcspinter commented on a change in pull request #2317: HIVE-25162: Add support for CREATE TABLE ... STORED BY ICEBERG statements

Posted by GitBox <gi...@apache.org>.
lcspinter commented on a change in pull request #2317:
URL: https://github.com/apache/hive/pull/2317#discussion_r641960607



##########
File path: iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergMetaHook.java
##########
@@ -333,6 +334,12 @@ private static Properties getCatalogProperties(org.apache.hadoop.hive.metastore.
       properties.put(Catalogs.NAME, TableIdentifier.of(hmsTable.getDbName(), hmsTable.getTableName()).toString());
     }
 
+    hmsTable.getSd().getSerdeInfo().getParameters().entrySet().stream()

Review comment:
       Good point. Added null check.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] pvary commented on a change in pull request #2317: HIVE-25162: Add support for CREATE TABLE ... STORED BY ICEBERG statements

Posted by GitBox <gi...@apache.org>.
pvary commented on a change in pull request #2317:
URL: https://github.com/apache/hive/pull/2317#discussion_r639553457



##########
File path: iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerNoScan.java
##########
@@ -239,6 +240,36 @@ public void testCreateDropTableNonDefaultCatalog() throws TException, Interrupte
     );
   }
 
+  @Test
+  public void testCreateTableStoredByIceberg() {
+    TableIdentifier identifier = TableIdentifier.of("default", "customers");
+    String query = String.format("CREATE EXTERNAL TABLE customers (customer_id BIGINT, first_name STRING, last_name " +
+        "STRING) STORED BY %s %s TBLPROPERTIES ('%s'='%s')",
+        "ICEBERG",

Review comment:
       I would change most of the test to `STORED BY ICEBERG` and keep only a few with the original class name.
   Also I would like to try out `stored by iCeBerG`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] lcspinter commented on a change in pull request #2317: HIVE-25162: Add support for CREATE TABLE ... STORED BY ICEBERG statements

Posted by GitBox <gi...@apache.org>.
lcspinter commented on a change in pull request #2317:
URL: https://github.com/apache/hive/pull/2317#discussion_r639681461



##########
File path: iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerNoScan.java
##########
@@ -587,6 +618,7 @@ public void testIcebergAndHmsTableProperties() throws Exception {
     expectedIcebergProperties.put("custom_property", "initial_val");
     expectedIcebergProperties.put("EXTERNAL", "TRUE");
     expectedIcebergProperties.put("storage_handler", HiveIcebergStorageHandler.class.getName());
+    expectedIcebergProperties.put(serdeConstants.SERIALIZATION_FORMAT, "1");

Review comment:
       In this PR, I also added a new logic to the `IcebergMetaHook` to copy the serdeproperties from HMS table to the catalog properties. This change was required since the hive syntax allows to provide additional serdeproperties when creating non-native tables. Like:
   `CREATE TABLE .... STORED BY ICEBERG WITH SERDEPROPERTIES('my_key'='my_value')`
   
   When we create a table, hive automatically updates the serdeproperties with the `SERIALIZATION_FORMAT` set to `1`( if it's not overwritten from the DDL). 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org