You are viewing a plain text version of this content. The canonical link for it is here.
Posted to gitbox@hive.apache.org by GitBox <gi...@apache.org> on 2022/03/23 10:10:55 UTC

[GitHub] [hive] pvary opened a new pull request #3129: HIVE-26061: Do not add 'from deserializer' comment upon alter commands for Iceberg tables

pvary opened a new pull request #3129:
URL: https://github.com/apache/hive/pull/3129


   ### What changes were proposed in this pull request?
   Add a new method for StorageHandler to decide if the default comment is needed for a table or not.
   
   ### Why are the changes needed?
   Iceberg table fields should match the Hive table fields exactly and the `from deserializer` is causing usability issues
   
   ### Does this PR introduce _any_ user-facing change?
   Iceberg table columns will not have this extra comment after the PR
   
   ### How was this patch tested?
   Unit tests


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] marton-bod commented on a change in pull request #3129: HIVE-26061: Do not add 'from deserializer' comment upon alter commands for Iceberg tables

Posted by GitBox <gi...@apache.org>.
marton-bod commented on a change in pull request #3129:
URL: https://github.com/apache/hive/pull/3129#discussion_r835253494



##########
File path: iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerNoScan.java
##########
@@ -809,8 +810,14 @@ public void testCreateTableWithColumnComments() {
     for (int i = 0; i < icebergTable.schema().columns().size(); i++) {
       Types.NestedField field = icebergTable.schema().columns().get(i);
       Assert.assertArrayEquals(new Object[] {field.name(), HiveSchemaUtil.convert(field.type()).getTypeName(),
-          field.doc() != null ? field.doc() : "from deserializer"}, rows.get(i));
+          field.doc() != null ? field.doc() : ""}, rows.get(i));
     }
+
+    // Check the columns directly
+    List<FieldSchema> cols = shell.metastore()
+        .run(client -> client.getTable(new GetTableRequest("default", "comment_table")))

Review comment:
       Shall we refactor`shell.metastore().getTable()` to use this implementation too, if this is superior?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] pvary commented on a change in pull request #3129: HIVE-26061: Do not add 'from deserializer' comment upon alter commands for Iceberg tables

Posted by GitBox <gi...@apache.org>.
pvary commented on a change in pull request #3129:
URL: https://github.com/apache/hive/pull/3129#discussion_r835280974



##########
File path: itests/hive-unit/src/test/java/org/apache/hive/jdbc/TestJdbcWithMiniLlapVectorArrowBatch.java
##########
@@ -320,7 +320,7 @@
 
   // test with legacy avro files
   // similar to ql/src/test/queries/clientpositive/avro_legacy_mixed_timestamp.q
-  @Test public void testAvroLegacyMixedTimestamps() throws Exception {
+  @Test(timeout = 12000000) public void testAvroLegacyMixedTimestamps() throws Exception {

Review comment:
       Reverted the change




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] marton-bod commented on a change in pull request #3129: HIVE-26061: Do not add 'from deserializer' comment upon alter commands for Iceberg tables

Posted by GitBox <gi...@apache.org>.
marton-bod commented on a change in pull request #3129:
URL: https://github.com/apache/hive/pull/3129#discussion_r835254267



##########
File path: iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerNoScan.java
##########
@@ -1172,8 +1179,11 @@ public void testAlterTableReplaceColumns() throws TException, InterruptedExcepti
     );
     testTables.createTable(shell, identifier.name(), schema, SPEC, FileFormat.PARQUET, ImmutableList.of());
 
+    // Run some alter commands so the

Review comment:
       nit: unfinished sentence




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] marton-bod commented on a change in pull request #3129: HIVE-26061: Do not add 'from deserializer' comment upon alter commands for Iceberg tables

Posted by GitBox <gi...@apache.org>.
marton-bod commented on a change in pull request #3129:
URL: https://github.com/apache/hive/pull/3129#discussion_r835262573



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveUtils.java
##########
@@ -304,6 +304,10 @@ public static HiveStorageHandler getStorageHandler(
     }
   }
 
+  public static String getDefaultComment(HiveStorageHandler storageHandler) {
+    return storageHandler != null ? storageHandler.getDefaultColumnComment() : HiveStorageHandler.FROM_SERIALIZER;

Review comment:
       Doesn't this add "from deserializer" to native tables (where storage handler is null)? I thought this default comment thing was only related to non-native tables




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] pvary commented on a change in pull request #3129: HIVE-26061: Do not add 'from deserializer' comment upon alter commands for Iceberg tables

Posted by GitBox <gi...@apache.org>.
pvary commented on a change in pull request #3129:
URL: https://github.com/apache/hive/pull/3129#discussion_r835720938



##########
File path: metastore/src/java/org/apache/hadoop/hive/metastore/SerDeStorageSchemaReader.java
##########
@@ -45,7 +45,7 @@
       }
 
       Deserializer s = HiveMetaStoreUtils.getDeserializer(conf, tbl, null, false);
-      return HiveMetaStoreUtils.getFieldsFromDeserializer(tbl.getTableName(), s);
+      return HiveMetaStoreUtils.getFieldsFromDeserializer(tbl.getTableName(), s, null);

Review comment:
       Changed the whole approach since we do not have enough info here to decide on the default comment




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] pvary commented on a change in pull request #3129: HIVE-26061: Do not add 'from deserializer' comment upon alter commands for Iceberg tables

Posted by GitBox <gi...@apache.org>.
pvary commented on a change in pull request #3129:
URL: https://github.com/apache/hive/pull/3129#discussion_r835283057



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveUtils.java
##########
@@ -304,6 +304,10 @@ public static HiveStorageHandler getStorageHandler(
     }
   }
 
+  public static String getDefaultComment(HiveStorageHandler storageHandler) {
+    return storageHandler != null ? storageHandler.getDefaultColumnComment() : HiveStorageHandler.FROM_SERIALIZER;

Review comment:
       Default storage handler still could have a `from deserializer` field. For example AVRO files can contain the schema and they can dictate the columns of the table. This change makes sure that we do not get an error when the storage handler is null. The other default is handled by the default implementation for the `getDefaultColumnComment()` method




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] pvary merged pull request #3129: HIVE-26061: Do not add 'from deserializer' comment upon alter commands for Iceberg tables

Posted by GitBox <gi...@apache.org>.
pvary merged pull request #3129:
URL: https://github.com/apache/hive/pull/3129


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org