You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2021/08/02 13:27:00 UTC

[jira] [Work logged] (HIVE-25328) Limit scope of REPLACE COLUMNS for Iceberg tables

     [ https://issues.apache.org/jira/browse/HIVE-25328?focusedWorklogId=632342&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-632342 ]

ASF GitHub Bot logged work on HIVE-25328:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 02/Aug/21 13:26
            Start Date: 02/Aug/21 13:26
    Worklog Time Spent: 10m 
      Work Description: szlta commented on a change in pull request #2475:
URL: https://github.com/apache/hive/pull/2475#discussion_r680969086



##########
File path: iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergMetaHook.java
##########
@@ -507,35 +506,38 @@ private void handleAddColumns(org.apache.hadoop.hive.metastore.api.Table hmsTabl
   }
 
   private void handleReplaceColumns(org.apache.hadoop.hive.metastore.api.Table hmsTable) throws MetaException {
-    HiveSchemaUtil.SchemaDifference schemaDifference = HiveSchemaUtil.getSchemaDiff(hmsTable.getSd().getCols(),
-        HiveSchemaUtil.convert(icebergTable.schema()), true);
-    if (!schemaDifference.isEmpty()) {
-      updateSchema = icebergTable.updateSchema();
-    } else {
-      // we should get here if the user restated the exactly the existing columns in the REPLACE COLUMNS command
-      LOG.info("Found no difference between new and old schema for ALTER TABLE REPLACE COLUMNS for" +
-          " table: {}. There will be no Iceberg commit.", hmsTable.getTableName());
-      return;
-    }
+    List<FieldSchema> hmsCols = hmsTable.getSd().getCols();
+    List<FieldSchema> icebergCols = HiveSchemaUtil.convert(icebergTable.schema());
+    HiveSchemaUtil.SchemaDifference schemaDifference = HiveSchemaUtil.getSchemaDiff(hmsCols, icebergCols, true);
 
-    for (FieldSchema droppedCol : schemaDifference.getMissingFromFirst()) {
-      updateSchema.deleteColumn(droppedCol.getName());
+    // if there are columns dropped, let's remove them from the iceberg schema as well so we can compare the order
+    if (!schemaDifference.getMissingFromFirst().isEmpty()) {
+      schemaDifference.getMissingFromFirst().forEach(icebergCols::remove);
     }
 
-    for (FieldSchema addedCol : schemaDifference.getMissingFromSecond()) {
-      updateSchema.addColumn(
-          addedCol.getName(),
-          HiveSchemaUtil.convert(TypeInfoUtils.getTypeInfoFromTypeString(addedCol.getType())),
-          addedCol.getComment()
-      );
-    }
+    Pair<String, Optional<String>> outOfOrder = HiveSchemaUtil.getFirstOutOfOrderColumn(
+        hmsCols, icebergCols, ImmutableMap.of());
 
-    for (FieldSchema updatedCol : schemaDifference.getTypeChanged()) {
-      updateSchema.updateColumn(updatedCol.getName(), getPrimitiveTypeOrThrow(updatedCol), updatedCol.getComment());
+    // limit the scope of this operation to only dropping columns
+    if (!schemaDifference.getMissingFromSecond().isEmpty() || !schemaDifference.getTypeChanged().isEmpty() ||
+        !schemaDifference.getCommentChanged().isEmpty() || outOfOrder != null) {
+      throw new MetaException("Unsupported operation to use REPLACE COLUMNS for adding a column, changing a " +
+          "column type, column comment or reordering columns. Only use REPLACE COLUMNS for dropping columns. " +
+          "For the other operations, consider using the ADD COLUMNS or CHANGE COLUMN commands.");
     }
 
-    for (FieldSchema updatedCol : schemaDifference.getCommentChanged()) {
-      updateSchema.updateColumnDoc(updatedCol.getName(), updatedCol.getComment());
+    // check if there were any column drops
+    if (!schemaDifference.getMissingFromFirst().isEmpty()) {

Review comment:
       This condition is already checked at line 518. Also - if we ever got here, shouldn't it mean there were cols dropped?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 632342)
    Time Spent: 20m  (was: 10m)

> Limit scope of REPLACE COLUMNS for Iceberg tables
> -------------------------------------------------
>
>                 Key: HIVE-25328
>                 URL: https://issues.apache.org/jira/browse/HIVE-25328
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Marton Bod
>            Assignee: Marton Bod
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> Replace columns is a rather wildcard operation which can do heavy-weight schema changes. We would only want to allow this operation for dropping columns for Iceberg tables. For other changes (adding cols, renaming, type promotion etc.), we should use the CHANGE COLUMN command.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)