You are viewing a plain text version of this content. The canonical link for it is here.
Posted to gitbox@hive.apache.org by GitBox <gi...@apache.org> on 2021/11/15 10:09:17 UTC

[GitHub] [hive] marton-bod commented on a change in pull request #2779: HIVE-25690: Fix column reorder detection for Iceberg schema evolution

marton-bod commented on a change in pull request #2779:
URL: https://github.com/apache/hive/pull/2779#discussion_r749174577



##########
File path: iceberg/iceberg-catalog/src/main/java/org/apache/iceberg/hive/HiveSchemaUtil.java
##########
@@ -178,28 +179,57 @@ public static SchemaDifference getSchemaDiff(Collection<FieldSchema> minuendColl
   }
 
   /**
-   * Compares a list of columns to another list, by name, to find an out of order column.
-   * It iterates through updated one by one, and compares the name of the column to the name of the column in the old
-   * list, in the same position. It returns the first mismatch it finds in updated, if any.
+   * Compares two lists of columns to each other, by name and index, to find the column that was moved by the
+   * schema evolution update (i.e. a column which was either moved to the first position, or moved after some specified
+   * column).

Review comment:
       That's a good point. Will do

##########
File path: iceberg/iceberg-catalog/src/main/java/org/apache/iceberg/hive/HiveSchemaUtil.java
##########
@@ -178,28 +179,57 @@ public static SchemaDifference getSchemaDiff(Collection<FieldSchema> minuendColl
   }
 
   /**
-   * Compares a list of columns to another list, by name, to find an out of order column.
-   * It iterates through updated one by one, and compares the name of the column to the name of the column in the old
-   * list, in the same position. It returns the first mismatch it finds in updated, if any.
+   * Compares two lists of columns to each other, by name and index, to find the column that was moved by the
+   * schema evolution update (i.e. a column which was either moved to the first position, or moved after some specified
+   * column).
    *
-   * @param updated The list of the columns after some updates have taken place
+   * @param updated The list of the columns after some updates have taken place (if any)
    * @param old The list of the original columns
    * @param renameMapping A map of name aliases for the updated columns (e.g. if a column rename occurred)
-   * @return A pair consisting of the first out of order column name, and its preceding column name (if any).
+   * @return A pair consisting of the reordered column's name, and its preceding column's name (if any).
    *         Returns a null in case there are no out of order columns.
    */
-  public static Pair<String, Optional<String>> getFirstOutOfOrderColumn(List<FieldSchema> updated,
+  public static Pair<String, Optional<String>> getReorderedColumn(List<FieldSchema> updated,
                                                                         List<FieldSchema> old,
                                                                         Map<String, String> renameMapping) {
-    for (int i = 0; i < updated.size() && i < old.size(); ++i) {
+    // first collect the updated index for each column
+    Map<String, Integer> nameToNewIndex = Maps.newHashMap();
+    for (int i = 0; i < updated.size(); ++i) {
       String updatedCol = renameMapping.getOrDefault(updated.get(i).getName(), updated.get(i).getName());
-      String oldCol = old.get(i).getName();
-      if (!oldCol.equals(updatedCol)) {
-        Optional<String> previousCol = i > 0 ? Optional.of(updated.get(i - 1).getName()) : Optional.empty();
-        return Pair.of(updatedCol, previousCol);
+      nameToNewIndex.put(updatedCol, i);
+    }
+
+    // find the column which has the highest index difference between its position in the old vs the updated list
+    String reorderedColName = null;
+    int maxIndexDiff = 0;
+    for (int oldIndex = 0; oldIndex < old.size(); ++oldIndex) {
+      String oldName = old.get(oldIndex).getName();
+      Integer newIndex = nameToNewIndex.get(oldName);
+      if (newIndex != null) {
+        if (maxIndexDiff < Math.abs(newIndex - oldIndex)) {

Review comment:
       Sure, will extract




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org