You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by "gaborkaszab (via GitHub)" <gi...@apache.org> on 2023/05/30 13:15:56 UTC

[GitHub] [iceberg] gaborkaszab opened a new pull request, #7738: Data: TableMigrationUtil shouldn't use path to get partition values

gaborkaszab opened a new pull request, #7738:
URL: https://github.com/apache/iceberg/pull/7738

   The existing implementation of TableMigrationUtil.loadPartition() requires the partition values parameter to be in URL-encoded format. Otherwise the '/' character in the partition value would cause a parsing error. But if we give URL-encoded partition values to this functions then the partition values in the result DataFiles will be wrong. For instance the string value "11/12/13" on the partition path is URL-encoded to '11%2F12%2F13'. If you run loadPartition() then the resulting DataFiles will incorrectly have the URL-encoded partition values.
   
   As a solution this patch adds a new implementation of loadPartition() that doesn't require the caller to pass the partition values in a URL-encoded format and creates DataFiles having the correct partition values.
   
   Note, I kept the existing implementation as well and marked it as deprecated so that it can be dropped in a later release.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] wypoon commented on a diff in pull request #7738: Data: TableMigrationUtil shouldn't use path to get partition values

Posted by "wypoon (via GitHub)" <gi...@apache.org>.
wypoon commented on code in PR #7738:
URL: https://github.com/apache/iceberg/pull/7738#discussion_r1212396126


##########
data/src/main/java/org/apache/iceberg/data/TableMigrationUtil.java:
##########
@@ -151,6 +256,36 @@ public static List<DataFile> listPartition(
     }
   }
 
+  private static List<FileStatus> fileStatusListInPartition(String partitionUri, Configuration conf)
+      throws IOException {

Review Comment:
   I suppose it is not worthwhile to change code you plan to remove in a later version?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] wypoon commented on a diff in pull request #7738: Data: TableMigrationUtil shouldn't use path to get partition values

Posted by "wypoon (via GitHub)" <gi...@apache.org>.
wypoon commented on code in PR #7738:
URL: https://github.com/apache/iceberg/pull/7738#discussion_r1212386909


##########
data/src/main/java/org/apache/iceberg/data/TableMigrationUtil.java:
##########
@@ -151,6 +256,36 @@ public static List<DataFile> listPartition(
     }
   }
 
+  private static List<FileStatus> fileStatusListInPartition(String partitionUri, Configuration conf)
+      throws IOException {

Review Comment:
   I thought the reason you're defining this helper method is because the same code exists in the existing `listPartitions` that you're retaining but deprecating and the new one you're adding, but you haven't changed the deprecated `listPartitions` to use this helper.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] wypoon commented on pull request #7738: Data: TableMigrationUtil shouldn't use path to get partition values

Posted by "wypoon (via GitHub)" <gi...@apache.org>.
wypoon commented on PR #7738:
URL: https://github.com/apache/iceberg/pull/7738#issuecomment-1571068540

   @RussellSpitzer this is an alternative to https://github.com/apache/iceberg/pull/7744.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


Re: [PR] Data: TableMigrationUtil shouldn't use path to get partition values [iceberg]

Posted by "gaborkaszab (via GitHub)" <gi...@apache.org>.
gaborkaszab closed pull request #7738: Data: TableMigrationUtil shouldn't use path to get partition values
URL: https://github.com/apache/iceberg/pull/7738


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] gaborkaszab commented on pull request #7738: Data: TableMigrationUtil shouldn't use path to get partition values

Posted by "gaborkaszab (via GitHub)" <gi...@apache.org>.
gaborkaszab commented on PR #7738:
URL: https://github.com/apache/iceberg/pull/7738#issuecomment-1570162761

   Hey @aokolnychyi, @szehon-ho, @amogh-jahagirdar,
   I found an issue wrt TableMigrationUtil.listPartition(). For details see https://github.com/apache/iceberg/issues/7612
   I'm tagging you because apparently Spark also has an issue with table migration when there are '/' chars in the partition columns. Could you please take a look at this fix? I added new functions for the fix while deprecated the old ones. Note, there is an extra 'schema' parameter needed for this solution, so I couldn't do the fix in-place.
   Once this fix is in, I can make a change on Spark side to use the new functions instead.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] wypoon commented on a diff in pull request #7738: Data: TableMigrationUtil shouldn't use path to get partition values

Posted by "wypoon (via GitHub)" <gi...@apache.org>.
wypoon commented on code in PR #7738:
URL: https://github.com/apache/iceberg/pull/7738#discussion_r1212386909


##########
data/src/main/java/org/apache/iceberg/data/TableMigrationUtil.java:
##########
@@ -151,6 +256,36 @@ public static List<DataFile> listPartition(
     }
   }
 
+  private static List<FileStatus> fileStatusListInPartition(String partitionUri, Configuration conf)
+      throws IOException {

Review Comment:
   I thought the reason you're defining this helper method is because the same code exists in the existing `listPartition` that you're retaining but deprecating and the new one you're adding, but you haven't changed the deprecated `listPartition` to use this helper.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org