You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/07/28 23:12:27 UTC

[GitHub] [hudi] umehrot2 commented on a diff in pull request #6237: [HUDI-4495] Fix handling of S3 paths incompatible with java URI stand…

umehrot2 commented on code in PR #6237:
URL: https://github.com/apache/hudi/pull/6237#discussion_r932745731


##########
hudi-common/src/main/java/org/apache/hudi/common/fs/HoodieWrapperFileSystem.java:
##########
@@ -145,8 +145,11 @@ public static Path convertPathWithScheme(Path oldPath, String newScheme) {
     URI oldURI = oldPath.toUri();
     URI newURI;
     try {
-      newURI = new URI(newScheme, oldURI.getUserInfo(), oldURI.getHost(), oldURI.getPort(), oldURI.getPath(),
-          oldURI.getQuery(), oldURI.getFragment());
+      newURI = new URI(newScheme,
+                oldURI.getAuthority(),
+                oldURI.getPath(),
+                oldURI.getQuery(),
+                oldURI.getFragment());

Review Comment:
   Yes. It will stay fully compatible. The issue is introduced because of the use of **getHost()** API. This works fine for HDFS like URIs that have a host and a port. Also it works well for most bucket naming conventions that match java URI standards. But for certain patters as explained in the Jira the **getHost()** API breaks because of the way it tries to parse the hostname. Instead this constructor works better because it uses **getAuthority** which correctly extracts bucket name in case of S3. I do not see a need for specifically using the **getHotst/getPort** API, when we have **getAuthority** that can essentially do the same.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org