You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2021/03/25 04:12:35 UTC

[GitHub] [iceberg] uncleGen opened a new pull request #2380: Spark: Qualify destination location before check when snapshot

uncleGen opened a new pull request #2380:
URL: https://github.com/apache/iceberg/pull/2380


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] uncleGen commented on pull request #2380: Spark: Qualify destination location before check when snapshot

Posted by GitBox <gi...@apache.org>.
uncleGen commented on pull request #2380:
URL: https://github.com/apache/iceberg/pull/2380#issuecomment-809237209


   cc @RussellSpitzer 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] uncleGen commented on a change in pull request #2380: Spark: Qualify destination location before check when snapshot

Posted by GitBox <gi...@apache.org>.
uncleGen commented on a change in pull request #2380:
URL: https://github.com/apache/iceberg/pull/2380#discussion_r609259338



##########
File path: spark3/src/main/java/org/apache/iceberg/actions/Spark3SnapshotAction.java
##########
@@ -135,10 +140,23 @@ protected TableCatalog checkSourceCatalog(CatalogPlugin catalog) {
 
   @Override
   public SnapshotAction withLocation(String location) {
-    Preconditions.checkArgument(!sourceTableLocation().equals(location),
+    String qualifiedLocation = makeQualifiedLocation(location);
+    Preconditions.checkArgument(!sourceTableLocation().equals(qualifiedLocation),
         "Cannot create snapshot where destination location is the same as the source location." +
             " This would cause a mixing of original table created and snapshot created files.");
     this.destTableLocation = location;
     return this;
   }
+
+  private String makeQualifiedLocation(String location) {
+    try {
+      Path path = new Path(location);
+      Configuration conf = spark().sessionState().newHadoopConf();
+      FileSystem fs = FileSystem.get(path.toUri(), conf);
+      return CatalogUtils.URIToString(fs.makeQualified(path).toUri());
+    } catch (IOException ioe) {
+      LOG.error("Failed to initialize hadoop filesystem.", ioe);
+      return location;

Review comment:
       make sense @RussellSpitzer 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] RussellSpitzer commented on a change in pull request #2380: Spark: Qualify destination location before check when snapshot

Posted by GitBox <gi...@apache.org>.
RussellSpitzer commented on a change in pull request #2380:
URL: https://github.com/apache/iceberg/pull/2380#discussion_r603234562



##########
File path: spark3/src/main/java/org/apache/iceberg/actions/Spark3SnapshotAction.java
##########
@@ -135,10 +140,23 @@ protected TableCatalog checkSourceCatalog(CatalogPlugin catalog) {
 
   @Override
   public SnapshotAction withLocation(String location) {
-    Preconditions.checkArgument(!sourceTableLocation().equals(location),
+    String qualifiedLocation = makeQualifiedLocation(location);
+    Preconditions.checkArgument(!sourceTableLocation().equals(qualifiedLocation),
         "Cannot create snapshot where destination location is the same as the source location." +
             " This would cause a mixing of original table created and snapshot created files.");
     this.destTableLocation = location;
     return this;
   }
+
+  private String makeQualifiedLocation(String location) {
+    try {
+      Path path = new Path(location);
+      Configuration conf = spark().sessionState().newHadoopConf();
+      FileSystem fs = FileSystem.get(path.toUri(), conf);
+      return CatalogUtils.URIToString(fs.makeQualified(path).toUri());
+    } catch (IOException ioe) {
+      LOG.error("Failed to initialize hadoop filesystem.", ioe);
+      return location;

Review comment:
       We should probably just fail, after all if the location string cannot be resolved isn't it going to fail anyway?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] uncleGen commented on a change in pull request #2380: Spark: Qualify destination location before check when snapshot

Posted by GitBox <gi...@apache.org>.
uncleGen commented on a change in pull request #2380:
URL: https://github.com/apache/iceberg/pull/2380#discussion_r601028952



##########
File path: spark3/src/main/java/org/apache/iceberg/actions/Spark3SnapshotAction.java
##########
@@ -135,10 +140,23 @@ protected TableCatalog checkSourceCatalog(CatalogPlugin catalog) {
 
   @Override
   public SnapshotAction withLocation(String location) {
-    Preconditions.checkArgument(!sourceTableLocation().equals(location),

Review comment:
       `sourceTableLocation` is converted from URI, while `location` comes from SQL command. We should qualify the `location` before check equals.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] uncleGen commented on a change in pull request #2380: Spark: Qualify destination location before check when snapshot

Posted by GitBox <gi...@apache.org>.
uncleGen commented on a change in pull request #2380:
URL: https://github.com/apache/iceberg/pull/2380#discussion_r601112723



##########
File path: spark3/src/main/java/org/apache/iceberg/actions/Spark3SnapshotAction.java
##########
@@ -135,10 +140,23 @@ protected TableCatalog checkSourceCatalog(CatalogPlugin catalog) {
 
   @Override
   public SnapshotAction withLocation(String location) {
-    Preconditions.checkArgument(!sourceTableLocation().equals(location),
+    String qualifiedLocation = makeQualifiedLocation(location);
+    Preconditions.checkArgument(!sourceTableLocation().equals(qualifiedLocation),
         "Cannot create snapshot where destination location is the same as the source location." +
             " This would cause a mixing of original table created and snapshot created files.");
     this.destTableLocation = location;
     return this;
   }
+
+  private String makeQualifiedLocation(String location) {
+    try {
+      Path path = new Path(location);
+      Configuration conf = spark().sessionState().newHadoopConf();
+      FileSystem fs = FileSystem.get(path.toUri(), conf);
+      return CatalogUtils.URIToString(fs.makeQualified(path).toUri());
+    } catch (IOException ioe) {
+      LOG.error("Failed to initialize hadoop filesystem.", ioe);
+      return location;

Review comment:
       If failed, back to `location`. Any suggestions?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] uncleGen commented on a change in pull request #2380: Spark: Qualify destination location before check when snapshot

Posted by GitBox <gi...@apache.org>.
uncleGen commented on a change in pull request #2380:
URL: https://github.com/apache/iceberg/pull/2380#discussion_r601028952



##########
File path: spark3/src/main/java/org/apache/iceberg/actions/Spark3SnapshotAction.java
##########
@@ -135,10 +140,23 @@ protected TableCatalog checkSourceCatalog(CatalogPlugin catalog) {
 
   @Override
   public SnapshotAction withLocation(String location) {
-    Preconditions.checkArgument(!sourceTableLocation().equals(location),

Review comment:
       `sourceTableLocation` is converted from URI, while `location` comes from SQL command. We should qualify the `location` before check equals. A example:
   - sourceTableLocation: hdfs://a.b.c.d:9000/source/location
   - location: /snapshot/destination/location




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] uncleGen commented on a change in pull request #2380: Spark: Qualify destination location before check when snapshot

Posted by GitBox <gi...@apache.org>.
uncleGen commented on a change in pull request #2380:
URL: https://github.com/apache/iceberg/pull/2380#discussion_r601028952



##########
File path: spark3/src/main/java/org/apache/iceberg/actions/Spark3SnapshotAction.java
##########
@@ -135,10 +140,23 @@ protected TableCatalog checkSourceCatalog(CatalogPlugin catalog) {
 
   @Override
   public SnapshotAction withLocation(String location) {
-    Preconditions.checkArgument(!sourceTableLocation().equals(location),

Review comment:
       `sourceTableLocation` is converted from URI, while `location` comes from SQL command. We should qualify the `location` before check equals. A example:
   - sourceTableLocation: hdfs://a.b.c.d:9000/source/location
   - location: /source/location




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] uncleGen commented on a change in pull request #2380: Spark: Qualify destination location before check when snapshot

Posted by GitBox <gi...@apache.org>.
uncleGen commented on a change in pull request #2380:
URL: https://github.com/apache/iceberg/pull/2380#discussion_r601028952



##########
File path: spark3/src/main/java/org/apache/iceberg/actions/Spark3SnapshotAction.java
##########
@@ -135,10 +140,23 @@ protected TableCatalog checkSourceCatalog(CatalogPlugin catalog) {
 
   @Override
   public SnapshotAction withLocation(String location) {
-    Preconditions.checkArgument(!sourceTableLocation().equals(location),

Review comment:
       `sourceTableLocation` is converted from URI, while `location` comes from SQL command. We should qualify the `location` before check equals. A example:
   - sourceTableLocation: hdfs://a.b.c.d:9000/source/location
   - location: /source/location
   
   If we compare them with previous logic, it was `false`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] uncleGen commented on a change in pull request #2380: Spark: Qualify destination location before check when snapshot

Posted by GitBox <gi...@apache.org>.
uncleGen commented on a change in pull request #2380:
URL: https://github.com/apache/iceberg/pull/2380#discussion_r601112723



##########
File path: spark3/src/main/java/org/apache/iceberg/actions/Spark3SnapshotAction.java
##########
@@ -135,10 +140,23 @@ protected TableCatalog checkSourceCatalog(CatalogPlugin catalog) {
 
   @Override
   public SnapshotAction withLocation(String location) {
-    Preconditions.checkArgument(!sourceTableLocation().equals(location),
+    String qualifiedLocation = makeQualifiedLocation(location);
+    Preconditions.checkArgument(!sourceTableLocation().equals(qualifiedLocation),
         "Cannot create snapshot where destination location is the same as the source location." +
             " This would cause a mixing of original table created and snapshot created files.");
     this.destTableLocation = location;
     return this;
   }
+
+  private String makeQualifiedLocation(String location) {
+    try {
+      Path path = new Path(location);
+      Configuration conf = spark().sessionState().newHadoopConf();
+      FileSystem fs = FileSystem.get(path.toUri(), conf);
+      return CatalogUtils.URIToString(fs.makeQualified(path).toUri());
+    } catch (IOException ioe) {
+      LOG.error("Failed to initialize hadoop filesystem.", ioe);
+      return location;

Review comment:
       If failed, back to location. Any suggestions?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] uncleGen commented on a change in pull request #2380: Spark: Qualify destination location before check when snapshot

Posted by GitBox <gi...@apache.org>.
uncleGen commented on a change in pull request #2380:
URL: https://github.com/apache/iceberg/pull/2380#discussion_r601028952



##########
File path: spark3/src/main/java/org/apache/iceberg/actions/Spark3SnapshotAction.java
##########
@@ -135,10 +140,23 @@ protected TableCatalog checkSourceCatalog(CatalogPlugin catalog) {
 
   @Override
   public SnapshotAction withLocation(String location) {
-    Preconditions.checkArgument(!sourceTableLocation().equals(location),

Review comment:
       `sourceTableLocation` is converted from URI, while `location` comes from SQL command. We should qualify the `location` before check equals. A fault example:
   - sourceTableLocation: hdfs://a.b.c.d:9000/source/location
   - location: /source/location
   
   If we compare them with previous logic, it was `false`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on a change in pull request #2380: Spark: Qualify destination location before check when snapshot

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on a change in pull request #2380:
URL: https://github.com/apache/iceberg/pull/2380#discussion_r608210448



##########
File path: spark3/src/main/java/org/apache/iceberg/actions/Spark3SnapshotAction.java
##########
@@ -135,10 +140,23 @@ protected TableCatalog checkSourceCatalog(CatalogPlugin catalog) {
 
   @Override
   public SnapshotAction withLocation(String location) {
-    Preconditions.checkArgument(!sourceTableLocation().equals(location),

Review comment:
       Does it mean we need to normalize and resolve the locations?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on a change in pull request #2380: Spark: Qualify destination location before check when snapshot

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on a change in pull request #2380:
URL: https://github.com/apache/iceberg/pull/2380#discussion_r608210129



##########
File path: spark3/src/main/java/org/apache/iceberg/actions/Spark3SnapshotAction.java
##########
@@ -135,10 +140,23 @@ protected TableCatalog checkSourceCatalog(CatalogPlugin catalog) {
 
   @Override
   public SnapshotAction withLocation(String location) {
-    Preconditions.checkArgument(!sourceTableLocation().equals(location),

Review comment:
       We should probably also check if two locations overlap. We will corrupt the original table if the source location is within the original location.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] RussellSpitzer commented on a change in pull request #2380: Spark: Qualify destination location before check when snapshot

Posted by GitBox <gi...@apache.org>.
RussellSpitzer commented on a change in pull request #2380:
URL: https://github.com/apache/iceberg/pull/2380#discussion_r603234562



##########
File path: spark3/src/main/java/org/apache/iceberg/actions/Spark3SnapshotAction.java
##########
@@ -135,10 +140,23 @@ protected TableCatalog checkSourceCatalog(CatalogPlugin catalog) {
 
   @Override
   public SnapshotAction withLocation(String location) {
-    Preconditions.checkArgument(!sourceTableLocation().equals(location),
+    String qualifiedLocation = makeQualifiedLocation(location);
+    Preconditions.checkArgument(!sourceTableLocation().equals(qualifiedLocation),
         "Cannot create snapshot where destination location is the same as the source location." +
             " This would cause a mixing of original table created and snapshot created files.");
     this.destTableLocation = location;
     return this;
   }
+
+  private String makeQualifiedLocation(String location) {
+    try {
+      Path path = new Path(location);
+      Configuration conf = spark().sessionState().newHadoopConf();
+      FileSystem fs = FileSystem.get(path.toUri(), conf);
+      return CatalogUtils.URIToString(fs.makeQualified(path).toUri());
+    } catch (IOException ioe) {
+      LOG.error("Failed to initialize hadoop filesystem.", ioe);
+      return location;

Review comment:
       We should probably just fail, after all if the location string cannot be resolved isn't it going to fail anyway?
   
   This makes me think we should be attempting to make a qualified path before comparing locations, so we can fail with cannot resolve uri before we have a chance to compare.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] RussellSpitzer commented on a change in pull request #2380: Spark: Qualify destination location before check when snapshot

Posted by GitBox <gi...@apache.org>.
RussellSpitzer commented on a change in pull request #2380:
URL: https://github.com/apache/iceberg/pull/2380#discussion_r603235482



##########
File path: spark3/src/main/java/org/apache/iceberg/actions/Spark3SnapshotAction.java
##########
@@ -135,10 +140,23 @@ protected TableCatalog checkSourceCatalog(CatalogPlugin catalog) {
 
   @Override
   public SnapshotAction withLocation(String location) {
-    Preconditions.checkArgument(!sourceTableLocation().equals(location),

Review comment:
       Good catch!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org