You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2020/04/16 00:04:05 UTC

[GitHub] [incubator-iceberg] aokolnychyi opened a new pull request #930: Spark: limit the listing depth in RemoveOrphanFilesAction

aokolnychyi opened a new pull request #930: Spark: limit the listing depth in RemoveOrphanFilesAction
URL: https://github.com/apache/incubator-iceberg/pull/930
 
 
   This PR limits the max depth of listing in `RemoveOrphanFilesAction`.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [incubator-iceberg] aokolnychyi commented on a change in pull request #930: Spark: limit the listing depth in RemoveOrphanFilesAction

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on a change in pull request #930: Spark: limit the listing depth in RemoveOrphanFilesAction
URL: https://github.com/apache/incubator-iceberg/pull/930#discussion_r409756784
 
 

 ##########
 File path: spark/src/main/java/org/apache/iceberg/actions/RemoveOrphanFilesAction.java
 ##########
 @@ -272,13 +272,17 @@ private String metadataTableName(MetadataTableType type) {
 
       Predicate<FileStatus> predicate = file -> file.getModificationTime() < olderThanTimestamp;
 
-      int maxDepth = Integer.MAX_VALUE;
+      int maxDepth = 2000;
       int maxDirectSubDirs = Integer.MAX_VALUE;
 
       dirs.forEachRemaining(dir -> {
         listDirRecursively(dir, predicate, conf.value().value(), maxDepth, maxDirectSubDirs, subDirs, files);
       });
 
+      if (!subDirs.isEmpty()) {
+        throw new RuntimeException("Could not list dirs: " + subDirs);
 
 Review comment:
   Updated. Unfortunately, we don't have `tableLocation` here as it is called for subdirs on executors. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [incubator-iceberg] rdblue merged pull request #930: Spark: limit the listing depth in RemoveOrphanFilesAction

Posted by GitBox <gi...@apache.org>.
rdblue merged pull request #930: Spark: limit the listing depth in RemoveOrphanFilesAction
URL: https://github.com/apache/incubator-iceberg/pull/930
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [incubator-iceberg] rdblue commented on a change in pull request #930: Spark: limit the listing depth in RemoveOrphanFilesAction

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #930: Spark: limit the listing depth in RemoveOrphanFilesAction
URL: https://github.com/apache/incubator-iceberg/pull/930#discussion_r409210943
 
 

 ##########
 File path: spark/src/main/java/org/apache/iceberg/actions/RemoveOrphanFilesAction.java
 ##########
 @@ -272,13 +272,17 @@ private String metadataTableName(MetadataTableType type) {
 
       Predicate<FileStatus> predicate = file -> file.getModificationTime() < olderThanTimestamp;
 
-      int maxDepth = Integer.MAX_VALUE;
+      int maxDepth = 2000;
       int maxDirectSubDirs = Integer.MAX_VALUE;
 
       dirs.forEachRemaining(dir -> {
         listDirRecursively(dir, predicate, conf.value().value(), maxDepth, maxDirectSubDirs, subDirs, files);
       });
 
+      if (!subDirs.isEmpty()) {
+        throw new RuntimeException("Could not list dirs: " + subDirs);
 
 Review comment:
   Can we use a better message here? I think this will only happen when the maximum depth is reached, so we could state that's what happened: `"Could not list location %s, reached maximum subdirectory depth %s", tableLocation, maxDepth`

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org