You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2021/11/05 14:26:41 UTC

[GitHub] [druid] gianm opened a new pull request #11879: Migrate File.mkdirs to FileUtils.mkdirp.

gianm opened a new pull request #11879:
URL: https://github.com/apache/druid/pull/11879


   The changes:
   
   - Add `void mkdirp(File directory) throws IOException` to FileUtils, which has semantics like Unix `mkdir -p`. It creates a directory if it does not exist, returns quietly if it already exists, and generates an error if it could not be created.
   - Migrate usages of File.mkdirs to the new function and add File.mkdirs to the forbidden functions list.
   
   The new method is better for two reasons:
   
   First, File.mkdirs makes it easy to ignore errors, since it returns a boolean. Unchecked usage is common and leads to confusing errors later, like "Cannot create file [dir/file]", instead of more-specific errors upfront like "Cannot create directory [dir]".
   
   Second, File.mkdirs is not concurrency safe. "Natural" patterns like this with File.mkdirs are prone to race conditions, because two `dir.mkdirs()` that run concurrently cannot both succeed:
   
   ```
   if (!dir.exists() && !dir.mkdirs()) {
     throw new IOException();
   }
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jihoonson commented on a change in pull request #11879: Migrate File.mkdirs to FileUtils.mkdirp.

Posted by GitBox <gi...@apache.org>.
jihoonson commented on a change in pull request #11879:
URL: https://github.com/apache/druid/pull/11879#discussion_r743985094



##########
File path: core/src/main/java/org/apache/druid/java/util/common/FileUtils.java
##########
@@ -379,6 +379,27 @@ public static File createTempDir(@Nullable final String prefix)
     }
   }
 
+  /**
+   * Create "directory" and all intermediate directories as needed. If the directory is successfully created, or already
+   * exists, returns quietly. Otherwise, throws an IOException.
+   *
+   * Simpler to use than {@link File#mkdirs()}, and more reliable since it is safe from races where two threads try

Review comment:
       BTW, my comment is not a blocker. I just don't want to give callers of this method an impression that the race is OK.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] gianm commented on a change in pull request #11879: Migrate File.mkdirs to FileUtils.mkdirp.

Posted by GitBox <gi...@apache.org>.
gianm commented on a change in pull request #11879:
URL: https://github.com/apache/druid/pull/11879#discussion_r743987838



##########
File path: core/src/main/java/org/apache/druid/java/util/common/FileUtils.java
##########
@@ -379,6 +379,27 @@ public static File createTempDir(@Nullable final String prefix)
     }
   }
 
+  /**
+   * Create "directory" and all intermediate directories as needed. If the directory is successfully created, or already
+   * exists, returns quietly. Otherwise, throws an IOException.
+   *
+   * Simpler to use than {@link File#mkdirs()}, and more reliable since it is safe from races where two threads try

Review comment:
       Ah, I see. Well: to avoid reporting an error when the directory already exists, you have to do either `isDirectory() || mkdirs()` or `mkdirs() || isDirectory()`. The first one is prone to races and the second is race-proof. So, since you have to pick one, we might as well do the second.
   
   One place where this would matter is the LocalDataSegmentPusher, which needs to create a directory for a datasource when pushing a new segment. But what happens if two threads (or even two task JVMs on the same server) need to push a segment for the same datasource at the same time? It needs a race-proof approach for that to work properly. It's OK today, because it uses `org.apache.commons.io.FileUtils.forceMkdir` which is race-proof.
   
   But, it'd be good for it to be able to safely use our FileUtils. Actually, if you don't mind I'd like to migrate it in this PR and add `org.apache.commons.io.FileUtils.forceMkdir` to the forbidden-apis list. Two reasons for that: (1) even though the commons-io version is race-proof, it's not guaranteed by the javadoc, so it may not be safe to rely on it; (2), it's annoying to use both FileUtils, because you can't import both in the same file.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jihoonson commented on a change in pull request #11879: Migrate File.mkdirs to FileUtils.mkdirp.

Posted by GitBox <gi...@apache.org>.
jihoonson commented on a change in pull request #11879:
URL: https://github.com/apache/druid/pull/11879#discussion_r743985094



##########
File path: core/src/main/java/org/apache/druid/java/util/common/FileUtils.java
##########
@@ -379,6 +379,27 @@ public static File createTempDir(@Nullable final String prefix)
     }
   }
 
+  /**
+   * Create "directory" and all intermediate directories as needed. If the directory is successfully created, or already
+   * exists, returns quietly. Otherwise, throws an IOException.
+   *
+   * Simpler to use than {@link File#mkdirs()}, and more reliable since it is safe from races where two threads try

Review comment:
       BTW, my comment is not a blocker. I just don't want to give callers of this method an impression that the race is OK. Maybe we can clarify it in the javadoc.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] gianm commented on a change in pull request #11879: Migrate File.mkdirs to FileUtils.mkdirp.

Posted by GitBox <gi...@apache.org>.
gianm commented on a change in pull request #11879:
URL: https://github.com/apache/druid/pull/11879#discussion_r744004597



##########
File path: core/src/main/java/org/apache/druid/java/util/common/FileUtils.java
##########
@@ -379,6 +379,27 @@ public static File createTempDir(@Nullable final String prefix)
     }
   }
 
+  /**
+   * Create "directory" and all intermediate directories as needed. If the directory is successfully created, or already
+   * exists, returns quietly. Otherwise, throws an IOException.
+   *
+   * Simpler to use than {@link File#mkdirs()}, and more reliable since it is safe from races where two threads try

Review comment:
       Updated.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jihoonson commented on a change in pull request #11879: Migrate File.mkdirs to FileUtils.mkdirp.

Posted by GitBox <gi...@apache.org>.
jihoonson commented on a change in pull request #11879:
URL: https://github.com/apache/druid/pull/11879#discussion_r743873821



##########
File path: core/src/main/java/org/apache/druid/java/util/common/FileUtils.java
##########
@@ -379,6 +379,27 @@ public static File createTempDir(@Nullable final String prefix)
     }
   }
 
+  /**
+   * Create "directory" and all intermediate directories as needed. If the directory is successfully created, or already
+   * exists, returns quietly. Otherwise, throws an IOException.
+   *
+   * Simpler to use than {@link File#mkdirs()}, and more reliable since it is safe from races where two threads try

Review comment:
       This method seems similar to `org.apache.commons.io.FileUtils.forceMkdir()` besides this handling for the race, and so perhaps we can fix the anti-pattern if we have and use it instead of adding a new method?

##########
File path: core/src/main/java/org/apache/druid/java/util/common/FileUtils.java
##########
@@ -379,6 +379,27 @@ public static File createTempDir(@Nullable final String prefix)
     }
   }
 
+  /**
+   * Create "directory" and all intermediate directories as needed. If the directory is successfully created, or already
+   * exists, returns quietly. Otherwise, throws an IOException.
+   *
+   * Simpler to use than {@link File#mkdirs()}, and more reliable since it is safe from races where two threads try

Review comment:
       Do we have this issue today? Or is this a future-proof? The race in creating directory sounds like an anti-pattern because.. why not using a single thread to prepare the directory and then letting other threads write their files without race?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] gianm commented on a change in pull request #11879: Migrate File.mkdirs to FileUtils.mkdirp.

Posted by GitBox <gi...@apache.org>.
gianm commented on a change in pull request #11879:
URL: https://github.com/apache/druid/pull/11879#discussion_r743970973



##########
File path: core/src/main/java/org/apache/druid/java/util/common/FileUtils.java
##########
@@ -379,6 +379,27 @@ public static File createTempDir(@Nullable final String prefix)
     }
   }
 
+  /**
+   * Create "directory" and all intermediate directories as needed. If the directory is successfully created, or already
+   * exists, returns quietly. Otherwise, throws an IOException.
+   *
+   * Simpler to use than {@link File#mkdirs()}, and more reliable since it is safe from races where two threads try

Review comment:
       For the potential race, I don't know if we have this issue today. I did notice it when goofing off with some new code in a branch, but I'm not sure if any production code today has a problem here.
   
   I still thought the patch was a good idea for two reasons:
   
   1. There are a bunch of places that don't check the return value of `File.mkdirs()`, and switching to this new method would make them fail faster if the directory cannot be created, which I thought would generally be good. (Otherwise, the errors will happen later and more indirectly, perhaps being harder to debug.)
   2. The fix to the race is really simple, so I didn't see why _not_ include it. (The fix is to do `mkdirs() || isDirectory()` instead of just `mkdirs()`).
   
   As to `org.apache.commons.io.FileUtils.forceMkdir()`, I didn't notice that method until now, so thanks for pointing it out. It looks almost identical to the one in this PR, except it skips the initial `isDirectory()` check. But it's still race-safe because it does have the final `isDirectory()` check. And actually, I think the initial `isDirectory()` check is pointless anyway, since `mkdirs()` will exit `false()` quickly if the directory already exists. So I just pushed a new commit that removes it.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] gianm commented on a change in pull request #11879: Migrate File.mkdirs to FileUtils.mkdirp.

Posted by GitBox <gi...@apache.org>.
gianm commented on a change in pull request #11879:
URL: https://github.com/apache/druid/pull/11879#discussion_r743987838



##########
File path: core/src/main/java/org/apache/druid/java/util/common/FileUtils.java
##########
@@ -379,6 +379,27 @@ public static File createTempDir(@Nullable final String prefix)
     }
   }
 
+  /**
+   * Create "directory" and all intermediate directories as needed. If the directory is successfully created, or already
+   * exists, returns quietly. Otherwise, throws an IOException.
+   *
+   * Simpler to use than {@link File#mkdirs()}, and more reliable since it is safe from races where two threads try

Review comment:
       Ah, I see. Well: to avoid reporting an error when the directory already exists, you have to do either `isDirectory() || mkdirs()` or `mkdirs() || isDirectory()`. The first one is prone to races and the second is race-proof. So, I'd rather do the second.
   
   One place where this would matter is the LocalDataSegmentPusher, which needs to create a directory for a datasource when pushing a new segment. But what happens if two threads (or even two task JVMs on the same server) need to push a segment for the same datasource at the same time? It needs a race-proof approach for that to work properly. It's OK today, because it uses `org.apache.commons.io.FileUtils.forceMkdir` which is race-proof.
   
   But, it'd be good for it to be able to safely use our FileUtils. Actually, if you don't mind I'd like to migrate it in this PR and add `org.apache.commons.io.FileUtils.forceMkdir` to the forbidden-apis list. Two reasons for that: (1) even though the commons-io version is race-proof, it's not guaranteed by the javadoc, so it may not be safe to rely on it; (2), it's annoying to use both FileUtils, because you can't import both in the same file.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] gianm merged pull request #11879: Migrate File.mkdirs to FileUtils.mkdirp.

Posted by GitBox <gi...@apache.org>.
gianm merged pull request #11879:
URL: https://github.com/apache/druid/pull/11879


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jihoonson commented on a change in pull request #11879: Migrate File.mkdirs to FileUtils.mkdirp.

Posted by GitBox <gi...@apache.org>.
jihoonson commented on a change in pull request #11879:
URL: https://github.com/apache/druid/pull/11879#discussion_r743990346



##########
File path: core/src/main/java/org/apache/druid/java/util/common/FileUtils.java
##########
@@ -379,6 +379,27 @@ public static File createTempDir(@Nullable final String prefix)
     }
   }
 
+  /**
+   * Create "directory" and all intermediate directories as needed. If the directory is successfully created, or already
+   * exists, returns quietly. Otherwise, throws an IOException.
+   *
+   * Simpler to use than {@link File#mkdirs()}, and more reliable since it is safe from races where two threads try

Review comment:
       > But, it'd be good for it to be able to safely use our FileUtils. Actually, if you don't mind I'd like to migrate it in this PR and add `org.apache.commons.io.FileUtils.forceMkdir` to the forbidden-apis list. Two reasons for that: (1) even though the commons-io version is race-proof, it's not guaranteed by the javadoc, so it may not be safe to rely on it; (2), it's annoying to use both FileUtils, because you can't import both in the same file.
   
   Sounds good to me :+1: 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jihoonson commented on a change in pull request #11879: Migrate File.mkdirs to FileUtils.mkdirp.

Posted by GitBox <gi...@apache.org>.
jihoonson commented on a change in pull request #11879:
URL: https://github.com/apache/druid/pull/11879#discussion_r743977179



##########
File path: core/src/main/java/org/apache/druid/java/util/common/FileUtils.java
##########
@@ -379,6 +379,27 @@ public static File createTempDir(@Nullable final String prefix)
     }
   }
 
+  /**
+   * Create "directory" and all intermediate directories as needed. If the directory is successfully created, or already
+   * exists, returns quietly. Otherwise, throws an IOException.
+   *
+   * Simpler to use than {@link File#mkdirs()}, and more reliable since it is safe from races where two threads try

Review comment:
       Oh, let me clarify myself. I do agree with discouraging use of `File.mkdirs()`. I'm OK even with adding this new method. I'm just not sure about the handling of race since letting multiple threads attempt to create the same directory is probably not a good idea in most cases. Even though the fix of the race is simple, I would not want to encourage such pattern unless it is proven to be useful in some case. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jihoonson commented on a change in pull request #11879: Migrate File.mkdirs to FileUtils.mkdirp.

Posted by GitBox <gi...@apache.org>.
jihoonson commented on a change in pull request #11879:
URL: https://github.com/apache/druid/pull/11879#discussion_r743873821



##########
File path: core/src/main/java/org/apache/druid/java/util/common/FileUtils.java
##########
@@ -379,6 +379,27 @@ public static File createTempDir(@Nullable final String prefix)
     }
   }
 
+  /**
+   * Create "directory" and all intermediate directories as needed. If the directory is successfully created, or already
+   * exists, returns quietly. Otherwise, throws an IOException.
+   *
+   * Simpler to use than {@link File#mkdirs()}, and more reliable since it is safe from races where two threads try

Review comment:
       This method seems similar to `org.apache.commons.io.FileUtils.forceMkdir()` besides this handling for the race, and so perhaps we can fix the anti-pattern if we have and use it instead of adding a new method?

##########
File path: core/src/main/java/org/apache/druid/java/util/common/FileUtils.java
##########
@@ -379,6 +379,27 @@ public static File createTempDir(@Nullable final String prefix)
     }
   }
 
+  /**
+   * Create "directory" and all intermediate directories as needed. If the directory is successfully created, or already
+   * exists, returns quietly. Otherwise, throws an IOException.
+   *
+   * Simpler to use than {@link File#mkdirs()}, and more reliable since it is safe from races where two threads try

Review comment:
       Do we have this issue today? Or is this a future-proof? The race in creating directory sounds like an anti-pattern because.. why not using a single thread to prepare the directory and then letting other threads write their files without race?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jihoonson commented on a change in pull request #11879: Migrate File.mkdirs to FileUtils.mkdirp.

Posted by GitBox <gi...@apache.org>.
jihoonson commented on a change in pull request #11879:
URL: https://github.com/apache/druid/pull/11879#discussion_r743873821



##########
File path: core/src/main/java/org/apache/druid/java/util/common/FileUtils.java
##########
@@ -379,6 +379,27 @@ public static File createTempDir(@Nullable final String prefix)
     }
   }
 
+  /**
+   * Create "directory" and all intermediate directories as needed. If the directory is successfully created, or already
+   * exists, returns quietly. Otherwise, throws an IOException.
+   *
+   * Simpler to use than {@link File#mkdirs()}, and more reliable since it is safe from races where two threads try

Review comment:
       This method seems similar to `org.apache.commons.io.FileUtils.forceMkdir()` besides this handling for the race, and so perhaps we can fix the anti-pattern if we have and use it instead of adding a new method?

##########
File path: core/src/main/java/org/apache/druid/java/util/common/FileUtils.java
##########
@@ -379,6 +379,27 @@ public static File createTempDir(@Nullable final String prefix)
     }
   }
 
+  /**
+   * Create "directory" and all intermediate directories as needed. If the directory is successfully created, or already
+   * exists, returns quietly. Otherwise, throws an IOException.
+   *
+   * Simpler to use than {@link File#mkdirs()}, and more reliable since it is safe from races where two threads try

Review comment:
       Do we have this issue today? Or is this a future-proof? The race in creating directory sounds like an anti-pattern because.. why not using a single thread to prepare the directory and then letting other threads write their files without race?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org