You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2021/09/24 14:05:37 UTC

[GitHub] [iceberg] rajarshisarkar opened a new pull request #3175: AWS: Add check to create staging directory if not exists for S3OutputStream

rajarshisarkar opened a new pull request #3175:
URL: https://github.com/apache/iceberg/pull/3175


   Fix for #2991 
   
   Seems like for EMR 6.x (Spark 3.x) Iceberg attempts to write the file before the `tmp` directory is created. This results in `java.io.IOException: No such file or directory` exception. 
   
   The `tmp` files look something like this by default (here `System.getProperty("java.io.tmpdir")` is `/mnt1/yarn/usercache/hadoop/appcache/application_1632368478936_0042/container_1632368478936_0042_01_000001/tmp`):
   
   ```
   /mnt1/yarn/usercache/hadoop/appcache/application_1632368478936_0042/container_1632368478936_0042_01_000001/tmp/s3fileio-2279038237918274355.tmp
   /mnt1/yarn/usercache/hadoop/appcache/application_1632368478936_0042/container_1632368478936_0042_01_000001/tmp/s3fileio-2279038237918274355.tmp
   /mnt1/yarn/usercache/hadoop/appcache/application_1632368478936_0042/container_1632368478936_0042_01_000001/tmp/s3fileio-2279038237918274355.tmp
   ```
   
   (or) something like this if `s3fileIoStagingDirectory` is set via `s3.staging-dir` (`/mnt/tmp` for this case):
   ```
   /mnt/tmp/s3fileio-6715538145691725671.tmp
   /mnt/tmp/s3fileio-5266666342669697861.tmp
   /mnt/tmp/s3fileio-4592572815837116260.tmp
   ```
   
   Adding a defensive check to ensure we check the existence of the staging directory before writing the temp files. With this check I could see the executor tasks succeeding in `cluster` deploy mode consistently.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rajarshisarkar commented on a change in pull request #3175: AWS: Add check to create staging directory if not exists for S3OutputStream

Posted by GitBox <gi...@apache.org>.
rajarshisarkar commented on a change in pull request #3175:
URL: https://github.com/apache/iceberg/pull/3175#discussion_r716591880



##########
File path: aws/src/main/java/org/apache/iceberg/aws/s3/S3OutputStream.java
##########
@@ -175,8 +175,15 @@ private void newStream() throws IOException {
       stream.close();
     }
 
+    boolean createStagingDirectory = false;
+    if (!stagingDirectory.exists()) {
+      createStagingDirectory = stagingDirectory.mkdirs();

Review comment:
       Handled `SecurityException` and added loggers.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on a change in pull request #3175: AWS: Add check to create staging directory if not exists for S3OutputStream

Posted by GitBox <gi...@apache.org>.
jackye1995 commented on a change in pull request #3175:
URL: https://github.com/apache/iceberg/pull/3175#discussion_r715761064



##########
File path: aws/src/main/java/org/apache/iceberg/aws/s3/S3OutputStream.java
##########
@@ -175,8 +175,15 @@ private void newStream() throws IOException {
       stream.close();
     }
 
+    boolean createStagingDirectory = false;
+    if (!stagingDirectory.exists()) {
+      createStagingDirectory = stagingDirectory.mkdirs();

Review comment:
       based on the documentation, this can also throw `SecurityException` that needs to be handled. We should also add some logging if staging directory is created, and if creation failed for some reason.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rajarshisarkar commented on a change in pull request #3175: AWS: Add check to create staging directory if not exists for S3OutputStream

Posted by GitBox <gi...@apache.org>.
rajarshisarkar commented on a change in pull request #3175:
URL: https://github.com/apache/iceberg/pull/3175#discussion_r716592490



##########
File path: aws/src/main/java/org/apache/iceberg/aws/s3/S3OutputStream.java
##########
@@ -175,8 +175,15 @@ private void newStream() throws IOException {
       stream.close();
     }
 
+    boolean createStagingDirectory = false;
+    if (!stagingDirectory.exists()) {
+      createStagingDirectory = stagingDirectory.mkdirs();
+    }

Review comment:
       Addressed. Also, separated out the logic in a method: `createStagingDirectoryIfNotExists()`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on a change in pull request #3175: AWS: Add check to create staging directory if not exists for S3OutputStream

Posted by GitBox <gi...@apache.org>.
jackye1995 commented on a change in pull request #3175:
URL: https://github.com/apache/iceberg/pull/3175#discussion_r715761241



##########
File path: aws/src/main/java/org/apache/iceberg/aws/s3/S3OutputStream.java
##########
@@ -175,8 +175,15 @@ private void newStream() throws IOException {
       stream.close();
     }
 
+    boolean createStagingDirectory = false;
+    if (!stagingDirectory.exists()) {
+      createStagingDirectory = stagingDirectory.mkdirs();
+    }

Review comment:
       nit: newline needed after a control statement

##########
File path: aws/src/main/java/org/apache/iceberg/aws/s3/S3OutputStream.java
##########
@@ -175,8 +175,15 @@ private void newStream() throws IOException {
       stream.close();
     }
 
+    boolean createStagingDirectory = false;
+    if (!stagingDirectory.exists()) {
+      createStagingDirectory = stagingDirectory.mkdirs();
+    }
     currentStagingFile = File.createTempFile("s3fileio-", ".tmp", stagingDirectory);
     currentStagingFile.deleteOnExit();
+    if (createStagingDirectory) {
+      stagingDirectory.deleteOnExit();
+    }

Review comment:
       nit: newline needed after a control statement




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rajarshisarkar commented on a change in pull request #3175: AWS: Add check to create staging directory if not exists for S3OutputStream

Posted by GitBox <gi...@apache.org>.
rajarshisarkar commented on a change in pull request #3175:
URL: https://github.com/apache/iceberg/pull/3175#discussion_r717589022



##########
File path: api/src/main/java/org/apache/iceberg/io/OutputFile.java
##########
@@ -48,6 +49,7 @@
    *
    * @return an output stream that can report its position
    * @throws RuntimeIOException If the implementation throws an {@link IOException}
+   * @throws AccessControlException If the implementation throws an {@link SecurityException}

Review comment:
       I have made the changes accordingly.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rajarshisarkar commented on a change in pull request #3175: AWS: Add check to create staging directory if not exists for S3OutputStream

Posted by GitBox <gi...@apache.org>.
rajarshisarkar commented on a change in pull request #3175:
URL: https://github.com/apache/iceberg/pull/3175#discussion_r717588365



##########
File path: aws/src/main/java/org/apache/iceberg/aws/s3/S3OutputStream.java
##########
@@ -170,11 +171,12 @@ public void write(byte[] b, int off, int len) throws IOException {
     }
   }
 
-  private void newStream() throws IOException {
+  private void newStream() throws IOException, SecurityException {

Review comment:
       Removed the special handling for `SecurityException`.

##########
File path: aws/src/main/java/org/apache/iceberg/aws/s3/S3OutputFile.java
##########
@@ -62,6 +63,9 @@ public PositionOutputStream createOrOverwrite() {
       return new S3OutputStream(client(), uri(), awsProperties());
     } catch (IOException e) {
       throw new UncheckedIOException("Failed to create output stream for location: " + uri(), e);
+    } catch (SecurityException e) {

Review comment:
       Removed the special handling for `SecurityException`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on a change in pull request #3175: AWS: Add check to create staging directory if not exists for S3OutputStream

Posted by GitBox <gi...@apache.org>.
jackye1995 commented on a change in pull request #3175:
URL: https://github.com/apache/iceberg/pull/3175#discussion_r717275514



##########
File path: aws/src/main/java/org/apache/iceberg/aws/s3/S3OutputStream.java
##########
@@ -170,11 +171,12 @@ public void write(byte[] b, int off, int len) throws IOException {
     }
   }
 
-  private void newStream() throws IOException {
+  private void newStream() throws IOException, SecurityException {

Review comment:
       Sorry for the back and forth, I read the documentation for mkdirs, it seems like SecurityException only catches JVM level permission and it might still just return false for OS level permission failure, so it's hard to have a consistent behavior for error handling. Because of that, plus the fact that SecurityException is a runtime exception, I think we can remove the special handling of it and just let it throw to the top level.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on a change in pull request #3175: AWS: Add check to create staging directory if not exists for S3OutputStream

Posted by GitBox <gi...@apache.org>.
jackye1995 commented on a change in pull request #3175:
URL: https://github.com/apache/iceberg/pull/3175#discussion_r717747863



##########
File path: aws/src/main/java/org/apache/iceberg/aws/s3/S3OutputStream.java
##########
@@ -328,6 +330,26 @@ private static InputStream uncheckedInputStream(File file) {
     }
   }
 
+  private void createStagingDirectoryIfNotExists() throws IOException, SecurityException {
+    if (!stagingDirectory.exists()) {
+      LOG.info("Staging directory: {} does not exist, trying to create one",
+          stagingDirectory.getAbsolutePath());
+      boolean createdStagingDirectory = stagingDirectory.mkdirs();
+      if (createdStagingDirectory) {
+        LOG.info("Successfully created staging directory: {}", stagingDirectory.getAbsolutePath());
+      } else {
+        if (stagingDirectory.exists()) {
+          LOG.info("Staging directory: {} is created by another process",
+              stagingDirectory.getAbsolutePath());
+        } else {
+          throw new IOException(String
+              .format("Staging directory: %s creation failed due to some unknown reason",

Review comment:
       nit: match the log message at L346, "Fail to create staging directory due to some unknown reason: {}"




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on a change in pull request #3175: AWS: Add check to create staging directory if not exists for S3OutputStream

Posted by GitBox <gi...@apache.org>.
jackye1995 commented on a change in pull request #3175:
URL: https://github.com/apache/iceberg/pull/3175#discussion_r717752923



##########
File path: aws/src/main/java/org/apache/iceberg/aws/s3/S3OutputStream.java
##########
@@ -170,11 +171,12 @@ public void write(byte[] b, int off, int len) throws IOException {
     }
   }
 
-  private void newStream() throws IOException {
+  private void newStream() throws IOException, SecurityException {

Review comment:
       Thanks, can you also remove the `SecurityException` after `throws`? We don't need to throw runtime exception explicitly, just need to document it at top level which you already did.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rajarshisarkar commented on a change in pull request #3175: AWS: Add check to create staging directory if not exists for S3OutputStream

Posted by GitBox <gi...@apache.org>.
rajarshisarkar commented on a change in pull request #3175:
URL: https://github.com/apache/iceberg/pull/3175#discussion_r717811637



##########
File path: aws/src/main/java/org/apache/iceberg/aws/s3/S3OutputStream.java
##########
@@ -328,6 +330,26 @@ private static InputStream uncheckedInputStream(File file) {
     }
   }
 
+  private void createStagingDirectoryIfNotExists() throws IOException, SecurityException {
+    if (!stagingDirectory.exists()) {
+      LOG.info("Staging directory: {} does not exist, trying to create one",
+          stagingDirectory.getAbsolutePath());
+      boolean createdStagingDirectory = stagingDirectory.mkdirs();
+      if (createdStagingDirectory) {
+        LOG.info("Successfully created staging directory: {}", stagingDirectory.getAbsolutePath());
+      } else {
+        if (stagingDirectory.exists()) {
+          LOG.info("Staging directory: {} is created by another process",

Review comment:
       Made the change.

##########
File path: aws/src/main/java/org/apache/iceberg/aws/s3/S3OutputStream.java
##########
@@ -328,6 +330,26 @@ private static InputStream uncheckedInputStream(File file) {
     }
   }
 
+  private void createStagingDirectoryIfNotExists() throws IOException, SecurityException {
+    if (!stagingDirectory.exists()) {
+      LOG.info("Staging directory: {} does not exist, trying to create one",
+          stagingDirectory.getAbsolutePath());
+      boolean createdStagingDirectory = stagingDirectory.mkdirs();
+      if (createdStagingDirectory) {
+        LOG.info("Successfully created staging directory: {}", stagingDirectory.getAbsolutePath());
+      } else {
+        if (stagingDirectory.exists()) {
+          LOG.info("Staging directory: {} is created by another process",
+              stagingDirectory.getAbsolutePath());
+        } else {
+          throw new IOException(String
+              .format("Staging directory: %s creation failed due to some unknown reason",

Review comment:
       Made the change.

##########
File path: aws/src/main/java/org/apache/iceberg/aws/s3/S3OutputStream.java
##########
@@ -86,7 +86,8 @@
   private boolean closed = false;
 
   @SuppressWarnings("StaticAssignmentInConstructor")
-  S3OutputStream(S3Client s3, S3URI location, AwsProperties awsProperties) throws IOException {
+  S3OutputStream(S3Client s3, S3URI location, AwsProperties awsProperties)
+      throws IOException, SecurityException {

Review comment:
       Removed it.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rajarshisarkar commented on a change in pull request #3175: AWS: Add check to create staging directory if not exists for S3OutputStream

Posted by GitBox <gi...@apache.org>.
rajarshisarkar commented on a change in pull request #3175:
URL: https://github.com/apache/iceberg/pull/3175#discussion_r717812156



##########
File path: aws/src/main/java/org/apache/iceberg/aws/s3/S3OutputStream.java
##########
@@ -170,11 +171,12 @@ public void write(byte[] b, int off, int len) throws IOException {
     }
   }
 
-  private void newStream() throws IOException {
+  private void newStream() throws IOException, SecurityException {

Review comment:
       Yeah, need not to throw explicitly. Removed it.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rajarshisarkar commented on a change in pull request #3175: AWS: Add check to create staging directory if not exists for S3OutputStream

Posted by GitBox <gi...@apache.org>.
rajarshisarkar commented on a change in pull request #3175:
URL: https://github.com/apache/iceberg/pull/3175#discussion_r716591880



##########
File path: aws/src/main/java/org/apache/iceberg/aws/s3/S3OutputStream.java
##########
@@ -175,8 +175,15 @@ private void newStream() throws IOException {
       stream.close();
     }
 
+    boolean createStagingDirectory = false;
+    if (!stagingDirectory.exists()) {
+      createStagingDirectory = stagingDirectory.mkdirs();

Review comment:
       Handled `SecurityException` and added loggers for success scenario. Throwing the exception in case of failure scenario.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on pull request #3175: AWS: Add check to create staging directory if not exists for S3OutputStream

Posted by GitBox <gi...@apache.org>.
jackye1995 commented on pull request #3175:
URL: https://github.com/apache/iceberg/pull/3175#issuecomment-929389831


   Thank you for the work, looks good to me overall, just some styling changes then it should be good to go!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rajarshisarkar commented on a change in pull request #3175: AWS: Add check to create staging directory if not exists for S3OutputStream

Posted by GitBox <gi...@apache.org>.
rajarshisarkar commented on a change in pull request #3175:
URL: https://github.com/apache/iceberg/pull/3175#discussion_r717587514



##########
File path: aws/src/main/java/org/apache/iceberg/aws/s3/S3OutputStream.java
##########
@@ -328,6 +330,15 @@ private static InputStream uncheckedInputStream(File file) {
     }
   }
 
+  private void createStagingDirectoryIfNotExists() throws SecurityException {
+    if (!stagingDirectory.exists()) {
+      boolean createdStagingDirectory = stagingDirectory.mkdirs();
+      if (createdStagingDirectory) {
+        LOG.info("Successfully created staging directory: {}", stagingDirectory.getAbsolutePath());
+      }

Review comment:
       Agreed, I have made the changes accordingly.

##########
File path: aws/src/main/java/org/apache/iceberg/aws/s3/S3OutputStream.java
##########
@@ -328,6 +330,15 @@ private static InputStream uncheckedInputStream(File file) {
     }
   }
 
+  private void createStagingDirectoryIfNotExists() throws SecurityException {
+    if (!stagingDirectory.exists()) {

Review comment:
       Added the logger.

##########
File path: aws/src/main/java/org/apache/iceberg/aws/s3/S3OutputStream.java
##########
@@ -170,11 +171,12 @@ public void write(byte[] b, int off, int len) throws IOException {
     }
   }
 
-  private void newStream() throws IOException {
+  private void newStream() throws IOException, SecurityException {

Review comment:
       Removed the special handling for `SecurityException`.

##########
File path: aws/src/main/java/org/apache/iceberg/aws/s3/S3OutputFile.java
##########
@@ -62,6 +63,9 @@ public PositionOutputStream createOrOverwrite() {
       return new S3OutputStream(client(), uri(), awsProperties());
     } catch (IOException e) {
       throw new UncheckedIOException("Failed to create output stream for location: " + uri(), e);
+    } catch (SecurityException e) {

Review comment:
       Removed the special handling for `SecurityException`.

##########
File path: api/src/main/java/org/apache/iceberg/io/OutputFile.java
##########
@@ -48,6 +49,7 @@
    *
    * @return an output stream that can report its position
    * @throws RuntimeIOException If the implementation throws an {@link IOException}
+   * @throws AccessControlException If the implementation throws an {@link SecurityException}

Review comment:
       I have made the changes accordingly.

##########
File path: aws/src/main/java/org/apache/iceberg/aws/s3/S3OutputStream.java
##########
@@ -328,6 +330,26 @@ private static InputStream uncheckedInputStream(File file) {
     }
   }
 
+  private void createStagingDirectoryIfNotExists() throws IOException, SecurityException {
+    if (!stagingDirectory.exists()) {
+      LOG.info("Staging directory: {} does not exist, trying to create one",
+          stagingDirectory.getAbsolutePath());
+      boolean createdStagingDirectory = stagingDirectory.mkdirs();
+      if (createdStagingDirectory) {
+        LOG.info("Successfully created staging directory: {}", stagingDirectory.getAbsolutePath());
+      } else {
+        if (stagingDirectory.exists()) {
+          LOG.info("Staging directory: {} is created by another process",

Review comment:
       Made the change.

##########
File path: aws/src/main/java/org/apache/iceberg/aws/s3/S3OutputStream.java
##########
@@ -328,6 +330,26 @@ private static InputStream uncheckedInputStream(File file) {
     }
   }
 
+  private void createStagingDirectoryIfNotExists() throws IOException, SecurityException {
+    if (!stagingDirectory.exists()) {
+      LOG.info("Staging directory: {} does not exist, trying to create one",
+          stagingDirectory.getAbsolutePath());
+      boolean createdStagingDirectory = stagingDirectory.mkdirs();
+      if (createdStagingDirectory) {
+        LOG.info("Successfully created staging directory: {}", stagingDirectory.getAbsolutePath());
+      } else {
+        if (stagingDirectory.exists()) {
+          LOG.info("Staging directory: {} is created by another process",
+              stagingDirectory.getAbsolutePath());
+        } else {
+          throw new IOException(String
+              .format("Staging directory: %s creation failed due to some unknown reason",

Review comment:
       Made the change.

##########
File path: aws/src/main/java/org/apache/iceberg/aws/s3/S3OutputStream.java
##########
@@ -86,7 +86,8 @@
   private boolean closed = false;
 
   @SuppressWarnings("StaticAssignmentInConstructor")
-  S3OutputStream(S3Client s3, S3URI location, AwsProperties awsProperties) throws IOException {
+  S3OutputStream(S3Client s3, S3URI location, AwsProperties awsProperties)
+      throws IOException, SecurityException {

Review comment:
       Removed it.

##########
File path: aws/src/main/java/org/apache/iceberg/aws/s3/S3OutputStream.java
##########
@@ -170,11 +171,12 @@ public void write(byte[] b, int off, int len) throws IOException {
     }
   }
 
-  private void newStream() throws IOException {
+  private void newStream() throws IOException, SecurityException {

Review comment:
       Yeah, need not to throw explicitly. Removed it.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 merged pull request #3175: AWS: Add check to create staging directory if not exists for S3OutputStream

Posted by GitBox <gi...@apache.org>.
jackye1995 merged pull request #3175:
URL: https://github.com/apache/iceberg/pull/3175


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on a change in pull request #3175: AWS: Add check to create staging directory if not exists for S3OutputStream

Posted by GitBox <gi...@apache.org>.
jackye1995 commented on a change in pull request #3175:
URL: https://github.com/apache/iceberg/pull/3175#discussion_r717753442



##########
File path: aws/src/main/java/org/apache/iceberg/aws/s3/S3OutputStream.java
##########
@@ -86,7 +86,8 @@
   private boolean closed = false;
 
   @SuppressWarnings("StaticAssignmentInConstructor")
-  S3OutputStream(S3Client s3, S3URI location, AwsProperties awsProperties) throws IOException {
+  S3OutputStream(S3Client s3, S3URI location, AwsProperties awsProperties)
+      throws IOException, SecurityException {

Review comment:
       same comment as below, remove SecurityException after throws




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on pull request #3175: AWS: Add check to create staging directory if not exists for S3OutputStream

Posted by GitBox <gi...@apache.org>.
jackye1995 commented on pull request #3175:
URL: https://github.com/apache/iceberg/pull/3175#issuecomment-926774205


   Thanks for working on this! Can you also add a test case for it?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on a change in pull request #3175: AWS: Add check to create staging directory if not exists for S3OutputStream

Posted by GitBox <gi...@apache.org>.
jackye1995 commented on a change in pull request #3175:
URL: https://github.com/apache/iceberg/pull/3175#discussion_r715761720



##########
File path: aws/src/main/java/org/apache/iceberg/aws/s3/S3OutputStream.java
##########
@@ -175,8 +175,15 @@ private void newStream() throws IOException {
       stream.close();
     }
 
+    boolean createStagingDirectory = false;
+    if (!stagingDirectory.exists()) {
+      createStagingDirectory = stagingDirectory.mkdirs();
+    }
     currentStagingFile = File.createTempFile("s3fileio-", ".tmp", stagingDirectory);
     currentStagingFile.deleteOnExit();
+    if (createStagingDirectory) {
+      stagingDirectory.deleteOnExit();

Review comment:
       why do we need to delete on exit? I think we can assume it's safe to keep the staging directory there.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rajarshisarkar commented on a change in pull request #3175: AWS: Add check to create staging directory if not exists for S3OutputStream

Posted by GitBox <gi...@apache.org>.
rajarshisarkar commented on a change in pull request #3175:
URL: https://github.com/apache/iceberg/pull/3175#discussion_r716592916



##########
File path: aws/src/main/java/org/apache/iceberg/aws/s3/S3OutputStream.java
##########
@@ -175,8 +175,15 @@ private void newStream() throws IOException {
       stream.close();
     }
 
+    boolean createStagingDirectory = false;
+    if (!stagingDirectory.exists()) {
+      createStagingDirectory = stagingDirectory.mkdirs();
+    }
     currentStagingFile = File.createTempFile("s3fileio-", ".tmp", stagingDirectory);
     currentStagingFile.deleteOnExit();
+    if (createStagingDirectory) {
+      stagingDirectory.deleteOnExit();

Review comment:
       Yeah, it should be safe. Made the changes accordingly.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rajarshisarkar commented on pull request #3175: AWS: Add check to create staging directory if not exists for S3OutputStream

Posted by GitBox <gi...@apache.org>.
rajarshisarkar commented on pull request #3175:
URL: https://github.com/apache/iceberg/pull/3175#issuecomment-926664169


   @thaismonti1912 @alex-shchetkov @fcvr1010 @jackye1995 Can you please review the PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on a change in pull request #3175: AWS: Add check to create staging directory if not exists for S3OutputStream

Posted by GitBox <gi...@apache.org>.
jackye1995 commented on a change in pull request #3175:
URL: https://github.com/apache/iceberg/pull/3175#discussion_r717275989



##########
File path: aws/src/main/java/org/apache/iceberg/aws/s3/S3OutputFile.java
##########
@@ -62,6 +63,9 @@ public PositionOutputStream createOrOverwrite() {
       return new S3OutputStream(client(), uri(), awsProperties());
     } catch (IOException e) {
       throw new UncheckedIOException("Failed to create output stream for location: " + uri(), e);
+    } catch (SecurityException e) {

Review comment:
       as discussed in the block below, we can remove this catch. AccessControlException also hides the original exception, so better to not use it.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on a change in pull request #3175: AWS: Add check to create staging directory if not exists for S3OutputStream

Posted by GitBox <gi...@apache.org>.
jackye1995 commented on a change in pull request #3175:
URL: https://github.com/apache/iceberg/pull/3175#discussion_r717274097



##########
File path: aws/src/main/java/org/apache/iceberg/aws/s3/S3OutputStream.java
##########
@@ -328,6 +330,15 @@ private static InputStream uncheckedInputStream(File file) {
     }
   }
 
+  private void createStagingDirectoryIfNotExists() throws SecurityException {
+    if (!stagingDirectory.exists()) {
+      boolean createdStagingDirectory = stagingDirectory.mkdirs();
+      if (createdStagingDirectory) {
+        LOG.info("Successfully created staging directory: {}", stagingDirectory.getAbsolutePath());
+      }

Review comment:
       else, add a LOG saying "Staging directory {} creation failed, or it is created by another process"

##########
File path: aws/src/main/java/org/apache/iceberg/aws/s3/S3OutputStream.java
##########
@@ -328,6 +330,15 @@ private static InputStream uncheckedInputStream(File file) {
     }
   }
 
+  private void createStagingDirectoryIfNotExists() throws SecurityException {
+    if (!stagingDirectory.exists()) {

Review comment:
       add a LOG saying something like "staging directoy {} not exist, trying to create one"

##########
File path: aws/src/main/java/org/apache/iceberg/aws/s3/S3OutputStream.java
##########
@@ -170,11 +171,12 @@ public void write(byte[] b, int off, int len) throws IOException {
     }
   }
 
-  private void newStream() throws IOException {
+  private void newStream() throws IOException, SecurityException {

Review comment:
       Sorry for the back and forth, I read the documentation for mkdirs, it seems like SecurityException only catches JVM level permission and it might still just return false for OS level permission failure, so it's hard to have a consistent behavior for error handling. Because of that, plus the fact that SecurityException is a runtime exception, I think we can remove the special handling of it and just let it throw to the top level.

##########
File path: aws/src/main/java/org/apache/iceberg/aws/s3/S3OutputFile.java
##########
@@ -62,6 +63,9 @@ public PositionOutputStream createOrOverwrite() {
       return new S3OutputStream(client(), uri(), awsProperties());
     } catch (IOException e) {
       throw new UncheckedIOException("Failed to create output stream for location: " + uri(), e);
+    } catch (SecurityException e) {

Review comment:
       as discussed in the block below, we can remove this catch. AccessControlException also hides the original exception, so better to not use it.

##########
File path: api/src/main/java/org/apache/iceberg/io/OutputFile.java
##########
@@ -48,6 +49,7 @@
    *
    * @return an output stream that can report its position
    * @throws RuntimeIOException If the implementation throws an {@link IOException}
+   * @throws AccessControlException If the implementation throws an {@link SecurityException}

Review comment:
       as discussed in the block below, we can just document @throws SecurityException if staging directory creation fails due to missing JVM level permission.

##########
File path: aws/src/main/java/org/apache/iceberg/aws/s3/S3OutputStream.java
##########
@@ -328,6 +330,15 @@ private static InputStream uncheckedInputStream(File file) {
     }
   }
 
+  private void createStagingDirectoryIfNotExists() throws SecurityException {
+    if (!stagingDirectory.exists()) {
+      boolean createdStagingDirectory = stagingDirectory.mkdirs();
+      if (createdStagingDirectory) {
+        LOG.info("Successfully created staging directory: {}", stagingDirectory.getAbsolutePath());
+      }

Review comment:
       actually, we can do better than just logging. When creation fails, we can check directory existence again, if it exists then it's created by another process, otherwise it's still an issue, and we can throw an IOException with error message indicating staging directory creation fails for some unknown reasons.

##########
File path: aws/src/main/java/org/apache/iceberg/aws/s3/S3OutputStream.java
##########
@@ -328,6 +330,26 @@ private static InputStream uncheckedInputStream(File file) {
     }
   }
 
+  private void createStagingDirectoryIfNotExists() throws IOException, SecurityException {
+    if (!stagingDirectory.exists()) {
+      LOG.info("Staging directory: {} does not exist, trying to create one",
+          stagingDirectory.getAbsolutePath());
+      boolean createdStagingDirectory = stagingDirectory.mkdirs();
+      if (createdStagingDirectory) {
+        LOG.info("Successfully created staging directory: {}", stagingDirectory.getAbsolutePath());
+      } else {
+        if (stagingDirectory.exists()) {
+          LOG.info("Staging directory: {} is created by another process",

Review comment:
       nit: match the log message at L339, "Successfully created staging directory by another process: {}"

##########
File path: aws/src/main/java/org/apache/iceberg/aws/s3/S3OutputStream.java
##########
@@ -328,6 +330,26 @@ private static InputStream uncheckedInputStream(File file) {
     }
   }
 
+  private void createStagingDirectoryIfNotExists() throws IOException, SecurityException {
+    if (!stagingDirectory.exists()) {
+      LOG.info("Staging directory: {} does not exist, trying to create one",
+          stagingDirectory.getAbsolutePath());
+      boolean createdStagingDirectory = stagingDirectory.mkdirs();
+      if (createdStagingDirectory) {
+        LOG.info("Successfully created staging directory: {}", stagingDirectory.getAbsolutePath());
+      } else {
+        if (stagingDirectory.exists()) {
+          LOG.info("Staging directory: {} is created by another process",
+              stagingDirectory.getAbsolutePath());
+        } else {
+          throw new IOException(String
+              .format("Staging directory: %s creation failed due to some unknown reason",

Review comment:
       nit: match the log message at L346, "Fail to create staging directory due to some unknown reason: {}"

##########
File path: aws/src/main/java/org/apache/iceberg/aws/s3/S3OutputStream.java
##########
@@ -170,11 +171,12 @@ public void write(byte[] b, int off, int len) throws IOException {
     }
   }
 
-  private void newStream() throws IOException {
+  private void newStream() throws IOException, SecurityException {

Review comment:
       Thanks, can you also remove the `SecurityException` after `throws`? We don't need to throw runtime exception explicitly, just need to document it at top level which you already did.

##########
File path: aws/src/main/java/org/apache/iceberg/aws/s3/S3OutputStream.java
##########
@@ -86,7 +86,8 @@
   private boolean closed = false;
 
   @SuppressWarnings("StaticAssignmentInConstructor")
-  S3OutputStream(S3Client s3, S3URI location, AwsProperties awsProperties) throws IOException {
+  S3OutputStream(S3Client s3, S3URI location, AwsProperties awsProperties)
+      throws IOException, SecurityException {

Review comment:
       same comment as below, remove SecurityException after throws




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on a change in pull request #3175: AWS: Add check to create staging directory if not exists for S3OutputStream

Posted by GitBox <gi...@apache.org>.
jackye1995 commented on a change in pull request #3175:
URL: https://github.com/apache/iceberg/pull/3175#discussion_r717276851



##########
File path: api/src/main/java/org/apache/iceberg/io/OutputFile.java
##########
@@ -48,6 +49,7 @@
    *
    * @return an output stream that can report its position
    * @throws RuntimeIOException If the implementation throws an {@link IOException}
+   * @throws AccessControlException If the implementation throws an {@link SecurityException}

Review comment:
       as discussed in the block below, we can just document @throws SecurityException if staging directory creation fails due to missing JVM level permission.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rajarshisarkar commented on pull request #3175: AWS: Add check to create staging directory if not exists for S3OutputStream

Posted by GitBox <gi...@apache.org>.
rajarshisarkar commented on pull request #3175:
URL: https://github.com/apache/iceberg/pull/3175#issuecomment-927767926


   > Thanks for working on this! Can you also add a test case for it?
   
   Added the test case `testStagingDirectoryCreation()` in `S3OutputStreamTest`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rajarshisarkar commented on a change in pull request #3175: AWS: Add check to create staging directory if not exists for S3OutputStream

Posted by GitBox <gi...@apache.org>.
rajarshisarkar commented on a change in pull request #3175:
URL: https://github.com/apache/iceberg/pull/3175#discussion_r716592654



##########
File path: aws/src/main/java/org/apache/iceberg/aws/s3/S3OutputStream.java
##########
@@ -175,8 +175,15 @@ private void newStream() throws IOException {
       stream.close();
     }
 
+    boolean createStagingDirectory = false;
+    if (!stagingDirectory.exists()) {
+      createStagingDirectory = stagingDirectory.mkdirs();
+    }
     currentStagingFile = File.createTempFile("s3fileio-", ".tmp", stagingDirectory);
     currentStagingFile.deleteOnExit();
+    if (createStagingDirectory) {
+      stagingDirectory.deleteOnExit();
+    }

Review comment:
       Addressed. Also, separated out the logic in a method: `createStagingDirectoryIfNotExists()`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on pull request #3175: AWS: Add check to create staging directory if not exists for S3OutputStream

Posted by GitBox <gi...@apache.org>.
jackye1995 commented on pull request #3175:
URL: https://github.com/apache/iceberg/pull/3175#issuecomment-929389831


   Thank you for the work, looks good to me overall, just some styling changes then it should be good to go!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rajarshisarkar commented on a change in pull request #3175: AWS: Add check to create staging directory if not exists for S3OutputStream

Posted by GitBox <gi...@apache.org>.
rajarshisarkar commented on a change in pull request #3175:
URL: https://github.com/apache/iceberg/pull/3175#discussion_r717587514



##########
File path: aws/src/main/java/org/apache/iceberg/aws/s3/S3OutputStream.java
##########
@@ -328,6 +330,15 @@ private static InputStream uncheckedInputStream(File file) {
     }
   }
 
+  private void createStagingDirectoryIfNotExists() throws SecurityException {
+    if (!stagingDirectory.exists()) {
+      boolean createdStagingDirectory = stagingDirectory.mkdirs();
+      if (createdStagingDirectory) {
+        LOG.info("Successfully created staging directory: {}", stagingDirectory.getAbsolutePath());
+      }

Review comment:
       Agreed, I have made the changes accordingly.

##########
File path: aws/src/main/java/org/apache/iceberg/aws/s3/S3OutputStream.java
##########
@@ -328,6 +330,15 @@ private static InputStream uncheckedInputStream(File file) {
     }
   }
 
+  private void createStagingDirectoryIfNotExists() throws SecurityException {
+    if (!stagingDirectory.exists()) {

Review comment:
       Added the logger.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on a change in pull request #3175: AWS: Add check to create staging directory if not exists for S3OutputStream

Posted by GitBox <gi...@apache.org>.
jackye1995 commented on a change in pull request #3175:
URL: https://github.com/apache/iceberg/pull/3175#discussion_r717746922



##########
File path: aws/src/main/java/org/apache/iceberg/aws/s3/S3OutputStream.java
##########
@@ -328,6 +330,26 @@ private static InputStream uncheckedInputStream(File file) {
     }
   }
 
+  private void createStagingDirectoryIfNotExists() throws IOException, SecurityException {
+    if (!stagingDirectory.exists()) {
+      LOG.info("Staging directory: {} does not exist, trying to create one",
+          stagingDirectory.getAbsolutePath());
+      boolean createdStagingDirectory = stagingDirectory.mkdirs();
+      if (createdStagingDirectory) {
+        LOG.info("Successfully created staging directory: {}", stagingDirectory.getAbsolutePath());
+      } else {
+        if (stagingDirectory.exists()) {
+          LOG.info("Staging directory: {} is created by another process",

Review comment:
       nit: match the log message at L339, "Successfully created staging directory by another process: {}"




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on a change in pull request #3175: AWS: Add check to create staging directory if not exists for S3OutputStream

Posted by GitBox <gi...@apache.org>.
jackye1995 commented on a change in pull request #3175:
URL: https://github.com/apache/iceberg/pull/3175#discussion_r717274097



##########
File path: aws/src/main/java/org/apache/iceberg/aws/s3/S3OutputStream.java
##########
@@ -328,6 +330,15 @@ private static InputStream uncheckedInputStream(File file) {
     }
   }
 
+  private void createStagingDirectoryIfNotExists() throws SecurityException {
+    if (!stagingDirectory.exists()) {
+      boolean createdStagingDirectory = stagingDirectory.mkdirs();
+      if (createdStagingDirectory) {
+        LOG.info("Successfully created staging directory: {}", stagingDirectory.getAbsolutePath());
+      }

Review comment:
       else, add a LOG saying "Staging directory {} creation failed, or it is created by another process"

##########
File path: aws/src/main/java/org/apache/iceberg/aws/s3/S3OutputStream.java
##########
@@ -328,6 +330,15 @@ private static InputStream uncheckedInputStream(File file) {
     }
   }
 
+  private void createStagingDirectoryIfNotExists() throws SecurityException {
+    if (!stagingDirectory.exists()) {

Review comment:
       add a LOG saying something like "staging directoy {} not exist, trying to create one"




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on a change in pull request #3175: AWS: Add check to create staging directory if not exists for S3OutputStream

Posted by GitBox <gi...@apache.org>.
jackye1995 commented on a change in pull request #3175:
URL: https://github.com/apache/iceberg/pull/3175#discussion_r717282414



##########
File path: aws/src/main/java/org/apache/iceberg/aws/s3/S3OutputStream.java
##########
@@ -328,6 +330,15 @@ private static InputStream uncheckedInputStream(File file) {
     }
   }
 
+  private void createStagingDirectoryIfNotExists() throws SecurityException {
+    if (!stagingDirectory.exists()) {
+      boolean createdStagingDirectory = stagingDirectory.mkdirs();
+      if (createdStagingDirectory) {
+        LOG.info("Successfully created staging directory: {}", stagingDirectory.getAbsolutePath());
+      }

Review comment:
       actually, we can do better than just logging. When creation fails, we can check directory existence again, if it exists then it's created by another process, otherwise it's still an issue, and we can throw an IOException with error message indicating staging directory creation fails for some unknown reasons.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org