You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by GitBox <gi...@apache.org> on 2022/03/02 17:17:36 UTC

[GitHub] [flink] XComp opened a new pull request #18963: [FLINK-26450][fs] Adds error handling to FileStateHandle.discardState

XComp opened a new pull request #18963:
URL: https://github.com/apache/flink/pull/18963


   ## What is the purpose of the change
   
   `FileStateHandle.discardState` didn't handle cases where the `FileSystem` implementation used the return value to indicate an error during deletion. This made the cleanup logic not recognize the failure.
   
   ## Brief change log
   
   * Updates `FileStateHandle.discardState` implementation
   
   ## Verifying this change
   
   * Introduces `TestingFileSystem`
   * Adds tests to `FileStateHandleTest` to cover both deletion code paths.
   
   ## Does this pull request potentially affect one of the following parts:
   
     - Dependencies (does it add or upgrade a dependency): no
     - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: no
     - The serializers: no
     - The runtime per-record code paths (performance sensitive): no
     - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: no
     - The S3 file system connector: yes
   
   ## Documentation
   
     - Does this pull request introduce a new feature? no
     - If yes, how is the feature documented? not applicable
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] dawidwys commented on pull request #18963: [FLINK-26450][fs] Adds error handling to FileStateHandle.discardState

Posted by GitBox <gi...@apache.org>.
dawidwys commented on pull request #18963:
URL: https://github.com/apache/flink/pull/18963#issuecomment-1057827384


   The solution lgtm, would be good to get @rkhachatryan opinion here.
   
   I struggled a bit with the `TestingFileSystem`. I am not the biggest fan of the `resetCallbacks` method there. I was wondering if the entire class is not a bit an overkill. However, I am good if you think it can be reused in the future.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #18963: [FLINK-26450][fs] Adds error handling to FileStateHandle.discardState

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #18963:
URL: https://github.com/apache/flink/pull/18963#issuecomment-1057183349


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "ea7434c46bf8d6777e5003918c034d50c6574f77",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=32452",
       "triggerID" : "ea7434c46bf8d6777e5003918c034d50c6574f77",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c4c53e6b1cf1aa97d6069bd01a3f9c8a7bff0ab9",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=32456",
       "triggerID" : "c4c53e6b1cf1aa97d6069bd01a3f9c8a7bff0ab9",
       "triggerType" : "PUSH"
     }, {
       "hash" : "82b87dc460ebc2093c33609957eabc3970684cbc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=32498",
       "triggerID" : "82b87dc460ebc2093c33609957eabc3970684cbc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6f86df311845077a3ba77d5a83034349e1e05c3f",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=32507",
       "triggerID" : "6f86df311845077a3ba77d5a83034349e1e05c3f",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1d5701800b54f5d948576e41dec7396edb018acd",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=32508",
       "triggerID" : "1d5701800b54f5d948576e41dec7396edb018acd",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5fba698adb6fbe58d8446281b3aa61825fce7e7a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=32520",
       "triggerID" : "5fba698adb6fbe58d8446281b3aa61825fce7e7a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "2c4b296a1bb88c2fd33877a8c1ad3362af9d93ed",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=32639",
       "triggerID" : "2c4b296a1bb88c2fd33877a8c1ad3362af9d93ed",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 2c4b296a1bb88c2fd33877a8c1ad3362af9d93ed Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=32639) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] XComp commented on a change in pull request #18963: [FLINK-26450][fs] Adds error handling to FileStateHandle.discardState

Posted by GitBox <gi...@apache.org>.
XComp commented on a change in pull request #18963:
URL: https://github.com/apache/flink/pull/18963#discussion_r819363512



##########
File path: flink-runtime/src/main/java/org/apache/flink/runtime/state/filesystem/FileStateHandle.java
##########
@@ -74,15 +74,21 @@ public FSDataInputStream openInputStream() throws IOException {
     }
 
     /**
-     * Discard the state by deleting the file that stores the state. If the parent directory of the
-     * state is empty after deleting the state file, it is also deleted.
+     * Discard the state by deleting the file that stores the state.
      *
      * @throws Exception Thrown, if the file deletion (not the directory deletion) fails.
      */
     @Override
     public void discardState() throws Exception {
-        FileSystem fs = getFileSystem();
-        fs.delete(filePath, false);
+        final FileSystem fs = getFileSystem();
+        if (!fs.exists(filePath)) {
+            return;
+        }
+
+        if (!fs.delete(filePath, false)) {
+            throw new IOException(
+                    filePath.getPath() + " could not be deleted for unknown reasons.");
+        }

Review comment:
       That's a good point - I should have considered that. The deletion needs to be atomic. I will consider that. Thanks for the hint




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #18963: [FLINK-26450][fs] Adds error handling to FileStateHandle.discardState

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #18963:
URL: https://github.com/apache/flink/pull/18963#issuecomment-1057183349


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "ea7434c46bf8d6777e5003918c034d50c6574f77",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=32452",
       "triggerID" : "ea7434c46bf8d6777e5003918c034d50c6574f77",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c4c53e6b1cf1aa97d6069bd01a3f9c8a7bff0ab9",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=32456",
       "triggerID" : "c4c53e6b1cf1aa97d6069bd01a3f9c8a7bff0ab9",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * c4c53e6b1cf1aa97d6069bd01a3f9c8a7bff0ab9 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=32456) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] XComp edited a comment on pull request #18963: [FLINK-26450][fs] Adds error handling to FileStateHandle.discardState

Posted by GitBox <gi...@apache.org>.
XComp edited a comment on pull request #18963:
URL: https://github.com/apache/flink/pull/18963#issuecomment-1057926590


   Thanks for @rkhachatryan  and @dawidwys for sharing your concerns. I agree that it's a bit of overkill for this specific usecase. See my inline comments below:
   > 1. TestingFileSystem might be brittle because it has a lot of unsupported operations. Can't we delegate everything except delete to some "real" FS?
   
   The initial intention of `TestingFileSystem` was to have a "framework" to test against in the future. Having all the method throw `UnsupportedOperationException` if not specified forces the developer writing a test to define exactly what behavior to expect from the `FileSystem`. For default behavior, having a temporary filesystem would be good enough.
   
   > 2. Callbacks make it less readdable, and mutable callbacks - error-prone, why do we need to reset them?
   The idea of resetting them to make each test define the exact context it's running in. Where do you see that it's error-prone? That just would indicate that the test's isn't defined precisely.
   
   Ideally, we would have a `FileSystem` being specified through a `Builder` pattern and created per test. But the lifecycle of the `FileSystem` should happen outside (i.e. around) the test to make sure that it's get removed again, afterwards. This prevents us from initializing the `FileSystem` within the test. Therefore, I tried this approach with mutable callbacks.
   
   > 3. I'm wondering whether a local FS can be picked up instead of a testing FS for some reason - should we guard against it using some "test://" scheme?
   
   I don't understand this proposal. Do you see a way to simulate an error using the local FS? I couldn't come up with a solution here.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot commented on pull request #18963: [FLINK-26450][fs] Adds error handling to FileStateHandle.discardState

Posted by GitBox <gi...@apache.org>.
flinkbot commented on pull request #18963:
URL: https://github.com/apache/flink/pull/18963#issuecomment-1057183349


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "ea7434c46bf8d6777e5003918c034d50c6574f77",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "ea7434c46bf8d6777e5003918c034d50c6574f77",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * ea7434c46bf8d6777e5003918c034d50c6574f77 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #18963: [FLINK-26450][fs] Adds error handling to FileStateHandle.discardState

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #18963:
URL: https://github.com/apache/flink/pull/18963#issuecomment-1057183349


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "ea7434c46bf8d6777e5003918c034d50c6574f77",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=32452",
       "triggerID" : "ea7434c46bf8d6777e5003918c034d50c6574f77",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c4c53e6b1cf1aa97d6069bd01a3f9c8a7bff0ab9",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=32456",
       "triggerID" : "c4c53e6b1cf1aa97d6069bd01a3f9c8a7bff0ab9",
       "triggerType" : "PUSH"
     }, {
       "hash" : "82b87dc460ebc2093c33609957eabc3970684cbc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=32498",
       "triggerID" : "82b87dc460ebc2093c33609957eabc3970684cbc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6f86df311845077a3ba77d5a83034349e1e05c3f",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=32507",
       "triggerID" : "6f86df311845077a3ba77d5a83034349e1e05c3f",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1d5701800b54f5d948576e41dec7396edb018acd",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=32508",
       "triggerID" : "1d5701800b54f5d948576e41dec7396edb018acd",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 1d5701800b54f5d948576e41dec7396edb018acd Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=32508) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] XComp closed pull request #18963: [FLINK-26450][fs] Adds error handling to FileStateHandle.discardState

Posted by GitBox <gi...@apache.org>.
XComp closed pull request #18963:
URL: https://github.com/apache/flink/pull/18963


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #18963: [FLINK-26450][fs] Adds error handling to FileStateHandle.discardState

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #18963:
URL: https://github.com/apache/flink/pull/18963#issuecomment-1057183349


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "ea7434c46bf8d6777e5003918c034d50c6574f77",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=32452",
       "triggerID" : "ea7434c46bf8d6777e5003918c034d50c6574f77",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c4c53e6b1cf1aa97d6069bd01a3f9c8a7bff0ab9",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=32456",
       "triggerID" : "c4c53e6b1cf1aa97d6069bd01a3f9c8a7bff0ab9",
       "triggerType" : "PUSH"
     }, {
       "hash" : "82b87dc460ebc2093c33609957eabc3970684cbc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=32498",
       "triggerID" : "82b87dc460ebc2093c33609957eabc3970684cbc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6f86df311845077a3ba77d5a83034349e1e05c3f",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=32507",
       "triggerID" : "6f86df311845077a3ba77d5a83034349e1e05c3f",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1d5701800b54f5d948576e41dec7396edb018acd",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=32508",
       "triggerID" : "1d5701800b54f5d948576e41dec7396edb018acd",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 6f86df311845077a3ba77d5a83034349e1e05c3f Azure: [CANCELED](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=32507) 
   * 1d5701800b54f5d948576e41dec7396edb018acd Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=32508) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #18963: [FLINK-26450][fs] Adds error handling to FileStateHandle.discardState

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #18963:
URL: https://github.com/apache/flink/pull/18963#issuecomment-1057183349


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "ea7434c46bf8d6777e5003918c034d50c6574f77",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=32452",
       "triggerID" : "ea7434c46bf8d6777e5003918c034d50c6574f77",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c4c53e6b1cf1aa97d6069bd01a3f9c8a7bff0ab9",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=32456",
       "triggerID" : "c4c53e6b1cf1aa97d6069bd01a3f9c8a7bff0ab9",
       "triggerType" : "PUSH"
     }, {
       "hash" : "82b87dc460ebc2093c33609957eabc3970684cbc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=32498",
       "triggerID" : "82b87dc460ebc2093c33609957eabc3970684cbc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6f86df311845077a3ba77d5a83034349e1e05c3f",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=32507",
       "triggerID" : "6f86df311845077a3ba77d5a83034349e1e05c3f",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1d5701800b54f5d948576e41dec7396edb018acd",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=32508",
       "triggerID" : "1d5701800b54f5d948576e41dec7396edb018acd",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5fba698adb6fbe58d8446281b3aa61825fce7e7a",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=32520",
       "triggerID" : "5fba698adb6fbe58d8446281b3aa61825fce7e7a",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 1d5701800b54f5d948576e41dec7396edb018acd Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=32508) 
   * 5fba698adb6fbe58d8446281b3aa61825fce7e7a Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=32520) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #18963: [FLINK-26450][fs] Adds error handling to FileStateHandle.discardState

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #18963:
URL: https://github.com/apache/flink/pull/18963#issuecomment-1057183349


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "ea7434c46bf8d6777e5003918c034d50c6574f77",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=32452",
       "triggerID" : "ea7434c46bf8d6777e5003918c034d50c6574f77",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c4c53e6b1cf1aa97d6069bd01a3f9c8a7bff0ab9",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=32456",
       "triggerID" : "c4c53e6b1cf1aa97d6069bd01a3f9c8a7bff0ab9",
       "triggerType" : "PUSH"
     }, {
       "hash" : "82b87dc460ebc2093c33609957eabc3970684cbc",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=32498",
       "triggerID" : "82b87dc460ebc2093c33609957eabc3970684cbc",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 82b87dc460ebc2093c33609957eabc3970684cbc Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=32498) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #18963: [FLINK-26450][fs] Adds error handling to FileStateHandle.discardState

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #18963:
URL: https://github.com/apache/flink/pull/18963#issuecomment-1057183349


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "ea7434c46bf8d6777e5003918c034d50c6574f77",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=32452",
       "triggerID" : "ea7434c46bf8d6777e5003918c034d50c6574f77",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c4c53e6b1cf1aa97d6069bd01a3f9c8a7bff0ab9",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=32456",
       "triggerID" : "c4c53e6b1cf1aa97d6069bd01a3f9c8a7bff0ab9",
       "triggerType" : "PUSH"
     }, {
       "hash" : "82b87dc460ebc2093c33609957eabc3970684cbc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=32498",
       "triggerID" : "82b87dc460ebc2093c33609957eabc3970684cbc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6f86df311845077a3ba77d5a83034349e1e05c3f",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=32507",
       "triggerID" : "6f86df311845077a3ba77d5a83034349e1e05c3f",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1d5701800b54f5d948576e41dec7396edb018acd",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=32508",
       "triggerID" : "1d5701800b54f5d948576e41dec7396edb018acd",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5fba698adb6fbe58d8446281b3aa61825fce7e7a",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=32520",
       "triggerID" : "5fba698adb6fbe58d8446281b3aa61825fce7e7a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "2c4b296a1bb88c2fd33877a8c1ad3362af9d93ed",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=32639",
       "triggerID" : "2c4b296a1bb88c2fd33877a8c1ad3362af9d93ed",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 5fba698adb6fbe58d8446281b3aa61825fce7e7a Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=32520) 
   * 2c4b296a1bb88c2fd33877a8c1ad3362af9d93ed Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=32639) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #18963: [FLINK-26450][fs] Adds error handling to FileStateHandle.discardState

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #18963:
URL: https://github.com/apache/flink/pull/18963#issuecomment-1057183349


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "ea7434c46bf8d6777e5003918c034d50c6574f77",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=32452",
       "triggerID" : "ea7434c46bf8d6777e5003918c034d50c6574f77",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * ea7434c46bf8d6777e5003918c034d50c6574f77 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=32452) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] XComp commented on pull request #18963: [FLINK-26450][fs] Adds error handling to FileStateHandle.discardState

Posted by GitBox <gi...@apache.org>.
XComp commented on pull request #18963:
URL: https://github.com/apache/flink/pull/18963#issuecomment-1059176954


   Closing this PR in favor of PR #18979 (FLINK-26484)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] XComp commented on pull request #18963: [FLINK-26450][fs] Adds error handling to FileStateHandle.discardState

Posted by GitBox <gi...@apache.org>.
XComp commented on pull request #18963:
URL: https://github.com/apache/flink/pull/18963#issuecomment-1063985677


   Thanks for the review. I'm gonna merge it now.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #18963: [FLINK-26450][fs] Adds error handling to FileStateHandle.discardState

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #18963:
URL: https://github.com/apache/flink/pull/18963#issuecomment-1057183349


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "ea7434c46bf8d6777e5003918c034d50c6574f77",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=32452",
       "triggerID" : "ea7434c46bf8d6777e5003918c034d50c6574f77",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c4c53e6b1cf1aa97d6069bd01a3f9c8a7bff0ab9",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=32456",
       "triggerID" : "c4c53e6b1cf1aa97d6069bd01a3f9c8a7bff0ab9",
       "triggerType" : "PUSH"
     }, {
       "hash" : "82b87dc460ebc2093c33609957eabc3970684cbc",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "82b87dc460ebc2093c33609957eabc3970684cbc",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * c4c53e6b1cf1aa97d6069bd01a3f9c8a7bff0ab9 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=32456) 
   * 82b87dc460ebc2093c33609957eabc3970684cbc UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #18963: [FLINK-26450][fs] Adds error handling to FileStateHandle.discardState

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #18963:
URL: https://github.com/apache/flink/pull/18963#issuecomment-1057183349


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "ea7434c46bf8d6777e5003918c034d50c6574f77",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=32452",
       "triggerID" : "ea7434c46bf8d6777e5003918c034d50c6574f77",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c4c53e6b1cf1aa97d6069bd01a3f9c8a7bff0ab9",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=32456",
       "triggerID" : "c4c53e6b1cf1aa97d6069bd01a3f9c8a7bff0ab9",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * ea7434c46bf8d6777e5003918c034d50c6574f77 Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=32452) 
   * c4c53e6b1cf1aa97d6069bd01a3f9c8a7bff0ab9 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=32456) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] XComp commented on pull request #18963: [FLINK-26450][fs] Adds error handling to FileStateHandle.discardState

Posted by GitBox <gi...@apache.org>.
XComp commented on pull request #18963:
URL: https://github.com/apache/flink/pull/18963#issuecomment-1058378672


   I had to revisit the issue because I noticed that the `FileSystem.delete` method is not clear on cases where the underlying file doesn't exist. The `LocalFileSystem` implements the delete method in a way that it would return `false` if it didn't delete the file since it relies on `java.io.File.delete`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] XComp merged pull request #18963: [FLINK-26450][fs] Adds error handling to FileStateHandle.discardState

Posted by GitBox <gi...@apache.org>.
XComp merged pull request #18963:
URL: https://github.com/apache/flink/pull/18963


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] XComp commented on pull request #18963: [FLINK-26450][fs] Adds error handling to FileStateHandle.discardState

Posted by GitBox <gi...@apache.org>.
XComp commented on pull request #18963:
URL: https://github.com/apache/flink/pull/18963#issuecomment-1060645442


   I reopened this issue again after we decided to not have a common `FileSystem.delete` implementation on the Flink side. `FileSystem` is `@Public` and, therefore, should be touched only after a discussion round on the mailing list. As a consequence, we decided to move the recursion fix closer to the presto FileSystem in PR #18979 . The `FileSystem.delete` implementations can still return `false` which should be handled in `FileStateHandle` properly. Therefore, I opened this PR again considering the comments from @rkhachatryan 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] rkhachatryan commented on pull request #18963: [FLINK-26450][fs] Adds error handling to FileStateHandle.discardState

Posted by GitBox <gi...@apache.org>.
rkhachatryan commented on pull request #18963:
URL: https://github.com/apache/flink/pull/18963#issuecomment-1057871273


   I have similar concerns regarding the tests as @dawidwys. production code LGTM.
   
   `TestingFileSystem` is needed I guess to register the file system; and this is needed to test "return false" case (where non-existent file can't be used  because it can throw an exception instead of returning false).
   
   1. `TestingFileSystem` might be brittle because it has a lot of unsupported operations. Can't we delegate everything except `delete` to some "real" FS?
   2. Callbacks make it less readdable, and mutable callbacks - error-prone, why do we need to reset them?
   3. I'm wondering whether a local FS can be picked up instead of a testing FS for some reason - should we guard against it using some "test://" scheme?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] XComp commented on pull request #18963: [FLINK-26450][fs] Adds error handling to FileStateHandle.discardState

Posted by GitBox <gi...@apache.org>.
XComp commented on pull request #18963:
URL: https://github.com/apache/flink/pull/18963#issuecomment-1057926590


   Thanks for your concerns. I agree that it's a bit of overkill for this specific usecase. See my inline comments below:
   > 1. TestingFileSystem might be brittle because it has a lot of unsupported operations. Can't we delegate everything except delete to some "real" FS?
   The initial intention of `TestingFileSystem` was to have a "framework" to test against in the future. Having all the method throw `UnsupportedOperationException` if not specified forces the developer writing a test to define exactly what behavior to expect from the `FileSystem`. For default behavior, having a temporary filesystem would be good enough.
   
   > 2. Callbacks make it less readdable, and mutable callbacks - error-prone, why do we need to reset them?
   The idea of resetting them to make each test define the exact context it's running in. Where do you see that it's error-prone? That just would indicate that the test's isn't defined precisely.
   Ideally, we would have a `FileSystem` being specified through a `Builder` pattern and created per test. But the lifecycle of the `FileSystem` should happen outside (i.e. around) the test to make sure that it's get removed again, afterwards. This prevents us from initializing the `FileSystem` within the test. Therefore, I tried this approach with mutable callbacks.
   
   > 3. I'm wondering whether a local FS can be picked up instead of a testing FS for some reason - should we guard against it using some "test://" scheme?
   I don't understand this proposal. Do you see a way to simulate an error using the local FS? I couldn't come up with a solution here.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] XComp commented on a change in pull request #18963: [FLINK-26450][fs] Adds error handling to FileStateHandle.discardState

Posted by GitBox <gi...@apache.org>.
XComp commented on a change in pull request #18963:
URL: https://github.com/apache/flink/pull/18963#discussion_r818967894



##########
File path: flink-runtime/src/test/java/org/apache/flink/runtime/state/filesystem/FileStateHandleTest.java
##########
@@ -57,27 +59,97 @@ public void testDisposeDeletesFile() throws Exception {
      */
     @Test
     public void testDisposeDoesNotDeleteParentDirectory() throws Exception {
-        File parentDir = tempFolder.newFolder();
-        assertTrue(parentDir.exists());
+        final Path p = new Path(TEST_SCHEME + "://path/with/parent");
+        final List<Path> pathsToDelete = new ArrayList<>();
 
-        File file = new File(parentDir, "test");
-        writeTestData(file);
-        assertTrue(file.exists());
+        initializeFileSystem(
+                (path, ignoredRecursionMarker) -> {
+                    pathsToDelete.add(path);
+                    return true;
+                });
 
-        FileStateHandle handle = new FileStateHandle(Path.fromLocalFile(file), file.length());
+        final FileStateHandle handle = new FileStateHandle(p, 42);
         handle.discardState();
-        assertFalse(file.exists());
-        assertTrue(parentDir.exists());
+        assertThat(pathsToDelete)
+                .as(
+                        "Only one delete call should have happened on the actual path but not the parent.")
+                .singleElement()
+                .isEqualTo(p);

Review comment:
       I agree. But the older version didn't enable us to simulate errors. So, I sticked to it for consistency reasons.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #18963: [FLINK-26450][fs] Adds error handling to FileStateHandle.discardState

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #18963:
URL: https://github.com/apache/flink/pull/18963#issuecomment-1057183349


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "ea7434c46bf8d6777e5003918c034d50c6574f77",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=32452",
       "triggerID" : "ea7434c46bf8d6777e5003918c034d50c6574f77",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c4c53e6b1cf1aa97d6069bd01a3f9c8a7bff0ab9",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=32456",
       "triggerID" : "c4c53e6b1cf1aa97d6069bd01a3f9c8a7bff0ab9",
       "triggerType" : "PUSH"
     }, {
       "hash" : "82b87dc460ebc2093c33609957eabc3970684cbc",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=32498",
       "triggerID" : "82b87dc460ebc2093c33609957eabc3970684cbc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6f86df311845077a3ba77d5a83034349e1e05c3f",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=32507",
       "triggerID" : "6f86df311845077a3ba77d5a83034349e1e05c3f",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1d5701800b54f5d948576e41dec7396edb018acd",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "1d5701800b54f5d948576e41dec7396edb018acd",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 82b87dc460ebc2093c33609957eabc3970684cbc Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=32498) 
   * 6f86df311845077a3ba77d5a83034349e1e05c3f Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=32507) 
   * 1d5701800b54f5d948576e41dec7396edb018acd UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #18963: [FLINK-26450][fs] Adds error handling to FileStateHandle.discardState

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #18963:
URL: https://github.com/apache/flink/pull/18963#issuecomment-1057183349


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "ea7434c46bf8d6777e5003918c034d50c6574f77",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=32452",
       "triggerID" : "ea7434c46bf8d6777e5003918c034d50c6574f77",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c4c53e6b1cf1aa97d6069bd01a3f9c8a7bff0ab9",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=32456",
       "triggerID" : "c4c53e6b1cf1aa97d6069bd01a3f9c8a7bff0ab9",
       "triggerType" : "PUSH"
     }, {
       "hash" : "82b87dc460ebc2093c33609957eabc3970684cbc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=32498",
       "triggerID" : "82b87dc460ebc2093c33609957eabc3970684cbc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6f86df311845077a3ba77d5a83034349e1e05c3f",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=32507",
       "triggerID" : "6f86df311845077a3ba77d5a83034349e1e05c3f",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1d5701800b54f5d948576e41dec7396edb018acd",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=32508",
       "triggerID" : "1d5701800b54f5d948576e41dec7396edb018acd",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5fba698adb6fbe58d8446281b3aa61825fce7e7a",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "5fba698adb6fbe58d8446281b3aa61825fce7e7a",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 1d5701800b54f5d948576e41dec7396edb018acd Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=32508) 
   * 5fba698adb6fbe58d8446281b3aa61825fce7e7a UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] XComp commented on pull request #18963: [FLINK-26450][fs] Adds error handling to FileStateHandle.discardState

Posted by GitBox <gi...@apache.org>.
XComp commented on pull request #18963:
URL: https://github.com/apache/flink/pull/18963#issuecomment-1058893234


   I noticed the `FileSystem.delete` inconsistency also makes problems for the BlobServer cleanup (FLINK-26484). I guess, we need to fix it in the `FileSystem` rather than the `FileStateHandle` to have it consistent everywhere.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] rkhachatryan commented on a change in pull request #18963: [FLINK-26450][fs] Adds error handling to FileStateHandle.discardState

Posted by GitBox <gi...@apache.org>.
rkhachatryan commented on a change in pull request #18963:
URL: https://github.com/apache/flink/pull/18963#discussion_r819362144



##########
File path: flink-runtime/src/main/java/org/apache/flink/runtime/state/filesystem/FileStateHandle.java
##########
@@ -74,15 +74,21 @@ public FSDataInputStream openInputStream() throws IOException {
     }
 
     /**
-     * Discard the state by deleting the file that stores the state. If the parent directory of the
-     * state is empty after deleting the state file, it is also deleted.
+     * Discard the state by deleting the file that stores the state.
      *
      * @throws Exception Thrown, if the file deletion (not the directory deletion) fails.
      */
     @Override
     public void discardState() throws Exception {
-        FileSystem fs = getFileSystem();
-        fs.delete(filePath, false);
+        final FileSystem fs = getFileSystem();
+        if (!fs.exists(filePath)) {
+            return;
+        }
+
+        if (!fs.delete(filePath, false)) {
+            throw new IOException(
+                    filePath.getPath() + " could not be deleted for unknown reasons.");
+        }

Review comment:
       I think current failures are related to concurrent deletions by different actors (JM/TM).
   If so, this code is prone to the same issue, although with lower probability.
   I'm wondering whether it's better to first try delete the file; if that fails then check whether the file exists.
   WDYT?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #18963: [FLINK-26450][fs] Adds error handling to FileStateHandle.discardState

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #18963:
URL: https://github.com/apache/flink/pull/18963#issuecomment-1057183349


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "ea7434c46bf8d6777e5003918c034d50c6574f77",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=32452",
       "triggerID" : "ea7434c46bf8d6777e5003918c034d50c6574f77",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * ea7434c46bf8d6777e5003918c034d50c6574f77 Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=32452) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] XComp edited a comment on pull request #18963: [FLINK-26450][fs] Adds error handling to FileStateHandle.discardState

Posted by GitBox <gi...@apache.org>.
XComp edited a comment on pull request #18963:
URL: https://github.com/apache/flink/pull/18963#issuecomment-1058378672


   I had to revisit the issue because I noticed that the `FileSystem.delete` method is not clear on cases where the underlying file doesn't exist. The `LocalFileSystem` implements the delete method in a way that it would return `false` if it didn't delete the file since it relies on `java.io.File.delete`
   
   This was probably the cause for [this build](https://dev.azure.com/mapohl/flink/_build/results?buildId=808&view=logs&j=d63a5fc4-24ea-51df-9ade-fa4330af161c&t=977479f1-49ea-5c4c-884c-4646ed1443ab) to fail in the e2e tests:
   ```
   2022-03-03 14:30:11,092 INFO  org.apache.flink.runtime.checkpoint.CheckpointCoordinator    [] - Triggering checkpoint 11 (type=CheckpointType{name='Checkpoint', sharingFilesStrategy=FORWARD_BACKWARD}) @ 1646317811091 for job b570100734a17ad72d8d2ccc712f6
   81d.
   2022-03-03 14:30:11,215 INFO  org.apache.flink.runtime.jobmaster.JobMaster                 [] - Triggering stop-with-savepoint for job b570100734a17ad72d8d2ccc712f681d.
   2022-03-03 14:30:11,232 INFO  org.apache.flink.runtime.checkpoint.CheckpointCoordinator    [] - Triggering checkpoint 12 (type=SavepointType{name='Suspend Savepoint', postCheckpointAction=SUSPEND, formatType=CANONICAL}) @ 1646317811228 for job b570100734
   a17ad72d8d2ccc712f681d.
   2022-03-03 14:30:11,259 WARN  org.apache.flink.runtime.checkpoint.CheckpointCoordinator    [] - Received late message for now expired checkpoint attempt 11 from task 275909f41c4e9d1635d1c3d3c1f55b4c of job b570100734a17ad72d8d2ccc712f681d at 127.0.0.1:34
   655-d7bf22 @ localhost (dataPort=37055).
   [...]
   2022-03-03 14:30:11,282 WARN  org.apache.flink.runtime.checkpoint.CheckpointCoordinator    [] - Received late message for now expired checkpoint attempt 11 from task f827493a1120315cebf2c38987fb2709 of job b570100734a17ad72d8d2ccc712f681d at 127.0.0.1:34
   655-d7bf22 @ localhost (dataPort=37055).
   2022-03-03 14:30:11,288 WARN  org.apache.flink.runtime.checkpoint.CheckpointCoordinator    [] - Received late message for now expired checkpoint attempt 11 from task bb54c8be2cceb115193c02f53ce3cf3e of job b570100734a17ad72d8d2ccc712f681d at 127.0.0.1:34
   655-d7bf22 @ localhost (dataPort=37055).
   2022-03-03 14:30:11,282 WARN  org.apache.flink.runtime.checkpoint.OperatorSubtaskState     [] - Error while discarding operator states.
   java.io.IOException: /home/vsts/work/1/s/flink-end-to-end-tests/test-scripts/temp-test-directory-47072687872/savepoint-e2e-test-chckpt-dir/b570100734a17ad72d8d2ccc712f681d/chk-11/73833c1e-bc28-4d68-8752-496d0ea65e8b could not be deleted for unknown reasons.
           at org.apache.flink.runtime.state.filesystem.FileStateHandle.discardState(FileStateHandle.java:86) ~[flink-dist-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]
           at org.apache.flink.runtime.state.KeyGroupsStateHandle.discardState(KeyGroupsStateHandle.java:125) ~[flink-dist-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]
           at org.apache.flink.util.LambdaUtil.applyToAllWhileSuppressingExceptions(LambdaUtil.java:55) ~[flink-dist-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]
           at org.apache.flink.runtime.state.StateUtil.bestEffortDiscardAllStateObjects(StateUtil.java:62) ~[flink-dist-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]
           at org.apache.flink.runtime.checkpoint.OperatorSubtaskState.discardState(OperatorSubtaskState.java:211) ~[flink-dist-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]
           at org.apache.flink.util.LambdaUtil.applyToAllWhileSuppressingExceptions(LambdaUtil.java:55) [flink-dist-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]
           at org.apache.flink.runtime.state.StateUtil.bestEffortDiscardAllStateObjects(StateUtil.java:62) [flink-dist-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]
           at org.apache.flink.runtime.checkpoint.TaskStateSnapshot.discardState(TaskStateSnapshot.java:156) [flink-dist-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]
           at org.apache.flink.runtime.checkpoint.CheckpointCoordinator$1.run(CheckpointCoordinator.java:2007) [flink-dist-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]
           at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_322]
           at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_322]
           at java.lang.Thread.run(Thread.java:750) [?:1.8.0_322]
   ```
   
   @rkhachatryan Is it possible that states are missed to be persisted to disk when there's a concurrent savepoint operation happening?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #18963: [FLINK-26450][fs] Adds error handling to FileStateHandle.discardState

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #18963:
URL: https://github.com/apache/flink/pull/18963#issuecomment-1057183349


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "ea7434c46bf8d6777e5003918c034d50c6574f77",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=32452",
       "triggerID" : "ea7434c46bf8d6777e5003918c034d50c6574f77",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c4c53e6b1cf1aa97d6069bd01a3f9c8a7bff0ab9",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=32456",
       "triggerID" : "c4c53e6b1cf1aa97d6069bd01a3f9c8a7bff0ab9",
       "triggerType" : "PUSH"
     }, {
       "hash" : "82b87dc460ebc2093c33609957eabc3970684cbc",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=32498",
       "triggerID" : "82b87dc460ebc2093c33609957eabc3970684cbc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6f86df311845077a3ba77d5a83034349e1e05c3f",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "6f86df311845077a3ba77d5a83034349e1e05c3f",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 82b87dc460ebc2093c33609957eabc3970684cbc Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=32498) 
   * 6f86df311845077a3ba77d5a83034349e1e05c3f UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] XComp commented on pull request #18963: [FLINK-26450][fs] Adds error handling to FileStateHandle.discardState

Posted by GitBox <gi...@apache.org>.
XComp commented on pull request #18963:
URL: https://github.com/apache/flink/pull/18963#issuecomment-1058067387


   I provided a more-use-case specific version for now to follow the idea of only providing code that is actually needed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] XComp commented on pull request #18963: [FLINK-26450][fs] Adds error handling to FileStateHandle.discardState

Posted by GitBox <gi...@apache.org>.
XComp commented on pull request #18963:
URL: https://github.com/apache/flink/pull/18963#issuecomment-1058274518


   Failure is unrelated to the change but also fixed on `master` already: FLINK-25659


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #18963: [FLINK-26450][fs] Adds error handling to FileStateHandle.discardState

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #18963:
URL: https://github.com/apache/flink/pull/18963#issuecomment-1057183349


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "ea7434c46bf8d6777e5003918c034d50c6574f77",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=32452",
       "triggerID" : "ea7434c46bf8d6777e5003918c034d50c6574f77",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c4c53e6b1cf1aa97d6069bd01a3f9c8a7bff0ab9",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=32456",
       "triggerID" : "c4c53e6b1cf1aa97d6069bd01a3f9c8a7bff0ab9",
       "triggerType" : "PUSH"
     }, {
       "hash" : "82b87dc460ebc2093c33609957eabc3970684cbc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=32498",
       "triggerID" : "82b87dc460ebc2093c33609957eabc3970684cbc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6f86df311845077a3ba77d5a83034349e1e05c3f",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=32507",
       "triggerID" : "6f86df311845077a3ba77d5a83034349e1e05c3f",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1d5701800b54f5d948576e41dec7396edb018acd",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=32508",
       "triggerID" : "1d5701800b54f5d948576e41dec7396edb018acd",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5fba698adb6fbe58d8446281b3aa61825fce7e7a",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=32520",
       "triggerID" : "5fba698adb6fbe58d8446281b3aa61825fce7e7a",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 5fba698adb6fbe58d8446281b3aa61825fce7e7a Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=32520) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] rkhachatryan commented on pull request #18963: [FLINK-26450][fs] Adds error handling to FileStateHandle.discardState

Posted by GitBox <gi...@apache.org>.
rkhachatryan commented on pull request #18963:
URL: https://github.com/apache/flink/pull/18963#issuecomment-1058451265


   I'm not sure I fully understand the question. 
   I think the state is missing for checkpoint 11, not for savepoint 12.
   From the logs, I see that the task in question has already received an abortion notification for checkpoint 11 (and therefore discarded it's state):
   ```
   2022-03-03 14:30:11,241 INFO  org.apache.flink.streaming.runtime.tasks.AsyncCheckpointRunnable [] - ArtificalKeyedStateMapper_Kryo_and_Custom_Stateful (3/4)#0 - asynchronous part of checkpoint 11 could not be completed.
   java.util.concurrent.CancellationException: null
       at java.util.concurrent.FutureTask.report(FutureTask.java:121) ~[?:1.8.0_322]
       at java.util.concurrent.FutureTask.get(FutureTask.java:192) ~[?:1.8.0_322]
       at org.apache.flink.util.concurrent.FutureUtils.runIfNotDoneAndGet(FutureUtils.java:645) ~[flink-dist-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]
       at org.apache.flink.streaming.api.operators.OperatorSnapshotFinalizer.<init>(OperatorSnapshotFinalizer.java:57) ~[flink-dist-1.15-SNAPSHOT .jar:1.15-SNAPSHOT]
       at org.apache.flink.streaming.runtime.tasks.AsyncCheckpointRunnable.finalizeNonFinishedSnapshots(AsyncCheckpointRunnable.java:191) ~[flink -dist-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]
       at org.apache.flink.streaming.runtime.tasks.AsyncCheckpointRunnable.run(AsyncCheckpointRunnable.java:124) [flink-dist-1.15-SNAPSHOT.jar:1. 15-SNAPSHOT]
       at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_322]
       at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_322]
       at java.lang.Thread.run(Thread.java:750) [?:1.8.0_322]
   ```
   
   After that, it shouldn't report it to the JM however.
   It also shouldn't re-use the same file.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] XComp edited a comment on pull request #18963: [FLINK-26450][fs] Adds error handling to FileStateHandle.discardState

Posted by GitBox <gi...@apache.org>.
XComp edited a comment on pull request #18963:
URL: https://github.com/apache/flink/pull/18963#issuecomment-1058378672


   I had to revisit the issue because I noticed that the `FileSystem.delete` method is not clear on cases where the underlying file doesn't exist. The `LocalFileSystem` implements the delete method in a way that it would return `false` if it didn't delete the file since it relies on `java.io.File.delete`
   
   This was probably the cause for [this build](https://dev.azure.com/mapohl/flink/_build/results?buildId=808&view=logs&j=d63a5fc4-24ea-51df-9ade-fa4330af161c&t=977479f1-49ea-5c4c-884c-4646ed1443ab) to fail in the e2e tests:
   ```
   2022-03-03 14:30:11,092 INFO  org.apache.flink.runtime.checkpoint.CheckpointCoordinator    [] - Triggering checkpoint 11 (type=CheckpointType{name='Checkpoint', sharingFilesStrategy=FORWARD_BACKWARD}) @ 1646317811091 for job b570100734a17ad72d8d2ccc712f6
   81d.
   2022-03-03 14:30:11,215 INFO  org.apache.flink.runtime.jobmaster.JobMaster                 [] - Triggering stop-with-savepoint for job b570100734a17ad72d8d2ccc712f681d.
   2022-03-03 14:30:11,232 INFO  org.apache.flink.runtime.checkpoint.CheckpointCoordinator    [] - Triggering checkpoint 12 (type=SavepointType{name='Suspend Savepoint', postCheckpointAction=SUSPEND, formatType=CANONICAL}) @ 1646317811228 for job b570100734
   a17ad72d8d2ccc712f681d.
   2022-03-03 14:30:11,259 WARN  org.apache.flink.runtime.checkpoint.CheckpointCoordinator    [] - Received late message for now expired checkpoint attempt 11 from task 275909f41c4e9d1635d1c3d3c1f55b4c of job b570100734a17ad72d8d2ccc712f681d at 127.0.0.1:34
   655-d7bf22 @ localhost (dataPort=37055).
   [...]
   2022-03-03 14:30:11,282 WARN  org.apache.flink.runtime.checkpoint.CheckpointCoordinator    [] - Received late message for now expired checkpoint attempt 11 from task f827493a1120315cebf2c38987fb2709 of job b570100734a17ad72d8d2ccc712f681d at 127.0.0.1:34
   655-d7bf22 @ localhost (dataPort=37055).
   2022-03-03 14:30:11,288 WARN  org.apache.flink.runtime.checkpoint.CheckpointCoordinator    [] - Received late message for now expired checkpoint attempt 11 from task bb54c8be2cceb115193c02f53ce3cf3e of job b570100734a17ad72d8d2ccc712f681d at 127.0.0.1:34
   655-d7bf22 @ localhost (dataPort=37055).
   2022-03-03 14:30:11,282 WARN  org.apache.flink.runtime.checkpoint.OperatorSubtaskState     [] - Error while discarding operator states.
   java.io.IOException: /home/vsts/work/1/s/flink-end-to-end-tests/test-scripts/temp-test-directory-47072687872/savepoint-e2e-test-chckpt-dir/b570100734a17ad72d8d2ccc712f681d/chk-11/73833c1e-bc28-4d68-8752-496d0ea65e8b could not be deleted for unknown reasons.
           at org.apache.flink.runtime.state.filesystem.FileStateHandle.discardState(FileStateHandle.java:86) ~[flink-dist-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]
           at org.apache.flink.runtime.state.KeyGroupsStateHandle.discardState(KeyGroupsStateHandle.java:125) ~[flink-dist-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]
           at org.apache.flink.util.LambdaUtil.applyToAllWhileSuppressingExceptions(LambdaUtil.java:55) ~[flink-dist-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]
           at org.apache.flink.runtime.state.StateUtil.bestEffortDiscardAllStateObjects(StateUtil.java:62) ~[flink-dist-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]
           at org.apache.flink.runtime.checkpoint.OperatorSubtaskState.discardState(OperatorSubtaskState.java:211) ~[flink-dist-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]
           at org.apache.flink.util.LambdaUtil.applyToAllWhileSuppressingExceptions(LambdaUtil.java:55) [flink-dist-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]
           at org.apache.flink.runtime.state.StateUtil.bestEffortDiscardAllStateObjects(StateUtil.java:62) [flink-dist-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]
           at org.apache.flink.runtime.checkpoint.TaskStateSnapshot.discardState(TaskStateSnapshot.java:156) [flink-dist-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]
           at org.apache.flink.runtime.checkpoint.CheckpointCoordinator$1.run(CheckpointCoordinator.java:2007) [flink-dist-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]
           at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_322]
           at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_322]
           at java.lang.Thread.run(Thread.java:750) [?:1.8.0_322]
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #18963: [FLINK-26450][fs] Adds error handling to FileStateHandle.discardState

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #18963:
URL: https://github.com/apache/flink/pull/18963#issuecomment-1057183349


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "ea7434c46bf8d6777e5003918c034d50c6574f77",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=32452",
       "triggerID" : "ea7434c46bf8d6777e5003918c034d50c6574f77",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c4c53e6b1cf1aa97d6069bd01a3f9c8a7bff0ab9",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=32456",
       "triggerID" : "c4c53e6b1cf1aa97d6069bd01a3f9c8a7bff0ab9",
       "triggerType" : "PUSH"
     }, {
       "hash" : "82b87dc460ebc2093c33609957eabc3970684cbc",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=32498",
       "triggerID" : "82b87dc460ebc2093c33609957eabc3970684cbc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6f86df311845077a3ba77d5a83034349e1e05c3f",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=32507",
       "triggerID" : "6f86df311845077a3ba77d5a83034349e1e05c3f",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 82b87dc460ebc2093c33609957eabc3970684cbc Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=32498) 
   * 6f86df311845077a3ba77d5a83034349e1e05c3f Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=32507) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #18963: [FLINK-26450][fs] Adds error handling to FileStateHandle.discardState

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #18963:
URL: https://github.com/apache/flink/pull/18963#issuecomment-1057183349


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "ea7434c46bf8d6777e5003918c034d50c6574f77",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=32452",
       "triggerID" : "ea7434c46bf8d6777e5003918c034d50c6574f77",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c4c53e6b1cf1aa97d6069bd01a3f9c8a7bff0ab9",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "c4c53e6b1cf1aa97d6069bd01a3f9c8a7bff0ab9",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * ea7434c46bf8d6777e5003918c034d50c6574f77 Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=32452) 
   * c4c53e6b1cf1aa97d6069bd01a3f9c8a7bff0ab9 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #18963: [FLINK-26450][fs] Adds error handling to FileStateHandle.discardState

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #18963:
URL: https://github.com/apache/flink/pull/18963#issuecomment-1057183349


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "ea7434c46bf8d6777e5003918c034d50c6574f77",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=32452",
       "triggerID" : "ea7434c46bf8d6777e5003918c034d50c6574f77",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c4c53e6b1cf1aa97d6069bd01a3f9c8a7bff0ab9",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=32456",
       "triggerID" : "c4c53e6b1cf1aa97d6069bd01a3f9c8a7bff0ab9",
       "triggerType" : "PUSH"
     }, {
       "hash" : "82b87dc460ebc2093c33609957eabc3970684cbc",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=32498",
       "triggerID" : "82b87dc460ebc2093c33609957eabc3970684cbc",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * c4c53e6b1cf1aa97d6069bd01a3f9c8a7bff0ab9 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=32456) 
   * 82b87dc460ebc2093c33609957eabc3970684cbc Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=32498) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] rkhachatryan commented on a change in pull request #18963: [FLINK-26450][fs] Adds error handling to FileStateHandle.discardState

Posted by GitBox <gi...@apache.org>.
rkhachatryan commented on a change in pull request #18963:
URL: https://github.com/apache/flink/pull/18963#discussion_r818939316



##########
File path: flink-runtime/src/test/java/org/apache/flink/runtime/state/filesystem/FileStateHandleTest.java
##########
@@ -57,27 +59,97 @@ public void testDisposeDeletesFile() throws Exception {
      */
     @Test
     public void testDisposeDoesNotDeleteParentDirectory() throws Exception {
-        File parentDir = tempFolder.newFolder();
-        assertTrue(parentDir.exists());
+        final Path p = new Path(TEST_SCHEME + "://path/with/parent");
+        final List<Path> pathsToDelete = new ArrayList<>();
 
-        File file = new File(parentDir, "test");
-        writeTestData(file);
-        assertTrue(file.exists());
+        initializeFileSystem(
+                (path, ignoredRecursionMarker) -> {
+                    pathsToDelete.add(path);
+                    return true;
+                });
 
-        FileStateHandle handle = new FileStateHandle(Path.fromLocalFile(file), file.length());
+        final FileStateHandle handle = new FileStateHandle(p, 42);
         handle.discardState();
-        assertFalse(file.exists());
-        assertTrue(parentDir.exists());
+        assertThat(pathsToDelete)
+                .as(
+                        "Only one delete call should have happened on the actual path but not the parent.")
+                .singleElement()
+                .isEqualTo(p);

Review comment:
       TBH, the older version seems more readable to me. It also doesn't need to be updated if `FileStateHandle` internals are changed.
   But I'm fine with this version as well.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] rkhachatryan commented on a change in pull request #18963: [FLINK-26450][fs] Adds error handling to FileStateHandle.discardState

Posted by GitBox <gi...@apache.org>.
rkhachatryan commented on a change in pull request #18963:
URL: https://github.com/apache/flink/pull/18963#discussion_r818969646



##########
File path: flink-runtime/src/test/java/org/apache/flink/runtime/state/filesystem/FileStateHandleTest.java
##########
@@ -57,27 +59,97 @@ public void testDisposeDeletesFile() throws Exception {
      */
     @Test
     public void testDisposeDoesNotDeleteParentDirectory() throws Exception {
-        File parentDir = tempFolder.newFolder();
-        assertTrue(parentDir.exists());
+        final Path p = new Path(TEST_SCHEME + "://path/with/parent");
+        final List<Path> pathsToDelete = new ArrayList<>();
 
-        File file = new File(parentDir, "test");
-        writeTestData(file);
-        assertTrue(file.exists());
+        initializeFileSystem(
+                (path, ignoredRecursionMarker) -> {
+                    pathsToDelete.add(path);
+                    return true;
+                });
 
-        FileStateHandle handle = new FileStateHandle(Path.fromLocalFile(file), file.length());
+        final FileStateHandle handle = new FileStateHandle(p, 42);
         handle.discardState();
-        assertFalse(file.exists());
-        assertTrue(parentDir.exists());
+        assertThat(pathsToDelete)
+                .as(
+                        "Only one delete call should have happened on the actual path but not the parent.")
+                .singleElement()
+                .isEqualTo(p);

Review comment:
       I see. Thanks for clarifying.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #18963: [FLINK-26450][fs] Adds error handling to FileStateHandle.discardState

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #18963:
URL: https://github.com/apache/flink/pull/18963#issuecomment-1057183349


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "ea7434c46bf8d6777e5003918c034d50c6574f77",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=32452",
       "triggerID" : "ea7434c46bf8d6777e5003918c034d50c6574f77",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c4c53e6b1cf1aa97d6069bd01a3f9c8a7bff0ab9",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=32456",
       "triggerID" : "c4c53e6b1cf1aa97d6069bd01a3f9c8a7bff0ab9",
       "triggerType" : "PUSH"
     }, {
       "hash" : "82b87dc460ebc2093c33609957eabc3970684cbc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=32498",
       "triggerID" : "82b87dc460ebc2093c33609957eabc3970684cbc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6f86df311845077a3ba77d5a83034349e1e05c3f",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=32507",
       "triggerID" : "6f86df311845077a3ba77d5a83034349e1e05c3f",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1d5701800b54f5d948576e41dec7396edb018acd",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=32508",
       "triggerID" : "1d5701800b54f5d948576e41dec7396edb018acd",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5fba698adb6fbe58d8446281b3aa61825fce7e7a",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=32520",
       "triggerID" : "5fba698adb6fbe58d8446281b3aa61825fce7e7a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "2c4b296a1bb88c2fd33877a8c1ad3362af9d93ed",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "2c4b296a1bb88c2fd33877a8c1ad3362af9d93ed",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 5fba698adb6fbe58d8446281b3aa61825fce7e7a Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=32520) 
   * 2c4b296a1bb88c2fd33877a8c1ad3362af9d93ed UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org