You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by StephanEwen <gi...@git.apache.org> on 2016/09/23 13:24:16 UTC

[GitHub] flink pull request #2544: [FLINK-4218] [checkpoints] Do not rely on FileSyst...

GitHub user StephanEwen opened a pull request:

    https://github.com/apache/flink/pull/2544

    [FLINK-4218] [checkpoints] Do not rely on FileSystem to determine state sizes

    This prevents failures on eventually consistent S3, where the operations for keys (=entries in the parent directory/bucket) are not guaranteed to be immediately consistent (visible) after a blob was written.
    
    Not relying on any operation on keys (= requesting `FileStatus`) should mitigate the problem.
    
    This also changes the exception signature from `getStateSize()` from `Exception` to `IOException`, which fits more natural with the exception signatures of some other I/O methods.
    
    Related issue: We may still want to have retries on `FileStatus` operations on S3, for other parts of the system (like FileOutputFormats)

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/StephanEwen/incubator-flink state_size_fix

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/2544.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2544
    
----
commit 7ce2de7e14b9a1fb24c27dc674f689f08abdf7cc
Author: Stephan Ewen <se...@apache.org>
Date:   2016-09-23T13:16:27Z

    [FLINK-4218] [checkpoints] Do not rely on FileSystem to determing state sizes
    
    This prevents failures on eventually consistent S3, where the operations for
    keys (=entries in the parent directory/bucket) are not guaranteed to be immediately
     consistent (visible) after a blob was written.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request #2544: [FLINK-4218] [checkpoints] Do not rely on FileSyst...

Posted by StephanEwen <gi...@git.apache.org>.
Github user StephanEwen closed the pull request at:

    https://github.com/apache/flink/pull/2544


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #2544: [FLINK-4218] [checkpoints] Do not rely on FileSystem to d...

Posted by StephanEwen <gi...@git.apache.org>.
Github user StephanEwen commented on the issue:

    https://github.com/apache/flink/pull/2544
  
    Manually merged in 95e9004e36fffae755eab7aa3d5d0d5e8bfb7113


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request #2544: [FLINK-4218] [checkpoints] Do not rely on FileSyst...

Posted by StefanRRichter <gi...@git.apache.org>.
Github user StefanRRichter commented on a diff in the pull request:

    https://github.com/apache/flink/pull/2544#discussion_r80318790
  
    --- Diff: flink-runtime/src/main/java/org/apache/flink/runtime/state/filesystem/FsCheckpointStreamFactory.java ---
    @@ -301,9 +301,16 @@ public StreamStateHandle closeAndGetHandle() throws IOException {
     					}
     					else {
     						flush();
    +
    +						long size = -1;
    --- End diff --
    
    I am not sure if returning -1 as size on exception is ideal. Currently, this value should one be used in the calculation of meta data, but one might be tempted to use it e.g. to preallocate a byte[] to read the file into, so this should at least be documented in `StateObject`. Furthermore, we make the assumption that the stream position is also equal to the final file size. Not entirely sure if this holds for all streams and file systems, but I guess this is the best we can do without asking the file system for meta data.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request #2544: [FLINK-4218] [checkpoints] Do not rely on FileSyst...

Posted by StefanRRichter <gi...@git.apache.org>.
Github user StefanRRichter commented on a diff in the pull request:

    https://github.com/apache/flink/pull/2544#discussion_r80319776
  
    --- Diff: flink-runtime/src/main/java/org/apache/flink/runtime/state/StateObject.java ---
    @@ -47,7 +49,7 @@
     	 * <p>If the the size is not known, return {@code 0}.
     	 *
     	 * @return Size of the state in bytes.
    -	 * @throws Exception If the operation fails during size retrieval.
    +	 * @throws IOException If the operation fails during size retrieval.
     	 */
    -	long getStateSize() throws Exception;
    +	long getStateSize() throws IOException;
    --- End diff --
    
    I think with the change in `StreamStateHandle`, even throwing IOException becomes obsolete now for all existing implementations. We might remove it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request #2544: [FLINK-4218] [checkpoints] Do not rely on FileSyst...

Posted by StephanEwen <gi...@git.apache.org>.
Github user StephanEwen commented on a diff in the pull request:

    https://github.com/apache/flink/pull/2544#discussion_r80463412
  
    --- Diff: flink-runtime/src/main/java/org/apache/flink/runtime/state/StateObject.java ---
    @@ -47,7 +49,7 @@
     	 * <p>If the the size is not known, return {@code 0}.
     	 *
     	 * @return Size of the state in bytes.
    -	 * @throws Exception If the operation fails during size retrieval.
    +	 * @throws IOException If the operation fails during size retrieval.
     	 */
    -	long getStateSize() throws Exception;
    +	long getStateSize() throws IOException;
    --- End diff --
    
    I think we should do that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request #2544: [FLINK-4218] [checkpoints] Do not rely on FileSyst...

Posted by StephanEwen <gi...@git.apache.org>.
Github user StephanEwen commented on a diff in the pull request:

    https://github.com/apache/flink/pull/2544#discussion_r80463624
  
    --- Diff: flink-runtime/src/main/java/org/apache/flink/runtime/state/filesystem/FsCheckpointStreamFactory.java ---
    @@ -301,9 +301,16 @@ public StreamStateHandle closeAndGetHandle() throws IOException {
     					}
     					else {
     						flush();
    +
    +						long size = -1;
    --- End diff --
    
    Stream position should be okay to determine the state size. All instances I checked were accurate there.
    Also, given that the size is more informational and should not be relied upon, it should be all the less critical.
    As 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #2544: [FLINK-4218] [checkpoints] Do not rely on FileSystem to d...

Posted by StephanEwen <gi...@git.apache.org>.
Github user StephanEwen commented on the issue:

    https://github.com/apache/flink/pull/2544
  
    @StefanRRichter Maybe interesting for you to review.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---