You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by pnowojski <gi...@git.apache.org> on 2017/10/20 15:50:10 UTC

[GitHub] flink pull request #4876: [FLINK-7737][filesystem] Add syncOnFlush flag to S...

GitHub user pnowojski opened a pull request:

    https://github.com/apache/flink/pull/4876

    [FLINK-7737][filesystem] Add syncOnFlush flag to StreamWriterBase

    ## What is the purpose of the change
    
    It depends whether to call hsync or hflush on the underlying file system and user preferences. Normally hflush is enough to protect against single machine HDFS failures and against TaskManagers failures. However if user is using S3 like file system, or wants to protect against whole HDFS rack power loss hsync must be used instead.
    
    This is a stop gap solution until proper fix waiting for https://issues.apache.org/jira/browse/FLINK-5789
    
    ## Verifying this change
    
    This change is hard to test :( One could think about writing a unit test using mocks, but that would only copy the implementation.
    
    ## Does this pull request potentially affect one of the following parts:
    
      - Dependencies (does it add or upgrade a dependency): (yes / **no**)
      - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (yes / **no**)
      - The serializers: (yes / **no** / don't know)
      - The runtime per-record code paths (performance sensitive): (yes / **no** / don't know)
      - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: (yes / **no** / don't know)
    
    ## Documentation
    
      - Does this pull request introduce a new feature? (**yes** / no)
      - If yes, how is the feature documented? (not applicable / docs / **JavaDocs** / not documented)
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/pnowojski/flink f7737

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/4876.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #4876
    
----
commit 8d1c34197d1098cbd5a56ec882da17f42410c1f4
Author: Piotr Nowojski <pi...@gmail.com>
Date:   2017-10-20T14:47:52Z

    [FLINK-7737][filesystem] Add syncOnFlush flag to StreamWriterBase
    
    It depends whether to call hsync or hflush on the underlying file system
    and user preferences. Normally hflush is enough to protect against single
    machine HDFS failures and against TaskManagers failures. However if user is
    using S3 like file system, or wants to protect againt whole HDFS rack power
    loss hsync must be used instead.

----


---

[GitHub] flink issue #4876: [FLINK-7737][filesystem] Add syncOnFlush flag to StreamWr...

Posted by StephanEwen <gi...@git.apache.org>.
Github user StephanEwen commented on the issue:

    https://github.com/apache/flink/pull/4876
  
    But do you understand what I mean? Semantics of code in the main scope should not be quirked to make assertions in tests shorter to write.
    
    Equals/hashCode is usually not implemented on I/O classes, like the output stream, because it is not well defined.


---

[GitHub] flink issue #4876: [FLINK-7737][filesystem] Add syncOnFlush flag to StreamWr...

Posted by aljoscha <gi...@git.apache.org>.
Github user aljoscha commented on the issue:

    https://github.com/apache/flink/pull/4876
  
    I merged this, could you please close the PR?


---

[GitHub] flink issue #4876: [FLINK-7737][filesystem] Add syncOnFlush flag to StreamWr...

Posted by aljoscha <gi...@git.apache.org>.
Github user aljoscha commented on the issue:

    https://github.com/apache/flink/pull/4876
  
    Excellent changes, I'll merge along with a bunch of other stuff. 👍 


---

[GitHub] flink pull request #4876: [FLINK-7737][filesystem] Add syncOnFlush flag to S...

Posted by pnowojski <gi...@git.apache.org>.
Github user pnowojski closed the pull request at:

    https://github.com/apache/flink/pull/4876


---

[GitHub] flink issue #4876: [FLINK-7737][filesystem] Add syncOnFlush flag to StreamWr...

Posted by pnowojski <gi...@git.apache.org>.
Github user pnowojski commented on the issue:

    https://github.com/apache/flink/pull/4876
  
    Thanks! 


---

[GitHub] flink issue #4876: [FLINK-7737][filesystem] Add syncOnFlush flag to StreamWr...

Posted by StephanEwen <gi...@git.apache.org>.
Github user StephanEwen commented on the issue:

    https://github.com/apache/flink/pull/4876
  
    A quick post-mortem comment here:
    
    This adds a lot of `equals()` and `hashCode()` on classes where these are ill-defined.
    
    For example: `StreamWriterBase` defines `equals()` and `hashCode()` just on a subset of configuration fields (here the sync field) and ignore the associated stream, because equals and hash is ill-defined on the stream. To me, the correct conclusion is that `equals()` and `hashCode()` are ill-defined on `StreamWriterBase` and should not be there!
    
    Adding such methods just to make assertion statement in tests more compact wrongly pushes some specific test logic in to the main classes. The correct way is to adjust the assertions in the test, or, if there is a lot of repetitive checking, create a `Matcher` that matches "equality based on some fields" and replace `assertEquals(X, Y)` with `assertThat(matcher, X, Y)`.
    
    I think in this case, it is actually a reason for a follow-up patch that changes this.


---

[GitHub] flink issue #4876: [FLINK-7737][filesystem] Add syncOnFlush flag to StreamWr...

Posted by pnowojski <gi...@git.apache.org>.
Github user pnowojski commented on the issue:

    https://github.com/apache/flink/pull/4876
  
    I assumed that it is up to concrete implementations of `StreamWriterBase` to implement `equals` completely, for example based on `configuration`. Alternatively including `outStream` in equality/hash implementation is also a solution, with slightly different semantic.


---

[GitHub] flink issue #4876: [FLINK-7737][filesystem] Add syncOnFlush flag to StreamWr...

Posted by pnowojski <gi...@git.apache.org>.
Github user pnowojski commented on the issue:

    https://github.com/apache/flink/pull/4876
  
    Yes I know what you mean. However including `outStream` to `hashCode` and `equals` wouldn't add any quirks.


---