You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by pnowojski <gi...@git.apache.org> on 2017/10/04 16:13:23 UTC

[GitHub] flink pull request #4774: [FLINK-6495] Fix Akka's default value for heartbea...

GitHub user pnowojski opened a pull request:

    https://github.com/apache/flink/pull/4774

    [FLINK-6495] Fix Akka's default value for heartbeat pause

    ## Brief change log
    
    This PR consists of two hotfixes regarding akka's heartbeat pause. The critical one is reverting it's default value from 10s back to 60s (bug introduced by #3935)
    
    ## Verifying this change
    
    This change is already covered by existing tests
    
    ## Does this pull request potentially affect one of the following parts:
    
      - Dependencies (does it add or upgrade a dependency): (yes / **no**)
      - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (**yes (default config values)** / no)
      - The serializers: (yes / **no** / don't know)
      - The runtime per-record code paths (performance sensitive): (yes / **no** / don't know)
      - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: (**yes** / no / don't know)
    
    ## Documentation
    
      - Does this pull request introduce a new feature? (yes / **no**)
      - If yes, how is the feature documented? (**not applicable** / docs / JavaDocs / not documented)
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/pnowojski/flink akka

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/4774.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #4774
    
----
commit 0f17a2998d343bc78bb01533a6fcf847d657c38c
Author: Piotr Nowojski <pi...@gmail.com>
Date:   2017-10-02T17:33:46Z

    [hotfix][config] Revert heartbeat pause back to 60s
    
    This fixes an important bug introduced by FLINK-6495. Heartbeat pause MUST be
    significantly larger then heartbeat interval.

commit 07fd359be00c8de6c1473c6c5631b4bdeee0c586
Author: Piotr Nowojski <pi...@gmail.com>
Date:   2017-10-04T11:26:16Z

    [hotfix][runtime] Fix default value for restart delay
    
    1. Previously default value didn't match with an exception message being thrown.
    2. HEARTBEAT_PAUSE is more sane default value for the delay, becauce we want to
    wait long enough before restart, for actors to realize about previous crash.

----


---

[GitHub] flink pull request #4774: [FLINK-6495] Fix Akka's default value for heartbea...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/flink/pull/4774


---

[GitHub] flink issue #4774: [FLINK-6495] Fix Akka's default value for heartbeat pause

Posted by pnowojski <gi...@git.apache.org>.
Github user pnowojski commented on the issue:

    https://github.com/apache/flink/pull/4774
  
    I have added runtime check for that. To be clear, this was not the reason for Kafka tests instabilities and I'm not aware if this was causing any issues. But it definitely could and should be fixed anyway (IMO that should be a release blocker)


---

[GitHub] flink pull request #4774: [FLINK-6495] Fix Akka's default value for heartbea...

Posted by NicoK <gi...@git.apache.org>.
Github user NicoK commented on a diff in the pull request:

    https://github.com/apache/flink/pull/4774#discussion_r147743853
  
    --- Diff: flink-runtime/src/main/scala/org/apache/flink/runtime/akka/AkkaUtils.scala ---
    @@ -257,6 +257,20 @@ object AkkaUtils {
         ConfigFactory.parseString(config)
       }
     
    +  private def validateHeartbeat(pauseParamName: String,
    +                                pauseValue: String,
    +                                intervalParamName: String,
    +                                intervalValue: String) = {
    +    if (Duration.apply(pauseValue).lteq(Duration.apply(intervalValue))) {
    +      throw new IllegalConfigurationException(
    +        "%s [%s] must greater then %s [%s]",
    --- End diff --
    
    this should actually be "than", not "then"


---

[GitHub] flink pull request #4774: [FLINK-6495] Fix Akka's default value for heartbea...

Posted by StephanEwen <gi...@git.apache.org>.
Github user StephanEwen commented on a diff in the pull request:

    https://github.com/apache/flink/pull/4774#discussion_r143246933
  
    --- Diff: flink-core/src/main/java/org/apache/flink/configuration/IllegalConfigurationException.java ---
    @@ -42,6 +42,17 @@ public IllegalConfigurationException(String message) {
     
     	/**
     	 * Constructs an new IllegalConfigurationException with the given error message
    +	 * format and arguments.
    +	 *
    +	 * @param format The error message format for the exception.
    +	 * @param arguments The arguments for the format.
    +	 */
    +	public IllegalConfigurationException(String format, Object... arguments) {
    --- End diff --
    
    Curious: Why introduce the extra constructor and not call `String.format(...)` where the exception is created?


---

[GitHub] flink pull request #4774: [FLINK-6495] Fix Akka's default value for heartbea...

Posted by StephanEwen <gi...@git.apache.org>.
Github user StephanEwen commented on a diff in the pull request:

    https://github.com/apache/flink/pull/4774#discussion_r143248194
  
    --- Diff: flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/restart/FixedDelayRestartStrategy.java ---
    @@ -82,7 +82,7 @@ public static FixedDelayRestartStrategyFactory createFactory(Configuration confi
     		int maxAttempts = configuration.getInteger(ConfigConstants.RESTART_STRATEGY_FIXED_DELAY_ATTEMPTS, 1);
     
     		String timeoutString = configuration.getString(
    -			AkkaOptions.WATCH_HEARTBEAT_INTERVAL);
    +			AkkaOptions.WATCH_HEARTBEAT_PAUSE);
    --- End diff --
    
    We cannot make this change, this introduces crazy delay on each recovery.


---

[GitHub] flink pull request #4774: [FLINK-6495] Fix Akka's default value for heartbea...

Posted by pnowojski <gi...@git.apache.org>.
Github user pnowojski commented on a diff in the pull request:

    https://github.com/apache/flink/pull/4774#discussion_r143416770
  
    --- Diff: flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/restart/FixedDelayRestartStrategy.java ---
    @@ -82,7 +82,7 @@ public static FixedDelayRestartStrategyFactory createFactory(Configuration confi
     		int maxAttempts = configuration.getInteger(ConfigConstants.RESTART_STRATEGY_FIXED_DELAY_ATTEMPTS, 1);
     
     		String timeoutString = configuration.getString(
    -			AkkaOptions.WATCH_HEARTBEAT_INTERVAL);
    +			AkkaOptions.WATCH_HEARTBEAT_PAUSE);
    --- End diff --
    
    I will fix the inconsistency in other way: fixing exception message instead of this getter.


---

[GitHub] flink pull request #4774: [FLINK-6495] Fix Akka's default value for heartbea...

Posted by pnowojski <gi...@git.apache.org>.
Github user pnowojski commented on a diff in the pull request:

    https://github.com/apache/flink/pull/4774#discussion_r143416489
  
    --- Diff: flink-core/src/main/java/org/apache/flink/configuration/IllegalConfigurationException.java ---
    @@ -42,6 +42,17 @@ public IllegalConfigurationException(String message) {
     
     	/**
     	 * Constructs an new IllegalConfigurationException with the given error message
    +	 * format and arguments.
    +	 *
    +	 * @param format The error message format for the exception.
    +	 * @param arguments The arguments for the format.
    +	 */
    +	public IllegalConfigurationException(String format, Object... arguments) {
    --- End diff --
    
    Convenience - reduces boiler plate. I prefer this:
    ```
    throw new IllegalConfigurationException("%s %s", foo, bar);
    ```
    to that:
    ```
    throw new IllegalConfigurationException(String.format("%s %s", foo, bar));
    ```
    same as it is done in loggers.


---

[GitHub] flink issue #4774: [FLINK-6495] Fix Akka's default value for heartbeat pause

Posted by pnowojski <gi...@git.apache.org>.
Github user pnowojski commented on the issue:

    https://github.com/apache/flink/pull/4774
  
    Made requested changes to fixed delay strategy and added one more hot fix regarding akka's documentation (last commit).


---

[GitHub] flink issue #4774: [FLINK-6495] Fix Akka's default value for heartbeat pause

Posted by zentol <gi...@git.apache.org>.
Github user zentol commented on the issue:

    https://github.com/apache/flink/pull/4774
  
    Given that this caused instabilities shouldn't we introduce a runtime check to make sure these options are configure correctly in relation to each other? We should also properly document it in the javadocs that these values have a strong relationship.


---