You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by tillrohrmann <gi...@git.apache.org> on 2015/02/05 21:05:57 UTC

[GitHub] flink pull request: [FLINK-1484] Adds explicit disconnect message ...

GitHub user tillrohrmann opened a pull request:

    https://github.com/apache/flink/pull/368

    [FLINK-1484] Adds explicit disconnect message for TaskManagers

    In case of a JobManager restart, which can be caused by an uncaught exception, all connected TaskManager are notified by a ```Disconnected``` message. This message triggers the cleanup of the TaskManagers and makes them try to reconnect to the JobManager.
    
    This PR also includes a test case to verify the afore-mentioned behaviour.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/tillrohrmann/flink akkaSupervision

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/368.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #368
    
----
commit 081721a01b5070dccdc807bf3ecd14d5b9de1b24
Author: Till Rohrmann <tr...@apache.org>
Date:   2015-02-05T19:58:59Z

    Adds explicit disconnect message to tell TaskManagers that the JobManager has failed

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-1484] Adds explicit disconnect message ...

Posted by StephanEwen <gi...@git.apache.org>.
Github user StephanEwen commented on the pull request:

    https://github.com/apache/flink/pull/368#issuecomment-73478622
  
    My bad, there is a ticket already. Can you squash the commits then and add the ticket tag?
    
    Otherwise, good to merge!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-1484] Adds explicit disconnect message ...

Posted by tillrohrmann <gi...@git.apache.org>.
Github user tillrohrmann commented on the pull request:

    https://github.com/apache/flink/pull/368#issuecomment-73479833
  
    Yes, I'll do it.
    
    On Mon, Feb 9, 2015 at 10:23 AM, Stephan Ewen <no...@github.com>
    wrote:
    
    > My bad, there is a ticket already. Can you squash the commits then and add
    > the ticket tag?
    >
    > Otherwise, good to merge!
    >
    > —
    > Reply to this email directly or view it on GitHub
    > <https://github.com/apache/flink/pull/368#issuecomment-73478622>.
    >



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-1484] Adds explicit disconnect message ...

Posted by uce <gi...@git.apache.org>.
Github user uce commented on the pull request:

    https://github.com/apache/flink/pull/368#issuecomment-75254417
  
    @tillrohrmann and @StephanEwen worked on some other reliablity issues. Will the changes in this PR be subsumed by the upcoming changes? If not, we should merge this. :-)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-1484] Adds explicit disconnect message ...

Posted by tillrohrmann <gi...@git.apache.org>.
Github user tillrohrmann commented on a diff in the pull request:

    https://github.com/apache/flink/pull/368#discussion_r24227786
  
    --- Diff: flink-runtime/src/main/scala/org/apache/flink/runtime/jobmanager/JobManager.scala ---
    @@ -125,6 +126,10 @@ Actor with ActorLogMessages with ActorLogging {
       override def postStop(): Unit = {
         log.info(s"Stopping job manager ${self.path}.")
     
    +    // disconnect the registered task managers
    +    instanceManager.getAllRegisteredInstances.asScala.foreach{
    +      _.getTaskManager ! Disconnected("JobManager is stopping")}
    +
         for((e,_) <- currentJobs.values){
           e.fail(new Exception("The JobManager is shutting down."))
    --- End diff --
    
    Thanks Henry for spotting it. I corrected it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-1484] Adds explicit disconnect message ...

Posted by hsaputra <gi...@git.apache.org>.
Github user hsaputra commented on a diff in the pull request:

    https://github.com/apache/flink/pull/368#discussion_r24220454
  
    --- Diff: flink-runtime/src/main/scala/org/apache/flink/runtime/jobmanager/JobManager.scala ---
    @@ -125,6 +126,10 @@ Actor with ActorLogMessages with ActorLogging {
       override def postStop(): Unit = {
         log.info(s"Stopping job manager ${self.path}.")
     
    +    // disconnect the registered task managers
    +    instanceManager.getAllRegisteredInstances.asScala.foreach{
    +      _.getTaskManager ! Disconnected("JobManager is stopping")}
    +
         for((e,_) <- currentJobs.values){
           e.fail(new Exception("The JobManager is shutting down."))
    --- End diff --
    
    Since we are cleaning up messages, maybe remove "The" so it is consistent with other messages.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-1484] Adds explicit disconnect message ...

Posted by tillrohrmann <gi...@git.apache.org>.
Github user tillrohrmann commented on a diff in the pull request:

    https://github.com/apache/flink/pull/368#discussion_r24236048
  
    --- Diff: flink-runtime/src/main/scala/org/apache/flink/runtime/messages/TaskManagerMessages.scala ---
    @@ -129,6 +129,13 @@ object TaskManagerMessages {
        * @param cause reason for the external failure
        */
       case class FailTask(executionID: ExecutionAttemptID, cause: Throwable)
    +
    +  /**
    +   * Makes the TaskManager to disconnect from the registered JobManager
    --- End diff --
    
    You're right. Thanks, I changed it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-1484] Adds explicit disconnect message ...

Posted by mxm <gi...@git.apache.org>.
Github user mxm commented on a diff in the pull request:

    https://github.com/apache/flink/pull/368#discussion_r24232671
  
    --- Diff: flink-runtime/src/main/scala/org/apache/flink/runtime/messages/TaskManagerMessages.scala ---
    @@ -129,6 +129,13 @@ object TaskManagerMessages {
        * @param cause reason for the external failure
        */
       case class FailTask(executionID: ExecutionAttemptID, cause: Throwable)
    +
    +  /**
    +   * Makes the TaskManager to disconnect from the registered JobManager
    --- End diff --
    
    Minor typo. Comment could be changed to "Disconnects the TaskManager from the JobManager"


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-1484] Adds explicit disconnect message ...

Posted by hsaputra <gi...@git.apache.org>.
Github user hsaputra commented on the pull request:

    https://github.com/apache/flink/pull/368#issuecomment-73263118
  
    +1
    
    LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-1484] Adds explicit disconnect message ...

Posted by tillrohrmann <gi...@git.apache.org>.
Github user tillrohrmann commented on the pull request:

    https://github.com/apache/flink/pull/368#issuecomment-75256160
  
    This PR has been merged as part of PR #423 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-1484] Adds explicit disconnect message ...

Posted by StephanEwen <gi...@git.apache.org>.
Github user StephanEwen commented on the pull request:

    https://github.com/apache/flink/pull/368#issuecomment-73478074
  
    Looks good. Since this is a behavior change, can you file a ticket for this, Till?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-1484] Adds explicit disconnect message ...

Posted by tillrohrmann <gi...@git.apache.org>.
Github user tillrohrmann closed the pull request at:

    https://github.com/apache/flink/pull/368


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---