You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by uce <gi...@git.apache.org> on 2016/06/22 16:06:26 UTC

[GitHub] flink issue #2096: [FLINK-3800] [runtime] Introduce SUSPENDED job status

Github user uce commented on the issue:

    https://github.com/apache/flink/pull/2096
  
    Really good changes and great that you added the `JobStatus` graph transitions figure to the docs.
    
    I looked at all changes and tests. Furthermore, I've tested this with lost ZooKeeper connections and it works as expected. The job is suspended and recovered smoothly. \U0001f44d 
    
    Some general notes:
    - The only thing that might be slightly confusing for users is that until the web frontend is notified about the leader changes, it shows the job status as FAILED until it disappears. This is probably some web interface quirk, because it does not recognize the new job status. I understand that you remove suspended job graphs and don't archive them (which makes sense as they are recovered later), but giving that it takes some time until all information propagates to the web frontend, we might want to temporarily show the state as suspended. We can do this as a follow up.
    
    - Regarding the removal of restart strategy disabling, I thought about whether we should leave it in and get rid of the `SuppressRestartsException` instead. That seems to be cleaner than the custom exception, which is checked in the `ExecutionGraph`. If we want to get rid of it (as a follow up), we should leave the disable call in.
    
    - If you think that https://github.com/apache/flink/pull/2095#issuecomment-227731879 is a valid point, then we should apply it for the suspended state as well.
    
    These comments are not blockers and I think this is good to merge. Great job!



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---