You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Ufuk Celebi (JIRA)" <ji...@apache.org> on 2018/09/25 18:25:00 UTC
[jira] [Comment Edited] (FLINK-10292) Generate JobGraph in
StandaloneJobClusterEntrypoint only once
[ https://issues.apache.org/jira/browse/FLINK-10292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16626190#comment-16626190 ]
Ufuk Celebi edited comment on FLINK-10292 at 9/25/18 6:24 PM:
--------------------------------------------------------------
I understand that non-determinism may be an issue when generating the {{JobGraph}}, but do we have some data about how common that is for applications? Would it be possible to keep a fixed JobGraph in the image instead of persisting one in the {{SubmittedJobGraphStore}}?
I like our current approach, because it keeps the source of truth with image-based deployments such as Kubernetes *in* the image instead of the {{SubmittedJobGraphStore}}. I'm wondering about the following scenario in particular (this is independent of the question whether it runs on Kubernetes or not and can be reproduced in an other way as well):
* A user creates a job cluster with high availability enabled (cluster ID for the logical application, e.g. myapp)
** This will persist the job with a fixed ID (after FLINK-10291) on first submission
* The user kills the application *without* cancelling
** This will leave all data in the high availability store(s) such as job graphs or checkpoints
* The user updates the image with a modified application and keeps the high availability configuration (e.g. cluster ID stays myapp)
** This will result in the job in the image to be ignored since we already have a job graph with the same (fixed) ID
I think in such a scenario it can be desirable to still have the checkpoints available, but it might be problematic if the job graph is recovered from the {{SubmittedJobGraphStore}} instead of using the job that is part of the image. What do you think about this scenario? Is it the responsibility of the user to handle this? If so, I think that the approach outlined in this ticket makes sense. If not, we may want to consider alternatives or ignore potential non-determinism.
was (Author: uce):
I understand that non-determinism may be an issue when generating the {{JobGraph}}, but do we have some data about how common that is for applications? Would it be possible to keep a fixed JobGraph in the image instead of persisting one in the {{SubmittedJobGraphStore}}?
I like our current approach, because it keeps the source of truth for the job in the image instead of the {{SubmittedJobGraphStore}}. I'm wondering about the following scenario:
* A user creates a job cluster with high availability enabled (cluster ID for the logical application, e.g. myapp)
** This will persist the job with a fixed ID (after FLINK-10291) on first submission
* The user kills the application *without* cancelling
** This will leave all data in the high availability store(s) such as job graphs or checkpoints
* The user updates the image with a modified application and keeps the high availability configuration (e.g. cluster ID stays myapp)
** This will result in the job in the image to be ignored since we already have a job graph with the same (fixed) ID
I think in such a scenario it can be desirable to still have the checkpoints available, but it might be problematic if the job graph is recovered from the {{SubmittedJobGraphStore}} instead of using the job that is part of the image. What do you think about this scenario? Is it the responsibility of the user to handle this? If so, I think that the approach outlined in this ticket makes sense. If not, we may want to consider alternatives or ignore potential non-determinism.
> Generate JobGraph in StandaloneJobClusterEntrypoint only once
> -------------------------------------------------------------
>
> Key: FLINK-10292
> URL: https://issues.apache.org/jira/browse/FLINK-10292
> Project: Flink
> Issue Type: Improvement
> Components: Distributed Coordination
> Affects Versions: 1.6.0, 1.7.0
> Reporter: Till Rohrmann
> Assignee: vinoyang
> Priority: Major
> Fix For: 1.7.0, 1.6.2
>
>
> Currently the {{StandaloneJobClusterEntrypoint}} generates the {{JobGraph}} from the given user code every time it starts/is restarted. This can be problematic if the the {{JobGraph}} generation has side effects. Therefore, it would be better to generate the {{JobGraph}} only once and store it in HA storage instead from where to retrieve.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)