You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Beam JIRA Bot (Jira)" <ji...@apache.org> on 2022/03/26 17:26:00 UTC
[jira] [Commented] (BEAM-13675) Python SDK not creating Flink checkpoints

    [ https://issues.apache.org/jira/browse/BEAM-13675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17512756#comment-17512756 ] 

Beam JIRA Bot commented on BEAM-13675:
--------------------------------------

This issue is P2 but has been unassigned without any comment for 60 days so it has been labeled "stale-P2". If this issue is still affecting you, we care! Please comment and remove the label. Otherwise, in 14 days the issue will be moved to P3.

Please see https://beam.apache.org/contribute/jira-priorities/ for a detailed explanation of what these priorities mean.


> Python SDK not creating Flink checkpoints
> -----------------------------------------
>
>                 Key: BEAM-13675
>                 URL: https://issues.apache.org/jira/browse/BEAM-13675
>             Project: Beam
>          Issue Type: Bug
>          Components: sdk-py-core
>    Affects Versions: 2.35.0
>            Reporter: Janek Bevendorff
>            Priority: P2
>              Labels: stale-P2
>
> I have been trying to get checkpointing to work with Apache Flink and Beam for the Python SDK, but without any success. I read a ton of documentation on how to get this working, but couldn't make any progress, so I have to assume that this is a bug. If not, then we need to at least fix the documentation (fundamentally).
> The "bug" is reproducible with both the PortableRunner and the FlinkRunner + uber JAR. I cannot really test the FlinkRunner without uber JARs, because I am submitting to a remote cluster.
> The flink cluster is configured with:
> {code:java}
>     state.checkpoint-storage: "filesystem"
>     state.checkpoints.dir: "file:///foo/bar/cp"
>     state.savepoints.dir: "file:///foo/bar/sp"
>     execution.checkpointing.interval: "60s"
>     execution.checkpointing.externalized-checkpoint-retention: "DELETE_ON_CANCELLATION" {code}
> ({{{}/foo/bar{}}} is a shared network mount)
> When I submit a job, all I'm seeing in the Flink Web UI under job configuration is
> {noformat}
> Execution mode: PIPELINED
> Max. number of execution retries: Cluster level default restart strategy
> Job parallelism: 120
> Object reuse mode: false{noformat}
> "User Configuration" is empty and no checkpoints are created (both the Checkpoints tab and the checkpoints folder remain empty).
> I tried setting
> {code:java}
> checkpointing_interval=30000,
> externalized_checkpoints_enabled=True, {code}
> in my Beam submission config, but the result is the same.
> When I try the FlinkRunner with {{{}flink_submit_uber_jar=True{}}}, it's the same again, but this time I also get the following warning and the job starts with a parallelism of 1 (I guess that's another bug):
> {code:java}
> WARNING:apache_beam.options.pipeline_options:Discarding invalid overrides: {'checkpointing_interval': 30000, 'externalized_
> checkpoints_enabled': True, 'parallelism': 120}{code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)