You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Folani <ha...@irisa.fr> on 2020/12/11 13:04:18 UTC

Getting an exception while stopping Flink with savepoints on Kubernetes+Minio

I'm deploying a standalone Flink cluster on top of Kubernetes and using MinIO
as a S3 backend. I mainly follow the instructions in flink's website.
I use the following command to run my job in Flink:  $flink run -d -m
<IP>:<port>  -j  job.jar

I also have added to flink-configmap.yaml the followings:


    state.backend: filesystem
    state.checkpoints.dir: s3://state/checkpoints
    state.savepoints.dir: s3://state/savepoints
    s3.path-style-access: true
    s3.endpoint: http://minio-service:9000
    s3.access-key: *******
    s3.secret-key: *******

It seems that everything is working well. The job is submitted correctly,
the checkpoints are written in minio, but when I try to cancel the job or
stop it with savepoints I get the following exception:

org.apache.flink.util.FlinkException: Could not stop with a savepoint job
"5ae191ca2b239ec7771e4c7a9a336537".
	at
org.apache.flink.client.cli.CliFrontend.lambda$stop$5(CliFrontend.java:495)
	at
org.apache.flink.client.cli.CliFrontend.runClusterAction(CliFrontend.java:864)
	at org.apache.flink.client.cli.CliFrontend.stop(CliFrontend.java:487)
	at
org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:931)
	at
org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:992)
	at
org.apache.flink.runtime.security.contexts.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:30)
	at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:992)
Caused by: java.util.concurrent.TimeoutException
	at
java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771)
	at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
	at
org.apache.flink.client.cli.CliFrontend.lambda$stop$5(CliFrontend.java:493)
	... 6 more

This is my command to stop with savepoints:  $flink stop -p  <JobID>
And my Flink version is flink-1.11.2-bin-scala_2.11.

What could be the reason of the exception? Any suggestion?







--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Getting an exception while stopping Flink with savepoints on Kubernetes+Minio

Posted by Robert Metzger <rm...@apache.org>.
I guess you are seeing a different error now, because you are submitting
the job, and stopping it right away.

Can you produce new logs, where you wait until at least one Checkpoint
successfully completed before you stop?
From the exception it seems that the job has not successfully been
initialized. I  would be surprised if  regular checkpoints are possible at
that point already.

On Wed, Dec 16, 2020 at 10:59 AM Folani <ha...@irisa.fr> wrote:

> Hi,
>
> I attached the log files for Jobmanager and Taskmanager:
>
>    jobmanager_log.asc
> <
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/file/t1744/jobmanager_log.asc>
>
>    6122-f8b99d_log.6122-f8b99d_log
> <
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/file/t1744/6122-f8b99d_log.6122-f8b99d_log>
>
>
>
> I do the following steps:
>
> 1- $ flink run -d -m <JM_rest_IP>:8081 -j SimpleJob.jar
>
> 2- $ flink stop -m <JM_rest_IP>:8081 -p <JobID>
>
> Then I get the following exception:
>
>
> org.apache.flink.util.FlinkException: Could not stop with a savepoint job
> "69daa5e180ea68bf1cceec70d865a08b".
>         at
> org.apache.flink.client.cli.CliFrontend.lambda$stop$5(CliFrontend.java:539)
>         at
>
> org.apache.flink.client.cli.CliFrontend.runClusterAction(CliFrontend.java:919)
>         at
> org.apache.flink.client.cli.CliFrontend.stop(CliFrontend.java:531)
>         at
> org.apache.flink.client.cli.CliFrontend.parseAndRun(CliFrontend.java:986)
>         at
>
> org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:1047)
>         at
>
> org.apache.flink.runtime.security.contexts.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:30)
>         at
> org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1047)
> Caused by: java.util.concurrent.TimeoutException
>         at
>
> java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771)
>         at
> java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
>         at
> org.apache.flink.client.cli.CliFrontend.lambda$stop$5(CliFrontend.java:537)
>         ... 6 more
>
>
>
>
>
>
> --
> Sent from:
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
>

Re: Getting an exception while stopping Flink with savepoints on Kubernetes+Minio

Posted by Folani <ha...@irisa.fr>.
Hi,

I attached the log files for Jobmanager and Taskmanager:

   jobmanager_log.asc
<http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/file/t1744/jobmanager_log.asc>  
   6122-f8b99d_log.6122-f8b99d_log
<http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/file/t1744/6122-f8b99d_log.6122-f8b99d_log>  


I do the following steps:

1- $ flink run -d -m <JM_rest_IP>:8081 -j SimpleJob.jar

2- $ flink stop -m <JM_rest_IP>:8081 -p <JobID>

Then I get the following exception:


org.apache.flink.util.FlinkException: Could not stop with a savepoint job
"69daa5e180ea68bf1cceec70d865a08b".
	at
org.apache.flink.client.cli.CliFrontend.lambda$stop$5(CliFrontend.java:539)
	at
org.apache.flink.client.cli.CliFrontend.runClusterAction(CliFrontend.java:919)
	at org.apache.flink.client.cli.CliFrontend.stop(CliFrontend.java:531)
	at
org.apache.flink.client.cli.CliFrontend.parseAndRun(CliFrontend.java:986)
	at
org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:1047)
	at
org.apache.flink.runtime.security.contexts.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:30)
	at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1047)
Caused by: java.util.concurrent.TimeoutException
	at
java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771)
	at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
	at
org.apache.flink.client.cli.CliFrontend.lambda$stop$5(CliFrontend.java:537)
	... 6 more






--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Getting an exception while stopping Flink with savepoints on Kubernetes+Minio

Posted by Robert Metzger <rm...@apache.org>.
Hi,
the logs from the client are not helpful for debugging this particular
issue.

With kubectl get pods, you can get the TaskManger pod names, with kubectl
logs <podID> you can get the logs.
The JobManager log would also be nice to have.

On Mon, Dec 14, 2020 at 3:29 PM Folani <ha...@irisa.fr> wrote:

> Hi Piotrek,
>
> Sorry for late response.
> I have another problem with setting logs. I think the logging issue comes
> from using Flink on my host machine and running a job on a jobmanager in
> K8s. I'm managing the issue. But, this is what I got in /log folder of my
> host machine:
>
>
>
> 2020-12-14 15:04:51,329 INFO  org.apache.flink.client.cli.CliFrontend
>
> [] -
>
> --------------------------------------------------------------------------------
> 2020-12-14 15:04:51,331 INFO  org.apache.flink.client.cli.CliFrontend
>
> [] -  Starting Command Line Client (Version: 1.12.0, Scala: 2.11,
> Rev:fc00492, Date:2020-12-02T08:49:16+01:00)
> 2020-12-14 15:04:51,331 INFO  org.apache.flink.client.cli.CliFrontend
>
> [] -  OS current user: folani
> 2020-12-14 15:04:51,332 INFO  org.apache.flink.client.cli.CliFrontend
>
> [] -  Current Hadoop/Kerberos user: <no hadoop dependency found>
> 2020-12-14 15:04:51,332 INFO  org.apache.flink.client.cli.CliFrontend
>
> [] -  JVM: OpenJDK 64-Bit Server VM - Oracle Corporation - 1.8/25.212-b04
> 2020-12-14 15:04:51,332 INFO  org.apache.flink.client.cli.CliFrontend
>
> [] -  Maximum heap size: 3538 MiBytes
> 2020-12-14 15:04:51,332 INFO  org.apache.flink.client.cli.CliFrontend
>
> [] -  JAVA_HOME: (not set)
> 2020-12-14 15:04:51,332 INFO  org.apache.flink.client.cli.CliFrontend
>
> [] -  No Hadoop Dependency available
> 2020-12-14 15:04:51,332 INFO  org.apache.flink.client.cli.CliFrontend
>
> [] -  JVM Options:
> 2020-12-14 15:04:51,332 INFO  org.apache.flink.client.cli.CliFrontend
>
> [] -
>
> -Dlog.file=/home/folani/Softwares/flink-1.12.0-bin-scala_2.11/flink-1.12.0/log/flink-folani-client-hralaptop.log
> 2020-12-14 15:04:51,332 INFO  org.apache.flink.client.cli.CliFrontend
>
> [] -
>
> -Dlog4j.configuration=file:/home/folani/Softwares/flink-1.12.0-bin-scala_2.11/flink-1.12.0/conf/log4j-cli.properties
> 2020-12-14 15:04:51,332 INFO  org.apache.flink.client.cli.CliFrontend
>
> [] -
>
> -Dlog4j.configurationFile=file:/home/folani/Softwares/flink-1.12.0-bin-scala_2.11/flink-1.12.0/conf/log4j-cli.properties
> 2020-12-14 15:04:51,332 INFO  org.apache.flink.client.cli.CliFrontend
>
> [] -
>
> -Dlogback.configurationFile=file:/home/folani/Softwares/flink-1.12.0-bin-scala_2.11/flink-1.12.0/conf/logback.xml
> 2020-12-14 15:04:51,332 INFO  org.apache.flink.client.cli.CliFrontend
>
> [] -  Program Arguments:
> 2020-12-14 15:04:51,333 INFO  org.apache.flink.client.cli.CliFrontend
>
> [] -     stop
> 2020-12-14 15:04:51,334 INFO  org.apache.flink.client.cli.CliFrontend
>
> [] -     -p
> 2020-12-14 15:04:51,334 INFO  org.apache.flink.client.cli.CliFrontend
>
> [] -     cc4a9a04e164fff8628b3ac59e5fbe80
> 2020-12-14 15:04:51,334 INFO  org.apache.flink.client.cli.CliFrontend
>
> [] -  Classpath:
>
> /home/folani/Softwares/flink-1.12.0-bin-scala_2.11/flink-1.12.0/lib/flink-csv-1.12.0.jar:/home/folani/Softwares/flink-1.12.0-bin-scala_2.11/flink-1.12.0/lib/flink-json-1.12.0.jar:/home/folani/Softwares/flink-1.12.0-bin-scala_2.11/flink-1.12.0/lib/flink-shaded-zookeeper-3.4.14.jar:/home/folani/Softwares/flink-1.12.0-bin-scala_2.11/flink-1.12.0/lib/flink-table_2.11-1.12.0.jar:/home/folani/Softwares/flink-1.12.0-bin-scala_2.11/flink-1.12.0/lib/flink-table-blink_2.11-1.12.0.jar:/home/folani/Softwares/flink-1.12.0-bin-scala_2.11/flink-1.12.0/lib/log4j-1.2-api-2.12.1.jar:/home/folani/Softwares/flink-1.12.0-bin-scala_2.11/flink-1.12.0/lib/log4j-api-2.12.1.jar:/home/folani/Softwares/flink-1.12.0-bin-scala_2.11/flink-1.12.0/lib/log4j-core-2.12.1.jar:/home/folani/Softwares/flink-1.12.0-bin-scala_2.11/flink-1.12.0/lib/log4j-slf4j-impl-2.12.1.jar:/home/folani/Softwares/flink-1.12.0-bin-scala_2.11/flink-1.12.0/lib/flink-dist_2.11-1.12.0.jar:::
> 2020-12-14 15:04:51,334 INFO  org.apache.flink.client.cli.CliFrontend
>
> [] -
>
> --------------------------------------------------------------------------------
> 2020-12-14 15:04:51,336 INFO
> org.apache.flink.configuration.GlobalConfiguration           [] - Loading
> configuration property: jobmanager.rpc.address, localhost
> 2020-12-14 15:04:51,336 INFO
> org.apache.flink.configuration.GlobalConfiguration           [] - Loading
> configuration property: jobmanager.rpc.port, 6123
> 2020-12-14 15:04:51,337 INFO
> org.apache.flink.configuration.GlobalConfiguration           [] - Loading
> configuration property: jobmanager.memory.process.size, 1024m
> 2020-12-14 15:04:51,337 INFO
> org.apache.flink.configuration.GlobalConfiguration           [] - Loading
> configuration property: taskmanager.memory.process.size, 1024m
> 2020-12-14 15:04:51,337 INFO
> org.apache.flink.configuration.GlobalConfiguration           [] - Loading
> configuration property: taskmanager.numberOfTaskSlots, 1
> 2020-12-14 15:04:51,337 INFO
> org.apache.flink.configuration.GlobalConfiguration           [] - Loading
> configuration property: parallelism.default, 1
> 2020-12-14 15:04:51,337 INFO
> org.apache.flink.configuration.GlobalConfiguration           [] - Loading
> configuration property: state.backend, filesystem
> 2020-12-14 15:04:51,337 INFO
> org.apache.flink.configuration.GlobalConfiguration           [] - Loading
> configuration property: state.checkpoints.dir, s3://state/checkpoints
> 2020-12-14 15:04:51,337 INFO
> org.apache.flink.configuration.GlobalConfiguration           [] - Loading
> configuration property: state.savepoints.dir, s3://state/savepoints
> 2020-12-14 15:04:51,337 INFO
> org.apache.flink.configuration.GlobalConfiguration           [] - Loading
> configuration property: s3.path-style-access, true
> 2020-12-14 15:04:51,337 INFO
> org.apache.flink.configuration.GlobalConfiguration           [] - Loading
> configuration property: s3.endpoint, http://172.17.0.3:30090
> 2020-12-14 15:04:51,338 INFO
> org.apache.flink.configuration.GlobalConfiguration           [] - Loading
> configuration property: s3.access-key, minio
> 2020-12-14 15:04:51,338 INFO
> org.apache.flink.configuration.GlobalConfiguration           [] - Loading
> configuration property: s3.secret-key, ******
> 2020-12-14 15:04:51,338 INFO
> org.apache.flink.configuration.GlobalConfiguration           [] - Loading
> configuration property: jobmanager.execution.failover-strategy, region
> 2020-12-14 15:04:51,338 INFO
> org.apache.flink.configuration.GlobalConfiguration           [] - Loading
> configuration property: metrics.reporter.prom.class,
> org.apache.flink.metrics.prometheus.PrometheusReporter
> 2020-12-14 15:04:51,338 INFO
> org.apache.flink.configuration.GlobalConfiguration           [] - Loading
> configuration property: metrics.reporter.prom.host, localhost
> 2020-12-14 15:04:51,338 INFO
> org.apache.flink.configuration.GlobalConfiguration           [] - Loading
> configuration property: metrics.reporter.prom.port, 9250-9260
> 2020-12-14 15:04:51,339 INFO
> org.apache.flink.configuration.GlobalConfiguration           [] - Loading
> configuration property: taskmanager.network.detailed-metrics, true
> 2020-12-14 15:04:51,339 INFO
> org.apache.flink.configuration.GlobalConfiguration           [] - Loading
> configuration property: metrics.latency.granularity, "subtask"
> 2020-12-14 15:04:51,339 INFO
> org.apache.flink.configuration.GlobalConfiguration           [] - Loading
> configuration property: metrics.latency.interval, 1000
> 2020-12-14 15:04:51,339 INFO
> org.apache.flink.configuration.GlobalConfiguration           [] - Loading
> configuration property: web.backpressure.refresh-interval, 1000
> 2020-12-14 15:04:51,339 INFO
> org.apache.flink.configuration.GlobalConfiguration           [] - Loading
> configuration property: metrics.system-resource, true
> 2020-12-14 15:04:51,339 INFO
> org.apache.flink.configuration.GlobalConfiguration           [] - Loading
> configuration property: metrics.system-resource-probing-interval, 1000
> 2020-12-14 15:04:51,358 INFO  org.apache.flink.client.cli.CliFrontend
>
> [] - Loading FallbackYarnSessionCli
> 2020-12-14 15:04:51,392 INFO  org.apache.flink.core.fs.FileSystem
>
> [] - Hadoop is not in the classpath/dependencies. The extended set of
> supported File Systems via Hadoop is not available.
> 2020-12-14 15:04:51,449 INFO
> org.apache.flink.runtime.security.modules.HadoopModuleFactory [] - Cannot
> create Hadoop Security Module because Hadoop cannot be found in the
> Classpath.
> 2020-12-14 15:04:51,454 INFO
> org.apache.flink.runtime.security.modules.JaasModule         [] - Jaas file
> will be created as /tmp/jaas-1159138828547882777.conf.
> 2020-12-14 15:04:51,458 INFO
> org.apache.flink.runtime.security.contexts.HadoopSecurityContextFactory []
> -
> Cannot install HadoopSecurityContext because Hadoop cannot be found in the
> Classpath.
> 2020-12-14 15:04:51,459 INFO  org.apache.flink.client.cli.CliFrontend
>
> [] - Running 'stop-with-savepoint' command.
> 2020-12-14 15:04:51,466 INFO  org.apache.flink.client.cli.CliFrontend
>
> [] - Suspending job "cc4a9a04e164fff8628b3ac59e5fbe80" with a savepoint.
> 2020-12-14 15:04:51,470 INFO
> org.apache.flink.client.deployment.DefaultClusterClientServiceLoader [] -
> Could not load factory due to missing dependencies.
> 2020-12-14 15:04:51,485 INFO
> org.apache.flink.configuration.Configuration
> [] - Config uses fallback configuration key 'jobmanager.rpc.address'
> instead
> of key 'rest.address'
> 2020-12-14 15:05:51,973 WARN
> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannel [] -
> Force-closing a channel whose registration task was not accepted by an
> event
> loop: [id: 0xe1d20630]
> java.util.concurrent.RejectedExecutionException: event executor terminated
>         at
>
> org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor.reject(SingleThreadEventExecutor.java:926)
> ~[flink-dist_2.11-1.12.0.jar:1.12.0]
>         at
>
> org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor.offerTask(SingleThreadEventExecutor.java:353)
> ~[flink-dist_2.11-1.12.0.jar:1.12.0]
>         at
>
> org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor.addTask(SingleThreadEventExecutor.java:346)
> ~[flink-dist_2.11-1.12.0.jar:1.12.0]
>         at
>
> org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor.execute(SingleThreadEventExecutor.java:828)
> ~[flink-dist_2.11-1.12.0.jar:1.12.0]
>         at
>
> org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor.execute(SingleThreadEventExecutor.java:818)
> ~[flink-dist_2.11-1.12.0.jar:1.12.0]
>         at
>
> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannel$AbstractUnsafe.register(AbstractChannel.java:471)
> ~[flink-dist_2.11-1.12.0.jar:1.12.0]
>         at
>
> org.apache.flink.shaded.netty4.io.netty.channel.SingleThreadEventLoop.register(SingleThreadEventLoop.java:87)
> ~[flink-dist_2.11-1.12.0.jar:1.12.0]
>         at
>
> org.apache.flink.shaded.netty4.io.netty.channel.SingleThreadEventLoop.register(SingleThreadEventLoop.java:81)
> ~[flink-dist_2.11-1.12.0.jar:1.12.0]
>         at
>
> org.apache.flink.shaded.netty4.io.netty.channel.MultithreadEventLoopGroup.register(MultithreadEventLoopGroup.java:86)
> ~[flink-dist_2.11-1.12.0.jar:1.12.0]
>         at
>
> org.apache.flink.shaded.netty4.io.netty.bootstrap.AbstractBootstrap.initAndRegister(AbstractBootstrap.java:323)
> ~[flink-dist_2.11-1.12.0.jar:1.12.0]
>         at
>
> org.apache.flink.shaded.netty4.io.netty.bootstrap.Bootstrap.doResolveAndConnect(Bootstrap.java:155)
> ~[flink-dist_2.11-1.12.0.jar:1.12.0]
>         at
>
> org.apache.flink.shaded.netty4.io.netty.bootstrap.Bootstrap.connect(Bootstrap.java:139)
> ~[flink-dist_2.11-1.12.0.jar:1.12.0]
>         at
>
> org.apache.flink.shaded.netty4.io.netty.bootstrap.Bootstrap.connect(Bootstrap.java:123)
> ~[flink-dist_2.11-1.12.0.jar:1.12.0]
>         at
> org.apache.flink.runtime.rest.RestClient.submitRequest(RestClient.java:333)
> ~[flink-dist_2.11-1.12.0.jar:1.12.0]
>         at
> org.apache.flink.runtime.rest.RestClient.sendRequest(RestClient.java:272)
> ~[flink-dist_2.11-1.12.0.jar:1.12.0]
>         at
> org.apache.flink.runtime.rest.RestClient.sendRequest(RestClient.java:214)
> ~[flink-dist_2.11-1.12.0.jar:1.12.0]
>         at
>
> org.apache.flink.client.program.rest.RestClusterClient.lambda$null$23(RestClusterClient.java:666)
> ~[flink-dist_2.11-1.12.0.jar:1.12.0]
>         at
>
> java.util.concurrent.CompletableFuture.uniCompose(CompletableFuture.java:952)
> [?:1.8.0_212]
>         at
>
> java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:926)
> [?:1.8.0_212]
>         at
>
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
> [?:1.8.0_212]
>         at
> java.util.concurrent.CompletableFuture.postFire(CompletableFuture.java:561)
> [?:1.8.0_212]
>         at
>
> java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:580)
> [?:1.8.0_212]
>         at
>
> java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442)
> [?:1.8.0_212]
>         at
>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> [?:1.8.0_212]
>         at
>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> [?:1.8.0_212]
>         at java.lang.Thread.run(Thread.java:748) [?:1.8.0_212]
> 2020-12-14 15:05:51,981 ERROR
>
> org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.rejectedExecution
> [] - Failed to submit a listener notification task. Event loop shut down?
> java.util.concurrent.RejectedExecutionException: event executor terminated
>         at
>
> org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor.reject(SingleThreadEventExecutor.java:926)
> ~[flink-dist_2.11-1.12.0.jar:1.12.0]
>         at
>
> org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor.offerTask(SingleThreadEventExecutor.java:353)
> ~[flink-dist_2.11-1.12.0.jar:1.12.0]
>         at
>
> org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor.addTask(SingleThreadEventExecutor.java:346)
> ~[flink-dist_2.11-1.12.0.jar:1.12.0]
>         at
>
> org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor.execute(SingleThreadEventExecutor.java:828)
> ~[flink-dist_2.11-1.12.0.jar:1.12.0]
>         at
>
> org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor.execute(SingleThreadEventExecutor.java:818)
> ~[flink-dist_2.11-1.12.0.jar:1.12.0]
>         at
>
> org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.safeExecute(DefaultPromise.java:841)
> ~[flink-dist_2.11-1.12.0.jar:1.12.0]
>         at
>
> org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:498)
> ~[flink-dist_2.11-1.12.0.jar:1.12.0]
>         at
>
> org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.addListener(DefaultPromise.java:183)
> ~[flink-dist_2.11-1.12.0.jar:1.12.0]
>         at
>
> org.apache.flink.shaded.netty4.io.netty.channel.DefaultChannelPromise.addListener(DefaultChannelPromise.java:95)
> ~[flink-dist_2.11-1.12.0.jar:1.12.0]
>         at
>
> org.apache.flink.shaded.netty4.io.netty.channel.DefaultChannelPromise.addListener(DefaultChannelPromise.java:30)
> ~[flink-dist_2.11-1.12.0.jar:1.12.0]
>         at
> org.apache.flink.runtime.rest.RestClient.submitRequest(RestClient.java:337)
> ~[flink-dist_2.11-1.12.0.jar:1.12.0]
>         at
> org.apache.flink.runtime.rest.RestClient.sendRequest(RestClient.java:272)
> ~[flink-dist_2.11-1.12.0.jar:1.12.0]
>         at
> org.apache.flink.runtime.rest.RestClient.sendRequest(RestClient.java:214)
> ~[flink-dist_2.11-1.12.0.jar:1.12.0]
>         at
>
> org.apache.flink.client.program.rest.RestClusterClient.lambda$null$23(RestClusterClient.java:666)
> ~[flink-dist_2.11-1.12.0.jar:1.12.0]
>         at
>
> java.util.concurrent.CompletableFuture.uniCompose(CompletableFuture.java:952)
> [?:1.8.0_212]
>         at
>
> java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:926)
> [?:1.8.0_212]
>         at
>
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
> [?:1.8.0_212]
>         at
> java.util.concurrent.CompletableFuture.postFire(CompletableFuture.java:561)
> [?:1.8.0_212]
>         at
>
> java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:580)
> [?:1.8.0_212]
>         at
>
> java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442)
> [?:1.8.0_212]
>         at
>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> [?:1.8.0_212]
>         at
>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> [?:1.8.0_212]
>         at java.lang.Thread.run(Thread.java:748) [?:1.8.0_212]
> 2020-12-14 15:05:51,981 ERROR org.apache.flink.client.cli.CliFrontend
>
> [] - Error while running the command.
> org.apache.flink.util.FlinkException: Could not stop with a savepoint job
> "cc4a9a04e164fff8628b3ac59e5fbe80".
>         at
> org.apache.flink.client.cli.CliFrontend.lambda$stop$5(CliFrontend.java:539)
> ~[flink-dist_2.11-1.12.0.jar:1.12.0]
>         at
>
> org.apache.flink.client.cli.CliFrontend.runClusterAction(CliFrontend.java:919)
> ~[flink-dist_2.11-1.12.0.jar:1.12.0]
>         at
> org.apache.flink.client.cli.CliFrontend.stop(CliFrontend.java:531)
> ~[flink-dist_2.11-1.12.0.jar:1.12.0]
>         at
> org.apache.flink.client.cli.CliFrontend.parseAndRun(CliFrontend.java:986)
> ~[flink-dist_2.11-1.12.0.jar:1.12.0]
>         at
>
> org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:1047)
> ~[flink-dist_2.11-1.12.0.jar:1.12.0]
>         at
>
> org.apache.flink.runtime.security.contexts.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:30)
> [flink-dist_2.11-1.12.0.jar:1.12.0]
>         at
> org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1047)
> [flink-dist_2.11-1.12.0.jar:1.12.0]
> Caused by: java.util.concurrent.TimeoutException
>         at
>
> java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771)
> ~[?:1.8.0_212]
>         at
> java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
> ~[?:1.8.0_212]
>         at
> org.apache.flink.client.cli.CliFrontend.lambda$stop$5(CliFrontend.java:537)
> ~[flink-dist_2.11-1.12.0.jar:1.12.0]
>         ... 6 more
>
>
> Thank you in advance.
> Folani
>
>
>
> --
> Sent from:
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
>

Re: Getting an exception while stopping Flink with savepoints on Kubernetes+Minio

Posted by Folani <ha...@irisa.fr>.
Hi Piotrek,

Sorry for late response.
I have another problem with setting logs. I think the logging issue comes
from using Flink on my host machine and running a job on a jobmanager in
K8s. I'm managing the issue. But, this is what I got in /log folder of my
host machine:



2020-12-14 15:04:51,329 INFO  org.apache.flink.client.cli.CliFrontend                     
[] -
--------------------------------------------------------------------------------
2020-12-14 15:04:51,331 INFO  org.apache.flink.client.cli.CliFrontend                     
[] -  Starting Command Line Client (Version: 1.12.0, Scala: 2.11,
Rev:fc00492, Date:2020-12-02T08:49:16+01:00)
2020-12-14 15:04:51,331 INFO  org.apache.flink.client.cli.CliFrontend                     
[] -  OS current user: folani
2020-12-14 15:04:51,332 INFO  org.apache.flink.client.cli.CliFrontend                     
[] -  Current Hadoop/Kerberos user: <no hadoop dependency found>
2020-12-14 15:04:51,332 INFO  org.apache.flink.client.cli.CliFrontend                     
[] -  JVM: OpenJDK 64-Bit Server VM - Oracle Corporation - 1.8/25.212-b04
2020-12-14 15:04:51,332 INFO  org.apache.flink.client.cli.CliFrontend                     
[] -  Maximum heap size: 3538 MiBytes
2020-12-14 15:04:51,332 INFO  org.apache.flink.client.cli.CliFrontend                     
[] -  JAVA_HOME: (not set)
2020-12-14 15:04:51,332 INFO  org.apache.flink.client.cli.CliFrontend                     
[] -  No Hadoop Dependency available
2020-12-14 15:04:51,332 INFO  org.apache.flink.client.cli.CliFrontend                     
[] -  JVM Options:
2020-12-14 15:04:51,332 INFO  org.apache.flink.client.cli.CliFrontend                     
[] -    
-Dlog.file=/home/folani/Softwares/flink-1.12.0-bin-scala_2.11/flink-1.12.0/log/flink-folani-client-hralaptop.log
2020-12-14 15:04:51,332 INFO  org.apache.flink.client.cli.CliFrontend                     
[] -    
-Dlog4j.configuration=file:/home/folani/Softwares/flink-1.12.0-bin-scala_2.11/flink-1.12.0/conf/log4j-cli.properties
2020-12-14 15:04:51,332 INFO  org.apache.flink.client.cli.CliFrontend                     
[] -    
-Dlog4j.configurationFile=file:/home/folani/Softwares/flink-1.12.0-bin-scala_2.11/flink-1.12.0/conf/log4j-cli.properties
2020-12-14 15:04:51,332 INFO  org.apache.flink.client.cli.CliFrontend                     
[] -    
-Dlogback.configurationFile=file:/home/folani/Softwares/flink-1.12.0-bin-scala_2.11/flink-1.12.0/conf/logback.xml
2020-12-14 15:04:51,332 INFO  org.apache.flink.client.cli.CliFrontend                     
[] -  Program Arguments:
2020-12-14 15:04:51,333 INFO  org.apache.flink.client.cli.CliFrontend                     
[] -     stop
2020-12-14 15:04:51,334 INFO  org.apache.flink.client.cli.CliFrontend                     
[] -     -p
2020-12-14 15:04:51,334 INFO  org.apache.flink.client.cli.CliFrontend                     
[] -     cc4a9a04e164fff8628b3ac59e5fbe80
2020-12-14 15:04:51,334 INFO  org.apache.flink.client.cli.CliFrontend                     
[] -  Classpath:
/home/folani/Softwares/flink-1.12.0-bin-scala_2.11/flink-1.12.0/lib/flink-csv-1.12.0.jar:/home/folani/Softwares/flink-1.12.0-bin-scala_2.11/flink-1.12.0/lib/flink-json-1.12.0.jar:/home/folani/Softwares/flink-1.12.0-bin-scala_2.11/flink-1.12.0/lib/flink-shaded-zookeeper-3.4.14.jar:/home/folani/Softwares/flink-1.12.0-bin-scala_2.11/flink-1.12.0/lib/flink-table_2.11-1.12.0.jar:/home/folani/Softwares/flink-1.12.0-bin-scala_2.11/flink-1.12.0/lib/flink-table-blink_2.11-1.12.0.jar:/home/folani/Softwares/flink-1.12.0-bin-scala_2.11/flink-1.12.0/lib/log4j-1.2-api-2.12.1.jar:/home/folani/Softwares/flink-1.12.0-bin-scala_2.11/flink-1.12.0/lib/log4j-api-2.12.1.jar:/home/folani/Softwares/flink-1.12.0-bin-scala_2.11/flink-1.12.0/lib/log4j-core-2.12.1.jar:/home/folani/Softwares/flink-1.12.0-bin-scala_2.11/flink-1.12.0/lib/log4j-slf4j-impl-2.12.1.jar:/home/folani/Softwares/flink-1.12.0-bin-scala_2.11/flink-1.12.0/lib/flink-dist_2.11-1.12.0.jar:::
2020-12-14 15:04:51,334 INFO  org.apache.flink.client.cli.CliFrontend                     
[] -
--------------------------------------------------------------------------------
2020-12-14 15:04:51,336 INFO 
org.apache.flink.configuration.GlobalConfiguration           [] - Loading
configuration property: jobmanager.rpc.address, localhost
2020-12-14 15:04:51,336 INFO 
org.apache.flink.configuration.GlobalConfiguration           [] - Loading
configuration property: jobmanager.rpc.port, 6123
2020-12-14 15:04:51,337 INFO 
org.apache.flink.configuration.GlobalConfiguration           [] - Loading
configuration property: jobmanager.memory.process.size, 1024m
2020-12-14 15:04:51,337 INFO 
org.apache.flink.configuration.GlobalConfiguration           [] - Loading
configuration property: taskmanager.memory.process.size, 1024m
2020-12-14 15:04:51,337 INFO 
org.apache.flink.configuration.GlobalConfiguration           [] - Loading
configuration property: taskmanager.numberOfTaskSlots, 1
2020-12-14 15:04:51,337 INFO 
org.apache.flink.configuration.GlobalConfiguration           [] - Loading
configuration property: parallelism.default, 1
2020-12-14 15:04:51,337 INFO 
org.apache.flink.configuration.GlobalConfiguration           [] - Loading
configuration property: state.backend, filesystem
2020-12-14 15:04:51,337 INFO 
org.apache.flink.configuration.GlobalConfiguration           [] - Loading
configuration property: state.checkpoints.dir, s3://state/checkpoints
2020-12-14 15:04:51,337 INFO 
org.apache.flink.configuration.GlobalConfiguration           [] - Loading
configuration property: state.savepoints.dir, s3://state/savepoints
2020-12-14 15:04:51,337 INFO 
org.apache.flink.configuration.GlobalConfiguration           [] - Loading
configuration property: s3.path-style-access, true
2020-12-14 15:04:51,337 INFO 
org.apache.flink.configuration.GlobalConfiguration           [] - Loading
configuration property: s3.endpoint, http://172.17.0.3:30090
2020-12-14 15:04:51,338 INFO 
org.apache.flink.configuration.GlobalConfiguration           [] - Loading
configuration property: s3.access-key, minio
2020-12-14 15:04:51,338 INFO 
org.apache.flink.configuration.GlobalConfiguration           [] - Loading
configuration property: s3.secret-key, ******
2020-12-14 15:04:51,338 INFO 
org.apache.flink.configuration.GlobalConfiguration           [] - Loading
configuration property: jobmanager.execution.failover-strategy, region
2020-12-14 15:04:51,338 INFO 
org.apache.flink.configuration.GlobalConfiguration           [] - Loading
configuration property: metrics.reporter.prom.class,
org.apache.flink.metrics.prometheus.PrometheusReporter
2020-12-14 15:04:51,338 INFO 
org.apache.flink.configuration.GlobalConfiguration           [] - Loading
configuration property: metrics.reporter.prom.host, localhost
2020-12-14 15:04:51,338 INFO 
org.apache.flink.configuration.GlobalConfiguration           [] - Loading
configuration property: metrics.reporter.prom.port, 9250-9260
2020-12-14 15:04:51,339 INFO 
org.apache.flink.configuration.GlobalConfiguration           [] - Loading
configuration property: taskmanager.network.detailed-metrics, true
2020-12-14 15:04:51,339 INFO 
org.apache.flink.configuration.GlobalConfiguration           [] - Loading
configuration property: metrics.latency.granularity, "subtask"
2020-12-14 15:04:51,339 INFO 
org.apache.flink.configuration.GlobalConfiguration           [] - Loading
configuration property: metrics.latency.interval, 1000
2020-12-14 15:04:51,339 INFO 
org.apache.flink.configuration.GlobalConfiguration           [] - Loading
configuration property: web.backpressure.refresh-interval, 1000
2020-12-14 15:04:51,339 INFO 
org.apache.flink.configuration.GlobalConfiguration           [] - Loading
configuration property: metrics.system-resource, true
2020-12-14 15:04:51,339 INFO 
org.apache.flink.configuration.GlobalConfiguration           [] - Loading
configuration property: metrics.system-resource-probing-interval, 1000
2020-12-14 15:04:51,358 INFO  org.apache.flink.client.cli.CliFrontend                     
[] - Loading FallbackYarnSessionCli
2020-12-14 15:04:51,392 INFO  org.apache.flink.core.fs.FileSystem                         
[] - Hadoop is not in the classpath/dependencies. The extended set of
supported File Systems via Hadoop is not available.
2020-12-14 15:04:51,449 INFO 
org.apache.flink.runtime.security.modules.HadoopModuleFactory [] - Cannot
create Hadoop Security Module because Hadoop cannot be found in the
Classpath.
2020-12-14 15:04:51,454 INFO 
org.apache.flink.runtime.security.modules.JaasModule         [] - Jaas file
will be created as /tmp/jaas-1159138828547882777.conf.
2020-12-14 15:04:51,458 INFO 
org.apache.flink.runtime.security.contexts.HadoopSecurityContextFactory [] -
Cannot install HadoopSecurityContext because Hadoop cannot be found in the
Classpath.
2020-12-14 15:04:51,459 INFO  org.apache.flink.client.cli.CliFrontend                     
[] - Running 'stop-with-savepoint' command.
2020-12-14 15:04:51,466 INFO  org.apache.flink.client.cli.CliFrontend                     
[] - Suspending job "cc4a9a04e164fff8628b3ac59e5fbe80" with a savepoint.
2020-12-14 15:04:51,470 INFO 
org.apache.flink.client.deployment.DefaultClusterClientServiceLoader [] -
Could not load factory due to missing dependencies.
2020-12-14 15:04:51,485 INFO  org.apache.flink.configuration.Configuration                
[] - Config uses fallback configuration key 'jobmanager.rpc.address' instead
of key 'rest.address'
2020-12-14 15:05:51,973 WARN 
org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannel [] -
Force-closing a channel whose registration task was not accepted by an event
loop: [id: 0xe1d20630]
java.util.concurrent.RejectedExecutionException: event executor terminated
	at
org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor.reject(SingleThreadEventExecutor.java:926)
~[flink-dist_2.11-1.12.0.jar:1.12.0]
	at
org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor.offerTask(SingleThreadEventExecutor.java:353)
~[flink-dist_2.11-1.12.0.jar:1.12.0]
	at
org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor.addTask(SingleThreadEventExecutor.java:346)
~[flink-dist_2.11-1.12.0.jar:1.12.0]
	at
org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor.execute(SingleThreadEventExecutor.java:828)
~[flink-dist_2.11-1.12.0.jar:1.12.0]
	at
org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor.execute(SingleThreadEventExecutor.java:818)
~[flink-dist_2.11-1.12.0.jar:1.12.0]
	at
org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannel$AbstractUnsafe.register(AbstractChannel.java:471)
~[flink-dist_2.11-1.12.0.jar:1.12.0]
	at
org.apache.flink.shaded.netty4.io.netty.channel.SingleThreadEventLoop.register(SingleThreadEventLoop.java:87)
~[flink-dist_2.11-1.12.0.jar:1.12.0]
	at
org.apache.flink.shaded.netty4.io.netty.channel.SingleThreadEventLoop.register(SingleThreadEventLoop.java:81)
~[flink-dist_2.11-1.12.0.jar:1.12.0]
	at
org.apache.flink.shaded.netty4.io.netty.channel.MultithreadEventLoopGroup.register(MultithreadEventLoopGroup.java:86)
~[flink-dist_2.11-1.12.0.jar:1.12.0]
	at
org.apache.flink.shaded.netty4.io.netty.bootstrap.AbstractBootstrap.initAndRegister(AbstractBootstrap.java:323)
~[flink-dist_2.11-1.12.0.jar:1.12.0]
	at
org.apache.flink.shaded.netty4.io.netty.bootstrap.Bootstrap.doResolveAndConnect(Bootstrap.java:155)
~[flink-dist_2.11-1.12.0.jar:1.12.0]
	at
org.apache.flink.shaded.netty4.io.netty.bootstrap.Bootstrap.connect(Bootstrap.java:139)
~[flink-dist_2.11-1.12.0.jar:1.12.0]
	at
org.apache.flink.shaded.netty4.io.netty.bootstrap.Bootstrap.connect(Bootstrap.java:123)
~[flink-dist_2.11-1.12.0.jar:1.12.0]
	at
org.apache.flink.runtime.rest.RestClient.submitRequest(RestClient.java:333)
~[flink-dist_2.11-1.12.0.jar:1.12.0]
	at
org.apache.flink.runtime.rest.RestClient.sendRequest(RestClient.java:272)
~[flink-dist_2.11-1.12.0.jar:1.12.0]
	at
org.apache.flink.runtime.rest.RestClient.sendRequest(RestClient.java:214)
~[flink-dist_2.11-1.12.0.jar:1.12.0]
	at
org.apache.flink.client.program.rest.RestClusterClient.lambda$null$23(RestClusterClient.java:666)
~[flink-dist_2.11-1.12.0.jar:1.12.0]
	at
java.util.concurrent.CompletableFuture.uniCompose(CompletableFuture.java:952)
[?:1.8.0_212]
	at
java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:926)
[?:1.8.0_212]
	at
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
[?:1.8.0_212]
	at
java.util.concurrent.CompletableFuture.postFire(CompletableFuture.java:561)
[?:1.8.0_212]
	at
java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:580)
[?:1.8.0_212]
	at
java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442)
[?:1.8.0_212]
	at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
[?:1.8.0_212]
	at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
[?:1.8.0_212]
	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_212]
2020-12-14 15:05:51,981 ERROR
org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.rejectedExecution
[] - Failed to submit a listener notification task. Event loop shut down?
java.util.concurrent.RejectedExecutionException: event executor terminated
	at
org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor.reject(SingleThreadEventExecutor.java:926)
~[flink-dist_2.11-1.12.0.jar:1.12.0]
	at
org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor.offerTask(SingleThreadEventExecutor.java:353)
~[flink-dist_2.11-1.12.0.jar:1.12.0]
	at
org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor.addTask(SingleThreadEventExecutor.java:346)
~[flink-dist_2.11-1.12.0.jar:1.12.0]
	at
org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor.execute(SingleThreadEventExecutor.java:828)
~[flink-dist_2.11-1.12.0.jar:1.12.0]
	at
org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor.execute(SingleThreadEventExecutor.java:818)
~[flink-dist_2.11-1.12.0.jar:1.12.0]
	at
org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.safeExecute(DefaultPromise.java:841)
~[flink-dist_2.11-1.12.0.jar:1.12.0]
	at
org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:498)
~[flink-dist_2.11-1.12.0.jar:1.12.0]
	at
org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.addListener(DefaultPromise.java:183)
~[flink-dist_2.11-1.12.0.jar:1.12.0]
	at
org.apache.flink.shaded.netty4.io.netty.channel.DefaultChannelPromise.addListener(DefaultChannelPromise.java:95)
~[flink-dist_2.11-1.12.0.jar:1.12.0]
	at
org.apache.flink.shaded.netty4.io.netty.channel.DefaultChannelPromise.addListener(DefaultChannelPromise.java:30)
~[flink-dist_2.11-1.12.0.jar:1.12.0]
	at
org.apache.flink.runtime.rest.RestClient.submitRequest(RestClient.java:337)
~[flink-dist_2.11-1.12.0.jar:1.12.0]
	at
org.apache.flink.runtime.rest.RestClient.sendRequest(RestClient.java:272)
~[flink-dist_2.11-1.12.0.jar:1.12.0]
	at
org.apache.flink.runtime.rest.RestClient.sendRequest(RestClient.java:214)
~[flink-dist_2.11-1.12.0.jar:1.12.0]
	at
org.apache.flink.client.program.rest.RestClusterClient.lambda$null$23(RestClusterClient.java:666)
~[flink-dist_2.11-1.12.0.jar:1.12.0]
	at
java.util.concurrent.CompletableFuture.uniCompose(CompletableFuture.java:952)
[?:1.8.0_212]
	at
java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:926)
[?:1.8.0_212]
	at
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
[?:1.8.0_212]
	at
java.util.concurrent.CompletableFuture.postFire(CompletableFuture.java:561)
[?:1.8.0_212]
	at
java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:580)
[?:1.8.0_212]
	at
java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442)
[?:1.8.0_212]
	at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
[?:1.8.0_212]
	at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
[?:1.8.0_212]
	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_212]
2020-12-14 15:05:51,981 ERROR org.apache.flink.client.cli.CliFrontend                     
[] - Error while running the command.
org.apache.flink.util.FlinkException: Could not stop with a savepoint job
"cc4a9a04e164fff8628b3ac59e5fbe80".
	at
org.apache.flink.client.cli.CliFrontend.lambda$stop$5(CliFrontend.java:539)
~[flink-dist_2.11-1.12.0.jar:1.12.0]
	at
org.apache.flink.client.cli.CliFrontend.runClusterAction(CliFrontend.java:919)
~[flink-dist_2.11-1.12.0.jar:1.12.0]
	at org.apache.flink.client.cli.CliFrontend.stop(CliFrontend.java:531)
~[flink-dist_2.11-1.12.0.jar:1.12.0]
	at
org.apache.flink.client.cli.CliFrontend.parseAndRun(CliFrontend.java:986)
~[flink-dist_2.11-1.12.0.jar:1.12.0]
	at
org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:1047)
~[flink-dist_2.11-1.12.0.jar:1.12.0]
	at
org.apache.flink.runtime.security.contexts.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:30)
[flink-dist_2.11-1.12.0.jar:1.12.0]
	at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1047)
[flink-dist_2.11-1.12.0.jar:1.12.0]
Caused by: java.util.concurrent.TimeoutException
	at
java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771)
~[?:1.8.0_212]
	at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
~[?:1.8.0_212]
	at
org.apache.flink.client.cli.CliFrontend.lambda$stop$5(CliFrontend.java:537)
~[flink-dist_2.11-1.12.0.jar:1.12.0]
	... 6 more


Thank you in advance.
Folani



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Getting an exception while stopping Flink with savepoints on Kubernetes+Minio

Posted by Piotr Nowojski <pn...@apache.org>.
Hi,

It's hard for me to guess what could be the problem. There was the same
error reported a couple of months ago [1], but there is frankly no extra
information there.

Can we start from looking at the full TaskManager and JobManager logs?
Could you share them with us?

Best,
Piotrek


[1]
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-1-11-job-stop-with-save-point-timeout-error-td36895.html

pt., 11 gru 2020 o 14:04 Folani <ha...@irisa.fr> napisaƂ(a):

> I'm deploying a standalone Flink cluster on top of Kubernetes and using
> MinIO
> as a S3 backend. I mainly follow the instructions in flink's website.
> I use the following command to run my job in Flink:  $flink run -d -m
> <IP>:<port>  -j  job.jar
>
> I also have added to flink-configmap.yaml the followings:
>
>
>     state.backend: filesystem
>     state.checkpoints.dir: s3://state/checkpoints
>     state.savepoints.dir: s3://state/savepoints
>     s3.path-style-access: true
>     s3.endpoint: http://minio-service:9000
>     s3.access-key: *******
>     s3.secret-key: *******
>
> It seems that everything is working well. The job is submitted correctly,
> the checkpoints are written in minio, but when I try to cancel the job or
> stop it with savepoints I get the following exception:
>
> org.apache.flink.util.FlinkException: Could not stop with a savepoint job
> "5ae191ca2b239ec7771e4c7a9a336537".
>         at
> org.apache.flink.client.cli.CliFrontend.lambda$stop$5(CliFrontend.java:495)
>         at
>
> org.apache.flink.client.cli.CliFrontend.runClusterAction(CliFrontend.java:864)
>         at
> org.apache.flink.client.cli.CliFrontend.stop(CliFrontend.java:487)
>         at
>
> org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:931)
>         at
>
> org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:992)
>         at
>
> org.apache.flink.runtime.security.contexts.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:30)
>         at
> org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:992)
> Caused by: java.util.concurrent.TimeoutException
>         at
>
> java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771)
>         at
> java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
>         at
> org.apache.flink.client.cli.CliFrontend.lambda$stop$5(CliFrontend.java:493)
>         ... 6 more
>
> This is my command to stop with savepoints:  $flink stop -p  <JobID>
> And my Flink version is flink-1.11.2-bin-scala_2.11.
>
> What could be the reason of the exception? Any suggestion?
>
>
>
>
>
>
>
> --
> Sent from:
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
>