You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Robert Cullen <ci...@gmail.com> on 2021/03/04 15:28:40 UTC

Python Flink cluster: how to shut-down

I ran this command using the example from the CLI page on the Flink website:

$ ./bin/flink run-application \ —target kubernetes-application \
—parallelism 8 \ -Dkubernetes.cluster-id= \
-Dtaskmanager.memory.process.size=4096m \ -Dkubernetes.taskmanager.cpu=2 \
-Dtaskmanager.numberOfTaskSlots=4 \ -Dkubernetes.container.image= \
—pyModule word_count \ —pyFiles
/opt/flink/examples/python/table/batch/word_count.py

How do I shut the cluster down?

Also I’m getting this error when running ./bin/flink list

org.apache.flink.util.FlinkException: Failed to retrieve job list.
    at org.apache.flink.client.cli.CliFrontend.listJobs(CliFrontend.java:449)
    at org.apache.flink.client.cli.CliFrontend.lambda$list$0(CliFrontend.java:430)
    at org.apache.flink.client.cli.CliFrontend.runClusterAction(CliFrontend.java:1002)
    at org.apache.flink.client.cli.CliFrontend.list(CliFrontend.java:427)
    at org.apache.flink.client.cli.CliFrontend.parseAndRun(CliFrontend.java:1060)
    at org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:1132)
    at org.apache.flink.runtime.security.contexts.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:28)
    at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1132)
Caused by: org.apache.flink.runtime.concurrent.FutureUtils$RetryException:
Could not complete the operation. Number of retries has been
exhausted.
    at org.apache.flink.runtime.concurrent.FutureUtils.lambda$retryOperationWithDelay$9(FutureUtils.java:386)
    at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774)
    at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750)
    at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
    at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990)
    at org.apache.flink.runtime.rest.RestClient.lambda$submitRequest$1(RestClient.java:430)
    at org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:577)
    at org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:570)
    at org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:549)
    at org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:490)
    at org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.setValue0(DefaultPromise.java:615)
    at org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.setFailure0(DefaultPromise.java:608)
    at org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:117)
    at org.apache.flink.shaded.netty4.io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.fulfillConnectPromise(AbstractNioChannel.java:321)
    at org.apache.flink.shaded.netty4.io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:337)
    at org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:702)
    at org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:650)
    at org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:576)
    at org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493)
    at org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
    at org.apache.flink.shaded.netty4.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.CompletionException:
org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannel$AnnotatedConnectException:
Connection refused: localhost/127.0.0.1:8081
    at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)
    at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308)
    at java.util.concurrent.CompletableFuture.uniCompose(CompletableFuture.java:957)
    at java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:940)
    ... 19 more
Caused by: org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannel$AnnotatedConnectException:
Connection refused: localhost/127.0.0.1:8081
Caused by: java.net.ConnectException: Connection refused
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:715)
    at org.apache.flink.shaded.netty4.io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:330)
    at org.apache.flink.shaded.netty4.io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:334)
    at org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:702)
    at org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:650)
    at org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:576)
    at org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493)
    at org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
    at org.apache.flink.shaded.netty4.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
    at java.lang.Thread.run(Thread.java:748)

-- 
Robert Cullen
240-475-4490

Re: Python Flink cluster: how to shut-down

Posted by Yang Wang <da...@gmail.com>.
I think chesnay's answer is on the point.
You could find how to list and cancel a Flink application on Kubernetes
here[1].

Another thing is that please make sure that you are using the correct
service exposed type[2](
e.g. LoadBalancer on the cloud, NodePort for the self-managed cluster, or
ClusterIP for K8s internal submission).

# List running job on the cluster
$ ./bin/flink list --target kubernetes-application
-Dkubernetes.cluster-id=my-first-application-cluster# Cancel running
job
$ ./bin/flink cancel --target kubernetes-application
-Dkubernetes.cluster-id=my-first-application-cluster <jobId>


[1].
https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/resource-providers/native_kubernetes/#application-mode
[2].
https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/resource-providers/native_kubernetes/#accessing-flinks-web-ui

Best,
Yang

Chesnay Schepler <ch...@apache.org> 于2021年3月4日周四 下午11:42写道:

> run-application creates an application cluster that shuts down once the
> job is complete. As such, canceling the job in this case is equivalent to
> shutting down the cluster.
>
> AFAIK you also need to specify kubernetes arguments when using the list
> command. (without any argument it just assumes you're running a local
> standalone cluster.
>
>
> On 3/4/2021 4:28 PM, Robert Cullen wrote:
>
> I ran this command using the example from the CLI page on the Flink
> website:
>
> $ ./bin/flink run-application \ —target kubernetes-application \
> —parallelism 8 \ -Dkubernetes.cluster-id= \
> -Dtaskmanager.memory.process.size=4096m \ -Dkubernetes.taskmanager.cpu=2 \
> -Dtaskmanager.numberOfTaskSlots=4 \ -Dkubernetes.container.image= \
> —pyModule word_count \ —pyFiles
> /opt/flink/examples/python/table/batch/word_count.py
>
> How do I shut the cluster down?
>
> Also I’m getting this error when running ./bin/flink list
>
> org.apache.flink.util.FlinkException: Failed to retrieve job list.
>     at org.apache.flink.client.cli.CliFrontend.listJobs(CliFrontend.java:449)
>     at org.apache.flink.client.cli.CliFrontend.lambda$list$0(CliFrontend.java:430)
>     at org.apache.flink.client.cli.CliFrontend.runClusterAction(CliFrontend.java:1002)
>     at org.apache.flink.client.cli.CliFrontend.list(CliFrontend.java:427)
>     at org.apache.flink.client.cli.CliFrontend.parseAndRun(CliFrontend.java:1060)
>     at org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:1132)
>     at org.apache.flink.runtime.security.contexts.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:28)
>     at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1132)
> Caused by: org.apache.flink.runtime.concurrent.FutureUtils$RetryException: Could not complete the operation. Number of retries has been exhausted.
>     at org.apache.flink.runtime.concurrent.FutureUtils.lambda$retryOperationWithDelay$9(FutureUtils.java:386)
>     at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774)
>     at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750)
>     at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
>     at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990)
>     at org.apache.flink.runtime.rest.RestClient.lambda$submitRequest$1(RestClient.java:430)
>     at org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:577)
>     at org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:570)
>     at org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:549)
>     at org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:490)
>     at org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.setValue0(DefaultPromise.java:615)
>     at org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.setFailure0(DefaultPromise.java:608)
>     at org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:117)
>     at org.apache.flink.shaded.netty4.io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.fulfillConnectPromise(AbstractNioChannel.java:321)
>     at org.apache.flink.shaded.netty4.io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:337)
>     at org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:702)
>     at org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:650)
>     at org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:576)
>     at org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493)
>     at org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
>     at org.apache.flink.shaded.netty4.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
>     at java.lang.Thread.run(Thread.java:748)
> Caused by: java.util.concurrent.CompletionException: org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: localhost/127.0.0.1:8081
>     at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)
>     at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308)
>     at java.util.concurrent.CompletableFuture.uniCompose(CompletableFuture.java:957)
>     at java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:940)
>     ... 19 more
> Caused by: org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: localhost/127.0.0.1:8081
> Caused by: java.net.ConnectException: Connection refused
>     at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>     at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:715)
>     at org.apache.flink.shaded.netty4.io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:330)
>     at org.apache.flink.shaded.netty4.io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:334)
>     at org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:702)
>     at org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:650)
>     at org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:576)
>     at org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493)
>     at org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
>     at org.apache.flink.shaded.netty4.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
>     at java.lang.Thread.run(Thread.java:748)
>
> --
> Robert Cullen
> 240-475-4490
>
>
>

Re: Python Flink cluster: how to shut-down

Posted by Chesnay Schepler <ch...@apache.org>.
run-application creates an application cluster that shuts down once the 
job is complete. As such, canceling the job in this case is equivalent 
to shutting down the cluster.

AFAIK you also need to specify kubernetes arguments when using the list 
command. (without any argument it just assumes you're running a local 
standalone cluster.


On 3/4/2021 4:28 PM, Robert Cullen wrote:
>
> I ran this command using the example from the CLI page on the Flink 
> website:
>
> $ ./bin/flink run-application \ —target kubernetes-application \ 
> —parallelism 8 \ -Dkubernetes.cluster-id= \ 
> -Dtaskmanager.memory.process.size=4096m \ 
> -Dkubernetes.taskmanager.cpu=2 \ -Dtaskmanager.numberOfTaskSlots=4 \ 
> -Dkubernetes.container.image= \ —pyModule word_count \ —pyFiles 
> /opt/flink/examples/python/table/batch/word_count.py
>
> How do I shut the cluster down?
>
> Also I’m getting this error when running ./bin/flink list
>
> |org.apache.flink.util.FlinkException: Failed to retrieve job list. at 
> org.apache.flink.client.cli.CliFrontend.listJobs(CliFrontend.java:449) 
> at 
> org.apache.flink.client.cli.CliFrontend.lambda$list$0(CliFrontend.java:430) 
> at 
> org.apache.flink.client.cli.CliFrontend.runClusterAction(CliFrontend.java:1002) 
> at org.apache.flink.client.cli.CliFrontend.list(CliFrontend.java:427) 
> at 
> org.apache.flink.client.cli.CliFrontend.parseAndRun(CliFrontend.java:1060) 
> at 
> org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:1132) 
> at 
> org.apache.flink.runtime.security.contexts.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:28) 
> at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1132) 
> Caused by: 
> org.apache.flink.runtime.concurrent.FutureUtils$RetryException: Could 
> not complete the operation. Number of retries has been exhausted. at 
> org.apache.flink.runtime.concurrent.FutureUtils.lambda$retryOperationWithDelay$9(FutureUtils.java:386) 
> at 
> java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774) 
> at 
> java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750) 
> at 
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) 
> at 
> java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990) 
> at 
> org.apache.flink.runtime.rest.RestClient.lambda$submitRequest$1(RestClient.java:430) 
> at 
> org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:577) 
> at 
> org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:570) 
> at 
> org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:549) 
> at 
> org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:490) 
> at 
> org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.setValue0(DefaultPromise.java:615) 
> at 
> org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.setFailure0(DefaultPromise.java:608) 
> at 
> org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:117) 
> at 
> org.apache.flink.shaded.netty4.io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.fulfillConnectPromise(AbstractNioChannel.java:321) 
> at 
> org.apache.flink.shaded.netty4.io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:337) 
> at 
> org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:702) 
> at 
> org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:650) 
> at 
> org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:576) 
> at 
> org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493) 
> at 
> org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) 
> at 
> org.apache.flink.shaded.netty4.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) 
> at java.lang.Thread.run(Thread.java:748) Caused by: 
> java.util.concurrent.CompletionException: 
> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannel$AnnotatedConnectException: 
> Connection refused: localhost/127.0.0.1:8081 <http://127.0.0.1:8081> 
> at 
> java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292) 
> at 
> java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308) 
> at 
> java.util.concurrent.CompletableFuture.uniCompose(CompletableFuture.java:957) 
> at 
> java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:940) 
> ... 19 more Caused by: 
> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannel$AnnotatedConnectException: 
> Connection refused: localhost/127.0.0.1:8081 <http://127.0.0.1:8081> 
> Caused by: java.net.ConnectException: Connection refused at 
> sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at 
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:715) 
> at 
> org.apache.flink.shaded.netty4.io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:330) 
> at 
> org.apache.flink.shaded.netty4.io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:334) 
> at 
> org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:702) 
> at 
> org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:650) 
> at 
> org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:576) 
> at 
> org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493) 
> at 
> org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) 
> at 
> org.apache.flink.shaded.netty4.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) 
> at java.lang.Thread.run(Thread.java:748) |
> -- 
> Robert Cullen
> 240-475-4490