You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Till Rohrmann (Jira)" <ji...@apache.org> on 2020/05/18 14:01:00 UTC

[jira] [Updated] (FLINK-10443) Flink Yarn CLI hanging after unsuccessful deployment

     [ https://issues.apache.org/jira/browse/FLINK-10443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Till Rohrmann updated FLINK-10443:
----------------------------------
    Affects Version/s: 1.9.1

> Flink Yarn CLI hanging after unsuccessful deployment
> ----------------------------------------------------
>
>                 Key: FLINK-10443
>                 URL: https://issues.apache.org/jira/browse/FLINK-10443
>             Project: Flink
>          Issue Type: Bug
>          Components: Command Line Client, Deployment / YARN
>    Affects Versions: 1.5.4, 1.9.1
>            Reporter: Nico Kruber
>            Priority: Major
>
> It looks like the Flink CLI is retrying forever to cancel a YARN deployment which has failed already (during deployment). Additionally, you cannot cancel the CLI via CTRL+C
> Reproduce via a failing deployment due to an inaccessible SSL key file:
> {code}
> // flink-conf.yaml
> security.ssl.enabled: true
> web.ssl.enabled: false
> security.ssl.keystore: /path/to/non-existing/internal.keystore
> security.ssl.keystore-password: internal_store_password
> security.ssl.key-password: internal_key_password
> security.ssl.truststore: /path/to/non-existing/internal.keystore
> security.ssl.truststore-password: internal_store_password
> {code}
> {code}
> ./bin/flink run -m yarn-cluster -p 2 ./examples/streaming/WordCount.jar --input /usr/share/doc/java-1.8.0-openjdk-devel-1.8.0.181/LICENSE
> {code}
> After the failure, the CLI will hang with repeatedly printing this:
> {code}
> 2018-09-26 15:17:50,348 INFO  org.apache.hadoop.io.retry.RetryInvocationHandler             - Exception while invoking ApplicationClientProtocolPBClientImpl.forceKillApplication over null. Retrying after sleeping for 30000ms.
> java.io.IOException: The client is stopped
>         at org.apache.hadoop.ipc.Client.getConnection(Client.java:1519)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1381)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1345)
>         at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:227)
>         at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
>         at com.sun.proxy.$Proxy8.forceKillApplication(Unknown Source)
>         at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.forceKillApplication(ApplicationClientProtocolPBClientImpl.java:213)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:498)
>         at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:409)
>         at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:163)
>         at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:155)
>         at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
>         at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:346)
>         at com.sun.proxy.$Proxy9.forceKillApplication(Unknown Source)
>         at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.killApplication(YarnClientImpl.java:439)
>         at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.killApplication(YarnClientImpl.java:419)
>         at org.apache.flink.yarn.AbstractYarnClusterDescriptor.failSessionDuringDeployment(AbstractYarnClusterDescriptor.java:1236)
>         at org.apache.flink.yarn.AbstractYarnClusterDescriptor.access$200(AbstractYarnClusterDescriptor.java:111)
>         at org.apache.flink.yarn.AbstractYarnClusterDescriptor$DeploymentFailureHook.run(AbstractYarnClusterDescriptor.java:1493)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)