You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Yury Ruchin <yu...@gmail.com> on 2017/02/01 07:40:17 UTC

Re: Cannot cancel job with savepoint due to timeout

Hi Bruno,

From the code I conclude that "akka.client.timeout" setting is what affects
this. It defaults to 60 seconds.

I'm not sure why this setting is not documented though as well as many
other "akka.*" settings - maybe there are some good reasons behind.

Regards,
Yury

2017-01-31 17:47 GMT+03:00 Bruno Aranda <ba...@apache.org>:

> Hi there,
>
> I am trying to cancel a job and create a savepoint (ie flink cancel -s)
> but it takes more than a minute to do that and then it fails due to the
> timeout. However, it seems that the job will be cancelled successfully and
> the savepoint made, but I can only see that through the dasboard.
>
> Cancelling job 790b60a2b44bc98854782d4e0cac05d5 with savepoint to default
> savepoint directory.
>
> ------------------------------------------------------------
>  The program finished with the following exception:
>
> java.util.concurrent.TimeoutException: Futures timed out after [60000
> milliseconds]
> at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
> at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
> at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:190)
> at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(
> BlockContext.scala:53)
> at scala.concurrent.Await$.result(package.scala:190)
> at scala.concurrent.Await.result(package.scala)
> at org.apache.flink.client.CliFrontend.cancel(CliFrontend.java:618)
> at org.apache.flink.client.CliFrontend.parseParameters(
> CliFrontend.java:1079)
> at org.apache.flink.client.CliFrontend$2.call(CliFrontend.java:1120)
> at org.apache.flink.client.CliFrontend$2.call(CliFrontend.java:1117)
> at org.apache.flink.runtime.security.HadoopSecurityContext$1.run(
> HadoopSecurityContext.java:43)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at org.apache.hadoop.security.UserGroupInformation.doAs(
> UserGroupInformation.java:1698)
> at org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(
> HadoopSecurityContext.java:40)
> at org.apache.flink.client.CliFrontend.main(CliFrontend.java:1117)
>
> Is there any way to configure this timeout? So we can depend on the
> outcome of this execution for scripts, etc.
>
> Thanks!
>
> Bruno
>

Re: Cannot cancel job with savepoint due to timeout

Posted by Till Rohrmann <tr...@apache.org>.
Hi Bruno,

the lacking documentation for akka.client.timeout is an oversight on our
part [1]. I'll update it asap.

Unfortunately, at the moment there is no other way than to specify the
akka.client.timeout in the flink-conf.yaml file.

[1] https://issues.apache.org/jira/browse/FLINK-5700

Cheers,
Till

On Wed, Feb 1, 2017 at 9:47 AM, Bruno Aranda <br...@gmail.com> wrote:

> Maybe, though it could be good to be able to override in the command line
> somehow, though I guess I could just change the flink config.
>
> Many thanks Yuri,
>
> Bruno
>
> On Wed, 1 Feb 2017 at 07:40 Yury Ruchin <yu...@gmail.com> wrote:
>
>> Hi Bruno,
>>
>> From the code I conclude that "akka.client.timeout" setting is what
>> affects this. It defaults to 60 seconds.
>>
>> I'm not sure why this setting is not documented though as well as many
>> other "akka.*" settings - maybe there are some good reasons behind.
>>
>> Regards,
>> Yury
>>
>> 2017-01-31 17:47 GMT+03:00 Bruno Aranda <ba...@apache.org>:
>>
>> Hi there,
>>
>> I am trying to cancel a job and create a savepoint (ie flink cancel -s)
>> but it takes more than a minute to do that and then it fails due to the
>> timeout. However, it seems that the job will be cancelled successfully and
>> the savepoint made, but I can only see that through the dasboard.
>>
>> Cancelling job 790b60a2b44bc98854782d4e0cac05d5 with savepoint to
>> default savepoint directory.
>>
>> ------------------------------------------------------------
>>  The program finished with the following exception:
>>
>> java.util.concurrent.TimeoutException: Futures timed out after [60000
>> milliseconds]
>> at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
>> at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
>> at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:190)
>> at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(
>> BlockContext.scala:53)
>> at scala.concurrent.Await$.result(package.scala:190)
>> at scala.concurrent.Await.result(package.scala)
>> at org.apache.flink.client.CliFrontend.cancel(CliFrontend.java:618)
>> at org.apache.flink.client.CliFrontend.parseParameters(
>> CliFrontend.java:1079)
>> at org.apache.flink.client.CliFrontend$2.call(CliFrontend.java:1120)
>> at org.apache.flink.client.CliFrontend$2.call(CliFrontend.java:1117)
>> at org.apache.flink.runtime.security.HadoopSecurityContext$1.run(
>> HadoopSecurityContext.java:43)
>> at java.security.AccessController.doPrivileged(Native Method)
>> at javax.security.auth.Subject.doAs(Subject.java:422)
>> at org.apache.hadoop.security.UserGroupInformation.doAs(
>> UserGroupInformation.java:1698)
>> at org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(
>> HadoopSecurityContext.java:40)
>> at org.apache.flink.client.CliFrontend.main(CliFrontend.java:1117)
>>
>> Is there any way to configure this timeout? So we can depend on the
>> outcome of this execution for scripts, etc.
>>
>> Thanks!
>>
>> Bruno
>>
>>
>>

Re: Cannot cancel job with savepoint due to timeout

Posted by Bruno Aranda <br...@gmail.com>.
Maybe, though it could be good to be able to override in the command line
somehow, though I guess I could just change the flink config.

Many thanks Yuri,

Bruno

On Wed, 1 Feb 2017 at 07:40 Yury Ruchin <yu...@gmail.com> wrote:

> Hi Bruno,
>
> From the code I conclude that "akka.client.timeout" setting is what
> affects this. It defaults to 60 seconds.
>
> I'm not sure why this setting is not documented though as well as many
> other "akka.*" settings - maybe there are some good reasons behind.
>
> Regards,
> Yury
>
> 2017-01-31 17:47 GMT+03:00 Bruno Aranda <ba...@apache.org>:
>
> Hi there,
>
> I am trying to cancel a job and create a savepoint (ie flink cancel -s)
> but it takes more than a minute to do that and then it fails due to the
> timeout. However, it seems that the job will be cancelled successfully and
> the savepoint made, but I can only see that through the dasboard.
>
> Cancelling job 790b60a2b44bc98854782d4e0cac05d5 with savepoint to default
> savepoint directory.
>
> ------------------------------------------------------------
>  The program finished with the following exception:
>
> java.util.concurrent.TimeoutException: Futures timed out after [60000
> milliseconds]
> at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
> at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
> at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:190)
> at
> scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
> at scala.concurrent.Await$.result(package.scala:190)
> at scala.concurrent.Await.result(package.scala)
> at org.apache.flink.client.CliFrontend.cancel(CliFrontend.java:618)
> at
> org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:1079)
> at org.apache.flink.client.CliFrontend$2.call(CliFrontend.java:1120)
> at org.apache.flink.client.CliFrontend$2.call(CliFrontend.java:1117)
> at
> org.apache.flink.runtime.security.HadoopSecurityContext$1.run(HadoopSecurityContext.java:43)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
> at
> org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:40)
> at org.apache.flink.client.CliFrontend.main(CliFrontend.java:1117)
>
> Is there any way to configure this timeout? So we can depend on the
> outcome of this execution for scripts, etc.
>
> Thanks!
>
> Bruno
>
>
>