You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Al-Isawi Rami <Ra...@comptel.com> on 2017/01/03 22:44:32 UTC

Cannot run using a savepoint with the same jar

Hi,

I have a flink job that I can trigger a save point for with no problem. However, If I cancel the job then try to run it with the save point, I get the following exception. Any ideas how can I debug or fix it? I am using the exact same jar so I did not modify the program in any manner. Using Flink version 1.1.4


Caused by: java.lang.IllegalStateException: Failed to rollback to savepoint jobmanager://savepoints/1. Cannot map savepoint state for operator 1692abfa98b4a67c1b7dfc17f79d35d7 to the new program, because the operator is not available in the new program. If you want to allow this, you can set the --allowNonRestoredState option on the CLI.
at org.apache.flink.runtime.checkpoint.savepoint.SavepointCoordinator.restoreSavepoint(SavepointCoordinator.java:257)
at org.apache.flink.runtime.executiongraph.ExecutionGraph.restoreSavepoint(ExecutionGraph.java:1020)
at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply$mcV$sp(JobManager.scala:1336)
at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1326)
at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1326)
at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:41)
at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:401)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.pollAndExecAll(ForkJoinPool.java:1253)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1346)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Disclaimer: This message and any attachments thereto are intended solely for the addressed recipient(s) and may contain confidential information. If you are not the intended recipient, please notify the sender by reply e-mail and delete the e-mail (including any attachments thereto) without producing, distributing or retaining any copies thereof. Any review, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient(s) is prohibited. Thank you.

Re: Cannot run using a savepoint with the same jar

Posted by Aljoscha Krettek <al...@apache.org>.
Hi Rami,
could you maybe provide your code? You could also send it to me directly if
you don't want to share with the community.

It might be that there is something in the way the pipeline is setup that
causes the (generated) operator UIDs to not be deterministic.

Best,
Aljoscha

On Sat, 7 Jan 2017 at 12:36 Rami Al-Isawi <Ra...@comptel.com> wrote:

> Hi Stephan,
>
> I have not change the parallelism nor the names or anything in my program.
> It is the same exact jar file unmodified.
>
> I have tried uid. but I faced this "UnsupportedOperationException: Cannot
> assign user-specified hash to intermediate node in chain. This will be
> supported in future versions of Flink. As a work around start new chain at
> task Map."
>
> Any clues how to carry on? I am just trying to avoid the painful process
> of dismantling the code and test so I come closer to the cause.
>
> I just think that if I am providing the same exact jar, nothing should
> break.
>
> Regards,
> -Rami
>
> On 4 Jan 2017, at 11:19, Stephan Ewen <se...@apache.org> wrote:
>
> Hi!
>
> Did you change the parallelism in your program, or do the names of some
> functions change each time you call the program?
>
> Can you try what happens when you give explicit IDs to operators via the
> '.uid(...)' method?
>
> Stephan
>
>
> On Tue, Jan 3, 2017 at 11:44 PM, Al-Isawi Rami <Ra...@comptel.com>
> wrote:
>
> Hi,
>
> I have a flink job that I can trigger a save point for with no problem.
> However, If I cancel the job then try to run it with the save point, I get
> the following exception. Any ideas how can I debug or fix it? I am using
> the exact same jar so I did not modify the program in any manner. Using
> Flink version 1.1.4
>
>
> Caused by: java.lang.IllegalStateException: Failed to rollback to
> savepoint jobmanager://savepoints/1. Cannot map savepoint state for
> operator 1692abfa98b4a67c1b7dfc17f79d35d7 to the new program, because the
> operator is not available in the new program. If you want to allow this,
> you can set the --allowNonRestoredState option on the CLI.
> at
> org.apache.flink.runtime.checkpoint.savepoint.SavepointCoordinator.restoreSavepoint(SavepointCoordinator.java:257)
> at
> org.apache.flink.runtime.executiongraph.ExecutionGraph.restoreSavepoint(ExecutionGraph.java:1020)
> at
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply$mcV$sp(JobManager.scala:1336)
> at
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1326)
> at
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1326)
> at
> scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
> at
> scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
> at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:41)
> at
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:401)
> at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
> at
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.pollAndExecAll(ForkJoinPool.java:1253)
> at
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1346)
> at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
> at
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> Disclaimer: This message and any attachments thereto are intended solely
> for the addressed recipient(s) and may contain confidential information. If
> you are not the intended recipient, please notify the sender by reply
> e-mail and delete the e-mail (including any attachments thereto) without
> producing, distributing or retaining any copies thereof. Any review,
> dissemination or other use of, or taking of any action in reliance upon,
> this information by persons or entities other than the intended
> recipient(s) is prohibited. Thank you.
>
>
>
> Disclaimer: This message and any attachments thereto are intended solely
> for the addressed recipient(s) and may contain confidential information. If
> you are not the intended recipient, please notify the sender by reply
> e-mail and delete the e-mail (including any attachments thereto) without
> producing, distributing or retaining any copies thereof. Any review,
> dissemination or other use of, or taking of any action in reliance upon,
> this information by persons or entities other than the intended
> recipient(s) is prohibited. Thank you.
>

Re: Cannot run using a savepoint with the same jar

Posted by Rami Al-Isawi <Ra...@comptel.com>.
Hi Stephan,

I have not change the parallelism nor the names or anything in my program. It is the same exact jar file unmodified.

I have tried uid. but I faced this "UnsupportedOperationException: Cannot assign user-specified hash to intermediate node in chain. This will be supported in future versions of Flink. As a work around start new chain at task Map."

Any clues how to carry on? I am just trying to avoid the painful process of dismantling the code and test so I come closer to the cause.

I just think that if I am providing the same exact jar, nothing should break.

Regards,
-Rami

On 4 Jan 2017, at 11:19, Stephan Ewen <se...@apache.org>> wrote:

Hi!

Did you change the parallelism in your program, or do the names of some functions change each time you call the program?

Can you try what happens when you give explicit IDs to operators via the '.uid(...)' method?

Stephan


On Tue, Jan 3, 2017 at 11:44 PM, Al-Isawi Rami <Ra...@comptel.com>> wrote:
Hi,

I have a flink job that I can trigger a save point for with no problem. However, If I cancel the job then try to run it with the save point, I get the following exception. Any ideas how can I debug or fix it? I am using the exact same jar so I did not modify the program in any manner. Using Flink version 1.1.4


Caused by: java.lang.IllegalStateException: Failed to rollback to savepoint jobmanager://savepoints/1. Cannot map savepoint state for operator 1692abfa98b4a67c1b7dfc17f79d35d7 to the new program, because the operator is not available in the new program. If you want to allow this, you can set the --allowNonRestoredState option on the CLI.
at org.apache.flink.runtime.checkpoint.savepoint.SavepointCoordinator.restoreSavepoint(SavepointCoordinator.java:257)
at org.apache.flink.runtime.executiongraph.ExecutionGraph.restoreSavepoint(ExecutionGraph.java:1020)
at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply$mcV$sp(JobManager.scala:1336)
at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1326)
at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1326)
at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:41)
at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:401)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.pollAndExecAll(ForkJoinPool.java:1253)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1346)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Disclaimer: This message and any attachments thereto are intended solely for the addressed recipient(s) and may contain confidential information. If you are not the intended recipient, please notify the sender by reply e-mail and delete the e-mail (including any attachments thereto) without producing, distributing or retaining any copies thereof. Any review, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient(s) is prohibited. Thank you.


Disclaimer: This message and any attachments thereto are intended solely for the addressed recipient(s) and may contain confidential information. If you are not the intended recipient, please notify the sender by reply e-mail and delete the e-mail (including any attachments thereto) without producing, distributing or retaining any copies thereof. Any review, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient(s) is prohibited. Thank you.

Re: Cannot run using a savepoint with the same jar

Posted by Stephan Ewen <se...@apache.org>.
Hi!

Did you change the parallelism in your program, or do the names of some
functions change each time you call the program?

Can you try what happens when you give explicit IDs to operators via the
'.uid(...)' method?

Stephan


On Tue, Jan 3, 2017 at 11:44 PM, Al-Isawi Rami <Ra...@comptel.com>
wrote:

> Hi,
>
> I have a flink job that I can trigger a save point for with no problem.
> However, If I cancel the job then try to run it with the save point, I get
> the following exception. Any ideas how can I debug or fix it? I am using
> the exact same jar so I did not modify the program in any manner. Using
> Flink version 1.1.4
>
>
> Caused by: java.lang.IllegalStateException: Failed to rollback to
> savepoint jobmanager://savepoints/1. Cannot map savepoint state for
> operator 1692abfa98b4a67c1b7dfc17f79d35d7 to the new program, because the
> operator is not available in the new program. If you want to allow this,
> you can set the --allowNonRestoredState option on the CLI.
> at org.apache.flink.runtime.checkpoint.savepoint.SavepointCoordinator.
> restoreSavepoint(SavepointCoordinator.java:257)
> at org.apache.flink.runtime.executiongraph.ExecutionGraph.
> restoreSavepoint(ExecutionGraph.java:1020)
> at org.apache.flink.runtime.jobmanager.JobManager$$
> anonfun$org$apache$flink$runtime$jobmanager$JobManager$
> $submitJob$1.apply$mcV$sp(JobManager.scala:1336)
> at org.apache.flink.runtime.jobmanager.JobManager$$
> anonfun$org$apache$flink$runtime$jobmanager$JobManager$
> $submitJob$1.apply(JobManager.scala:1326)
> at org.apache.flink.runtime.jobmanager.JobManager$$
> anonfun$org$apache$flink$runtime$jobmanager$JobManager$
> $submitJob$1.apply(JobManager.scala:1326)
> at scala.concurrent.impl.Future$PromiseCompletingRunnable.
> liftedTree1$1(Future.scala:24)
> at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(
> Future.scala:24)
> at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:41)
> at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(
> AbstractDispatcher.scala:401)
> at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
> at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.
> pollAndExecAll(ForkJoinPool.java:1253)
> at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.
> runTask(ForkJoinPool.java:1346)
> at scala.concurrent.forkjoin.ForkJoinPool.runWorker(
> ForkJoinPool.java:1979)
> at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(
> ForkJoinWorkerThread.java:107)
> Disclaimer: This message and any attachments thereto are intended solely
> for the addressed recipient(s) and may contain confidential information. If
> you are not the intended recipient, please notify the sender by reply
> e-mail and delete the e-mail (including any attachments thereto) without
> producing, distributing or retaining any copies thereof. Any review,
> dissemination or other use of, or taking of any action in reliance upon,
> this information by persons or entities other than the intended
> recipient(s) is prohibited. Thank you.
>