You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Javier Vegas <jv...@strava.com> on 2022/10/13 19:41:13 UTC

HA not working in standalone mode for operator 1.2

Hi, I have a S3 HA Flink app that works as expected deployed via
operator 1.2 in native mode, but I am seeing errors when switching to
standalone mode (which I want to do mostly to save me having to set jarURI
explicitly).
I can see the job manager writes the JobGraph in S3, and in the web UI I
can see it creates the jobs, but the taskmanager sits there doing nothing
as if could not communicate with the jobmanager. I can see also that the
operator has created two services, while native mode creates only the rest
service. After a while, the taskmanager closes with the following exception:

org.apache.flink.runtime.taskexecutor.exceptions.RegistrationTimeoutException:
Could not register at the ResourceManager within the specified maximum
registration duration 300000 ms. This indicates a problem with this
instance. Terminating now.

        at
org.apache.flink.runtime.taskexecutor.TaskExecutor.registrationTimeout(TaskExecutor.java:1474)

        at
org.apache.flink.runtime.taskexecutor.TaskExecutor.lambda$startRegistrationTimeout$16(TaskExecutor.java:1459)

        at
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.lambda$handleRunAsync$4(AkkaRpcActor.java:443)

        at
org.apache.flink.runtime.concurrent.akka.ClassLoadingUtils.runWithContextClassLoader(ClassLoadingUtils.java:68)

        at
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:443)

        at
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:213)

        at
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:163)

        at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:24)

        at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:20)

        at scala.PartialFunction.applyOrElse(PartialFunction.scala:123)

        at scala.PartialFunction.applyOrElse$(PartialFunction.scala:122)

        at
akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:20)

        at
scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171)

        at
scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:172)

        at
scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:172)

        at akka.actor.Actor.aroundReceive(Actor.scala:537)

        at akka.actor.Actor.aroundReceive$(Actor.scala:535)

        at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:220)

        at akka.actor.ActorCell.receiveMessage(ActorCell.scala:580)

        at akka.actor.ActorCell.invoke(ActorCell.scala:548)

        at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:270)

        at akka.dispatch.Mailbox.run(Mailbox.scala:231)

        at akka.dispatch.Mailbox.exec(Mailbox.scala:243)

        at java.base/java.util.concurrent.ForkJoinTask.doExec(Unknown
Source)

        at
java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(Unknown
Source)

        at java.base/java.util.concurrent.ForkJoinPool.scan(Unknown Source)

        at java.base/java.util.concurrent.ForkJoinPool.runWorker(Unknown
Source)
        at java.base/java.util.concurrent.ForkJoinWorkerThread.run(Unknown
Source)

Re: HA not working in standalone mode for operator 1.2

Posted by Javier Vegas <jv...@strava.com>.
The jars that my build version creates have a version number, something
like myapp-2.2.11.jar. I am lazy and want to avoid having to update the
jarURI param (required in native mode) every time I deploy a new version of
my app, and just update the Docker image I am using. Another solution would
be to keep using native, and modify my build system to strip the version
number in the packaged jar.

Thanks!

Javier

El jue, 13 oct 2022 a las 13:22, Gyula Fóra (<gy...@gmail.com>)
escribió:

> Before we dive further into this can you please explain the jarURI problem
> your are trying to solve by switching to standalone?
>
> The native mode should work well in almost any setup.
>
> Gyula
>
> On Thu, 13 Oct 2022 at 21:41, Javier Vegas <jv...@strava.com> wrote:
>
>> Hi, I have a S3 HA Flink app that works as expected deployed via
>> operator 1.2 in native mode, but I am seeing errors when switching to
>> standalone mode (which I want to do mostly to save me having to set jarURI
>> explicitly).
>> I can see the job manager writes the JobGraph in S3, and in the web UI I
>> can see it creates the jobs, but the taskmanager sits there doing nothing
>> as if could not communicate with the jobmanager. I can see also that the
>> operator has created two services, while native mode creates only the rest
>> service. After a while, the taskmanager closes with the following exception:
>>
>> org.apache.flink.runtime.taskexecutor.exceptions.RegistrationTimeoutException:
>> Could not register at the ResourceManager within the specified maximum
>> registration duration 300000 ms. This indicates a problem with this
>> instance. Terminating now.
>>
>>         at
>> org.apache.flink.runtime.taskexecutor.TaskExecutor.registrationTimeout(TaskExecutor.java:1474)
>>
>>         at
>> org.apache.flink.runtime.taskexecutor.TaskExecutor.lambda$startRegistrationTimeout$16(TaskExecutor.java:1459)
>>
>>         at
>> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.lambda$handleRunAsync$4(AkkaRpcActor.java:443)
>>
>>         at
>> org.apache.flink.runtime.concurrent.akka.ClassLoadingUtils.runWithContextClassLoader(ClassLoadingUtils.java:68)
>>
>>         at
>> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:443)
>>
>>         at
>> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:213)
>>
>>         at
>> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:163)
>>
>>         at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:24)
>>
>>         at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:20)
>>
>>         at scala.PartialFunction.applyOrElse(PartialFunction.scala:123)
>>
>>         at scala.PartialFunction.applyOrElse$(PartialFunction.scala:122)
>>
>>         at
>> akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:20)
>>
>>         at
>> scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171)
>>
>>         at
>> scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:172)
>>
>>         at
>> scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:172)
>>
>>         at akka.actor.Actor.aroundReceive(Actor.scala:537)
>>
>>         at akka.actor.Actor.aroundReceive$(Actor.scala:535)
>>
>>         at
>> akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:220)
>>
>>         at akka.actor.ActorCell.receiveMessage(ActorCell.scala:580)
>>
>>         at akka.actor.ActorCell.invoke(ActorCell.scala:548)
>>
>>         at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:270)
>>
>>         at akka.dispatch.Mailbox.run(Mailbox.scala:231)
>>
>>         at akka.dispatch.Mailbox.exec(Mailbox.scala:243)
>>
>>         at java.base/java.util.concurrent.ForkJoinTask.doExec(Unknown
>> Source)
>>
>>         at
>> java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(Unknown
>> Source)
>>
>>         at java.base/java.util.concurrent.ForkJoinPool.scan(Unknown
>> Source)
>>
>>         at java.base/java.util.concurrent.ForkJoinPool.runWorker(Unknown
>> Source)
>>         at
>> java.base/java.util.concurrent.ForkJoinWorkerThread.run(Unknown Source)
>>
>>

Re: HA not working in standalone mode for operator 1.2

Posted by Gyula Fóra <gy...@gmail.com>.
Before we dive further into this can you please explain the jarURI problem
your are trying to solve by switching to standalone?

The native mode should work well in almost any setup.

Gyula

On Thu, 13 Oct 2022 at 21:41, Javier Vegas <jv...@strava.com> wrote:

> Hi, I have a S3 HA Flink app that works as expected deployed via
> operator 1.2 in native mode, but I am seeing errors when switching to
> standalone mode (which I want to do mostly to save me having to set jarURI
> explicitly).
> I can see the job manager writes the JobGraph in S3, and in the web UI I
> can see it creates the jobs, but the taskmanager sits there doing nothing
> as if could not communicate with the jobmanager. I can see also that the
> operator has created two services, while native mode creates only the rest
> service. After a while, the taskmanager closes with the following exception:
>
> org.apache.flink.runtime.taskexecutor.exceptions.RegistrationTimeoutException:
> Could not register at the ResourceManager within the specified maximum
> registration duration 300000 ms. This indicates a problem with this
> instance. Terminating now.
>
>         at
> org.apache.flink.runtime.taskexecutor.TaskExecutor.registrationTimeout(TaskExecutor.java:1474)
>
>         at
> org.apache.flink.runtime.taskexecutor.TaskExecutor.lambda$startRegistrationTimeout$16(TaskExecutor.java:1459)
>
>         at
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.lambda$handleRunAsync$4(AkkaRpcActor.java:443)
>
>         at
> org.apache.flink.runtime.concurrent.akka.ClassLoadingUtils.runWithContextClassLoader(ClassLoadingUtils.java:68)
>
>         at
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:443)
>
>         at
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:213)
>
>         at
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:163)
>
>         at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:24)
>
>         at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:20)
>
>         at scala.PartialFunction.applyOrElse(PartialFunction.scala:123)
>
>         at scala.PartialFunction.applyOrElse$(PartialFunction.scala:122)
>
>         at
> akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:20)
>
>         at
> scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171)
>
>         at
> scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:172)
>
>         at
> scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:172)
>
>         at akka.actor.Actor.aroundReceive(Actor.scala:537)
>
>         at akka.actor.Actor.aroundReceive$(Actor.scala:535)
>
>         at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:220)
>
>         at akka.actor.ActorCell.receiveMessage(ActorCell.scala:580)
>
>         at akka.actor.ActorCell.invoke(ActorCell.scala:548)
>
>         at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:270)
>
>         at akka.dispatch.Mailbox.run(Mailbox.scala:231)
>
>         at akka.dispatch.Mailbox.exec(Mailbox.scala:243)
>
>         at java.base/java.util.concurrent.ForkJoinTask.doExec(Unknown
> Source)
>
>         at
> java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(Unknown
> Source)
>
>         at java.base/java.util.concurrent.ForkJoinPool.scan(Unknown
> Source)
>
>         at java.base/java.util.concurrent.ForkJoinPool.runWorker(Unknown
> Source)
>         at
> java.base/java.util.concurrent.ForkJoinWorkerThread.run(Unknown Source)
>
>