You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "cjjxfli (Jira)" <ji...@apache.org> on 2022/03/30 04:07:00 UTC

[jira] [Commented] (FLINK-24031) I am trying to deploy Flink in kubernetes but when I launch the taskManager in other container I get a Exception

    [ https://issues.apache.org/jira/browse/FLINK-24031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17514427#comment-17514427 ] 

cjjxfli commented on FLINK-24031:
---------------------------------

*I have the same problem.*
 
2022-03-21 03:41:55,535 DEBUG org.apache.flink.runtime.rpc.akka.AkkaRpcService             [] - Try to connect to remote RPC endpoint with address akka.tcp://flink@flink-jobmanager:6123/user/rpc/resourcemanager_*. Returning a org.apache.flink.runtime.resourcemanager.ResourceManagerGateway gateway.
2022-03-21 03:41:55,548 WARN  akka.remote.ReliableDeliverySupervisor                       [] - Association with remote system [akka.tcp://flink@flink-jobmanager:6123] has failed, address is now gated for [50] ms. Reason: [Association failed with [akka.tcp://flink@flink-jobmanager:6123]] Caused by: [java.net.UnknownHostException: flink-jobmanager]
2022-03-21 03:41:55,550 DEBUG org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Could not resolve ResourceManager address akka.tcp://flink@flink-jobmanager:6123/user/rpc/resourcemanager_*, retrying in 10000 ms.
org.apache.flink.runtime.rpc.exceptions.RpcConnectionException: Could not connect to rpc endpoint under address akka.tcp://flink@flink-jobmanager:6123/user/rpc/resourcemanager_*.
    at org.apache.flink.runtime.rpc.akka.AkkaRpcService.lambda$resolveActorAddress$10(AkkaRpcService.java:520) ~[flink-dist_2.11-1.11.1.jar:1.11.1]
    at scala.concurrent.java8.FuturesConvertersImpl$CF$$anon$1.accept(FutureConvertersImpl.scala:59) ~[flink-dist_2.11-1.11.1.jar:1.11.1]
    at scala.concurrent.java8.FuturesConvertersImpl$CF$$anon$1.accept(FutureConvertersImpl.scala:53) ~[flink-dist_2.11-1.11.1.jar:1.11.1]
    at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774) ~[?:1.8.0_265]
    at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750) ~[?:1.8.0_265]
    at java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:456) ~[?:1.8.0_265]
    at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_265]
Caused by: akka.actor.ActorNotFound: Actor not found for: ActorSelection[Anchor(akka.tcp://flink@flink-jobmanager:6123/), Path(/user/rpc/resourcemanager_*)]
    at akka.actor.ActorSelection$$anonfun$resolveOne$1.apply(ActorSelection.scala:71) ~[flink-dist_2.11-1.11.1.jar:1.11.1]
    at akka.actor.ActorSelection$$anonfun$resolveOne$1.apply(ActorSelection.scala:69) ~[flink-dist_2.11-1.11.1.jar:1.11.1]
    at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:36) ~[flink-dist_2.11-1.11.1.jar:1.11.1]
    at akka.dispatch.BatchingExecutor$AbstractBatch.processBatch(BatchingExecutor.scala:55) ~[flink-dist_2.11-1.11.1.jar:1.11.1]
    at akka.dispatch.BatchingExecutor$Batch.run(BatchingExecutor.scala:73) ~[flink-dist_2.11-1.11.1.jar:1.11.1]
    at akka.dispatch.ExecutionContexts$sameThreadExecutionContext$.unbatchedExecute(Future.scala:81) ~[flink-dist_2.11-1.11.1.jar:1.11.1]
    at akka.dispatch.BatchingExecutor$class.execute(BatchingExecutor.scala:120) ~[flink-dist_2.11-1.11.1.jar:1.11.1]
    at akka.dispatch.ExecutionContexts$sameThreadExecutionContext$.execute(Future.scala:80) ~[flink-dist_2.11-1.11.1.jar:1.11.1]
    at scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:44) ~[flink-dist_2.11-1.11.1.jar:1.11.1]
    at scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:252) ~[flink-dist_2.11-1.11.1.jar:1.11.1]
    at akka.pattern.PromiseActorRef.$bang(AskSupport.scala:572) ~[flink-dist_2.11-1.11.1.jar:1.11.1]
    at akka.actor.EmptyLocalActorRef.specialHandle(ActorRef.scala:556) ~[flink-dist_2.11-1.11.1.jar:1.11.1]
    at akka.actor.DeadLetterActorRef.specialHandle(ActorRef.scala:593) ~[flink-dist_2.11-1.11.1.jar:1.11.1]
    at akka.actor.DeadLetterActorRef.$bang(ActorRef.scala:582) ~[flink-dist_2.11-1.11.1.jar:1.11.1]
    at akka.remote.RemoteActorRefProvider$RemoteDeadLetterActorRef.$bang(RemoteActorRefProvider.scala:104) ~[flink-dist_2.11-1.11.1.jar:1.11.1]
    at akka.remote.EndpointWriter.postStop(Endpoint.scala:606) ~[flink-dist_2.11-1.11.1.jar:1.11.1]
    at akka.actor.Actor$class.aroundPostStop(Actor.scala:536) ~[flink-dist_2.11-1.11.1.jar:1.11.1]
    at akka.remote.EndpointActor.aroundPostStop(Endpoint.scala:458) ~[flink-dist_2.11-1.11.1.jar:1.11.1]
    at akka.actor.dungeon.FaultHandling$class.akka$actor$dungeon$FaultHandling$$finishTerminate(FaultHandling.scala:210) ~[flink-dist_2.11-1.11.1.jar:1.11.1]
    at akka.actor.dungeon.FaultHandling$class.terminate(FaultHandling.scala:172) ~[flink-dist_2.11-1.11.1.jar:1.11.1]
    at akka.actor.ActorCell.terminate(ActorCell.scala:429) ~[flink-dist_2.11-1.11.1.jar:1.11.1]
    at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:533) ~[flink-dist_2.11-1.11.1.jar:1.11.1]
    at akka.actor.ActorCell.systemInvoke(ActorCell.scala:549) ~[flink-dist_2.11-1.11.1.jar:1.11.1]
    at akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:283) ~[flink-dist_2.11-1.11.1.jar:1.11.1]
    at akka.dispatch.Mailbox.run(Mailbox.scala:224) ~[flink-dist_2.11-1.11.1.jar:1.11.1]
    at akka.dispatch.Mailbox.exec(Mailbox.scala:235) ~[flink-dist_2.11-1.11.1.jar:1.11.1]
    at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) [flink-dist_2.11-1.11.1.jar:1.11.1]
    at akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) [flink-dist_2.11-1.11.1.jar:1.11.1]
    at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) [flink-dist_2.11-1.11.1.jar:1.11.1]
    at akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) [flink-dist_2.11-1.11.1.jar:1.11.1]

> I am trying to deploy Flink in kubernetes but when I launch the taskManager in other container I get a Exception
> ----------------------------------------------------------------------------------------------------------------
>
>                 Key: FLINK-24031
>                 URL: https://issues.apache.org/jira/browse/FLINK-24031
>             Project: Flink
>          Issue Type: Bug
>          Components: Deployment / Kubernetes
>    Affects Versions: 1.13.0, 1.13.2
>            Reporter: Julio Pérez
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.13.1
>
>         Attachments: flink-map.yml, jobmanager.log, jobmanager.yml, taskmanager.log, taskmanager.yml
>
>
>  I explain here -> [https://github.com/apache/flink/pull/17020]
> I have a problem when I try to run Flink in k8s with the follow manifests
> I have the following exception
>  # JobManager :
> {quote}2021-08-27 09:16:57,917 ERROR akka.remote.EndpointWriter [] - dropping message [class akka.actor.ActorSelectionMessage] for non-local recipient [Actor[akka.tcp://flink@jobmanager-hs:6123/]] arriving at [akka.tcp://flink@jobmanager-hs:6123] inbound addresses are [akka.tcp://flink@cluster:6123]
>  2021-08-27 09:17:01,255 DEBUG org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - Trigger heartbeat request.
>  2021-08-27 09:17:01,284 DEBUG org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - Trigger heartbeat request.
>  2021-08-27 09:17:10,008 DEBUG akka.remote.transport.netty.NettyTransport [] - Remote connection to [/172.17.0.1:34827] was disconnected because of [id: 0x13ae1d03, /172.17.0.1:34827 :> /172.17.0.23:6123] DISCONNECTED
>  2021-08-27 09:17:10,008 DEBUG akka.remote.transport.ProtocolStateActor [] - Association between local [tcp://flink@cluster:6123] and remote [tcp://flink@172.17.0.1:34827] was disassociated because the ProtocolStateActor failed: Unknown
>  2021-08-27 09:17:10,009 WARN akka.remote.ReliableDeliverySupervisor [] - Association with remote system [akka.tcp://flink@172.17.0.24:6122] has failed, address is now gated for [50] ms. Reason: [Disassociated]
> {quote}
> TaskManager:
> {quote}INFO org.apache.flink.runtime.taskexecutor.TaskExecutor [] - Could not resolve ResourceManager address akka.tcp://flink@flink-jobmanager:6123/user/rpc/resourcemanager__, retrying in 10000 ms: Could not connect to rpc endpoint under address akka.tcp://flink@flink-jobmanager:6123/user/rpc/resourcemanager__.
>  INFO org.apache.flink.runtime.taskexecutor.TaskExecutor [] - Could not resolve ResourceManager address akka.tcp://flink@flink-jobmanager:6123/user/rpc/resourcemanager__, retrying in 10000 ms: Could not connect to rpc endpoint under address akka.tcp://flink@flink-jobmanager:6123/user/rpc/resourcemanager__.
> {quote}
> Best regards,
> Julio



--
This message was sent by Atlassian Jira
(v8.20.1#820001)