You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flink.apache.org by "liuzhuo (JIRA)" <ji...@apache.org> on 2018/03/26 06:54:00 UTC

[jira] [Created] (FLINK-9072) Host name with "_" causes cluster exception

liuzhuo created FLINK-9072:
------------------------------

             Summary: Host name with "_" causes cluster exception
                 Key: FLINK-9072
                 URL: https://issues.apache.org/jira/browse/FLINK-9072
             Project: Flink
          Issue Type: Bug
          Components: Core
    Affects Versions: 1.3.2
         Environment: linux: 

    Linux version 3.10.0-693.2.2.el7.x86_64 (builder@kbuilder.dev.centos.org) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-16) (GCC) ) #1 SMP Tue Sep 12 22:26:13 UTC 2017

Java:

    1.8.0_121-b13

Flink :

    flink-1.3.2-bin-hadoop26-scala_2.11
            Reporter: liuzhuo


In my production environment , When I start the cluster, I got errors .

 

 
{code:java}
2018-03-21 09:50:42,437 ERROR org.apache.flink.runtime.webmonitor.files.StaticFileServerHandler - Caught exception
akka.actor.ActorNotFound: Actor not found for: ActorSelection[Anchor(akka://flink/deadLetters), Path(/)]
at akka.actor.ActorSelection$$anonfun$resolveOne$1.apply(ActorSelection.scala:65)
at akka.actor.ActorSelection$$anonfun$resolveOne$1.apply(ActorSelection.scala:63)
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
at akka.dispatch.BatchingExecutor$AbstractBatch.processBatch(BatchingExecutor.scala:55)
at akka.dispatch.BatchingExecutor$Batch.run(BatchingExecutor.scala:73)
at akka.dispatch.ExecutionContexts$sameThreadExecutionContext$.unbatchedExecute(Future.scala:74)
at akka.dispatch.BatchingExecutor$class.execute(BatchingExecutor.scala:120)
at akka.dispatch.ExecutionContexts$sameThreadExecutionContext$.execute(Future.scala:73)
at scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:40)
at scala.concurrent.impl.Promise$DefaultPromise.scala$concurrent$impl$Promise$DefaultPromise$$dispatchOrAddCallback(Promise.scala:280)
at scala.concurrent.impl.Promise$DefaultPromise.onComplete(Promise.scala:270)
at akka.actor.ActorSelection.resolveOne(ActorSelection.scala:63)
at org.apache.flink.runtime.akka.AkkaUtils$.getActorRefFuture(AkkaUtils.scala:498)
at org.apache.flink.runtime.akka.AkkaUtils.getActorRefFuture(AkkaUtils.scala)
at org.apache.flink.runtime.webmonitor.JobManagerRetriever.notifyLeaderAddress(JobManagerRetriever.java:141)
at org.apache.flink.runtime.leaderretrieval.StandaloneLeaderRetrievalService.start(StandaloneLeaderRetrievalService.java:85)
at org.apache.flink.runtime.webmonitor.WebRuntimeMonitor.start(WebRuntimeMonitor.java:434)
at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$startJobManagerActors$6.apply(JobManager.scala:2352)
at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$startJobManagerActors$6.apply(JobManager.scala:2344)
at scala.Option.foreach(Option.scala:257)
at org.apache.flink.runtime.jobmanager.JobManager$.startJobManagerActors(JobManager.scala:2343)
at org.apache.flink.runtime.jobmanager.JobManager$.liftedTree3$1(JobManager.scala:2053)
at org.apache.flink.runtime.jobmanager.JobManager$.runJobManager(JobManager.scala:2052)
at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$2.apply$mcV$sp(JobManager.scala:2139)
at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$2.apply(JobManager.scala:2117)
at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$2.apply(JobManager.scala:2117)
at scala.util.Try$.apply(Try.scala:192)
at org.apache.flink.runtime.jobmanager.JobManager$.retryOnBindException(JobManager.scala:2172)
at org.apache.flink.runtime.jobmanager.JobManager$.runJobManager(JobManager.scala:2117)
at org.apache.flink.runtime.jobmanager.JobManager$$anon$10.call(JobManager.scala:1992)
at org.apache.flink.runtime.jobmanager.JobManager$$anon$10.call(JobManager.scala:1990)
at org.apache.flink.runtime.security.HadoopSecurityContext$1.run(HadoopSecurityContext.java:43)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
at org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:40)
at org.apache.flink.runtime.jobmanager.JobManager$.main(JobManager.scala:1990)
at org.apache.flink.runtime.jobmanager.JobManager.main(JobManager.scala)
2018-03-21 09:51:23,993 ERROR org.apache.flink.runtime.clusterframework.standalone.StandaloneResourceManager - Resource manager could not register at JobManager
akka.pattern.AskTimeoutException: Ask timed out on [ActorSelection[Anchor(akka://flink/deadLetters), Path(/)]] after [100000 ms]
at akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:334)
at akka.actor.Scheduler$$anon$7.run(Scheduler.scala:117)
at scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:599)
at scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:109)
at scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:597)
at akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(Scheduler.scala:474)
at akka.actor.LightArrayRevolverScheduler$$anon$8.executeBucket$1(Scheduler.scala:425)
at akka.actor.LightArrayRevolverScheduler$$anon$8.nextTick(Scheduler.scala:429)
at akka.actor.LightArrayRevolverScheduler$$anon$8.run(Scheduler.scala:381)
at java.lang.Thread.run(Thread.java:748)
{code}
The error show "akka://flink/deadLetters"

I search it on google , The most answer is the network not work, or 6123 port is not avaliable, or iptables problems。

I  exclude All above. Finally,I found the different between production environment and the develop environment .

My develop environment, Hosts like this:

192.168.xx.xx  master1

192.168.xx.xx  slave1

192.168.xx.xx  slave2

 

The production environment, hosts like :

192.168.xx.xx  Flink_master

192.168.xx.xx  slaves_01

192.168.xx.xx  slaves_02

 

when I change the production environment hosts to my develop environment, remove the "_".the cluster is back to normal

So I guess the host with"_" can not work for Flink cluster

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)