You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Ufuk Celebi <uc...@apache.org> on 2016/03/10 16:10:16 UTC

Re: TaskManager unable to register with JobManager

Hey Ravinder,

check out the following config keys:

blob.server.port
taskmanager.rpc.port
taskmanager.data.port


– Ufuk


On Wed, Feb 10, 2016 at 4:06 PM, Ravinder Kaur <ne...@gmail.com> wrote:
> Hello Fabian,
>
> Thank you very much for the resource. I had already gone through this and
> have found port '6123' as default for taskmanager registration. But I want
> to know the specific range of ports the taskmanager access during job
> execution.
>
> The taskmanager always tries to access a random port during job execution
> for which I need to disable firewall using 'ufw allow port' during the
> execution, otherwise the job hangs and finally fails. So I  wanted to know a
> particular range of ports which I can specify in the iptables to always
> allow access.
>
>
> Kind Regards,
> Ravinder Kaur
>
> On Wed, Feb 10, 2016 at 2:16 PM, Fabian Hueske <fh...@gmail.com> wrote:
>>
>> Hi Ravinder,
>>
>> please have a look at the configuration documentation:
>>
>> -->
>> https://ci.apache.org/projects/flink/flink-docs-master/setup/config.html#jobmanager-amp-taskmanager
>>
>> Best, Fabian
>>
>> 2016-02-10 13:55 GMT+01:00 Ravinder Kaur <ne...@gmail.com>:
>>>
>>> Hello All,
>>>
>>> I need to know the range of ports that are being used during the
>>> master/slave communication in the Flink cluster. Also is there a way I can
>>> specify a range of ports, at the slaves, to restrict them to connect to
>>> master only in this range?
>>>
>>> Kind Regards,
>>> Ravinder Kaur
>>>
>>>
>>> On Wed, Feb 3, 2016 at 10:09 PM, Stephan Ewen <se...@apache.org> wrote:
>>>>
>>>> Can machines connect to port 6123? The firewall may block that port, put
>>>> permit SSH.
>>>>
>>>> On Wed, Feb 3, 2016 at 9:52 PM, Ravinder Kaur <ne...@gmail.com>
>>>> wrote:
>>>>>
>>>>> Hello,
>>>>>
>>>>> Here is the log file of Jobmanager. I did not see some thing suspicious
>>>>> and as it suggests the ports are also listening.
>>>>>
>>>>> 20:58:46,906 INFO  org.apache.flink.runtime.jobmanager.JobManager
>>>>> - Starting JobManager on IP-of-master:6123 with execution mode CLUSTER and
>>>>> streaming mode BATCH_ONLY
>>>>> 20:58:46,978 INFO  org.apache.flink.runtime.jobmanager.JobManager
>>>>> - Security is not enabled. Starting non-authenticated JobManager.
>>>>> 20:58:46,979 INFO  org.apache.flink.runtime.jobmanager.JobManager
>>>>> - Starting JobManager
>>>>> 20:58:46,980 INFO  org.apache.flink.runtime.jobmanager.JobManager
>>>>> - Starting JobManager actor system at 10.155.208.138:6123
>>>>> 20:58:48,196 INFO  akka.event.slf4j.Slf4jLogger
>>>>> - Slf4jLogger started
>>>>> 20:58:48,295 INFO  Remoting
>>>>> - Starting remoting
>>>>> 20:58:48,541 INFO  Remoting
>>>>> - Remoting started; listening on addresses
>>>>> :[akka.tcp://flink@IP-of-master:6123]
>>>>> 20:58:48,549 INFO  org.apache.flink.runtime.jobmanager.JobManager
>>>>> - Starting JobManger web frontend
>>>>> 20:58:48,690 INFO
>>>>> org.apache.flink.runtime.webmonitor.WebRuntimeMonitor         - Using
>>>>> directory /tmp/flink-web-876a4755-4f38-4ff7-8202-f263afa9b986 for the web
>>>>> interface files
>>>>> 20:58:48,691 INFO
>>>>> org.apache.flink.runtime.webmonitor.WebRuntimeMonitor         - Serving job
>>>>> manager log from
>>>>> /home/flink/flink-0.10.1/log/flink-flink-jobmanager-0-hostname.log
>>>>> 20:58:48,691 INFO
>>>>> org.apache.flink.runtime.webmonitor.WebRuntimeMonitor         - Serving job
>>>>> manager stdout from
>>>>> /home/flink/flink-0.10.1/log/flink-flink-jobmanager-0-hostname.out
>>>>> 20:58:49,044 INFO
>>>>> org.apache.flink.runtime.webmonitor.WebRuntimeMonitor         - Web frontend
>>>>> listening at 0:0:0:0:0:0:0:0:8081
>>>>> 20:58:49,045 INFO  org.apache.flink.runtime.jobmanager.JobManager
>>>>> - Starting JobManager actor
>>>>> 20:58:49,052 INFO  org.apache.flink.runtime.blob.BlobServer
>>>>> - Created BLOB server storage directory
>>>>> /tmp/blobStore-e0c52bfb-2411-4a83-ac8d-5664a5894258
>>>>> 20:58:49,054 INFO  org.apache.flink.runtime.blob.BlobServer
>>>>> - Started BLOB server at 0.0.0.0:43683 - max concurrent requests: 50 - max
>>>>> backlog: 1000
>>>>> 20:58:49,075 INFO  org.apache.flink.runtime.jobmanager.MemoryArchivist
>>>>> - Started memory archivist akka://flink/user/archive
>>>>> 20:58:49,075 INFO  org.apache.flink.runtime.jobmanager.JobManager
>>>>> - Starting JobManager at akka.tcp://flink@IP-of-master:6123/user/jobmanager.
>>>>> 20:58:49,081 INFO  org.apache.flink.runtime.jobmanager.JobManager
>>>>> - JobManager akka.tcp://flink@IP-of-master:6123/user/jobmanager was granted
>>>>> leadership with leader session ID None.
>>>>> 20:58:49,082 INFO
>>>>> org.apache.flink.runtime.webmonitor.WebRuntimeMonitor         - Starting
>>>>> with JobManager akka.tcp://flink@IP-of-master:6123/user/jobmanager on port
>>>>> 8081
>>>>> 20:58:49,083 INFO
>>>>> org.apache.flink.runtime.webmonitor.JobManagerRetriever       - New leader
>>>>> reachable under akka.tcp://flink@IP-of-master:6123/user/jobmanager:null.
>>>>> 20:59:22,794 INFO  org.apache.flink.runtime.jobmanager.JobManager
>>>>> - Submitting job 72733d69588678ec224003ab5577cab8 (Flink Java Job at Wed Feb
>>>>> 03 20:59:22 CET 2016).
>>>>> 20:59:22,853 INFO  org.apache.flink.runtime.jobmanager.JobManager
>>>>> - Scheduling job 72733d69588678ec224003ab5577cab8 (Flink Java Job at Wed Feb
>>>>> 03 20:59:22 CET 2016).
>>>>> 20:59:22,857 INFO  org.apache.flink.runtime.jobmanager.JobManager
>>>>> - Status of job 72733d69588678ec224003ab5577cab8 (Flink Java Job at Wed Feb
>>>>> 03 20:59:22 CET 2016) changed to RUNNING.
>>>>> 20:59:22,859 INFO
>>>>> org.apache.flink.runtime.executiongraph.ExecutionGraph        - CHAIN
>>>>> DataSource (at getDefaultTextLineDataSet(WordCountData.java:70)
>>>>> (org.apache.flink.api.java.io.CollectionInputFormat)) -> FlatMap (FlatMap at
>>>>> main(WordCount.java:69)) -> Combine(SUM(1), at main(WordCount.java:72) (1/1)
>>>>> (23fb37019a504fd6c7bf95e46a8cd7a3) switched from CREATED to SCHEDULED
>>>>> 20:59:22,881 INFO
>>>>> org.apache.flink.runtime.executiongraph.ExecutionGraph        - CHAIN
>>>>> DataSource (at getDefaultTextLineDataSet(WordCountData.java:70)
>>>>> (org.apache.flink.api.java.io.CollectionInputFormat)) -> FlatMap (FlatMap at
>>>>> main(WordCount.java:69)) -> Combine(SUM(1), at main(WordCount.java:72) (1/1)
>>>>> (23fb37019a504fd6c7bf95e46a8cd7a3) switched from SCHEDULED to CANCELED
>>>>> 20:59:22,881 INFO  org.apache.flink.runtime.jobmanager.JobManager
>>>>> - Status of job 72733d69588678ec224003ab5577cab8 (Flink Java Job at Wed Feb
>>>>> 03 20:59:22 CET 2016) changed to FAILING.
>>>>>
>>>>> org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException:
>>>>> Not enough free slots available to run the job. You can decrease the
>>>>> operator parallelism or increase the number of slots per TaskManager in the
>>>>> configuration. Task to schedule: < Attempt #0 (CHAIN DataSource (at
>>>>> getDefaultTextLineDataSet(WordCountData.java:70)
>>>>> (org.apache.flink.api.java.io.CollectionInputFormat)) -> FlatMap (FlatMap at
>>>>> main(WordCount.java:69)) -> Combine(SUM(1), at main(WordCount.java:72)
>>>>> (1/1)) @ (unassigned) - [SCHEDULED] > with groupID <
>>>>> 31e497f2f68c9cee5864c8fddaff3d59 > in sharing group < SlotSharingGroup
>>>>> [f9ed1aab933e061a8ce1ecaa3534f18c, 037bb78a1902f7edea69a978ad7b54ce,
>>>>> 31e497f2f68c9cee5864c8fddaff3d59] >. Resources available to scheduler:
>>>>> Number of instances=0, total number of slots=0, available slots=0
>>>>>         at
>>>>> org.apache.flink.runtime.jobmanager.scheduler.Scheduler.scheduleTask(Scheduler.java:256)
>>>>>         at
>>>>> org.apache.flink.runtime.jobmanager.scheduler.Scheduler.scheduleImmediately(Scheduler.java:131)
>>>>>         at
>>>>> org.apache.flink.runtime.executiongraph.Execution.scheduleForExecution(Execution.java:298)
>>>>>         at
>>>>> org.apache.flink.runtime.executiongraph.ExecutionVertex.scheduleForExecution(ExecutionVertex.java:458)
>>>>>         at
>>>>> org.apache.flink.runtime.executiongraph.ExecutionJobVertex.scheduleAll(ExecutionJobVertex.java:322)
>>>>>         at
>>>>> org.apache.flink.runtime.executiongraph.ExecutionGraph.scheduleForExecution(ExecutionGraph.java:679)
>>>>>         at
>>>>> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply$mcV$sp(JobManager.scala:982)
>>>>>         at
>>>>> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:962)
>>>>>         at
>>>>> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:962)
>>>>>         at
>>>>> scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
>>>>>         at
>>>>> scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
>>>>>         at
>>>>> akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:41)
>>>>>         at
>>>>> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:401)
>>>>>         at
>>>>> scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>>>>>         at
>>>>> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>>>>>         at
>>>>> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>>>>>         at
>>>>> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
>>>>> 20:59:22,886 INFO
>>>>> org.apache.flink.runtime.executiongraph.ExecutionGraph        - CHAIN Reduce
>>>>> (SUM(1), at main(WordCount.java:72) -> FlatMap (collect()) (1/1)
>>>>> (824b6e3771304cd0f92aea4ab763a11d) switched from CREATED to CANCELED
>>>>> 20:59:22,887 INFO
>>>>> org.apache.flink.runtime.executiongraph.ExecutionGraph        - DataSink
>>>>> (collect() sink) (1/1) (1bb64a2edc6f68ad716acd9f8d2d7d67) switched from
>>>>> CREATED to CANCELED
>>>>> 20:59:22,890 INFO  org.apache.flink.runtime.jobmanager.JobManager
>>>>> - Status of job 72733d69588678ec224003ab5577cab8 (Flink Java Job at Wed Feb
>>>>> 03 20:59:22 CET 2016) changed to FAILED.
>>>>>
>>>>> org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException:
>>>>> Not enough free slots available to run the job. You can decrease the
>>>>> operator parallelism or increase the number of slots per TaskManager in the
>>>>> configuration. Task to schedule: < Attempt #0 (CHAIN DataSource (at
>>>>> getDefaultTextLineDataSet(WordCountData.java:70)
>>>>> (org.apache.flink.api.java.io.CollectionInputFormat)) -> FlatMap (FlatMap at
>>>>> main(WordCount.java:69)) -> Combine(SUM(1), at main(WordCount.java:72)
>>>>> (1/1)) @ (unassigned) - [SCHEDULED] > with groupID <
>>>>> 31e497f2f68c9cee5864c8fddaff3d59 > in sharing group < SlotSharingGroup
>>>>> [f9ed1aab933e061a8ce1ecaa3534f18c, 037bb78a1902f7edea69a978ad7b54ce,
>>>>> 31e497f2f68c9cee5864c8fddaff3d59] >. Resources available to scheduler:
>>>>> Number of instances=0, total number of slots=0, available slots=0
>>>>>         at
>>>>> org.apache.flink.runtime.jobmanager.scheduler.Scheduler.scheduleTask(Scheduler.java:256)
>>>>>         at
>>>>> org.apache.flink.runtime.jobmanager.scheduler.Scheduler.scheduleImmediately(Scheduler.java:131)
>>>>>         at
>>>>> org.apache.flink.runtime.executiongraph.Execution.scheduleForExecution(Execution.java:298)
>>>>>         at
>>>>> org.apache.flink.runtime.executiongraph.ExecutionVertex.scheduleForExecution(ExecutionVertex.java:458)
>>>>>         at
>>>>> org.apache.flink.runtime.executiongraph.ExecutionJobVertex.scheduleAll(ExecutionJobVertex.java:322)
>>>>>         at
>>>>> org.apache.flink.runtime.executiongraph.ExecutionGraph.scheduleForExecution(ExecutionGraph.java:679)
>>>>>         at
>>>>> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply$mcV$sp(JobManager.scala:982)
>>>>>         at
>>>>> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:962)
>>>>>         at
>>>>> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:962)
>>>>>         at
>>>>> scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
>>>>>         at
>>>>> scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
>>>>>
>>>>>
>>>>> On Wed, Feb 3, 2016 at 9:27 PM, Robert Metzger <rm...@apache.org>
>>>>> wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> the TaskManager is starting up, but its not able to register at the
>>>>>> job manager. Did you check the JobManager log? Do you see anything
>>>>>> suspicious there? Are the ports matching?
>>>>>>
>>>>>>
>>>>>> On Wed, Feb 3, 2016 at 9:23 PM, Ravinder Kaur <ne...@gmail.com>
>>>>>> wrote:
>>>>>>>
>>>>>>> Hello,
>>>>>>>
>>>>>>> Thank you for pointing it out. I had a little typo while I edited the
>>>>>>> hostname in flink-conf.yaml. I've reset it and the TaskManager started up.
>>>>>>> But I still can't run the WordCount example and it throws the same
>>>>>>> NoResourceAvaliableException.
>>>>>>>
>>>>>>> Caused by:
>>>>>>> org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableExce
>>>>>>> ption: Not enough free slots available to run the job. You can decrease the
>>>>>>> oper
>>>>>>> ator parallelism or increase the number of slots per TaskManager in the
>>>>>>> configur
>>>>>>> ation. Task to schedule: < Attempt #0 (CHAIN DataSource (at
>>>>>>> getDefaultTextLineDa
>>>>>>> taSet(WordCountData.java:70)
>>>>>>> (org.apache.flink.api.java.io.CollectionInputFormat
>>>>>>> )) -> FlatMap (FlatMap at main(WordCount.java:69)) -> Combine(SUM(1), at
>>>>>>> main(Wo
>>>>>>> rdCount.java:72) (1/1)) @ (unassigned) - [SCHEDULED] > with groupID <
>>>>>>> 31e497f2f6
>>>>>>> 8c9cee5864c8fddaff3d59 > in sharing group < SlotSharingGroup
>>>>>>> [f9ed1aab933e061a8c
>>>>>>> e1ecaa3534f18c, 037bb78a1902f7edea69a978ad7b54ce,
>>>>>>> 31e497f2f68c9cee5864c8fddaff3d
>>>>>>> 59] >. Resources available to scheduler: Number of instances=0, total number
>>>>>>> of
>>>>>>> slots=0, available slots=0
>>>>>>>         at
>>>>>>> org.apache.flink.runtime.jobmanager.scheduler.Scheduler.scheduleTask(
>>>>>>> Scheduler.java:256)
>>>>>>>         at
>>>>>>> org.apache.flink.runtime.jobmanager.scheduler.Scheduler.scheduleImmed
>>>>>>> iately(Scheduler.java:131)
>>>>>>>         at
>>>>>>> org.apache.flink.runtime.executiongraph.Execution.scheduleForExecutio
>>>>>>> n(Execution.java:298)
>>>>>>>         at
>>>>>>> org.apache.flink.runtime.executiongraph.ExecutionVertex.scheduleForEx
>>>>>>> ecution(ExecutionVertex.java:458)
>>>>>>>         at
>>>>>>> org.apache.flink.runtime.executiongraph.ExecutionJobVertex.scheduleAl
>>>>>>> l(ExecutionJobVertex.java:322)
>>>>>>>         at
>>>>>>> org.apache.flink.runtime.executiongraph.ExecutionGraph.scheduleForExe
>>>>>>> cution(ExecutionGraph.java:679)
>>>>>>>         at
>>>>>>> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$fl
>>>>>>> ink$runtime$jobmanager$JobManager$$submitJob$1.apply$mcV$sp(JobManager.scala:982
>>>>>>> )
>>>>>>>         at
>>>>>>> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$fl
>>>>>>> ink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:962)
>>>>>>>         at
>>>>>>> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$fl
>>>>>>> ink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:962)
>>>>>>>         ... 8 more
>>>>>>>
>>>>>>> The log of TaskManager again has the same errors as before.
>>>>>>>
>>>>>>> 20:58:58,457 INFO  org.apache.flink.runtime.net.ConnectionUtils
>>>>>>> - Failed to connect from address '/slave-IP': connect timed out
>>>>>>> 20:58:58,458 INFO  org.apache.flink.runtime.net.ConnectionUtils
>>>>>>> - Failed to connect from address '/0:0:0:0:0:0:0:1%1': Network is
>>>>>>> unreachable
>>>>>>> 20:58:58,458 INFO  org.apache.flink.runtime.net.ConnectionUtils
>>>>>>> - Failed to connect from address '/127.0.0.1': Invalid argument
>>>>>>> 20:58:59,048 WARN  org.apache.flink.runtime.net.ConnectionUtils
>>>>>>> - Could not connect to /master-IP:6123. Selecting a local address using
>>>>>>> heuristics.
>>>>>>> 20:58:59,050 INFO  org.apache.flink.runtime.taskmanager.TaskManager
>>>>>>> - TaskManager will use hostname/address 'hostname-of-slave' (slave-IP) for
>>>>>>> communication.
>>>>>>> 20:58:59,051 INFO  org.apache.flink.runtime.taskmanager.TaskManager
>>>>>>> - Starting TaskManager in streaming mode BATCH_ONLY
>>>>>>> 20:58:59,052 INFO  org.apache.flink.runtime.taskmanager.TaskManager
>>>>>>> - Starting TaskManager actor system at slave_IP:0
>>>>>>> 20:58:59,776 INFO  akka.event.slf4j.Slf4jLogger
>>>>>>> - Slf4jLogger started
>>>>>>> 20:58:59,842 INFO  Remoting
>>>>>>> - Starting remoting
>>>>>>> 20:59:00,094 INFO  Remoting
>>>>>>> - Remoting started; listening on addresses
>>>>>>> :[akka.tcp://flink@slave-IP:33813]
>>>>>>> 20:59:00,100 INFO  org.apache.flink.runtime.taskmanager.TaskManager
>>>>>>> - Starting TaskManager actor
>>>>>>> 20:59:00,125 INFO
>>>>>>> org.apache.flink.runtime.io.network.netty.NettyConfig         - NettyConfig
>>>>>>> [server address: hostname-of-master/master-IP, server port: 49030, memory
>>>>>>> segment size (bytes): 32768, transport type: NIO, number of server threads:
>>>>>>> 0 (use Netty's default), number of client threads: 0 (use Netty's default),
>>>>>>> server connect backlog: 0 (use Netty's default), client connect timeout
>>>>>>> (sec): 120, send/receive buffer size (bytes): 0 (use Netty's default)]
>>>>>>> 20:59:00,131 INFO  org.apache.flink.runtime.taskmanager.TaskManager
>>>>>>> - Messages between TaskManager and JobManager have a max timeout of 100000
>>>>>>> milliseconds
>>>>>>> 20:59:00,142 INFO  org.apache.flink.runtime.taskmanager.TaskManager
>>>>>>> - Temporary file directory '/tmp': total 4 GB, usable 1 GB (25.00% usable)
>>>>>>> 20:59:00,210 INFO
>>>>>>> org.apache.flink.runtime.io.network.buffer.NetworkBufferPool  - Allocated 64
>>>>>>> MB for network buffer pool (number of memory segments: 2048, bytes per
>>>>>>> segment: 32768).
>>>>>>> 20:59:00,323 INFO  org.apache.flink.runtime.taskmanager.TaskManager
>>>>>>> - Using 0.7 of the currently free heap space for Flink managed heap memory
>>>>>>> (293 MB).
>>>>>>> 20:59:00,565 INFO
>>>>>>> org.apache.flink.runtime.io.disk.iomanager.IOManager          - I/O manager
>>>>>>> uses directory /tmp/flink-io-c7796b82-6676-4604-97fd-df09001a84e8 for spill
>>>>>>> files.
>>>>>>> 20:59:00,578 INFO  org.apache.flink.runtime.filecache.FileCache
>>>>>>> - User file cache uses directory
>>>>>>> /tmp/flink-dist-cache-13ed3e76-cf1e-46fa-9ba2-5177e801429e
>>>>>>> 20:59:00,908 INFO  org.apache.flink.runtime.taskmanager.TaskManager
>>>>>>> - Starting TaskManager actor at akka://flink/user/taskmanager#-157676733.
>>>>>>> 20:59:00,908 INFO  org.apache.flink.runtime.taskmanager.TaskManager
>>>>>>> - TaskManager data connection information: hostname-of-master
>>>>>>> (dataPort=49030)
>>>>>>> 20:59:00,909 INFO  org.apache.flink.runtime.taskmanager.TaskManager
>>>>>>> - TaskManager has 1 task slot(s).
>>>>>>> 20:59:00,910 INFO  org.apache.flink.runtime.taskmanager.TaskManager
>>>>>>> - Memory usage stats: [HEAP: 376/491/491 MB, NON HEAP: 24/49/304 MB
>>>>>>> (used/committed/max)]
>>>>>>> 20:59:00,917 INFO  org.apache.flink.runtime.taskmanager.TaskManager
>>>>>>> - Trying to register at JobManager
>>>>>>> akka.tcp://flink@master-IP:6123/user/jobmanager (attempt 1, timeout: 500
>>>>>>> milliseconds)
>>>>>>> 20:59:01,443 INFO  org.apache.flink.runtime.taskmanager.TaskManager
>>>>>>> - Trying to register at JobManager
>>>>>>> akka.tcp://flink@master-IP:6123/user/jobmanager (attempt 2, timeout: 1000
>>>>>>> milliseconds)
>>>>>>> 20:59:02,873 INFO  org.apache.flink.runtime.taskmanager.TaskManager
>>>>>>> - Trying to register at JobManager
>>>>>>> akka.tcp://flink@master-IP:6123/user/jobmanager (attempt 3, timeout: 2000
>>>>>>> milliseconds)
>>>>>>> 20:59:04,893 INFO  org.apache.flink.runtime.taskmanager.TaskManager
>>>>>>> - Trying to register at JobManager
>>>>>>> akka.tcp://flink@master-IP:6123/user/jobmanager (attempt 4, timeout: 4000
>>>>>>> milliseconds)
>>>>>>> 20:59:08,914 INFO  org.apache.flink.runtime.taskmanager.TaskManager
>>>>>>> - Trying to register at JobManager
>>>>>>> akka.tcp://flink@master-IP:6123/user/jobmanager (attempt 5, timeout: 8000
>>>>>>> milliseconds)
>>>>>>>
>>>>>>>
>>>>>>> Kind Regards,
>>>>>>> Ravinder Kaur
>>>>>>>
>>>>>>> On Wed, Feb 3, 2016 at 8:12 PM, Stephan Ewen <se...@apache.org>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>> This looks like the reason:
>>>>>>>>
>>>>>>>> java.net.UnknownHostException: Cannot resolve the JobManager
>>>>>>>> hostname 'hostname-of-master' specified in the configuration
>>>>>>>>
>>>>>>>> On Wed, Feb 3, 2016 at 7:29 PM, Ravinder Kaur <ne...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> Hello,
>>>>>>>>>
>>>>>>>>> The log file of the Taskmanager now shows the following
>>>>>>>>>
>>>>>>>>> 18:27:10,082 WARN  org.apache.hadoop.util.NativeCodeLoader
>>>>>>>>> - Unable to load native-hadoop library for your platform... using
>>>>>>>>> builtin-java classes where applicable
>>>>>>>>> 18:27:10,244 INFO  org.apache.flink.runtime.taskmanager.TaskManager
>>>>>>>>> -
>>>>>>>>> --------------------------------------------------------------------------------
>>>>>>>>> 18:27:10,244 INFO  org.apache.flink.runtime.taskmanager.TaskManager
>>>>>>>>> -  Starting TaskManager (Version: 0.10.1, Rev:2e9b231, Date:22.11.2015 @
>>>>>>>>> 12:41:12 CET)
>>>>>>>>> 18:27:10,244 INFO  org.apache.flink.runtime.taskmanager.TaskManager
>>>>>>>>> -  Current user: flink
>>>>>>>>> 18:27:10,245 INFO  org.apache.flink.runtime.taskmanager.TaskManager
>>>>>>>>> -  JVM: OpenJDK 64-Bit Server VM - Oracle Corporation - 1.7/24.91-b01
>>>>>>>>> 18:27:10,245 INFO  org.apache.flink.runtime.taskmanager.TaskManager
>>>>>>>>> -  Maximum heap size: 491 MiBytes
>>>>>>>>> 18:27:10,245 INFO  org.apache.flink.runtime.taskmanager.TaskManager
>>>>>>>>> -  JAVA_HOME: /usr/lib/jvm/java-1.7.0-openjdk-amd64
>>>>>>>>> 18:27:10,247 INFO  org.apache.flink.runtime.taskmanager.TaskManager
>>>>>>>>> -  Hadoop version: 2.7.0
>>>>>>>>> 18:27:10,247 INFO  org.apache.flink.runtime.taskmanager.TaskManager
>>>>>>>>> -  JVM Options:
>>>>>>>>> 18:27:10,247 INFO  org.apache.flink.runtime.taskmanager.TaskManager
>>>>>>>>> -     -Xms512M
>>>>>>>>> 18:27:10,247 INFO  org.apache.flink.runtime.taskmanager.TaskManager
>>>>>>>>> -     -Xmx512M
>>>>>>>>> 18:27:10,248 INFO  org.apache.flink.runtime.taskmanager.TaskManager
>>>>>>>>> -     -XX:MaxDirectMemorySize=8388607T
>>>>>>>>> 18:27:10,248 INFO  org.apache.flink.runtime.taskmanager.TaskManager
>>>>>>>>> -     -XX:MaxPermSize=256m
>>>>>>>>> 18:27:10,248 INFO  org.apache.flink.runtime.taskmanager.TaskManager
>>>>>>>>> -
>>>>>>>>> -Dlog.file=/home/flink/flink-0.10.1/log/flink-flink-taskmanager-0-vm-10-155-208-137.cloud.mwn.de.log
>>>>>>>>> 18:27:10,248 INFO  org.apache.flink.runtime.taskmanager.TaskManager
>>>>>>>>> -
>>>>>>>>> -Dlog4j.configuration=file:/home/flink/flink-0.10.1/conf/log4j.properties
>>>>>>>>> 18:27:10,248 INFO  org.apache.flink.runtime.taskmanager.TaskManager
>>>>>>>>> -
>>>>>>>>> -Dlogback.configurationFile=file:/home/flink/flink-0.10.1/conf/logback.xml
>>>>>>>>> 18:27:10,248 INFO  org.apache.flink.runtime.taskmanager.TaskManager
>>>>>>>>> -  Program Arguments:
>>>>>>>>> 18:27:10,248 INFO  org.apache.flink.runtime.taskmanager.TaskManager
>>>>>>>>> -     --configDir
>>>>>>>>> 18:27:10,248 INFO  org.apache.flink.runtime.taskmanager.TaskManager
>>>>>>>>> -     /home/flink/flink-0.10.1/conf
>>>>>>>>> 18:27:10,248 INFO  org.apache.flink.runtime.taskmanager.TaskManager
>>>>>>>>> -     --streamingMode
>>>>>>>>> 18:27:10,248 INFO  org.apache.flink.runtime.taskmanager.TaskManager
>>>>>>>>> -     batch
>>>>>>>>> 18:27:10,248 INFO  org.apache.flink.runtime.taskmanager.TaskManager
>>>>>>>>> -  Classpath:
>>>>>>>>> /home/flink/flink-0.10.1/lib/flink-dist_2.11-0.10.1.jar:/home/flink/flink-0.10.1/lib/flink-python_2.11-0.10.1.jar:/home/flink/flink-0.10.1/lib/log4j-1.2.17.jar:/home/flink/flink-0.10.1/lib/slf4j-log4j12-1.7.7.jar:/usr/lib/jvm/java-1.7.0-openjdk-amd64/lib/tools.jar::
>>>>>>>>> 18:27:10,248 INFO  org.apache.flink.runtime.taskmanager.TaskManager
>>>>>>>>> -
>>>>>>>>> --------------------------------------------------------------------------------
>>>>>>>>> 18:27:10,252 INFO  org.apache.flink.runtime.taskmanager.TaskManager
>>>>>>>>> - Maximum number of open file descriptors is 4096
>>>>>>>>> 18:27:10,277 INFO  org.apache.flink.runtime.taskmanager.TaskManager
>>>>>>>>> - Loading configuration from /home/flink/flink-0.10.1/conf
>>>>>>>>> 18:27:10,356 INFO  org.apache.flink.runtime.taskmanager.TaskManager
>>>>>>>>> - Security is not enabled. Starting non-authenticated TaskManager.
>>>>>>>>> 18:27:10,365 ERROR org.apache.flink.runtime.taskmanager.TaskManager
>>>>>>>>> - Failed to run TaskManager.
>>>>>>>>> java.net.UnknownHostException: Cannot resolve the JobManager
>>>>>>>>> hostname 'hostname-of-master' specified in the configuration
>>>>>>>>>         at
>>>>>>>>> org.apache.flink.runtime.util.StandaloneUtils.createLeaderRetrievalService(StandaloneUtils.java:79)
>>>>>>>>>         at
>>>>>>>>> org.apache.flink.runtime.util.StandaloneUtils.createLeaderRetrievalService(StandaloneUtils.java:48)
>>>>>>>>>         at
>>>>>>>>> org.apache.flink.runtime.util.LeaderRetrievalUtils.createLeaderRetrievalService(LeaderRetrievalUtils.java:69)
>>>>>>>>>         at
>>>>>>>>> org.apache.flink.runtime.taskmanager.TaskManager$.selectNetworkInterfaceAndPort(TaskManager.scala:1351)
>>>>>>>>>         at
>>>>>>>>> org.apache.flink.runtime.taskmanager.TaskManager$.selectNetworkInterfaceAndRunTaskManager(TaskManager.scala:1328)
>>>>>>>>>         at
>>>>>>>>> org.apache.flink.runtime.taskmanager.TaskManager$.main(TaskManager.scala:1240)
>>>>>>>>>         at
>>>>>>>>> org.apache.flink.runtime.taskmanager.TaskManager.main(TaskManager.scala)
>>>>>>>>>
>>>>>>>>> Kind Regards,
>>>>>>>>> Ravinder Kaur
>>>>>>>>>
>>>>>>>>> On Wed, Feb 3, 2016 at 7:19 PM, Stephan Ewen <se...@apache.org>
>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>> What do the TaskManger logs say?
>>>>>>>>>>
>>>>>>>>>> On Wed, Feb 3, 2016 at 6:34 PM, Ravinder Kaur
>>>>>>>>>> <ne...@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>> Hello,
>>>>>>>>>>>
>>>>>>>>>>> Thanks for the quick reply. I tried to set jobmanager.rpc.address
>>>>>>>>>>> in flink-conf.yaml to the hostname of master node on both the nodes.
>>>>>>>>>>>
>>>>>>>>>>> Now it does not start the Taskmanager at the worker node at all.
>>>>>>>>>>> When I start the cluster using ./bin/start-cluster.sh on master it shows the
>>>>>>>>>>> normal output of starting the Jobmanager and Taskmanager but when I run jps
>>>>>>>>>>> on the nodes the slave does not have the Taskmanager running.
>>>>>>>>>>>
>>>>>>>>>>> Running the WordCount example again fails showing the same error.
>>>>>>>>>>> Stopping the cluster says no taskmanager to stop.
>>>>>>>>>>>
>>>>>>>>>>> Kind Regards,
>>>>>>>>>>> Ravinder Kaur
>>>>>>>>>>>
>>>>>>>>>>> On Wed, Feb 3, 2016 at 5:47 PM, Stephan Ewen <se...@apache.org>
>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> Looks like the network configuration is not correct.
>>>>>>>>>>>>
>>>>>>>>>>>> I would try setting the full host name (like
>>>>>>>>>>>> "master.abc.xyz.com") as jobmanager.rpc.address.
>>>>>>>>>>>>
>>>>>>>>>>>> Greetings,
>>>>>>>>>>>> Stephan
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Wed, Feb 3, 2016 at 5:43 PM, Ravinder Kaur
>>>>>>>>>>>> <ne...@gmail.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Hello Community,
>>>>>>>>>>>>>
>>>>>>>>>>>>> I'm a student and new to Apache Flink. I'm trying to learn and
>>>>>>>>>>>>> have setup a 2- node standalone Flink(0.10.1) cluster (one master and one
>>>>>>>>>>>>> worker). I'm facing the following issue.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Cluster: consists of 2 vms (one master and one worker)
>>>>>>>>>>>>>
>>>>>>>>>>>>> The configurations are done as per
>>>>>>>>>>>>> https://ci.apache.org/projects/flink/flink-docs-release-0.10/setup/cluster_setup.html
>>>>>>>>>>>>>
>>>>>>>>>>>>> When I start the cluster both the JobManager and the
>>>>>>>>>>>>> TaskManager are started on the master and worker respectively.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Command to start the cluster : bin/start-cluster.sh
>>>>>>>>>>>>>
>>>>>>>>>>>>> JPS shows all the processes running.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Then I run the following command to run a WordCount example
>>>>>>>>>>>>> job: ./bin/flink run ./examples/WordCount.jar
>>>>>>>>>>>>>
>>>>>>>>>>>>> the result is attached with the mail.
>>>>>>>>>>>>>
>>>>>>>>>>>>> The error is
>>>>>>>>>>>>> org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailabeException:
>>>>>>>>>>>>> Not enough free slots available to run to run the job
>>>>>>>>>>>>> ....................... Resources available to scheduler: Number of
>>>>>>>>>>>>> instances=0, total number of slots= 0, available slots=0
>>>>>>>>>>>>>
>>>>>>>>>>>>> Therefore I suppose that the JobManager does not find the
>>>>>>>>>>>>> TaskManager and checked the logs of the TaskManager which indeed shows that
>>>>>>>>>>>>> the TaskManager is unable to register at the JobManager for quite a long
>>>>>>>>>>>>> time. There are org.apache.flink.runtime.net.ConnectionUtils: Failed to
>>>>>>>>>>>>> connect from localhost: Connect timed out and
>>>>>>>>>>>>> org.apache.flink.runtime.net.ConnectionUtils: Failed to connect from address
>>>>>>>>>>>>> localhost: Network is Unreachable messages in the log of the TaskManager.
>>>>>>>>>>>>> Later when it starts up after a number of attempts and tries to register at
>>>>>>>>>>>>> the JobManager, which also fails after a lot of attempts showing the
>>>>>>>>>>>>> following message org.apache.flink.runtime.taskmanager.Taskmanager: Trying
>>>>>>>>>>>>> to register at JobManager akka.tcp://flink@master:6123/user'/jobmanager
>>>>>>>>>>>>> (attempt:92, timeout:30seconds) and
>>>>>>>>>>>>> org.apache.flink.runtime.taskmanager.Taskmanager: Tried to associate with
>>>>>>>>>>>>> unreachable remote host [akka.tcp://flink@master:6123/user/jobmanager].
>>>>>>>>>>>>> Address is now gated for 5000ms, all messages to this address will be
>>>>>>>>>>>>> delivered to dead letters. Reason: Connection timed out: /master:6123
>>>>>>>>>>>>>
>>>>>>>>>>>>> I browsed the internet for these and found
>>>>>>>>>>>>> http://stackoverflow.com/questions/33601020/flink-job-wont-run-with-higher-taskmanager-heap-mb
>>>>>>>>>>>>> and https://issues.apache.org/jira/browse/FLINK-1119 these
>>>>>>>>>>>>> links helpful. Stephan Ewen the guy who provided the solution in both the
>>>>>>>>>>>>> links gives a good explanation that the TaskManagers take quite some time to
>>>>>>>>>>>>> register at the JobManager and therefore I waited for as long as 20 mins
>>>>>>>>>>>>> after starting the cluster to run the job. But even after waiting so long I
>>>>>>>>>>>>> get the same error.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Another suggestion was to run the cluster in streaming mode. So
>>>>>>>>>>>>> I tried it with the command : bin/start-cluster-streaming.sh and ran the job
>>>>>>>>>>>>> but I get the same error. I have rechecked all the configurations but I'm
>>>>>>>>>>>>> unable to find out the fault.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I re-checked all the configurations but could not find anything
>>>>>>>>>>>>> wrong. Also checked the port 6123 on master which is in LISTEN state and tcp
>>>>>>>>>>>>> request from worker to master shows SYN_SENT state using netstat -na and
>>>>>>>>>>>>> lsof -i commands.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I opened the webpage on master http://localhost:8081 but it
>>>>>>>>>>>>> shows nothing and localhost:8080 says connection refused.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Kindly help me out as it is very important for me. Let me know
>>>>>>>>>>>>> if you have any questions.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Kind Regards,
>>>>>>>>>>>>> Ravinder Kaur
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>