You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flink.apache.org by Deepak Jha <dk...@gmail.com> on 2016/03/10 06:02:37 UTC

Flink-1.0.0 JobManager is not running in Docker Container on AWS

Hi All,

I'm trying to setup Flink 1.0.0 cluster on Docker (separate containers for
jobmanager and taskmanager) inside AWS (Using AWS ECS service). I tested it
locally and its working fine but on AWS Docker, I am running into following
issue

*2016-03-09 18:04:12,114 PST [INFO]  ec2-52-3-248-202.compute-1.ama [main]
o.a.f.runtime.jobmanager.JobManager - Starting JobManager with
high-availability*
*2016-03-09 18:04:12,118 PST [INFO]  ec2-52-3-248-202.compute-1.ama [main]
o.a.f.runtime.jobmanager.JobManager - Starting JobManager on
172.31.63.152:8079 <http://172.31.63.152:8079> with execution mode CLUSTER*
*2016-03-09 18:04:12,172 PST [INFO]  ec2-52-3-248-202.compute-1.ama [main]
o.a.f.runtime.jobmanager.JobManager - Security is not enabled. Starting
non-authenticated JobManager.*
*2016-03-09 18:04:12,174 PST [DEBUG] ec2-52-3-248-202.compute-1.ama [main]
org.apache.flink.util.NetUtils - Trying to open socket on port 8079*
*2016-03-09 18:04:12,176 PST [DEBUG] ec2-52-3-248-202.compute-1.ama [main]
org.apache.flink.util.NetUtils - Unable to allocate socket on port*
*java.net.BindException: Cannot assign requested address*
*    at java.net.PlainSocketImpl.socketBind(Native Method)*
*    at
java.net.AbstractPlainSocketImpl.bind(AbstractPlainSocketImpl.java:387)*
*    at java.net.ServerSocket.bind(ServerSocket.java:375)*
*    at java.net.ServerSocket.<init>(ServerSocket.java:237)*
*    at
org.apache.flink.runtime.jobmanager.JobManager$$anonfun$2$$anon$3.createSocket(JobManager.scala:1722)*
*    at
org.apache.flink.util.NetUtils.createSocketFromPorts(NetUtils.java:237)*
*    at
org.apache.flink.runtime.jobmanager.JobManager$$anonfun$2.apply$mcV$sp(JobManager.scala:1719)*
*    at
org.apache.flink.runtime.jobmanager.JobManager$$anonfun$2.apply(JobManager.scala:1717)*
*    at
org.apache.flink.runtime.jobmanager.JobManager$$anonfun$2.apply(JobManager.scala:1717)*
*    at scala.util.Try$.apply(Try.scala:192)*
*    at
org.apache.flink.runtime.jobmanager.JobManager$.retryOnBindException(JobManager.scala:1772)*
*    at
org.apache.flink.runtime.jobmanager.JobManager$.runJobManager(JobManager.scala:1717)*
*    at
org.apache.flink.runtime.jobmanager.JobManager$.main(JobManager.scala:1653)*
*    at
org.apache.flink.runtime.jobmanager.JobManager.main(JobManager.scala)*
*2016-03-09 18:04:12,180 PST [ERROR] ec2-52-3-248-202.compute-1.ama [main]
o.a.f.runtime.jobmanager.JobManager - Failed to run JobManager.*
*java.lang.RuntimeException: Unable to do further retries starting the
actor system*
*    at
org.apache.flink.runtime.jobmanager.JobManager$.retryOnBindException(JobManager.scala:1777)*
*    at
org.apache.flink.runtime.jobmanager.JobManager$.runJobManager(JobManager.scala:1717)*
*    at
org.apache.flink.runtime.jobmanager.JobManager$.main(JobManager.scala:1653)*
*    at
org.apache.flink.runtime.jobmanager.JobManager.main(JobManager.scala)*
*2016-03-09 18:04:12,991 PST [DEBUG] ec2-52-3-248-202.compute-1.ama [main]
o.a.h.m.lib.MutableMetricsFactory - field
org.apache.hadoop.metrics2.lib.MutableRate
org.apache.hadoop.security.UserGroupInformation$UgiMetrics.loginSuccess
with annotation @org.apache.hadoop.metrics2.annotation.Metric(about=,
sampleName=Ops, always=false, type=DEFAULT, value=[Rate of successful
kerberos logins and latency (milliseconds)], valueName=Time)*


Initially Jobmanager tries to bind to port 0 which did not work. On looking
further into it, I tried using recovery jobmanager port using different
port combinations, but it does not seems to be working... I've exposed the
ports in the docker compose file as well....


PFA the jobmanager log file for details also the jobmanager config file...
-- 
Thanks,
Deepak Jha

Re: Flink-1.0.0 JobManager is not running in Docker Container on AWS

Posted by Deepak Jha <dk...@gmail.com>.
Hi Maximilian,
Thanks for the email and looking into the issue. I'm using Scala 2.11 so it
sounds perfect to me...
I will be more than happy to test it out.

On Tue, Mar 22, 2016 at 2:48 AM, Maximilian Michels <mx...@apache.org> wrote:

> Hi Deepak,
>
> We have looked further into this and have a pretty easy fix. However,
> it will only work with Flink's Scala 2.11 version because newer
> versions of the Akka library are incompatible with Scala 2.10 (Flink's
> default Scala version). Would that be a viable option for you?
>
> We're currently discussing this here:
> https://issues.apache.org/jira/browse/FLINK-2821
>
> Best,
> Max
>
> On Mon, Mar 14, 2016 at 4:49 PM, Deepak Jha <dk...@gmail.com> wrote:
> > Hi Maximilian,
> > Thanks for your response. I will wait for the update.
> >
> > On Monday, March 14, 2016, Maximilian Michels <mx...@apache.org> wrote:
> >
> >> Hi Deepak,
> >>
> >> We'll look more into this problem this week. Until now we considered it
> a
> >> configuration issue if the bind address was not externally reachable.
> >> However, one might not always have the possibility to change this
> network
> >> configuration.
> >>
> >> Looking further, it is actually possible to let the bind address be
> >> different from the advertised address. From the Akka FAQ at
> >> http://doc.akka.io/docs/akka/2.4.1/additional/faq.html:
> >>
> >> If you are running an ActorSystem under a NAT or inside a docker
> container,
> >> > make sure to set akka.remote.netty.tcp.hostname and
> >> > akka.remote.netty.tcp.port to the address it is reachable at from
> other
> >> > ActorSystems. If you need to bind your network interface to a
> different
> >> > address - use akka.remote.netty.tcp.bind-hostname and
> >> > akka.remote.netty.tcp.bind-port settings. Also make sure your network
> is
> >> > configured to translate from the address your ActorSystem is
> reachable at
> >> > to the address your ActorSystem network interface is bound to.
> >> >
> >>
> >> It looks like we have to expose this configuration to users who have a
> >> special network setup.
> >>
> >> Best,
> >> Max
> >>
> >> On Mon, Mar 14, 2016 at 5:42 AM, Deepak Jha <dkjhanitt@gmail.com
> >> <javascript:;>> wrote:
> >>
> >> > Hi Stephan & Ufuk,
> >> > Thanks for your response.
> >> >
> >> > Yes there is a way in which you can run docker (net = host mode) in
> which
> >> > guest machine's network stack gets shared by docker container.
> >> > Unfortunately its not supported by AWS ECS.
> >> >
> >> > I do have one more question for you. Can you guys please explain me
> what
> >> > happens when taskmanager's register themselves to jobmanager in HA
> mode?
> >> > Does each taskmanager gets connected to jobmanager on separate port ?
> The
> >> > reason I'm asking is because if I run 2 taskmanager's (on separate
> docker
> >> > container), they are able to attach themselves to the Jobmanager
> (another
> >> > docker container) ( Flink HA setup using remote zk cluster) but soon
> >> after
> >> > that they get disconnected. Logs are not very helpful either... I
> suspect
> >> > that each taskmanager gets connected on new port and since by default
> >> > docker does not expose all ports, this may happen.... I do not see
> this
> >> > happen when I do not use docker container....
> >> >
> >> > Here is the log file that I saw in jobmanager....
> >> >
> >> > 2016-03-12 08:55:55,010 PST [INFO]  ec2-54-173-231-120.compute-1.a
> >> > [flink-akka.actor.default-dispatcher-20]
> >> o.a.f.r.instance.InstanceManager -
> >> > Registered TaskManager at 5673db03e679 (akka.tcp://
> >> > flink@172.17.0.3:6121/user/taskmanager) as
> >> > 7eafcfddd6bd084f2ec5a32594603f4f. Current number of registered hosts
> >> > is 1. *Current
> >> > number of alive task slots is 1.*
> >> > 2016-03-12 08:57:42,676 PST [INFO]  ec2-54-173-231-120.compute-1.a
> >> > [flink-akka.actor.default-dispatcher-20]
> >> o.a.f.r.instance.InstanceManager -
> >> > Registered TaskManager at 7200a7da4da7 (akka.tcp://
> >> > flink@172.17.0.3:6121/user/taskmanager) as
> >> > 320338e15a7a44ee64dc03a40f04fcd7. Current number of registered hosts
> >> > is 2. *Current
> >> > number of alive task slots is 2.*
> >> > 2016-03-12 08:57:48,422 PST [INFO]  ec2-54-173-231-120.compute-1.a
> >> > [flink-akka.actor.default-dispatcher-20]
> >> > o.a.f.runtime.jobmanager.JobManager - Task manager akka.tcp://
> >> > flink@172.17.0.3:6121/user/taskmanager terminated.
> >> > 2016-03-12 08:57:48,422 PST [INFO]  ec2-54-173-231-120.compute-1.a
> >> > [flink-akka.actor.default-dispatcher-20]
> o.a.f.r.instance.InstanceManager
> >> > -*
> >> > Unregistered task manager akka.tcp://
> >> > flink@172.17.0.3:6121/user/taskmanager
> >> > <http://flink@172.17.0.3:6121/user/taskmanager>. Number of registered
> >> task
> >> > managers 1. Number of available slots 1.*
> >> > 2016-03-12 08:58:01,417 PST [WARN]  ec2-54-173-231-120.compute-1.a
> >> > [flink-akka.actor.default-dispatcher-20]
> >> > a.remote.ReliableDeliverySupervisor - Association with remote system
> >> > [akka.tcp://flink@172.17.0.3:6121] has failed, address is now gated
> for
> >> > [5000] ms. Reason is: [Disassociated].
> >> > 2016-03-12 08:58:01,451 PST [INFO]  ec2-54-173-231-120.compute-1.a
> >> > [flink-akka.actor.default-dispatcher-20]
> >> > o.a.f.runtime.jobmanager.JobManager - Task manager akka.tcp://
> >> > flink@172.17.0.3:6121/user/taskmanager wants to disconnect, because
> >> > TaskManager akka://flink/user/taskmanager is disassociating.
> >> > 2016-03-12 08:58:01,451 PST [INFO]  ec2-54-173-231-120.compute-1.a
> >> > [flink-akka.actor.default-dispatcher-20]
> >> > o.a.f.r.instance.InstanceManager - *Unregistered
> >> > task manager akka.tcp://flink@172.17.0.3:6121/user/taskmanager
> >> > <http://flink@172.17.0.3:6121/user/taskmanager>. Number of registered
> >> task
> >> > managers 0. Number of available slots 0.*
> >> > 2016-03-12 08:58:01,465 PST [INFO]  ec2-54-173-231-120.compute-1.a
> >> > [flink-akka.actor.default-dispatcher-20]
> >> > o.a.f.r.instance.InstanceManager - *Registered
> >> > TaskManager at 7200a7da4da7
> >> > (akka.tcp://flink@172.17.0.3:6121/user/taskmanager
> >> > <http://flink@172.17.0.3:6121/user/taskmanager>) as
> >> > b5dbbc829854afa3ec5d8f0b6f9dbd03. Current number of registered hosts
> is
> >> 1.
> >> > Current number of alive task slots is 1.*
> >> > 2016-03-12 08:58:03,383 PST [INFO]  ec2-54-173-231-120.compute-1.a
> >> > [flink-akka.actor.default-dispatcher-20]
> >> > o.a.f.runtime.jobmanager.JobManager - Task manager akka.tcp://
> >> > flink@172.17.0.3:6121/user/taskmanager terminated.
> >> > 2016-03-12 08:58:03,384 PST [INFO]  ec2-54-173-231-120.compute-1.a
> >> > [flink-akka.actor.default-dispatcher-20]
> o.a.f.r.instance.InstanceManager
> >> > -*
> >> > Unregistered task manager akka.tcp://
> >> > flink@172.17.0.3:6121/user/taskmanager
> >> > <http://flink@172.17.0.3:6121/user/taskmanager>. Number of registered
> >> task
> >> > managers 0. Number of available slots 0.*
> >> > 2016-03-12 08:58:04,988 PST [INFO]  ec2-54-173-231-120.compute-1.a
> >> > [flink-akka.actor.default-dispatcher-20]
> >> o.a.f.r.instance.InstanceManager -
> >> > Registering TaskManager at akka.tcp://
> >> > flink@172.17.0.3:6121/user/taskmanager
> >> > which was marked as dead earlier because of a heart-beat timeout.
> >> > 2016-03-12 08:58:04,988 PST [INFO]  ec2-54-173-231-120.compute-1.a
> >> > [flink-akka.actor.default-dispatcher-20]
> >> o.a.f.r.instance.InstanceManager -
> >> > Registered TaskManager at 7200a7da4da7 (akka.tcp://
> >> > flink@172.17.0.3:6121/user/taskmanager) as
> >> > eac0ce12e6ec885863d3438d691f4ab2. Current number of registered hosts
> is
> >> 1.
> >> > Current number of alive task slots is 1.
> >> > 2016-03-12 08:58:21,382 PST [WARN]  ec2-54-173-231-120.compute-1.a
> >> > [flink-akka.actor.default-dispatcher-20]
> >> > a.remote.ReliableDeliverySupervisor - Association with remote system
> >> > [akka.tcp://flink@172.17.0.3:6121] has failed, address is now gated
> for
> >> > [5000] ms. Reason is: [Disassociated].
> >> > 2016-03-12 08:58:21,388 PST [INFO]  ec2-54-173-231-120.compute-1.a
> >> > [flink-akka.actor.default-dispatcher-20]
> >> > o.a.f.runtime.jobmanager.JobManager - Task manager akka.tcp://
> >> > flink@172.17.0.3:6121/user/taskmanager wants to disconnect, because
> >> > TaskManager akka://flink/user/taskmanager is disassociating.
> >> > 2016-03-12 08:58:21,388 PST [INFO]  ec2-54-173-231-120.compute-1.a
> >> > [flink-akka.actor.default-dispatcher-20]
> >> o.a.f.r.instance.InstanceManager -
> >> > Unregistered task manager akka.tcp://
> >> > flink@172.17.0.3:6121/user/taskmanager.
> >> > Number of registered task managers 0. Number of available slots 0.
> >> > 2016-03-12 08:58:21,390 PST [INFO]  ec2-54-173-231-120.compute-1.a
> >> > [flink-akka.actor.default-dispatcher-20]
> >> o.a.f.r.instance.InstanceManager -
> >> > Registered TaskManager at 7200a7da4da7 (akka.tcp://
> >> > flink@172.17.0.3:6121/user/taskmanager) as
> >> > bda61dbd047d40889aa3868d5d4d86a9. Current number of registered hosts
> is
> >> 1.
> >> > Current number of alive task slots is 1.
> >> > 2016-03-12 08:58:25,433 PST [INFO]  ec2-54-173-231-120.compute-1.a
> >> > [flink-akka.actor.default-dispatcher-18]
> >> > o.a.f.runtime.jobmanager.JobManager - Task manager akka.tcp://
> >> > flink@172.17.0.3:6121/user/taskmanager terminated.
> >> > 2016-03-12 08:58:25,434 PST [INFO]  ec2-54-173-231-120.compute-1.a
> >> > [flink-akka.actor.default-dispatcher-18]
> >> o.a.f.r.instance.InstanceManager -
> >> > Unregistered task manager akka.tcp://
> >> > flink@172.17.0.3:6121/user/taskmanager.
> >> > Number of registered task managers 0. Number of available slots 0.
> >> > 2016-03-12 08:58:28,947 PST [INFO]  ec2-54-173-231-120.compute-1.a
> >> > [flink-akka.actor.default-dispatcher-20]
> >> o.a.f.r.instance.InstanceManager -
> >> > Registering TaskManager at akka.tcp://
> >> > flink@172.17.0.3:6121/user/taskmanager
> >> > which was marked as dead earlier because of a heart-beat timeout.
> >> > 2016-03-12 08:58:28,948 PST [INFO]  ec2-54-173-231-120.compute-1.a
> >> > [flink-akka.actor.default-dispatcher-20]
> >> o.a.f.r.instance.InstanceManager -
> >> > Registered TaskManager at 7200a7da4da7 (akka.tcp://
> >> > flink@172.17.0.3:6121/user/taskmanager) as
> >> > d42ea5c6e0053935a0973d8536f3d8a5. Current number of registered hosts
> is
> >> 1.
> >> > Current number of alive task slots is 1.
> >> >
> >> >
> >> > On Fri, Mar 11, 2016 at 5:23 AM, Stephan Ewen <sewen@apache.org
> >> <javascript:;>> wrote:
> >> >
> >> > > Hi Deepak!
> >> > >
> >> > > We can currently not split the bind address and advertised address,
> >> > because
> >> > > the Akka library only accepts packages sent explicitly to the bind
> >> > address
> >> > > (not sure why Akka has this artificial limitation, but it is there).
> >> > >
> >> > > Can you bridge the container IP address to be visible from the
> outside?
> >> > >
> >> > > Stephan
> >> > >
> >> > >
> >> > > On Fri, Mar 11, 2016 at 1:03 PM, Ufuk Celebi <uce@apache.org
> >> <javascript:;>> wrote:
> >> > >
> >> > > > Hey Deepak!
> >> > > >
> >> > > > Your description of Flink's behaviour is correct. To summarize:
> >> > > >
> >> > > > # Host Address
> >> > > >
> >> > > > If you specify a host address as an argument to the JVM (via
> >> > > > jobmanager.sh or the start-cluster.sh scripts) then that one is
> used.
> >> > > > If you don't, it falls back to the value configured in
> >> flink-conf.yaml
> >> > > > (what you describe).
> >> > > >
> >> > > > # Ports
> >> > > >
> >> > > > Default used random port and publishes via ZooKeeper. You can
> >> > > > configure a port range only via recovery.jobmanager.port (what you
> >> > > > describe).
> >> > > >
> >> > > > ---
> >> > > >
> >> > > > Your proposal would likely solve the issue, but isn't it possible
> to
> >> > > > handle this outside of Flink? I've found this stack overflow
> >> question,
> >> > > > which should be related:
> >> > > >
> >> > > >
> >> > >
> >> >
> >>
> http://stackoverflow.com/questions/26539727/giving-a-docker-container-a-routable-ip-address
> >> > > >
> >> > > > What's your opinion?
> >> > > >
> >> > >
> >> >
> >> >
> >> >
> >> > --
> >> > Thanks,
> >> > Deepak Jha
> >> >
> >>
> >
> >
> > --
> > Sent from Gmail Mobile
>



-- 
Thanks,
Deepak Jha

Re: Flink-1.0.0 JobManager is not running in Docker Container on AWS

Posted by Maximilian Michels <mx...@apache.org>.
Hi Deepak,

We have looked further into this and have a pretty easy fix. However,
it will only work with Flink's Scala 2.11 version because newer
versions of the Akka library are incompatible with Scala 2.10 (Flink's
default Scala version). Would that be a viable option for you?

We're currently discussing this here:
https://issues.apache.org/jira/browse/FLINK-2821

Best,
Max

On Mon, Mar 14, 2016 at 4:49 PM, Deepak Jha <dk...@gmail.com> wrote:
> Hi Maximilian,
> Thanks for your response. I will wait for the update.
>
> On Monday, March 14, 2016, Maximilian Michels <mx...@apache.org> wrote:
>
>> Hi Deepak,
>>
>> We'll look more into this problem this week. Until now we considered it a
>> configuration issue if the bind address was not externally reachable.
>> However, one might not always have the possibility to change this network
>> configuration.
>>
>> Looking further, it is actually possible to let the bind address be
>> different from the advertised address. From the Akka FAQ at
>> http://doc.akka.io/docs/akka/2.4.1/additional/faq.html:
>>
>> If you are running an ActorSystem under a NAT or inside a docker container,
>> > make sure to set akka.remote.netty.tcp.hostname and
>> > akka.remote.netty.tcp.port to the address it is reachable at from other
>> > ActorSystems. If you need to bind your network interface to a different
>> > address - use akka.remote.netty.tcp.bind-hostname and
>> > akka.remote.netty.tcp.bind-port settings. Also make sure your network is
>> > configured to translate from the address your ActorSystem is reachable at
>> > to the address your ActorSystem network interface is bound to.
>> >
>>
>> It looks like we have to expose this configuration to users who have a
>> special network setup.
>>
>> Best,
>> Max
>>
>> On Mon, Mar 14, 2016 at 5:42 AM, Deepak Jha <dkjhanitt@gmail.com
>> <javascript:;>> wrote:
>>
>> > Hi Stephan & Ufuk,
>> > Thanks for your response.
>> >
>> > Yes there is a way in which you can run docker (net = host mode) in which
>> > guest machine's network stack gets shared by docker container.
>> > Unfortunately its not supported by AWS ECS.
>> >
>> > I do have one more question for you. Can you guys please explain me what
>> > happens when taskmanager's register themselves to jobmanager in HA mode?
>> > Does each taskmanager gets connected to jobmanager on separate port ? The
>> > reason I'm asking is because if I run 2 taskmanager's (on separate docker
>> > container), they are able to attach themselves to the Jobmanager (another
>> > docker container) ( Flink HA setup using remote zk cluster) but soon
>> after
>> > that they get disconnected. Logs are not very helpful either... I suspect
>> > that each taskmanager gets connected on new port and since by default
>> > docker does not expose all ports, this may happen.... I do not see this
>> > happen when I do not use docker container....
>> >
>> > Here is the log file that I saw in jobmanager....
>> >
>> > 2016-03-12 08:55:55,010 PST [INFO]  ec2-54-173-231-120.compute-1.a
>> > [flink-akka.actor.default-dispatcher-20]
>> o.a.f.r.instance.InstanceManager -
>> > Registered TaskManager at 5673db03e679 (akka.tcp://
>> > flink@172.17.0.3:6121/user/taskmanager) as
>> > 7eafcfddd6bd084f2ec5a32594603f4f. Current number of registered hosts
>> > is 1. *Current
>> > number of alive task slots is 1.*
>> > 2016-03-12 08:57:42,676 PST [INFO]  ec2-54-173-231-120.compute-1.a
>> > [flink-akka.actor.default-dispatcher-20]
>> o.a.f.r.instance.InstanceManager -
>> > Registered TaskManager at 7200a7da4da7 (akka.tcp://
>> > flink@172.17.0.3:6121/user/taskmanager) as
>> > 320338e15a7a44ee64dc03a40f04fcd7. Current number of registered hosts
>> > is 2. *Current
>> > number of alive task slots is 2.*
>> > 2016-03-12 08:57:48,422 PST [INFO]  ec2-54-173-231-120.compute-1.a
>> > [flink-akka.actor.default-dispatcher-20]
>> > o.a.f.runtime.jobmanager.JobManager - Task manager akka.tcp://
>> > flink@172.17.0.3:6121/user/taskmanager terminated.
>> > 2016-03-12 08:57:48,422 PST [INFO]  ec2-54-173-231-120.compute-1.a
>> > [flink-akka.actor.default-dispatcher-20] o.a.f.r.instance.InstanceManager
>> > -*
>> > Unregistered task manager akka.tcp://
>> > flink@172.17.0.3:6121/user/taskmanager
>> > <http://flink@172.17.0.3:6121/user/taskmanager>. Number of registered
>> task
>> > managers 1. Number of available slots 1.*
>> > 2016-03-12 08:58:01,417 PST [WARN]  ec2-54-173-231-120.compute-1.a
>> > [flink-akka.actor.default-dispatcher-20]
>> > a.remote.ReliableDeliverySupervisor - Association with remote system
>> > [akka.tcp://flink@172.17.0.3:6121] has failed, address is now gated for
>> > [5000] ms. Reason is: [Disassociated].
>> > 2016-03-12 08:58:01,451 PST [INFO]  ec2-54-173-231-120.compute-1.a
>> > [flink-akka.actor.default-dispatcher-20]
>> > o.a.f.runtime.jobmanager.JobManager - Task manager akka.tcp://
>> > flink@172.17.0.3:6121/user/taskmanager wants to disconnect, because
>> > TaskManager akka://flink/user/taskmanager is disassociating.
>> > 2016-03-12 08:58:01,451 PST [INFO]  ec2-54-173-231-120.compute-1.a
>> > [flink-akka.actor.default-dispatcher-20]
>> > o.a.f.r.instance.InstanceManager - *Unregistered
>> > task manager akka.tcp://flink@172.17.0.3:6121/user/taskmanager
>> > <http://flink@172.17.0.3:6121/user/taskmanager>. Number of registered
>> task
>> > managers 0. Number of available slots 0.*
>> > 2016-03-12 08:58:01,465 PST [INFO]  ec2-54-173-231-120.compute-1.a
>> > [flink-akka.actor.default-dispatcher-20]
>> > o.a.f.r.instance.InstanceManager - *Registered
>> > TaskManager at 7200a7da4da7
>> > (akka.tcp://flink@172.17.0.3:6121/user/taskmanager
>> > <http://flink@172.17.0.3:6121/user/taskmanager>) as
>> > b5dbbc829854afa3ec5d8f0b6f9dbd03. Current number of registered hosts is
>> 1.
>> > Current number of alive task slots is 1.*
>> > 2016-03-12 08:58:03,383 PST [INFO]  ec2-54-173-231-120.compute-1.a
>> > [flink-akka.actor.default-dispatcher-20]
>> > o.a.f.runtime.jobmanager.JobManager - Task manager akka.tcp://
>> > flink@172.17.0.3:6121/user/taskmanager terminated.
>> > 2016-03-12 08:58:03,384 PST [INFO]  ec2-54-173-231-120.compute-1.a
>> > [flink-akka.actor.default-dispatcher-20] o.a.f.r.instance.InstanceManager
>> > -*
>> > Unregistered task manager akka.tcp://
>> > flink@172.17.0.3:6121/user/taskmanager
>> > <http://flink@172.17.0.3:6121/user/taskmanager>. Number of registered
>> task
>> > managers 0. Number of available slots 0.*
>> > 2016-03-12 08:58:04,988 PST [INFO]  ec2-54-173-231-120.compute-1.a
>> > [flink-akka.actor.default-dispatcher-20]
>> o.a.f.r.instance.InstanceManager -
>> > Registering TaskManager at akka.tcp://
>> > flink@172.17.0.3:6121/user/taskmanager
>> > which was marked as dead earlier because of a heart-beat timeout.
>> > 2016-03-12 08:58:04,988 PST [INFO]  ec2-54-173-231-120.compute-1.a
>> > [flink-akka.actor.default-dispatcher-20]
>> o.a.f.r.instance.InstanceManager -
>> > Registered TaskManager at 7200a7da4da7 (akka.tcp://
>> > flink@172.17.0.3:6121/user/taskmanager) as
>> > eac0ce12e6ec885863d3438d691f4ab2. Current number of registered hosts is
>> 1.
>> > Current number of alive task slots is 1.
>> > 2016-03-12 08:58:21,382 PST [WARN]  ec2-54-173-231-120.compute-1.a
>> > [flink-akka.actor.default-dispatcher-20]
>> > a.remote.ReliableDeliverySupervisor - Association with remote system
>> > [akka.tcp://flink@172.17.0.3:6121] has failed, address is now gated for
>> > [5000] ms. Reason is: [Disassociated].
>> > 2016-03-12 08:58:21,388 PST [INFO]  ec2-54-173-231-120.compute-1.a
>> > [flink-akka.actor.default-dispatcher-20]
>> > o.a.f.runtime.jobmanager.JobManager - Task manager akka.tcp://
>> > flink@172.17.0.3:6121/user/taskmanager wants to disconnect, because
>> > TaskManager akka://flink/user/taskmanager is disassociating.
>> > 2016-03-12 08:58:21,388 PST [INFO]  ec2-54-173-231-120.compute-1.a
>> > [flink-akka.actor.default-dispatcher-20]
>> o.a.f.r.instance.InstanceManager -
>> > Unregistered task manager akka.tcp://
>> > flink@172.17.0.3:6121/user/taskmanager.
>> > Number of registered task managers 0. Number of available slots 0.
>> > 2016-03-12 08:58:21,390 PST [INFO]  ec2-54-173-231-120.compute-1.a
>> > [flink-akka.actor.default-dispatcher-20]
>> o.a.f.r.instance.InstanceManager -
>> > Registered TaskManager at 7200a7da4da7 (akka.tcp://
>> > flink@172.17.0.3:6121/user/taskmanager) as
>> > bda61dbd047d40889aa3868d5d4d86a9. Current number of registered hosts is
>> 1.
>> > Current number of alive task slots is 1.
>> > 2016-03-12 08:58:25,433 PST [INFO]  ec2-54-173-231-120.compute-1.a
>> > [flink-akka.actor.default-dispatcher-18]
>> > o.a.f.runtime.jobmanager.JobManager - Task manager akka.tcp://
>> > flink@172.17.0.3:6121/user/taskmanager terminated.
>> > 2016-03-12 08:58:25,434 PST [INFO]  ec2-54-173-231-120.compute-1.a
>> > [flink-akka.actor.default-dispatcher-18]
>> o.a.f.r.instance.InstanceManager -
>> > Unregistered task manager akka.tcp://
>> > flink@172.17.0.3:6121/user/taskmanager.
>> > Number of registered task managers 0. Number of available slots 0.
>> > 2016-03-12 08:58:28,947 PST [INFO]  ec2-54-173-231-120.compute-1.a
>> > [flink-akka.actor.default-dispatcher-20]
>> o.a.f.r.instance.InstanceManager -
>> > Registering TaskManager at akka.tcp://
>> > flink@172.17.0.3:6121/user/taskmanager
>> > which was marked as dead earlier because of a heart-beat timeout.
>> > 2016-03-12 08:58:28,948 PST [INFO]  ec2-54-173-231-120.compute-1.a
>> > [flink-akka.actor.default-dispatcher-20]
>> o.a.f.r.instance.InstanceManager -
>> > Registered TaskManager at 7200a7da4da7 (akka.tcp://
>> > flink@172.17.0.3:6121/user/taskmanager) as
>> > d42ea5c6e0053935a0973d8536f3d8a5. Current number of registered hosts is
>> 1.
>> > Current number of alive task slots is 1.
>> >
>> >
>> > On Fri, Mar 11, 2016 at 5:23 AM, Stephan Ewen <sewen@apache.org
>> <javascript:;>> wrote:
>> >
>> > > Hi Deepak!
>> > >
>> > > We can currently not split the bind address and advertised address,
>> > because
>> > > the Akka library only accepts packages sent explicitly to the bind
>> > address
>> > > (not sure why Akka has this artificial limitation, but it is there).
>> > >
>> > > Can you bridge the container IP address to be visible from the outside?
>> > >
>> > > Stephan
>> > >
>> > >
>> > > On Fri, Mar 11, 2016 at 1:03 PM, Ufuk Celebi <uce@apache.org
>> <javascript:;>> wrote:
>> > >
>> > > > Hey Deepak!
>> > > >
>> > > > Your description of Flink's behaviour is correct. To summarize:
>> > > >
>> > > > # Host Address
>> > > >
>> > > > If you specify a host address as an argument to the JVM (via
>> > > > jobmanager.sh or the start-cluster.sh scripts) then that one is used.
>> > > > If you don't, it falls back to the value configured in
>> flink-conf.yaml
>> > > > (what you describe).
>> > > >
>> > > > # Ports
>> > > >
>> > > > Default used random port and publishes via ZooKeeper. You can
>> > > > configure a port range only via recovery.jobmanager.port (what you
>> > > > describe).
>> > > >
>> > > > ---
>> > > >
>> > > > Your proposal would likely solve the issue, but isn't it possible to
>> > > > handle this outside of Flink? I've found this stack overflow
>> question,
>> > > > which should be related:
>> > > >
>> > > >
>> > >
>> >
>> http://stackoverflow.com/questions/26539727/giving-a-docker-container-a-routable-ip-address
>> > > >
>> > > > What's your opinion?
>> > > >
>> > >
>> >
>> >
>> >
>> > --
>> > Thanks,
>> > Deepak Jha
>> >
>>
>
>
> --
> Sent from Gmail Mobile

Re: Flink-1.0.0 JobManager is not running in Docker Container on AWS

Posted by Deepak Jha <dk...@gmail.com>.
Hi Maximilian,
Thanks for your response. I will wait for the update.

On Monday, March 14, 2016, Maximilian Michels <mx...@apache.org> wrote:

> Hi Deepak,
>
> We'll look more into this problem this week. Until now we considered it a
> configuration issue if the bind address was not externally reachable.
> However, one might not always have the possibility to change this network
> configuration.
>
> Looking further, it is actually possible to let the bind address be
> different from the advertised address. From the Akka FAQ at
> http://doc.akka.io/docs/akka/2.4.1/additional/faq.html:
>
> If you are running an ActorSystem under a NAT or inside a docker container,
> > make sure to set akka.remote.netty.tcp.hostname and
> > akka.remote.netty.tcp.port to the address it is reachable at from other
> > ActorSystems. If you need to bind your network interface to a different
> > address - use akka.remote.netty.tcp.bind-hostname and
> > akka.remote.netty.tcp.bind-port settings. Also make sure your network is
> > configured to translate from the address your ActorSystem is reachable at
> > to the address your ActorSystem network interface is bound to.
> >
>
> It looks like we have to expose this configuration to users who have a
> special network setup.
>
> Best,
> Max
>
> On Mon, Mar 14, 2016 at 5:42 AM, Deepak Jha <dkjhanitt@gmail.com
> <javascript:;>> wrote:
>
> > Hi Stephan & Ufuk,
> > Thanks for your response.
> >
> > Yes there is a way in which you can run docker (net = host mode) in which
> > guest machine's network stack gets shared by docker container.
> > Unfortunately its not supported by AWS ECS.
> >
> > I do have one more question for you. Can you guys please explain me what
> > happens when taskmanager's register themselves to jobmanager in HA mode?
> > Does each taskmanager gets connected to jobmanager on separate port ? The
> > reason I'm asking is because if I run 2 taskmanager's (on separate docker
> > container), they are able to attach themselves to the Jobmanager (another
> > docker container) ( Flink HA setup using remote zk cluster) but soon
> after
> > that they get disconnected. Logs are not very helpful either... I suspect
> > that each taskmanager gets connected on new port and since by default
> > docker does not expose all ports, this may happen.... I do not see this
> > happen when I do not use docker container....
> >
> > Here is the log file that I saw in jobmanager....
> >
> > 2016-03-12 08:55:55,010 PST [INFO]  ec2-54-173-231-120.compute-1.a
> > [flink-akka.actor.default-dispatcher-20]
> o.a.f.r.instance.InstanceManager -
> > Registered TaskManager at 5673db03e679 (akka.tcp://
> > flink@172.17.0.3:6121/user/taskmanager) as
> > 7eafcfddd6bd084f2ec5a32594603f4f. Current number of registered hosts
> > is 1. *Current
> > number of alive task slots is 1.*
> > 2016-03-12 08:57:42,676 PST [INFO]  ec2-54-173-231-120.compute-1.a
> > [flink-akka.actor.default-dispatcher-20]
> o.a.f.r.instance.InstanceManager -
> > Registered TaskManager at 7200a7da4da7 (akka.tcp://
> > flink@172.17.0.3:6121/user/taskmanager) as
> > 320338e15a7a44ee64dc03a40f04fcd7. Current number of registered hosts
> > is 2. *Current
> > number of alive task slots is 2.*
> > 2016-03-12 08:57:48,422 PST [INFO]  ec2-54-173-231-120.compute-1.a
> > [flink-akka.actor.default-dispatcher-20]
> > o.a.f.runtime.jobmanager.JobManager - Task manager akka.tcp://
> > flink@172.17.0.3:6121/user/taskmanager terminated.
> > 2016-03-12 08:57:48,422 PST [INFO]  ec2-54-173-231-120.compute-1.a
> > [flink-akka.actor.default-dispatcher-20] o.a.f.r.instance.InstanceManager
> > -*
> > Unregistered task manager akka.tcp://
> > flink@172.17.0.3:6121/user/taskmanager
> > <http://flink@172.17.0.3:6121/user/taskmanager>. Number of registered
> task
> > managers 1. Number of available slots 1.*
> > 2016-03-12 08:58:01,417 PST [WARN]  ec2-54-173-231-120.compute-1.a
> > [flink-akka.actor.default-dispatcher-20]
> > a.remote.ReliableDeliverySupervisor - Association with remote system
> > [akka.tcp://flink@172.17.0.3:6121] has failed, address is now gated for
> > [5000] ms. Reason is: [Disassociated].
> > 2016-03-12 08:58:01,451 PST [INFO]  ec2-54-173-231-120.compute-1.a
> > [flink-akka.actor.default-dispatcher-20]
> > o.a.f.runtime.jobmanager.JobManager - Task manager akka.tcp://
> > flink@172.17.0.3:6121/user/taskmanager wants to disconnect, because
> > TaskManager akka://flink/user/taskmanager is disassociating.
> > 2016-03-12 08:58:01,451 PST [INFO]  ec2-54-173-231-120.compute-1.a
> > [flink-akka.actor.default-dispatcher-20]
> > o.a.f.r.instance.InstanceManager - *Unregistered
> > task manager akka.tcp://flink@172.17.0.3:6121/user/taskmanager
> > <http://flink@172.17.0.3:6121/user/taskmanager>. Number of registered
> task
> > managers 0. Number of available slots 0.*
> > 2016-03-12 08:58:01,465 PST [INFO]  ec2-54-173-231-120.compute-1.a
> > [flink-akka.actor.default-dispatcher-20]
> > o.a.f.r.instance.InstanceManager - *Registered
> > TaskManager at 7200a7da4da7
> > (akka.tcp://flink@172.17.0.3:6121/user/taskmanager
> > <http://flink@172.17.0.3:6121/user/taskmanager>) as
> > b5dbbc829854afa3ec5d8f0b6f9dbd03. Current number of registered hosts is
> 1.
> > Current number of alive task slots is 1.*
> > 2016-03-12 08:58:03,383 PST [INFO]  ec2-54-173-231-120.compute-1.a
> > [flink-akka.actor.default-dispatcher-20]
> > o.a.f.runtime.jobmanager.JobManager - Task manager akka.tcp://
> > flink@172.17.0.3:6121/user/taskmanager terminated.
> > 2016-03-12 08:58:03,384 PST [INFO]  ec2-54-173-231-120.compute-1.a
> > [flink-akka.actor.default-dispatcher-20] o.a.f.r.instance.InstanceManager
> > -*
> > Unregistered task manager akka.tcp://
> > flink@172.17.0.3:6121/user/taskmanager
> > <http://flink@172.17.0.3:6121/user/taskmanager>. Number of registered
> task
> > managers 0. Number of available slots 0.*
> > 2016-03-12 08:58:04,988 PST [INFO]  ec2-54-173-231-120.compute-1.a
> > [flink-akka.actor.default-dispatcher-20]
> o.a.f.r.instance.InstanceManager -
> > Registering TaskManager at akka.tcp://
> > flink@172.17.0.3:6121/user/taskmanager
> > which was marked as dead earlier because of a heart-beat timeout.
> > 2016-03-12 08:58:04,988 PST [INFO]  ec2-54-173-231-120.compute-1.a
> > [flink-akka.actor.default-dispatcher-20]
> o.a.f.r.instance.InstanceManager -
> > Registered TaskManager at 7200a7da4da7 (akka.tcp://
> > flink@172.17.0.3:6121/user/taskmanager) as
> > eac0ce12e6ec885863d3438d691f4ab2. Current number of registered hosts is
> 1.
> > Current number of alive task slots is 1.
> > 2016-03-12 08:58:21,382 PST [WARN]  ec2-54-173-231-120.compute-1.a
> > [flink-akka.actor.default-dispatcher-20]
> > a.remote.ReliableDeliverySupervisor - Association with remote system
> > [akka.tcp://flink@172.17.0.3:6121] has failed, address is now gated for
> > [5000] ms. Reason is: [Disassociated].
> > 2016-03-12 08:58:21,388 PST [INFO]  ec2-54-173-231-120.compute-1.a
> > [flink-akka.actor.default-dispatcher-20]
> > o.a.f.runtime.jobmanager.JobManager - Task manager akka.tcp://
> > flink@172.17.0.3:6121/user/taskmanager wants to disconnect, because
> > TaskManager akka://flink/user/taskmanager is disassociating.
> > 2016-03-12 08:58:21,388 PST [INFO]  ec2-54-173-231-120.compute-1.a
> > [flink-akka.actor.default-dispatcher-20]
> o.a.f.r.instance.InstanceManager -
> > Unregistered task manager akka.tcp://
> > flink@172.17.0.3:6121/user/taskmanager.
> > Number of registered task managers 0. Number of available slots 0.
> > 2016-03-12 08:58:21,390 PST [INFO]  ec2-54-173-231-120.compute-1.a
> > [flink-akka.actor.default-dispatcher-20]
> o.a.f.r.instance.InstanceManager -
> > Registered TaskManager at 7200a7da4da7 (akka.tcp://
> > flink@172.17.0.3:6121/user/taskmanager) as
> > bda61dbd047d40889aa3868d5d4d86a9. Current number of registered hosts is
> 1.
> > Current number of alive task slots is 1.
> > 2016-03-12 08:58:25,433 PST [INFO]  ec2-54-173-231-120.compute-1.a
> > [flink-akka.actor.default-dispatcher-18]
> > o.a.f.runtime.jobmanager.JobManager - Task manager akka.tcp://
> > flink@172.17.0.3:6121/user/taskmanager terminated.
> > 2016-03-12 08:58:25,434 PST [INFO]  ec2-54-173-231-120.compute-1.a
> > [flink-akka.actor.default-dispatcher-18]
> o.a.f.r.instance.InstanceManager -
> > Unregistered task manager akka.tcp://
> > flink@172.17.0.3:6121/user/taskmanager.
> > Number of registered task managers 0. Number of available slots 0.
> > 2016-03-12 08:58:28,947 PST [INFO]  ec2-54-173-231-120.compute-1.a
> > [flink-akka.actor.default-dispatcher-20]
> o.a.f.r.instance.InstanceManager -
> > Registering TaskManager at akka.tcp://
> > flink@172.17.0.3:6121/user/taskmanager
> > which was marked as dead earlier because of a heart-beat timeout.
> > 2016-03-12 08:58:28,948 PST [INFO]  ec2-54-173-231-120.compute-1.a
> > [flink-akka.actor.default-dispatcher-20]
> o.a.f.r.instance.InstanceManager -
> > Registered TaskManager at 7200a7da4da7 (akka.tcp://
> > flink@172.17.0.3:6121/user/taskmanager) as
> > d42ea5c6e0053935a0973d8536f3d8a5. Current number of registered hosts is
> 1.
> > Current number of alive task slots is 1.
> >
> >
> > On Fri, Mar 11, 2016 at 5:23 AM, Stephan Ewen <sewen@apache.org
> <javascript:;>> wrote:
> >
> > > Hi Deepak!
> > >
> > > We can currently not split the bind address and advertised address,
> > because
> > > the Akka library only accepts packages sent explicitly to the bind
> > address
> > > (not sure why Akka has this artificial limitation, but it is there).
> > >
> > > Can you bridge the container IP address to be visible from the outside?
> > >
> > > Stephan
> > >
> > >
> > > On Fri, Mar 11, 2016 at 1:03 PM, Ufuk Celebi <uce@apache.org
> <javascript:;>> wrote:
> > >
> > > > Hey Deepak!
> > > >
> > > > Your description of Flink's behaviour is correct. To summarize:
> > > >
> > > > # Host Address
> > > >
> > > > If you specify a host address as an argument to the JVM (via
> > > > jobmanager.sh or the start-cluster.sh scripts) then that one is used.
> > > > If you don't, it falls back to the value configured in
> flink-conf.yaml
> > > > (what you describe).
> > > >
> > > > # Ports
> > > >
> > > > Default used random port and publishes via ZooKeeper. You can
> > > > configure a port range only via recovery.jobmanager.port (what you
> > > > describe).
> > > >
> > > > ---
> > > >
> > > > Your proposal would likely solve the issue, but isn't it possible to
> > > > handle this outside of Flink? I've found this stack overflow
> question,
> > > > which should be related:
> > > >
> > > >
> > >
> >
> http://stackoverflow.com/questions/26539727/giving-a-docker-container-a-routable-ip-address
> > > >
> > > > What's your opinion?
> > > >
> > >
> >
> >
> >
> > --
> > Thanks,
> > Deepak Jha
> >
>


-- 
Sent from Gmail Mobile

Re: Flink-1.0.0 JobManager is not running in Docker Container on AWS

Posted by Maximilian Michels <mx...@apache.org>.
Hi Deepak,

We'll look more into this problem this week. Until now we considered it a
configuration issue if the bind address was not externally reachable.
However, one might not always have the possibility to change this network
configuration.

Looking further, it is actually possible to let the bind address be
different from the advertised address. From the Akka FAQ at
http://doc.akka.io/docs/akka/2.4.1/additional/faq.html:

If you are running an ActorSystem under a NAT or inside a docker container,
> make sure to set akka.remote.netty.tcp.hostname and
> akka.remote.netty.tcp.port to the address it is reachable at from other
> ActorSystems. If you need to bind your network interface to a different
> address - use akka.remote.netty.tcp.bind-hostname and
> akka.remote.netty.tcp.bind-port settings. Also make sure your network is
> configured to translate from the address your ActorSystem is reachable at
> to the address your ActorSystem network interface is bound to.
>

It looks like we have to expose this configuration to users who have a
special network setup.

Best,
Max

On Mon, Mar 14, 2016 at 5:42 AM, Deepak Jha <dk...@gmail.com> wrote:

> Hi Stephan & Ufuk,
> Thanks for your response.
>
> Yes there is a way in which you can run docker (net = host mode) in which
> guest machine's network stack gets shared by docker container.
> Unfortunately its not supported by AWS ECS.
>
> I do have one more question for you. Can you guys please explain me what
> happens when taskmanager's register themselves to jobmanager in HA mode?
> Does each taskmanager gets connected to jobmanager on separate port ? The
> reason I'm asking is because if I run 2 taskmanager's (on separate docker
> container), they are able to attach themselves to the Jobmanager (another
> docker container) ( Flink HA setup using remote zk cluster) but soon after
> that they get disconnected. Logs are not very helpful either... I suspect
> that each taskmanager gets connected on new port and since by default
> docker does not expose all ports, this may happen.... I do not see this
> happen when I do not use docker container....
>
> Here is the log file that I saw in jobmanager....
>
> 2016-03-12 08:55:55,010 PST [INFO]  ec2-54-173-231-120.compute-1.a
> [flink-akka.actor.default-dispatcher-20] o.a.f.r.instance.InstanceManager -
> Registered TaskManager at 5673db03e679 (akka.tcp://
> flink@172.17.0.3:6121/user/taskmanager) as
> 7eafcfddd6bd084f2ec5a32594603f4f. Current number of registered hosts
> is 1. *Current
> number of alive task slots is 1.*
> 2016-03-12 08:57:42,676 PST [INFO]  ec2-54-173-231-120.compute-1.a
> [flink-akka.actor.default-dispatcher-20] o.a.f.r.instance.InstanceManager -
> Registered TaskManager at 7200a7da4da7 (akka.tcp://
> flink@172.17.0.3:6121/user/taskmanager) as
> 320338e15a7a44ee64dc03a40f04fcd7. Current number of registered hosts
> is 2. *Current
> number of alive task slots is 2.*
> 2016-03-12 08:57:48,422 PST [INFO]  ec2-54-173-231-120.compute-1.a
> [flink-akka.actor.default-dispatcher-20]
> o.a.f.runtime.jobmanager.JobManager - Task manager akka.tcp://
> flink@172.17.0.3:6121/user/taskmanager terminated.
> 2016-03-12 08:57:48,422 PST [INFO]  ec2-54-173-231-120.compute-1.a
> [flink-akka.actor.default-dispatcher-20] o.a.f.r.instance.InstanceManager
> -*
> Unregistered task manager akka.tcp://
> flink@172.17.0.3:6121/user/taskmanager
> <http://flink@172.17.0.3:6121/user/taskmanager>. Number of registered task
> managers 1. Number of available slots 1.*
> 2016-03-12 08:58:01,417 PST [WARN]  ec2-54-173-231-120.compute-1.a
> [flink-akka.actor.default-dispatcher-20]
> a.remote.ReliableDeliverySupervisor - Association with remote system
> [akka.tcp://flink@172.17.0.3:6121] has failed, address is now gated for
> [5000] ms. Reason is: [Disassociated].
> 2016-03-12 08:58:01,451 PST [INFO]  ec2-54-173-231-120.compute-1.a
> [flink-akka.actor.default-dispatcher-20]
> o.a.f.runtime.jobmanager.JobManager - Task manager akka.tcp://
> flink@172.17.0.3:6121/user/taskmanager wants to disconnect, because
> TaskManager akka://flink/user/taskmanager is disassociating.
> 2016-03-12 08:58:01,451 PST [INFO]  ec2-54-173-231-120.compute-1.a
> [flink-akka.actor.default-dispatcher-20]
> o.a.f.r.instance.InstanceManager - *Unregistered
> task manager akka.tcp://flink@172.17.0.3:6121/user/taskmanager
> <http://flink@172.17.0.3:6121/user/taskmanager>. Number of registered task
> managers 0. Number of available slots 0.*
> 2016-03-12 08:58:01,465 PST [INFO]  ec2-54-173-231-120.compute-1.a
> [flink-akka.actor.default-dispatcher-20]
> o.a.f.r.instance.InstanceManager - *Registered
> TaskManager at 7200a7da4da7
> (akka.tcp://flink@172.17.0.3:6121/user/taskmanager
> <http://flink@172.17.0.3:6121/user/taskmanager>) as
> b5dbbc829854afa3ec5d8f0b6f9dbd03. Current number of registered hosts is 1.
> Current number of alive task slots is 1.*
> 2016-03-12 08:58:03,383 PST [INFO]  ec2-54-173-231-120.compute-1.a
> [flink-akka.actor.default-dispatcher-20]
> o.a.f.runtime.jobmanager.JobManager - Task manager akka.tcp://
> flink@172.17.0.3:6121/user/taskmanager terminated.
> 2016-03-12 08:58:03,384 PST [INFO]  ec2-54-173-231-120.compute-1.a
> [flink-akka.actor.default-dispatcher-20] o.a.f.r.instance.InstanceManager
> -*
> Unregistered task manager akka.tcp://
> flink@172.17.0.3:6121/user/taskmanager
> <http://flink@172.17.0.3:6121/user/taskmanager>. Number of registered task
> managers 0. Number of available slots 0.*
> 2016-03-12 08:58:04,988 PST [INFO]  ec2-54-173-231-120.compute-1.a
> [flink-akka.actor.default-dispatcher-20] o.a.f.r.instance.InstanceManager -
> Registering TaskManager at akka.tcp://
> flink@172.17.0.3:6121/user/taskmanager
> which was marked as dead earlier because of a heart-beat timeout.
> 2016-03-12 08:58:04,988 PST [INFO]  ec2-54-173-231-120.compute-1.a
> [flink-akka.actor.default-dispatcher-20] o.a.f.r.instance.InstanceManager -
> Registered TaskManager at 7200a7da4da7 (akka.tcp://
> flink@172.17.0.3:6121/user/taskmanager) as
> eac0ce12e6ec885863d3438d691f4ab2. Current number of registered hosts is 1.
> Current number of alive task slots is 1.
> 2016-03-12 08:58:21,382 PST [WARN]  ec2-54-173-231-120.compute-1.a
> [flink-akka.actor.default-dispatcher-20]
> a.remote.ReliableDeliverySupervisor - Association with remote system
> [akka.tcp://flink@172.17.0.3:6121] has failed, address is now gated for
> [5000] ms. Reason is: [Disassociated].
> 2016-03-12 08:58:21,388 PST [INFO]  ec2-54-173-231-120.compute-1.a
> [flink-akka.actor.default-dispatcher-20]
> o.a.f.runtime.jobmanager.JobManager - Task manager akka.tcp://
> flink@172.17.0.3:6121/user/taskmanager wants to disconnect, because
> TaskManager akka://flink/user/taskmanager is disassociating.
> 2016-03-12 08:58:21,388 PST [INFO]  ec2-54-173-231-120.compute-1.a
> [flink-akka.actor.default-dispatcher-20] o.a.f.r.instance.InstanceManager -
> Unregistered task manager akka.tcp://
> flink@172.17.0.3:6121/user/taskmanager.
> Number of registered task managers 0. Number of available slots 0.
> 2016-03-12 08:58:21,390 PST [INFO]  ec2-54-173-231-120.compute-1.a
> [flink-akka.actor.default-dispatcher-20] o.a.f.r.instance.InstanceManager -
> Registered TaskManager at 7200a7da4da7 (akka.tcp://
> flink@172.17.0.3:6121/user/taskmanager) as
> bda61dbd047d40889aa3868d5d4d86a9. Current number of registered hosts is 1.
> Current number of alive task slots is 1.
> 2016-03-12 08:58:25,433 PST [INFO]  ec2-54-173-231-120.compute-1.a
> [flink-akka.actor.default-dispatcher-18]
> o.a.f.runtime.jobmanager.JobManager - Task manager akka.tcp://
> flink@172.17.0.3:6121/user/taskmanager terminated.
> 2016-03-12 08:58:25,434 PST [INFO]  ec2-54-173-231-120.compute-1.a
> [flink-akka.actor.default-dispatcher-18] o.a.f.r.instance.InstanceManager -
> Unregistered task manager akka.tcp://
> flink@172.17.0.3:6121/user/taskmanager.
> Number of registered task managers 0. Number of available slots 0.
> 2016-03-12 08:58:28,947 PST [INFO]  ec2-54-173-231-120.compute-1.a
> [flink-akka.actor.default-dispatcher-20] o.a.f.r.instance.InstanceManager -
> Registering TaskManager at akka.tcp://
> flink@172.17.0.3:6121/user/taskmanager
> which was marked as dead earlier because of a heart-beat timeout.
> 2016-03-12 08:58:28,948 PST [INFO]  ec2-54-173-231-120.compute-1.a
> [flink-akka.actor.default-dispatcher-20] o.a.f.r.instance.InstanceManager -
> Registered TaskManager at 7200a7da4da7 (akka.tcp://
> flink@172.17.0.3:6121/user/taskmanager) as
> d42ea5c6e0053935a0973d8536f3d8a5. Current number of registered hosts is 1.
> Current number of alive task slots is 1.
>
>
> On Fri, Mar 11, 2016 at 5:23 AM, Stephan Ewen <se...@apache.org> wrote:
>
> > Hi Deepak!
> >
> > We can currently not split the bind address and advertised address,
> because
> > the Akka library only accepts packages sent explicitly to the bind
> address
> > (not sure why Akka has this artificial limitation, but it is there).
> >
> > Can you bridge the container IP address to be visible from the outside?
> >
> > Stephan
> >
> >
> > On Fri, Mar 11, 2016 at 1:03 PM, Ufuk Celebi <uc...@apache.org> wrote:
> >
> > > Hey Deepak!
> > >
> > > Your description of Flink's behaviour is correct. To summarize:
> > >
> > > # Host Address
> > >
> > > If you specify a host address as an argument to the JVM (via
> > > jobmanager.sh or the start-cluster.sh scripts) then that one is used.
> > > If you don't, it falls back to the value configured in flink-conf.yaml
> > > (what you describe).
> > >
> > > # Ports
> > >
> > > Default used random port and publishes via ZooKeeper. You can
> > > configure a port range only via recovery.jobmanager.port (what you
> > > describe).
> > >
> > > ---
> > >
> > > Your proposal would likely solve the issue, but isn't it possible to
> > > handle this outside of Flink? I've found this stack overflow question,
> > > which should be related:
> > >
> > >
> >
> http://stackoverflow.com/questions/26539727/giving-a-docker-container-a-routable-ip-address
> > >
> > > What's your opinion?
> > >
> >
>
>
>
> --
> Thanks,
> Deepak Jha
>

Re: Flink-1.0.0 JobManager is not running in Docker Container on AWS

Posted by Deepak Jha <dk...@gmail.com>.
Hi Stephan & Ufuk,
Thanks for your response.

Yes there is a way in which you can run docker (net = host mode) in which
guest machine's network stack gets shared by docker container.
Unfortunately its not supported by AWS ECS.

I do have one more question for you. Can you guys please explain me what
happens when taskmanager's register themselves to jobmanager in HA mode?
Does each taskmanager gets connected to jobmanager on separate port ? The
reason I'm asking is because if I run 2 taskmanager's (on separate docker
container), they are able to attach themselves to the Jobmanager (another
docker container) ( Flink HA setup using remote zk cluster) but soon after
that they get disconnected. Logs are not very helpful either... I suspect
that each taskmanager gets connected on new port and since by default
docker does not expose all ports, this may happen.... I do not see this
happen when I do not use docker container....

Here is the log file that I saw in jobmanager....

2016-03-12 08:55:55,010 PST [INFO]  ec2-54-173-231-120.compute-1.a
[flink-akka.actor.default-dispatcher-20] o.a.f.r.instance.InstanceManager -
Registered TaskManager at 5673db03e679 (akka.tcp://
flink@172.17.0.3:6121/user/taskmanager) as
7eafcfddd6bd084f2ec5a32594603f4f. Current number of registered hosts
is 1. *Current
number of alive task slots is 1.*
2016-03-12 08:57:42,676 PST [INFO]  ec2-54-173-231-120.compute-1.a
[flink-akka.actor.default-dispatcher-20] o.a.f.r.instance.InstanceManager -
Registered TaskManager at 7200a7da4da7 (akka.tcp://
flink@172.17.0.3:6121/user/taskmanager) as
320338e15a7a44ee64dc03a40f04fcd7. Current number of registered hosts
is 2. *Current
number of alive task slots is 2.*
2016-03-12 08:57:48,422 PST [INFO]  ec2-54-173-231-120.compute-1.a
[flink-akka.actor.default-dispatcher-20]
o.a.f.runtime.jobmanager.JobManager - Task manager akka.tcp://
flink@172.17.0.3:6121/user/taskmanager terminated.
2016-03-12 08:57:48,422 PST [INFO]  ec2-54-173-231-120.compute-1.a
[flink-akka.actor.default-dispatcher-20] o.a.f.r.instance.InstanceManager -*
Unregistered task manager akka.tcp://flink@172.17.0.3:6121/user/taskmanager
<http://flink@172.17.0.3:6121/user/taskmanager>. Number of registered task
managers 1. Number of available slots 1.*
2016-03-12 08:58:01,417 PST [WARN]  ec2-54-173-231-120.compute-1.a
[flink-akka.actor.default-dispatcher-20]
a.remote.ReliableDeliverySupervisor - Association with remote system
[akka.tcp://flink@172.17.0.3:6121] has failed, address is now gated for
[5000] ms. Reason is: [Disassociated].
2016-03-12 08:58:01,451 PST [INFO]  ec2-54-173-231-120.compute-1.a
[flink-akka.actor.default-dispatcher-20]
o.a.f.runtime.jobmanager.JobManager - Task manager akka.tcp://
flink@172.17.0.3:6121/user/taskmanager wants to disconnect, because
TaskManager akka://flink/user/taskmanager is disassociating.
2016-03-12 08:58:01,451 PST [INFO]  ec2-54-173-231-120.compute-1.a
[flink-akka.actor.default-dispatcher-20]
o.a.f.r.instance.InstanceManager - *Unregistered
task manager akka.tcp://flink@172.17.0.3:6121/user/taskmanager
<http://flink@172.17.0.3:6121/user/taskmanager>. Number of registered task
managers 0. Number of available slots 0.*
2016-03-12 08:58:01,465 PST [INFO]  ec2-54-173-231-120.compute-1.a
[flink-akka.actor.default-dispatcher-20]
o.a.f.r.instance.InstanceManager - *Registered
TaskManager at 7200a7da4da7
(akka.tcp://flink@172.17.0.3:6121/user/taskmanager
<http://flink@172.17.0.3:6121/user/taskmanager>) as
b5dbbc829854afa3ec5d8f0b6f9dbd03. Current number of registered hosts is 1.
Current number of alive task slots is 1.*
2016-03-12 08:58:03,383 PST [INFO]  ec2-54-173-231-120.compute-1.a
[flink-akka.actor.default-dispatcher-20]
o.a.f.runtime.jobmanager.JobManager - Task manager akka.tcp://
flink@172.17.0.3:6121/user/taskmanager terminated.
2016-03-12 08:58:03,384 PST [INFO]  ec2-54-173-231-120.compute-1.a
[flink-akka.actor.default-dispatcher-20] o.a.f.r.instance.InstanceManager -*
Unregistered task manager akka.tcp://flink@172.17.0.3:6121/user/taskmanager
<http://flink@172.17.0.3:6121/user/taskmanager>. Number of registered task
managers 0. Number of available slots 0.*
2016-03-12 08:58:04,988 PST [INFO]  ec2-54-173-231-120.compute-1.a
[flink-akka.actor.default-dispatcher-20] o.a.f.r.instance.InstanceManager -
Registering TaskManager at akka.tcp://flink@172.17.0.3:6121/user/taskmanager
which was marked as dead earlier because of a heart-beat timeout.
2016-03-12 08:58:04,988 PST [INFO]  ec2-54-173-231-120.compute-1.a
[flink-akka.actor.default-dispatcher-20] o.a.f.r.instance.InstanceManager -
Registered TaskManager at 7200a7da4da7 (akka.tcp://
flink@172.17.0.3:6121/user/taskmanager) as
eac0ce12e6ec885863d3438d691f4ab2. Current number of registered hosts is 1.
Current number of alive task slots is 1.
2016-03-12 08:58:21,382 PST [WARN]  ec2-54-173-231-120.compute-1.a
[flink-akka.actor.default-dispatcher-20]
a.remote.ReliableDeliverySupervisor - Association with remote system
[akka.tcp://flink@172.17.0.3:6121] has failed, address is now gated for
[5000] ms. Reason is: [Disassociated].
2016-03-12 08:58:21,388 PST [INFO]  ec2-54-173-231-120.compute-1.a
[flink-akka.actor.default-dispatcher-20]
o.a.f.runtime.jobmanager.JobManager - Task manager akka.tcp://
flink@172.17.0.3:6121/user/taskmanager wants to disconnect, because
TaskManager akka://flink/user/taskmanager is disassociating.
2016-03-12 08:58:21,388 PST [INFO]  ec2-54-173-231-120.compute-1.a
[flink-akka.actor.default-dispatcher-20] o.a.f.r.instance.InstanceManager -
Unregistered task manager akka.tcp://flink@172.17.0.3:6121/user/taskmanager.
Number of registered task managers 0. Number of available slots 0.
2016-03-12 08:58:21,390 PST [INFO]  ec2-54-173-231-120.compute-1.a
[flink-akka.actor.default-dispatcher-20] o.a.f.r.instance.InstanceManager -
Registered TaskManager at 7200a7da4da7 (akka.tcp://
flink@172.17.0.3:6121/user/taskmanager) as
bda61dbd047d40889aa3868d5d4d86a9. Current number of registered hosts is 1.
Current number of alive task slots is 1.
2016-03-12 08:58:25,433 PST [INFO]  ec2-54-173-231-120.compute-1.a
[flink-akka.actor.default-dispatcher-18]
o.a.f.runtime.jobmanager.JobManager - Task manager akka.tcp://
flink@172.17.0.3:6121/user/taskmanager terminated.
2016-03-12 08:58:25,434 PST [INFO]  ec2-54-173-231-120.compute-1.a
[flink-akka.actor.default-dispatcher-18] o.a.f.r.instance.InstanceManager -
Unregistered task manager akka.tcp://flink@172.17.0.3:6121/user/taskmanager.
Number of registered task managers 0. Number of available slots 0.
2016-03-12 08:58:28,947 PST [INFO]  ec2-54-173-231-120.compute-1.a
[flink-akka.actor.default-dispatcher-20] o.a.f.r.instance.InstanceManager -
Registering TaskManager at akka.tcp://flink@172.17.0.3:6121/user/taskmanager
which was marked as dead earlier because of a heart-beat timeout.
2016-03-12 08:58:28,948 PST [INFO]  ec2-54-173-231-120.compute-1.a
[flink-akka.actor.default-dispatcher-20] o.a.f.r.instance.InstanceManager -
Registered TaskManager at 7200a7da4da7 (akka.tcp://
flink@172.17.0.3:6121/user/taskmanager) as
d42ea5c6e0053935a0973d8536f3d8a5. Current number of registered hosts is 1.
Current number of alive task slots is 1.


On Fri, Mar 11, 2016 at 5:23 AM, Stephan Ewen <se...@apache.org> wrote:

> Hi Deepak!
>
> We can currently not split the bind address and advertised address, because
> the Akka library only accepts packages sent explicitly to the bind address
> (not sure why Akka has this artificial limitation, but it is there).
>
> Can you bridge the container IP address to be visible from the outside?
>
> Stephan
>
>
> On Fri, Mar 11, 2016 at 1:03 PM, Ufuk Celebi <uc...@apache.org> wrote:
>
> > Hey Deepak!
> >
> > Your description of Flink's behaviour is correct. To summarize:
> >
> > # Host Address
> >
> > If you specify a host address as an argument to the JVM (via
> > jobmanager.sh or the start-cluster.sh scripts) then that one is used.
> > If you don't, it falls back to the value configured in flink-conf.yaml
> > (what you describe).
> >
> > # Ports
> >
> > Default used random port and publishes via ZooKeeper. You can
> > configure a port range only via recovery.jobmanager.port (what you
> > describe).
> >
> > ---
> >
> > Your proposal would likely solve the issue, but isn't it possible to
> > handle this outside of Flink? I've found this stack overflow question,
> > which should be related:
> >
> >
> http://stackoverflow.com/questions/26539727/giving-a-docker-container-a-routable-ip-address
> >
> > What's your opinion?
> >
>



-- 
Thanks,
Deepak Jha

Re: Flink-1.0.0 JobManager is not running in Docker Container on AWS

Posted by Stephan Ewen <se...@apache.org>.
Hi Deepak!

We can currently not split the bind address and advertised address, because
the Akka library only accepts packages sent explicitly to the bind address
(not sure why Akka has this artificial limitation, but it is there).

Can you bridge the container IP address to be visible from the outside?

Stephan


On Fri, Mar 11, 2016 at 1:03 PM, Ufuk Celebi <uc...@apache.org> wrote:

> Hey Deepak!
>
> Your description of Flink's behaviour is correct. To summarize:
>
> # Host Address
>
> If you specify a host address as an argument to the JVM (via
> jobmanager.sh or the start-cluster.sh scripts) then that one is used.
> If you don't, it falls back to the value configured in flink-conf.yaml
> (what you describe).
>
> # Ports
>
> Default used random port and publishes via ZooKeeper. You can
> configure a port range only via recovery.jobmanager.port (what you
> describe).
>
> ---
>
> Your proposal would likely solve the issue, but isn't it possible to
> handle this outside of Flink? I've found this stack overflow question,
> which should be related:
>
> http://stackoverflow.com/questions/26539727/giving-a-docker-container-a-routable-ip-address
>
> What's your opinion?
>

Re: Flink-1.0.0 JobManager is not running in Docker Container on AWS

Posted by Ufuk Celebi <uc...@apache.org>.
Hey Deepak!

Your description of Flink's behaviour is correct. To summarize:

# Host Address

If you specify a host address as an argument to the JVM (via
jobmanager.sh or the start-cluster.sh scripts) then that one is used.
If you don't, it falls back to the value configured in flink-conf.yaml
(what you describe).

# Ports

Default used random port and publishes via ZooKeeper. You can
configure a port range only via recovery.jobmanager.port (what you
describe).

---

Your proposal would likely solve the issue, but isn't it possible to
handle this outside of Flink? I've found this stack overflow question,
which should be related:
http://stackoverflow.com/questions/26539727/giving-a-docker-container-a-routable-ip-address

What's your opinion?

Re: Flink-1.0.0 JobManager is not running in Docker Container on AWS

Posted by Deepak Jha <dk...@gmail.com>.
Hi Stephan,
I am able to figure out the issue... Here is my explanation..

As I've said, I'm trying to setup Flink HA cluster in docker containers
managed by Amazon ECS. I've a remote zookeeper cluster running in AWS.
There are few issues when we deploy it using docker

--- Flink uses *jobmanager.rpc.address *to bind as well as for storing it
in the zookeeper. Now this address could be the host_ipaddress or
running_container_ipaddress. If I set it to host_ipaddress then jobmanager
is not able to bind because this is not the container's ip address.  If I
use the container's ip address then it is able to bind, but when it pushes
its details to zookeeper , its container's ip address. So remote
taskmanager's are not able to discover it. Ideally  *jobmanager.rpc.address
*should be split into *jobmanager.bind.address (*to bind to jobmanager*) *and
*jobmanager.discovery.address* (to publish in zookeeper so that remote
taskmanager's can discover it)..

eg: Let's assume

EC2_Instance_Ip = 1.1.1.1
Container_Ip = 2.2.2.2 (This container is running in this EC2_Instance)
recovery.jobmanager.port = 3000
jobmanager.web.port = 8080
I mapped port 3000 on container to 3000 on host and 8080 on container to
8080 on host...

In flink-conf.yml assume
*Case 1*
     jobmanager.rpc.address = 2.2.2.2  (Container's Ip address)
     Now 2.2.2.2 will be written in zookeeper. So external taskmanager
would like to use this address to communicate with the jobmanager but it
will not be able to connect since 2.2.2.2 is not discoverable from outside
EC2 container.

*Case 2*
   jobmanager.rpc.address = 1.1.1.1  (EC2_Instance Ip address)
   Container does not know this address, so it will not be able to bind at
all.

As you can see we need 2 ip address... one for binding and another for
discovery.

---- In docker world we have to expose all the ports we want to use ( in
bridged network mode). By default the jobmanager uses random port number
for communication, since we do not know the port number in advance so we
set r*ecovery.jobmanager.port*  and exposed it in Dockerfile. Same is the
case with blob.server.port on taskmanager's.

Hope I clarified it, please let me know if you have any other question.

On Thu, Mar 10, 2016 at 10:47 AM, Stephan Ewen <se...@apache.org> wrote:

> Hi!
>
> Is it possible that the docker container config forbids to open ports?
> Flink will try to open some ports and needs the OS or container to permit
> that.
>
> Greetings,
> Stephan
>
>
> On Thu, Mar 10, 2016 at 6:27 PM, Deepak Jha <dk...@gmail.com> wrote:
>
> > Hi Stephan,
> > I tried 0.10.2 as well still running into the same issue.
> >
> > On Thursday, March 10, 2016, Deepak Jha <dk...@gmail.com> wrote:
> >
> > > Yes. Flink 1.0.0
> > >
> > > On Thursday, March 10, 2016, Stephan Ewen <sewen@apache.org
> > > <javascript:_e(%7B%7D,'cvml','sewen@apache.org');>> wrote:
> > >
> > >> Hi!
> > >>
> > >> Is this Flink 1.0.0 ?
> > >>
> > >> Stephan
> > >>
> > >>
> > >> On Thu, Mar 10, 2016 at 6:02 AM, Deepak Jha <dk...@gmail.com>
> > wrote:
> > >>
> > >> > Hi All,
> > >> >
> > >> > I'm trying to setup Flink 1.0.0 cluster on Docker (separate
> containers
> > >> for
> > >> > jobmanager and taskmanager) inside AWS (Using AWS ECS service). I
> > >> tested it
> > >> > locally and its working fine but on AWS Docker, I am running into
> > >> following
> > >> > issue
> > >> >
> > >> > *2016-03-09 18:04:12,114 PST [INFO]  ec2-52-3-248-202.compute-1.ama
> > >> [main]
> > >> > o.a.f.runtime.jobmanager.JobManager - Starting JobManager with
> > >> > high-availability*
> > >> > *2016-03-09 18:04:12,118 PST [INFO]  ec2-52-3-248-202.compute-1.ama
> > >> [main]
> > >> > o.a.f.runtime.jobmanager.JobManager - Starting JobManager on
> > >> > 172.31.63.152:8079 <http://172.31.63.152:8079> with execution mode
> > >> CLUSTER*
> > >> > *2016-03-09 18:04:12,172 PST [INFO]  ec2-52-3-248-202.compute-1.ama
> > >> [main]
> > >> > o.a.f.runtime.jobmanager.JobManager - Security is not enabled.
> > Starting
> > >> > non-authenticated JobManager.*
> > >> > *2016-03-09 18:04:12,174 PST [DEBUG] ec2-52-3-248-202.compute-1.ama
> > >> [main]
> > >> > org.apache.flink.util.NetUtils - Trying to open socket on port 8079*
> > >> > *2016-03-09 18:04:12,176 PST [DEBUG] ec2-52-3-248-202.compute-1.ama
> > >> [main]
> > >> > org.apache.flink.util.NetUtils - Unable to allocate socket on port*
> > >> > *java.net.BindException: Cannot assign requested address*
> > >> > *    at java.net.PlainSocketImpl.socketBind(Native Method)*
> > >> > *    at
> > >> >
> > java.net.AbstractPlainSocketImpl.bind(AbstractPlainSocketImpl.java:387)*
> > >> > *    at java.net.ServerSocket.bind(ServerSocket.java:375)*
> > >> > *    at java.net.ServerSocket.<init>(ServerSocket.java:237)*
> > >> > *    at
> > >> >
> > >>
> >
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$2$$anon$3.createSocket(JobManager.scala:1722)*
> > >> > *    at
> > >> >
> > org.apache.flink.util.NetUtils.createSocketFromPorts(NetUtils.java:237)*
> > >> > *    at
> > >> >
> > >>
> >
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$2.apply$mcV$sp(JobManager.scala:1719)*
> > >> > *    at
> > >> >
> > >>
> >
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$2.apply(JobManager.scala:1717)*
> > >> > *    at
> > >> >
> > >>
> >
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$2.apply(JobManager.scala:1717)*
> > >> > *    at scala.util.Try$.apply(Try.scala:192)*
> > >> > *    at
> > >> >
> > >>
> >
> org.apache.flink.runtime.jobmanager.JobManager$.retryOnBindException(JobManager.scala:1772)*
> > >> > *    at
> > >> >
> > >>
> >
> org.apache.flink.runtime.jobmanager.JobManager$.runJobManager(JobManager.scala:1717)*
> > >> > *    at
> > >> >
> > >>
> >
> org.apache.flink.runtime.jobmanager.JobManager$.main(JobManager.scala:1653)*
> > >> > *    at
> > >> >
> org.apache.flink.runtime.jobmanager.JobManager.main(JobManager.scala)*
> > >> > *2016-03-09 18:04:12,180 PST [ERROR] ec2-52-3-248-202.compute-1.ama
> > >> [main]
> > >> > o.a.f.runtime.jobmanager.JobManager - Failed to run JobManager.*
> > >> > *java.lang.RuntimeException: Unable to do further retries starting
> the
> > >> > actor system*
> > >> > *    at
> > >> >
> > >>
> >
> org.apache.flink.runtime.jobmanager.JobManager$.retryOnBindException(JobManager.scala:1777)*
> > >> > *    at
> > >> >
> > >>
> >
> org.apache.flink.runtime.jobmanager.JobManager$.runJobManager(JobManager.scala:1717)*
> > >> > *    at
> > >> >
> > >>
> >
> org.apache.flink.runtime.jobmanager.JobManager$.main(JobManager.scala:1653)*
> > >> > *    at
> > >> >
> org.apache.flink.runtime.jobmanager.JobManager.main(JobManager.scala)*
> > >> > *2016-03-09 18:04:12,991 PST [DEBUG] ec2-52-3-248-202.compute-1.ama
> > >> [main]
> > >> > o.a.h.m.lib.MutableMetricsFactory - field
> > >> > org.apache.hadoop.metrics2.lib.MutableRate
> > >> >
> > org.apache.hadoop.security.UserGroupInformation$UgiMetrics.loginSuccess
> > >> > with annotation
> @org.apache.hadoop.metrics2.annotation.Metric(about=,
> > >> > sampleName=Ops, always=false, type=DEFAULT, value=[Rate of
> successful
> > >> > kerberos logins and latency (milliseconds)], valueName=Time)*
> > >> >
> > >> >
> > >> > Initially Jobmanager tries to bind to port 0 which did not work. On
> > >> > looking further into it, I tried using recovery jobmanager port
> using
> > >> > different port combinations, but it does not seems to be working...
> > I've
> > >> > exposed the ports in the docker compose file as well....
> > >> >
> > >> >
> > >> > PFA the jobmanager log file for details also the jobmanager config
> > >> file...
> > >> > --
> > >> > Thanks,
> > >> > Deepak Jha
> > >> >
> > >> >
> > >>
> > >
> > >
> > > --
> > > Sent from Gmail Mobile
> > >
> >
> >
> > --
> > Sent from Gmail Mobile
> >
>



-- 
Thanks,
Deepak Jha

Re: Flink-1.0.0 JobManager is not running in Docker Container on AWS

Posted by Stephan Ewen <se...@apache.org>.
Hi!

Is it possible that the docker container config forbids to open ports?
Flink will try to open some ports and needs the OS or container to permit
that.

Greetings,
Stephan


On Thu, Mar 10, 2016 at 6:27 PM, Deepak Jha <dk...@gmail.com> wrote:

> Hi Stephan,
> I tried 0.10.2 as well still running into the same issue.
>
> On Thursday, March 10, 2016, Deepak Jha <dk...@gmail.com> wrote:
>
> > Yes. Flink 1.0.0
> >
> > On Thursday, March 10, 2016, Stephan Ewen <sewen@apache.org
> > <javascript:_e(%7B%7D,'cvml','sewen@apache.org');>> wrote:
> >
> >> Hi!
> >>
> >> Is this Flink 1.0.0 ?
> >>
> >> Stephan
> >>
> >>
> >> On Thu, Mar 10, 2016 at 6:02 AM, Deepak Jha <dk...@gmail.com>
> wrote:
> >>
> >> > Hi All,
> >> >
> >> > I'm trying to setup Flink 1.0.0 cluster on Docker (separate containers
> >> for
> >> > jobmanager and taskmanager) inside AWS (Using AWS ECS service). I
> >> tested it
> >> > locally and its working fine but on AWS Docker, I am running into
> >> following
> >> > issue
> >> >
> >> > *2016-03-09 18:04:12,114 PST [INFO]  ec2-52-3-248-202.compute-1.ama
> >> [main]
> >> > o.a.f.runtime.jobmanager.JobManager - Starting JobManager with
> >> > high-availability*
> >> > *2016-03-09 18:04:12,118 PST [INFO]  ec2-52-3-248-202.compute-1.ama
> >> [main]
> >> > o.a.f.runtime.jobmanager.JobManager - Starting JobManager on
> >> > 172.31.63.152:8079 <http://172.31.63.152:8079> with execution mode
> >> CLUSTER*
> >> > *2016-03-09 18:04:12,172 PST [INFO]  ec2-52-3-248-202.compute-1.ama
> >> [main]
> >> > o.a.f.runtime.jobmanager.JobManager - Security is not enabled.
> Starting
> >> > non-authenticated JobManager.*
> >> > *2016-03-09 18:04:12,174 PST [DEBUG] ec2-52-3-248-202.compute-1.ama
> >> [main]
> >> > org.apache.flink.util.NetUtils - Trying to open socket on port 8079*
> >> > *2016-03-09 18:04:12,176 PST [DEBUG] ec2-52-3-248-202.compute-1.ama
> >> [main]
> >> > org.apache.flink.util.NetUtils - Unable to allocate socket on port*
> >> > *java.net.BindException: Cannot assign requested address*
> >> > *    at java.net.PlainSocketImpl.socketBind(Native Method)*
> >> > *    at
> >> >
> java.net.AbstractPlainSocketImpl.bind(AbstractPlainSocketImpl.java:387)*
> >> > *    at java.net.ServerSocket.bind(ServerSocket.java:375)*
> >> > *    at java.net.ServerSocket.<init>(ServerSocket.java:237)*
> >> > *    at
> >> >
> >>
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$2$$anon$3.createSocket(JobManager.scala:1722)*
> >> > *    at
> >> >
> org.apache.flink.util.NetUtils.createSocketFromPorts(NetUtils.java:237)*
> >> > *    at
> >> >
> >>
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$2.apply$mcV$sp(JobManager.scala:1719)*
> >> > *    at
> >> >
> >>
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$2.apply(JobManager.scala:1717)*
> >> > *    at
> >> >
> >>
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$2.apply(JobManager.scala:1717)*
> >> > *    at scala.util.Try$.apply(Try.scala:192)*
> >> > *    at
> >> >
> >>
> org.apache.flink.runtime.jobmanager.JobManager$.retryOnBindException(JobManager.scala:1772)*
> >> > *    at
> >> >
> >>
> org.apache.flink.runtime.jobmanager.JobManager$.runJobManager(JobManager.scala:1717)*
> >> > *    at
> >> >
> >>
> org.apache.flink.runtime.jobmanager.JobManager$.main(JobManager.scala:1653)*
> >> > *    at
> >> > org.apache.flink.runtime.jobmanager.JobManager.main(JobManager.scala)*
> >> > *2016-03-09 18:04:12,180 PST [ERROR] ec2-52-3-248-202.compute-1.ama
> >> [main]
> >> > o.a.f.runtime.jobmanager.JobManager - Failed to run JobManager.*
> >> > *java.lang.RuntimeException: Unable to do further retries starting the
> >> > actor system*
> >> > *    at
> >> >
> >>
> org.apache.flink.runtime.jobmanager.JobManager$.retryOnBindException(JobManager.scala:1777)*
> >> > *    at
> >> >
> >>
> org.apache.flink.runtime.jobmanager.JobManager$.runJobManager(JobManager.scala:1717)*
> >> > *    at
> >> >
> >>
> org.apache.flink.runtime.jobmanager.JobManager$.main(JobManager.scala:1653)*
> >> > *    at
> >> > org.apache.flink.runtime.jobmanager.JobManager.main(JobManager.scala)*
> >> > *2016-03-09 18:04:12,991 PST [DEBUG] ec2-52-3-248-202.compute-1.ama
> >> [main]
> >> > o.a.h.m.lib.MutableMetricsFactory - field
> >> > org.apache.hadoop.metrics2.lib.MutableRate
> >> >
> org.apache.hadoop.security.UserGroupInformation$UgiMetrics.loginSuccess
> >> > with annotation @org.apache.hadoop.metrics2.annotation.Metric(about=,
> >> > sampleName=Ops, always=false, type=DEFAULT, value=[Rate of successful
> >> > kerberos logins and latency (milliseconds)], valueName=Time)*
> >> >
> >> >
> >> > Initially Jobmanager tries to bind to port 0 which did not work. On
> >> > looking further into it, I tried using recovery jobmanager port using
> >> > different port combinations, but it does not seems to be working...
> I've
> >> > exposed the ports in the docker compose file as well....
> >> >
> >> >
> >> > PFA the jobmanager log file for details also the jobmanager config
> >> file...
> >> > --
> >> > Thanks,
> >> > Deepak Jha
> >> >
> >> >
> >>
> >
> >
> > --
> > Sent from Gmail Mobile
> >
>
>
> --
> Sent from Gmail Mobile
>

Re: Flink-1.0.0 JobManager is not running in Docker Container on AWS

Posted by Deepak Jha <dk...@gmail.com>.
Hi Stephan,
I tried 0.10.2 as well still running into the same issue.

On Thursday, March 10, 2016, Deepak Jha <dk...@gmail.com> wrote:

> Yes. Flink 1.0.0
>
> On Thursday, March 10, 2016, Stephan Ewen <sewen@apache.org
> <javascript:_e(%7B%7D,'cvml','sewen@apache.org');>> wrote:
>
>> Hi!
>>
>> Is this Flink 1.0.0 ?
>>
>> Stephan
>>
>>
>> On Thu, Mar 10, 2016 at 6:02 AM, Deepak Jha <dk...@gmail.com> wrote:
>>
>> > Hi All,
>> >
>> > I'm trying to setup Flink 1.0.0 cluster on Docker (separate containers
>> for
>> > jobmanager and taskmanager) inside AWS (Using AWS ECS service). I
>> tested it
>> > locally and its working fine but on AWS Docker, I am running into
>> following
>> > issue
>> >
>> > *2016-03-09 18:04:12,114 PST [INFO]  ec2-52-3-248-202.compute-1.ama
>> [main]
>> > o.a.f.runtime.jobmanager.JobManager - Starting JobManager with
>> > high-availability*
>> > *2016-03-09 18:04:12,118 PST [INFO]  ec2-52-3-248-202.compute-1.ama
>> [main]
>> > o.a.f.runtime.jobmanager.JobManager - Starting JobManager on
>> > 172.31.63.152:8079 <http://172.31.63.152:8079> with execution mode
>> CLUSTER*
>> > *2016-03-09 18:04:12,172 PST [INFO]  ec2-52-3-248-202.compute-1.ama
>> [main]
>> > o.a.f.runtime.jobmanager.JobManager - Security is not enabled. Starting
>> > non-authenticated JobManager.*
>> > *2016-03-09 18:04:12,174 PST [DEBUG] ec2-52-3-248-202.compute-1.ama
>> [main]
>> > org.apache.flink.util.NetUtils - Trying to open socket on port 8079*
>> > *2016-03-09 18:04:12,176 PST [DEBUG] ec2-52-3-248-202.compute-1.ama
>> [main]
>> > org.apache.flink.util.NetUtils - Unable to allocate socket on port*
>> > *java.net.BindException: Cannot assign requested address*
>> > *    at java.net.PlainSocketImpl.socketBind(Native Method)*
>> > *    at
>> > java.net.AbstractPlainSocketImpl.bind(AbstractPlainSocketImpl.java:387)*
>> > *    at java.net.ServerSocket.bind(ServerSocket.java:375)*
>> > *    at java.net.ServerSocket.<init>(ServerSocket.java:237)*
>> > *    at
>> >
>> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$2$$anon$3.createSocket(JobManager.scala:1722)*
>> > *    at
>> > org.apache.flink.util.NetUtils.createSocketFromPorts(NetUtils.java:237)*
>> > *    at
>> >
>> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$2.apply$mcV$sp(JobManager.scala:1719)*
>> > *    at
>> >
>> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$2.apply(JobManager.scala:1717)*
>> > *    at
>> >
>> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$2.apply(JobManager.scala:1717)*
>> > *    at scala.util.Try$.apply(Try.scala:192)*
>> > *    at
>> >
>> org.apache.flink.runtime.jobmanager.JobManager$.retryOnBindException(JobManager.scala:1772)*
>> > *    at
>> >
>> org.apache.flink.runtime.jobmanager.JobManager$.runJobManager(JobManager.scala:1717)*
>> > *    at
>> >
>> org.apache.flink.runtime.jobmanager.JobManager$.main(JobManager.scala:1653)*
>> > *    at
>> > org.apache.flink.runtime.jobmanager.JobManager.main(JobManager.scala)*
>> > *2016-03-09 18:04:12,180 PST [ERROR] ec2-52-3-248-202.compute-1.ama
>> [main]
>> > o.a.f.runtime.jobmanager.JobManager - Failed to run JobManager.*
>> > *java.lang.RuntimeException: Unable to do further retries starting the
>> > actor system*
>> > *    at
>> >
>> org.apache.flink.runtime.jobmanager.JobManager$.retryOnBindException(JobManager.scala:1777)*
>> > *    at
>> >
>> org.apache.flink.runtime.jobmanager.JobManager$.runJobManager(JobManager.scala:1717)*
>> > *    at
>> >
>> org.apache.flink.runtime.jobmanager.JobManager$.main(JobManager.scala:1653)*
>> > *    at
>> > org.apache.flink.runtime.jobmanager.JobManager.main(JobManager.scala)*
>> > *2016-03-09 18:04:12,991 PST [DEBUG] ec2-52-3-248-202.compute-1.ama
>> [main]
>> > o.a.h.m.lib.MutableMetricsFactory - field
>> > org.apache.hadoop.metrics2.lib.MutableRate
>> > org.apache.hadoop.security.UserGroupInformation$UgiMetrics.loginSuccess
>> > with annotation @org.apache.hadoop.metrics2.annotation.Metric(about=,
>> > sampleName=Ops, always=false, type=DEFAULT, value=[Rate of successful
>> > kerberos logins and latency (milliseconds)], valueName=Time)*
>> >
>> >
>> > Initially Jobmanager tries to bind to port 0 which did not work. On
>> > looking further into it, I tried using recovery jobmanager port using
>> > different port combinations, but it does not seems to be working... I've
>> > exposed the ports in the docker compose file as well....
>> >
>> >
>> > PFA the jobmanager log file for details also the jobmanager config
>> file...
>> > --
>> > Thanks,
>> > Deepak Jha
>> >
>> >
>>
>
>
> --
> Sent from Gmail Mobile
>


-- 
Sent from Gmail Mobile

Re: Flink-1.0.0 JobManager is not running in Docker Container on AWS

Posted by Deepak Jha <dk...@gmail.com>.
Yes. Flink 1.0.0

On Thursday, March 10, 2016, Stephan Ewen <se...@apache.org> wrote:

> Hi!
>
> Is this Flink 1.0.0 ?
>
> Stephan
>
>
> On Thu, Mar 10, 2016 at 6:02 AM, Deepak Jha <dkjhanitt@gmail.com
> <javascript:;>> wrote:
>
> > Hi All,
> >
> > I'm trying to setup Flink 1.0.0 cluster on Docker (separate containers
> for
> > jobmanager and taskmanager) inside AWS (Using AWS ECS service). I tested
> it
> > locally and its working fine but on AWS Docker, I am running into
> following
> > issue
> >
> > *2016-03-09 18:04:12,114 PST [INFO]  ec2-52-3-248-202.compute-1.ama
> [main]
> > o.a.f.runtime.jobmanager.JobManager - Starting JobManager with
> > high-availability*
> > *2016-03-09 18:04:12,118 PST [INFO]  ec2-52-3-248-202.compute-1.ama
> [main]
> > o.a.f.runtime.jobmanager.JobManager - Starting JobManager on
> > 172.31.63.152:8079 <http://172.31.63.152:8079> with execution mode
> CLUSTER*
> > *2016-03-09 18:04:12,172 PST [INFO]  ec2-52-3-248-202.compute-1.ama
> [main]
> > o.a.f.runtime.jobmanager.JobManager - Security is not enabled. Starting
> > non-authenticated JobManager.*
> > *2016-03-09 18:04:12,174 PST [DEBUG] ec2-52-3-248-202.compute-1.ama
> [main]
> > org.apache.flink.util.NetUtils - Trying to open socket on port 8079*
> > *2016-03-09 18:04:12,176 PST [DEBUG] ec2-52-3-248-202.compute-1.ama
> [main]
> > org.apache.flink.util.NetUtils - Unable to allocate socket on port*
> > *java.net.BindException: Cannot assign requested address*
> > *    at java.net.PlainSocketImpl.socketBind(Native Method)*
> > *    at
> > java.net.AbstractPlainSocketImpl.bind(AbstractPlainSocketImpl.java:387)*
> > *    at java.net.ServerSocket.bind(ServerSocket.java:375)*
> > *    at java.net.ServerSocket.<init>(ServerSocket.java:237)*
> > *    at
> >
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$2$$anon$3.createSocket(JobManager.scala:1722)*
> > *    at
> > org.apache.flink.util.NetUtils.createSocketFromPorts(NetUtils.java:237)*
> > *    at
> >
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$2.apply$mcV$sp(JobManager.scala:1719)*
> > *    at
> >
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$2.apply(JobManager.scala:1717)*
> > *    at
> >
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$2.apply(JobManager.scala:1717)*
> > *    at scala.util.Try$.apply(Try.scala:192)*
> > *    at
> >
> org.apache.flink.runtime.jobmanager.JobManager$.retryOnBindException(JobManager.scala:1772)*
> > *    at
> >
> org.apache.flink.runtime.jobmanager.JobManager$.runJobManager(JobManager.scala:1717)*
> > *    at
> >
> org.apache.flink.runtime.jobmanager.JobManager$.main(JobManager.scala:1653)*
> > *    at
> > org.apache.flink.runtime.jobmanager.JobManager.main(JobManager.scala)*
> > *2016-03-09 18:04:12,180 PST [ERROR] ec2-52-3-248-202.compute-1.ama
> [main]
> > o.a.f.runtime.jobmanager.JobManager - Failed to run JobManager.*
> > *java.lang.RuntimeException: Unable to do further retries starting the
> > actor system*
> > *    at
> >
> org.apache.flink.runtime.jobmanager.JobManager$.retryOnBindException(JobManager.scala:1777)*
> > *    at
> >
> org.apache.flink.runtime.jobmanager.JobManager$.runJobManager(JobManager.scala:1717)*
> > *    at
> >
> org.apache.flink.runtime.jobmanager.JobManager$.main(JobManager.scala:1653)*
> > *    at
> > org.apache.flink.runtime.jobmanager.JobManager.main(JobManager.scala)*
> > *2016-03-09 18:04:12,991 PST [DEBUG] ec2-52-3-248-202.compute-1.ama
> [main]
> > o.a.h.m.lib.MutableMetricsFactory - field
> > org.apache.hadoop.metrics2.lib.MutableRate
> > org.apache.hadoop.security.UserGroupInformation$UgiMetrics.loginSuccess
> > with annotation @org.apache.hadoop.metrics2.annotation.Metric(about=,
> > sampleName=Ops, always=false, type=DEFAULT, value=[Rate of successful
> > kerberos logins and latency (milliseconds)], valueName=Time)*
> >
> >
> > Initially Jobmanager tries to bind to port 0 which did not work. On
> > looking further into it, I tried using recovery jobmanager port using
> > different port combinations, but it does not seems to be working... I've
> > exposed the ports in the docker compose file as well....
> >
> >
> > PFA the jobmanager log file for details also the jobmanager config
> file...
> > --
> > Thanks,
> > Deepak Jha
> >
> >
>


-- 
Sent from Gmail Mobile

Re: Flink-1.0.0 JobManager is not running in Docker Container on AWS

Posted by Stephan Ewen <se...@apache.org>.
Hi!

Is this Flink 1.0.0 ?

Stephan


On Thu, Mar 10, 2016 at 6:02 AM, Deepak Jha <dk...@gmail.com> wrote:

> Hi All,
>
> I'm trying to setup Flink 1.0.0 cluster on Docker (separate containers for
> jobmanager and taskmanager) inside AWS (Using AWS ECS service). I tested it
> locally and its working fine but on AWS Docker, I am running into following
> issue
>
> *2016-03-09 18:04:12,114 PST [INFO]  ec2-52-3-248-202.compute-1.ama [main]
> o.a.f.runtime.jobmanager.JobManager - Starting JobManager with
> high-availability*
> *2016-03-09 18:04:12,118 PST [INFO]  ec2-52-3-248-202.compute-1.ama [main]
> o.a.f.runtime.jobmanager.JobManager - Starting JobManager on
> 172.31.63.152:8079 <http://172.31.63.152:8079> with execution mode CLUSTER*
> *2016-03-09 18:04:12,172 PST [INFO]  ec2-52-3-248-202.compute-1.ama [main]
> o.a.f.runtime.jobmanager.JobManager - Security is not enabled. Starting
> non-authenticated JobManager.*
> *2016-03-09 18:04:12,174 PST [DEBUG] ec2-52-3-248-202.compute-1.ama [main]
> org.apache.flink.util.NetUtils - Trying to open socket on port 8079*
> *2016-03-09 18:04:12,176 PST [DEBUG] ec2-52-3-248-202.compute-1.ama [main]
> org.apache.flink.util.NetUtils - Unable to allocate socket on port*
> *java.net.BindException: Cannot assign requested address*
> *    at java.net.PlainSocketImpl.socketBind(Native Method)*
> *    at
> java.net.AbstractPlainSocketImpl.bind(AbstractPlainSocketImpl.java:387)*
> *    at java.net.ServerSocket.bind(ServerSocket.java:375)*
> *    at java.net.ServerSocket.<init>(ServerSocket.java:237)*
> *    at
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$2$$anon$3.createSocket(JobManager.scala:1722)*
> *    at
> org.apache.flink.util.NetUtils.createSocketFromPorts(NetUtils.java:237)*
> *    at
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$2.apply$mcV$sp(JobManager.scala:1719)*
> *    at
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$2.apply(JobManager.scala:1717)*
> *    at
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$2.apply(JobManager.scala:1717)*
> *    at scala.util.Try$.apply(Try.scala:192)*
> *    at
> org.apache.flink.runtime.jobmanager.JobManager$.retryOnBindException(JobManager.scala:1772)*
> *    at
> org.apache.flink.runtime.jobmanager.JobManager$.runJobManager(JobManager.scala:1717)*
> *    at
> org.apache.flink.runtime.jobmanager.JobManager$.main(JobManager.scala:1653)*
> *    at
> org.apache.flink.runtime.jobmanager.JobManager.main(JobManager.scala)*
> *2016-03-09 18:04:12,180 PST [ERROR] ec2-52-3-248-202.compute-1.ama [main]
> o.a.f.runtime.jobmanager.JobManager - Failed to run JobManager.*
> *java.lang.RuntimeException: Unable to do further retries starting the
> actor system*
> *    at
> org.apache.flink.runtime.jobmanager.JobManager$.retryOnBindException(JobManager.scala:1777)*
> *    at
> org.apache.flink.runtime.jobmanager.JobManager$.runJobManager(JobManager.scala:1717)*
> *    at
> org.apache.flink.runtime.jobmanager.JobManager$.main(JobManager.scala:1653)*
> *    at
> org.apache.flink.runtime.jobmanager.JobManager.main(JobManager.scala)*
> *2016-03-09 18:04:12,991 PST [DEBUG] ec2-52-3-248-202.compute-1.ama [main]
> o.a.h.m.lib.MutableMetricsFactory - field
> org.apache.hadoop.metrics2.lib.MutableRate
> org.apache.hadoop.security.UserGroupInformation$UgiMetrics.loginSuccess
> with annotation @org.apache.hadoop.metrics2.annotation.Metric(about=,
> sampleName=Ops, always=false, type=DEFAULT, value=[Rate of successful
> kerberos logins and latency (milliseconds)], valueName=Time)*
>
>
> Initially Jobmanager tries to bind to port 0 which did not work. On
> looking further into it, I tried using recovery jobmanager port using
> different port combinations, but it does not seems to be working... I've
> exposed the ports in the docker compose file as well....
>
>
> PFA the jobmanager log file for details also the jobmanager config file...
> --
> Thanks,
> Deepak Jha
>
>