You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@flink.apache.org by Pieter Hameete <ph...@gmail.com> on 2016/02/06 12:14:09 UTC

Flink on YARN: Stuck on "Trying to register at JobManager"

Hi Guys!

Im attempting to run Flink on YARN, but I run into an issue. Im starting
the Flink YARN session from an Ubuntu 14.04 VM. All goes well until after
the JobManager web UI is started:

JobManager web interface address
http://head05.hathi.surfsara.nl:8088/proxy/application_1452780322684_10532/
Waiting until all TaskManagers have connected
11:09:51,557 INFO  org.apache.flink.yarn.ApplicationClient
      - Notification about new leader address akka.tcp://
flink@145.100.41.148:35666/user/jobmanager with session ID null.
No status updates from the YARN cluster received so far. Waiting ...
11:09:51,578 INFO  org.apache.flink.yarn.ApplicationClient
      - Received address of new leader akka.tcp://
flink@145.100.41.148:35666/user/jobmanager with session ID null.
11:09:51,583 INFO  org.apache.flink.yarn.ApplicationClient
      - Disconnect from JobManager null.
11:09:51,595 INFO  org.apache.flink.yarn.ApplicationClient
      - Trying to register at JobManager akka.tcp://
flink@145.100.41.148:35666/user/jobmanager.
No status updates from the YARN cluster received so far. Waiting ...
No status updates from the YARN cluster received so far. Waiting ...

It then hangs on these last steps (trying to register, no status updates..)

Im sure there must be a problem on my side that is causing me not to be
able to register at the JobManager. What could cause such connection
problems?

Any tips are very welcome :-)

Cheers and have a good weekend!

- Pieter

Re: Flink on YARN: Stuck on "Trying to register at JobManager"

Posted by Pieter Hameete <ph...@gmail.com>.

Solved: indeed it needed to be built for YARN 2.7.1 specifically. Cheers!

2016-02-08 19:13 GMT+01:00 Robert Metzger <rm...@apache.org>:

> Mh, that's weird. Maybe both resource managers are marked as "standby"?
> Not sure what can cause this issue.
>
> Which YARN version are you using? Maybe you need to build Flink against
> that specific hadoop version yourself.
>
> On Mon, Feb 8, 2016 at 5:50 PM, Pieter Hameete <ph...@gmail.com> wrote:
>
>> After downloading and building the 1.0-SNAPSHOT from the master branch I
>> do run into another problem when starting a YARN cluster. The startup now
>> infinitely loops at the following step:
>>
>> 17:39:12,369 INFO
>> org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider  - Failing
>> over to rm2
>> 17:39:34,855 INFO
>> org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider  - Failing
>> over to rm1
>>
>> Any clue what couldve gone wrong? I used all-default for building with
>> maven.
>>
>> - Pieter
>>
>>
>>
>> 2016-02-08 17:07 GMT+01:00 Pieter Hameete <ph...@gmail.com>:
>>
>>> Matter of RTFM eh ;-) thx and sorry for the bother.
>>>
>>> 2016-02-08 17:06 GMT+01:00 Robert Metzger <rm...@apache.org>:
>>>
>>>> You said earlier that you are using Flink 0.10. The feature is only
>>>> available in 1.0-SNAPSHOT.
>>>>
>>>> On Mon, Feb 8, 2016 at 4:53 PM, Pieter Hameete <ph...@gmail.com>
>>>> wrote:
>>>>
>>>>> Ive tried setting the yarn.application-master.port property in
>>>>> flink-conf.yaml to a range suggested in
>>>>> https://ci.apache.org/projects/flink/flink-docs-master/setup/yarn_setup.html#running-flink-on-yarn-behind-fi
>>>>> rewalls
>>>>>
>>>>> The JobManager does not seem to be picking the property up. Am I
>>>>> setting this in the wrong place? Or is there another way to enforce this
>>>>> property?
>>>>>
>>>>> Cheers,
>>>>>
>>>>> Pieter
>>>>>
>>>>> 2016-02-07 20:04 GMT+01:00 Pieter Hameete <ph...@gmail.com>:
>>>>>
>>>>>> I found the relevant information on the website. Ill consult with the
>>>>>> cluster admin tomorrow, thanks for the help :-)
>>>>>>
>>>>>> - Pieter
>>>>>>
>>>>>> 2016-02-07 19:31 GMT+01:00 Robert Metzger <rm...@apache.org>:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> we had other users with a similar issue as well. There is a
>>>>>>> configuration value which allows you to specify a single port or a range of
>>>>>>> ports for the JobManager to allocate when running on YARN.
>>>>>>> Note that when using this with a single port, the JMs may collide.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Sun, Feb 7, 2016 at 7:25 PM, Pieter Hameete <ph...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi Stephan,
>>>>>>>>
>>>>>>>> surely it seems this way! I must not be the first with this issue
>>>>>>>> though? I'll have to contact the cluster admins to find a solution
>>>>>>>> together. What would be a way of make the JobManagers accessible from
>>>>>>>> outside the network, because the IP and port number changes every time.
>>>>>>>>
>>>>>>>> Alternatively, I can ask for ssh access to a node within the
>>>>>>>> network. that will surely work but it's not my preferred solution.
>>>>>>>>
>>>>>>>> - Pieter
>>>>>>>>
>>>>>>>> 2016-02-06 16:22 GMT+01:00 Stephan Ewen <se...@apache.org>:
>>>>>>>>
>>>>>>>>> Yeah, sounds a lot like the client cannot connect to the
>>>>>>>>> JobManager port.
>>>>>>>>>
>>>>>>>>> The ports to communicate with HDFS and the YARN resource manager
>>>>>>>>> may be whitelisted r forwarded, so you can submit the YARN session, but
>>>>>>>>> then not connect to the JobManager afterwards.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Sat, Feb 6, 2016 at 2:11 PM, Pieter Hameete <phameete@gmail.com
>>>>>>>>> > wrote:
>>>>>>>>>
>>>>>>>>>> Hi Max!
>>>>>>>>>>
>>>>>>>>>> I'm using Flink 0.10.1 and indeed the cluster seems to be created
>>>>>>>>>> fine, all in the JobManager Web UI looks good.
>>>>>>>>>>
>>>>>>>>>> It seems like the JobManager initiates the connection with my VM
>>>>>>>>>> and cannot reach it. It could be that this is similar to the problem here:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> http://apache-spark-user-list.1001560.n3.nabble.com/spark-with-docker-errors-with-akka-NAT-td7702.html
>>>>>>>>>>
>>>>>>>>>> I probably have to make some changes to the networking
>>>>>>>>>> configuration of my VM so it can be reached by the JobManager despite using
>>>>>>>>>> a different port each time.
>>>>>>>>>>
>>>>>>>>>> - Pieter
>>>>>>>>>>
>>>>>>>>>> 2016-02-06 14:05 GMT+01:00 Maximilian Michels <mx...@apache.org>:
>>>>>>>>>>
>>>>>>>>>>> Hi Pieter,
>>>>>>>>>>>
>>>>>>>>>>> Which version of Flink are you using? It appears you've created a
>>>>>>>>>>> Flink YARN cluster but you can't reach the JobManager afterwards.
>>>>>>>>>>>
>>>>>>>>>>> Cheers,
>>>>>>>>>>> Max
>>>>>>>>>>>
>>>>>>>>>>> On Sat, Feb 6, 2016 at 1:42 PM, Pieter Hameete <
>>>>>>>>>>> phameete@gmail.com> wrote:
>>>>>>>>>>> > Hi Robert,
>>>>>>>>>>> >
>>>>>>>>>>> > unfortunately there are no signs of what is going wrong in the
>>>>>>>>>>> logs. The
>>>>>>>>>>> > last log messages are about succesful registration of the
>>>>>>>>>>> TaskManagers.
>>>>>>>>>>> >
>>>>>>>>>>> > I'm also fairly sure it must be something in my VM that is
>>>>>>>>>>> causing this,
>>>>>>>>>>> > because when I start the yarn-session from a login node that
>>>>>>>>>>> is on the same
>>>>>>>>>>> > network as the hadoop cluster there are no problems
>>>>>>>>>>> registering with the
>>>>>>>>>>> > JobManager. I did also notice the following message in the
>>>>>>>>>>> local console:
>>>>>>>>>>> >
>>>>>>>>>>> > 12:30:27,173 WARN  Remoting
>>>>>>>>>>> > - Tried to associate with unreachable remote address
>>>>>>>>>>> > [akka.tcp://flink@145.100.41.13:41539]. Address is now gated
>>>>>>>>>>> for 5000 ms,
>>>>>>>>>>> > all messages to this address will be delivered to dead
>>>>>>>>>>> letters. Reason:
>>>>>>>>>>> > connection timed out: /145.100.41.13:41539
>>>>>>>>>>> >
>>>>>>>>>>> > I can ping the JobManager fine from with VM. Could there be
>>>>>>>>>>> some invalid or
>>>>>>>>>>> > missing configuration on my side?
>>>>>>>>>>> >
>>>>>>>>>>> > Cheers,
>>>>>>>>>>> >
>>>>>>>>>>> > Pieter
>>>>>>>>>>> >
>>>>>>>>>>> >
>>>>>>>>>>> > 2016-02-06 12:54 GMT+01:00 Robert Metzger <rmetzger@apache.org
>>>>>>>>>>> >:
>>>>>>>>>>> >>
>>>>>>>>>>> >> Hi,
>>>>>>>>>>> >>
>>>>>>>>>>> >> did you check the logs of the JobManager itself? Maybe it'll
>>>>>>>>>>> tell us
>>>>>>>>>>> >> already whats going on.
>>>>>>>>>>> >>
>>>>>>>>>>> >> On Sat, Feb 6, 2016 at 12:14 PM, Pieter Hameete <
>>>>>>>>>>> phameete@gmail.com>
>>>>>>>>>>> >> wrote:
>>>>>>>>>>> >>>
>>>>>>>>>>> >>> Hi Guys!
>>>>>>>>>>> >>>
>>>>>>>>>>> >>> Im attempting to run Flink on YARN, but I run into an issue.
>>>>>>>>>>> Im starting
>>>>>>>>>>> >>> the Flink YARN session from an Ubuntu 14.04 VM. All goes
>>>>>>>>>>> well until after
>>>>>>>>>>> >>> the JobManager web UI is started:
>>>>>>>>>>> >>>
>>>>>>>>>>> >>> JobManager web interface address
>>>>>>>>>>> >>>
>>>>>>>>>>> http://head05.hathi.surfsara.nl:8088/proxy/application_1452780322684_10532/
>>>>>>>>>>> >>> Waiting until all TaskManagers have connected
>>>>>>>>>>> >>> 11:09:51,557 INFO  org.apache.flink.yarn.ApplicationClient
>>>>>>>>>>> >>> - Notification about new leader address
>>>>>>>>>>> >>> akka.tcp://flink@145.100.41.148:35666/user/jobmanager with
>>>>>>>>>>> session ID null.
>>>>>>>>>>> >>> No status updates from the YARN cluster received so far.
>>>>>>>>>>> Waiting ...
>>>>>>>>>>> >>> 11:09:51,578 INFO  org.apache.flink.yarn.ApplicationClient
>>>>>>>>>>> >>> - Received address of new leader
>>>>>>>>>>> >>> akka.tcp://flink@145.100.41.148:35666/user/jobmanager with
>>>>>>>>>>> session ID null.
>>>>>>>>>>> >>> 11:09:51,583 INFO  org.apache.flink.yarn.ApplicationClient
>>>>>>>>>>> >>> - Disconnect from JobManager null.
>>>>>>>>>>> >>> 11:09:51,595 INFO  org.apache.flink.yarn.ApplicationClient
>>>>>>>>>>> >>> - Trying to register at JobManager
>>>>>>>>>>> >>> akka.tcp://flink@145.100.41.148:35666/user/jobmanager.
>>>>>>>>>>> >>> No status updates from the YARN cluster received so far.
>>>>>>>>>>> Waiting ...
>>>>>>>>>>> >>> No status updates from the YARN cluster received so far.
>>>>>>>>>>> Waiting ...
>>>>>>>>>>> >>>
>>>>>>>>>>> >>> It then hangs on these last steps (trying to register, no
>>>>>>>>>>> status
>>>>>>>>>>> >>> updates..)
>>>>>>>>>>> >>>
>>>>>>>>>>> >>> Im sure there must be a problem on my side that is causing
>>>>>>>>>>> me not to be
>>>>>>>>>>> >>> able to register at the JobManager. What could cause such
>>>>>>>>>>> connection
>>>>>>>>>>> >>> problems?
>>>>>>>>>>> >>>
>>>>>>>>>>> >>> Any tips are very welcome :-)
>>>>>>>>>>> >>>
>>>>>>>>>>> >>> Cheers and have a good weekend!
>>>>>>>>>>> >>>
>>>>>>>>>>> >>> - Pieter
>>>>>>>>>>> >>>
>>>>>>>>>>> >>>
>>>>>>>>>>> >>
>>>>>>>>>>> >
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Flink on YARN: Stuck on "Trying to register at JobManager"

Posted by Robert Metzger <rm...@apache.org>.

Mh, that's weird. Maybe both resource managers are marked as "standby"? Not
sure what can cause this issue.

Which YARN version are you using? Maybe you need to build Flink against
that specific hadoop version yourself.

On Mon, Feb 8, 2016 at 5:50 PM, Pieter Hameete <ph...@gmail.com> wrote:

> After downloading and building the 1.0-SNAPSHOT from the master branch I
> do run into another problem when starting a YARN cluster. The startup now
> infinitely loops at the following step:
>
> 17:39:12,369 INFO
> org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider  - Failing
> over to rm2
> 17:39:34,855 INFO
> org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider  - Failing
> over to rm1
>
> Any clue what couldve gone wrong? I used all-default for building with
> maven.
>
> - Pieter
>
>
>
> 2016-02-08 17:07 GMT+01:00 Pieter Hameete <ph...@gmail.com>:
>
>> Matter of RTFM eh ;-) thx and sorry for the bother.
>>
>> 2016-02-08 17:06 GMT+01:00 Robert Metzger <rm...@apache.org>:
>>
>>> You said earlier that you are using Flink 0.10. The feature is only
>>> available in 1.0-SNAPSHOT.
>>>
>>> On Mon, Feb 8, 2016 at 4:53 PM, Pieter Hameete <ph...@gmail.com>
>>> wrote:
>>>
>>>> Ive tried setting the yarn.application-master.port property in
>>>> flink-conf.yaml to a range suggested in
>>>> https://ci.apache.org/projects/flink/flink-docs-master/setup/yarn_setup.html#running-flink-on-yarn-behind-fi
>>>> rewalls
>>>>
>>>> The JobManager does not seem to be picking the property up. Am I
>>>> setting this in the wrong place? Or is there another way to enforce this
>>>> property?
>>>>
>>>> Cheers,
>>>>
>>>> Pieter
>>>>
>>>> 2016-02-07 20:04 GMT+01:00 Pieter Hameete <ph...@gmail.com>:
>>>>
>>>>> I found the relevant information on the website. Ill consult with the
>>>>> cluster admin tomorrow, thanks for the help :-)
>>>>>
>>>>> - Pieter
>>>>>
>>>>> 2016-02-07 19:31 GMT+01:00 Robert Metzger <rm...@apache.org>:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> we had other users with a similar issue as well. There is a
>>>>>> configuration value which allows you to specify a single port or a range of
>>>>>> ports for the JobManager to allocate when running on YARN.
>>>>>> Note that when using this with a single port, the JMs may collide.
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Sun, Feb 7, 2016 at 7:25 PM, Pieter Hameete <ph...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi Stephan,
>>>>>>>
>>>>>>> surely it seems this way! I must not be the first with this issue
>>>>>>> though? I'll have to contact the cluster admins to find a solution
>>>>>>> together. What would be a way of make the JobManagers accessible from
>>>>>>> outside the network, because the IP and port number changes every time.
>>>>>>>
>>>>>>> Alternatively, I can ask for ssh access to a node within the
>>>>>>> network. that will surely work but it's not my preferred solution.
>>>>>>>
>>>>>>> - Pieter
>>>>>>>
>>>>>>> 2016-02-06 16:22 GMT+01:00 Stephan Ewen <se...@apache.org>:
>>>>>>>
>>>>>>>> Yeah, sounds a lot like the client cannot connect to the JobManager
>>>>>>>> port.
>>>>>>>>
>>>>>>>> The ports to communicate with HDFS and the YARN resource manager
>>>>>>>> may be whitelisted r forwarded, so you can submit the YARN session, but
>>>>>>>> then not connect to the JobManager afterwards.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Sat, Feb 6, 2016 at 2:11 PM, Pieter Hameete <ph...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hi Max!
>>>>>>>>>
>>>>>>>>> I'm using Flink 0.10.1 and indeed the cluster seems to be created
>>>>>>>>> fine, all in the JobManager Web UI looks good.
>>>>>>>>>
>>>>>>>>> It seems like the JobManager initiates the connection with my VM
>>>>>>>>> and cannot reach it. It could be that this is similar to the problem here:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> http://apache-spark-user-list.1001560.n3.nabble.com/spark-with-docker-errors-with-akka-NAT-td7702.html
>>>>>>>>>
>>>>>>>>> I probably have to make some changes to the networking
>>>>>>>>> configuration of my VM so it can be reached by the JobManager despite using
>>>>>>>>> a different port each time.
>>>>>>>>>
>>>>>>>>> - Pieter
>>>>>>>>>
>>>>>>>>> 2016-02-06 14:05 GMT+01:00 Maximilian Michels <mx...@apache.org>:
>>>>>>>>>
>>>>>>>>>> Hi Pieter,
>>>>>>>>>>
>>>>>>>>>> Which version of Flink are you using? It appears you've created a
>>>>>>>>>> Flink YARN cluster but you can't reach the JobManager afterwards.
>>>>>>>>>>
>>>>>>>>>> Cheers,
>>>>>>>>>> Max
>>>>>>>>>>
>>>>>>>>>> On Sat, Feb 6, 2016 at 1:42 PM, Pieter Hameete <
>>>>>>>>>> phameete@gmail.com> wrote:
>>>>>>>>>> > Hi Robert,
>>>>>>>>>> >
>>>>>>>>>> > unfortunately there are no signs of what is going wrong in the
>>>>>>>>>> logs. The
>>>>>>>>>> > last log messages are about succesful registration of the
>>>>>>>>>> TaskManagers.
>>>>>>>>>> >
>>>>>>>>>> > I'm also fairly sure it must be something in my VM that is
>>>>>>>>>> causing this,
>>>>>>>>>> > because when I start the yarn-session from a login node that is
>>>>>>>>>> on the same
>>>>>>>>>> > network as the hadoop cluster there are no problems registering
>>>>>>>>>> with the
>>>>>>>>>> > JobManager. I did also notice the following message in the
>>>>>>>>>> local console:
>>>>>>>>>> >
>>>>>>>>>> > 12:30:27,173 WARN  Remoting
>>>>>>>>>> > - Tried to associate with unreachable remote address
>>>>>>>>>> > [akka.tcp://flink@145.100.41.13:41539]. Address is now gated
>>>>>>>>>> for 5000 ms,
>>>>>>>>>> > all messages to this address will be delivered to dead letters.
>>>>>>>>>> Reason:
>>>>>>>>>> > connection timed out: /145.100.41.13:41539
>>>>>>>>>> >
>>>>>>>>>> > I can ping the JobManager fine from with VM. Could there be
>>>>>>>>>> some invalid or
>>>>>>>>>> > missing configuration on my side?
>>>>>>>>>> >
>>>>>>>>>> > Cheers,
>>>>>>>>>> >
>>>>>>>>>> > Pieter
>>>>>>>>>> >
>>>>>>>>>> >
>>>>>>>>>> > 2016-02-06 12:54 GMT+01:00 Robert Metzger <rmetzger@apache.org
>>>>>>>>>> >:
>>>>>>>>>> >>
>>>>>>>>>> >> Hi,
>>>>>>>>>> >>
>>>>>>>>>> >> did you check the logs of the JobManager itself? Maybe it'll
>>>>>>>>>> tell us
>>>>>>>>>> >> already whats going on.
>>>>>>>>>> >>
>>>>>>>>>> >> On Sat, Feb 6, 2016 at 12:14 PM, Pieter Hameete <
>>>>>>>>>> phameete@gmail.com>
>>>>>>>>>> >> wrote:
>>>>>>>>>> >>>
>>>>>>>>>> >>> Hi Guys!
>>>>>>>>>> >>>
>>>>>>>>>> >>> Im attempting to run Flink on YARN, but I run into an issue.
>>>>>>>>>> Im starting
>>>>>>>>>> >>> the Flink YARN session from an Ubuntu 14.04 VM. All goes well
>>>>>>>>>> until after
>>>>>>>>>> >>> the JobManager web UI is started:
>>>>>>>>>> >>>
>>>>>>>>>> >>> JobManager web interface address
>>>>>>>>>> >>>
>>>>>>>>>> http://head05.hathi.surfsara.nl:8088/proxy/application_1452780322684_10532/
>>>>>>>>>> >>> Waiting until all TaskManagers have connected
>>>>>>>>>> >>> 11:09:51,557 INFO  org.apache.flink.yarn.ApplicationClient
>>>>>>>>>> >>> - Notification about new leader address
>>>>>>>>>> >>> akka.tcp://flink@145.100.41.148:35666/user/jobmanager with
>>>>>>>>>> session ID null.
>>>>>>>>>> >>> No status updates from the YARN cluster received so far.
>>>>>>>>>> Waiting ...
>>>>>>>>>> >>> 11:09:51,578 INFO  org.apache.flink.yarn.ApplicationClient
>>>>>>>>>> >>> - Received address of new leader
>>>>>>>>>> >>> akka.tcp://flink@145.100.41.148:35666/user/jobmanager with
>>>>>>>>>> session ID null.
>>>>>>>>>> >>> 11:09:51,583 INFO  org.apache.flink.yarn.ApplicationClient
>>>>>>>>>> >>> - Disconnect from JobManager null.
>>>>>>>>>> >>> 11:09:51,595 INFO  org.apache.flink.yarn.ApplicationClient
>>>>>>>>>> >>> - Trying to register at JobManager
>>>>>>>>>> >>> akka.tcp://flink@145.100.41.148:35666/user/jobmanager.
>>>>>>>>>> >>> No status updates from the YARN cluster received so far.
>>>>>>>>>> Waiting ...
>>>>>>>>>> >>> No status updates from the YARN cluster received so far.
>>>>>>>>>> Waiting ...
>>>>>>>>>> >>>
>>>>>>>>>> >>> It then hangs on these last steps (trying to register, no
>>>>>>>>>> status
>>>>>>>>>> >>> updates..)
>>>>>>>>>> >>>
>>>>>>>>>> >>> Im sure there must be a problem on my side that is causing me
>>>>>>>>>> not to be
>>>>>>>>>> >>> able to register at the JobManager. What could cause such
>>>>>>>>>> connection
>>>>>>>>>> >>> problems?
>>>>>>>>>> >>>
>>>>>>>>>> >>> Any tips are very welcome :-)
>>>>>>>>>> >>>
>>>>>>>>>> >>> Cheers and have a good weekend!
>>>>>>>>>> >>>
>>>>>>>>>> >>> - Pieter
>>>>>>>>>> >>>
>>>>>>>>>> >>>
>>>>>>>>>> >>
>>>>>>>>>> >
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Flink on YARN: Stuck on "Trying to register at JobManager"

Posted by Pieter Hameete <ph...@gmail.com>.

After downloading and building the 1.0-SNAPSHOT from the master branch I do
run into another problem when starting a YARN cluster. The startup now
infinitely loops at the following step:

17:39:12,369 INFO
org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider  - Failing
over to rm2
17:39:34,855 INFO
org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider  - Failing
over to rm1

Any clue what couldve gone wrong? I used all-default for building with
maven.

- Pieter



2016-02-08 17:07 GMT+01:00 Pieter Hameete <ph...@gmail.com>:

> Matter of RTFM eh ;-) thx and sorry for the bother.
>
> 2016-02-08 17:06 GMT+01:00 Robert Metzger <rm...@apache.org>:
>
>> You said earlier that you are using Flink 0.10. The feature is only
>> available in 1.0-SNAPSHOT.
>>
>> On Mon, Feb 8, 2016 at 4:53 PM, Pieter Hameete <ph...@gmail.com>
>> wrote:
>>
>>> Ive tried setting the yarn.application-master.port property in
>>> flink-conf.yaml to a range suggested in
>>> https://ci.apache.org/projects/flink/flink-docs-master/setup/yarn_setup.html#running-flink-on-yarn-behind-fi
>>> rewalls
>>>
>>> The JobManager does not seem to be picking the property up. Am I setting
>>> this in the wrong place? Or is there another way to enforce this property?
>>>
>>> Cheers,
>>>
>>> Pieter
>>>
>>> 2016-02-07 20:04 GMT+01:00 Pieter Hameete <ph...@gmail.com>:
>>>
>>>> I found the relevant information on the website. Ill consult with the
>>>> cluster admin tomorrow, thanks for the help :-)
>>>>
>>>> - Pieter
>>>>
>>>> 2016-02-07 19:31 GMT+01:00 Robert Metzger <rm...@apache.org>:
>>>>
>>>>> Hi,
>>>>>
>>>>> we had other users with a similar issue as well. There is a
>>>>> configuration value which allows you to specify a single port or a range of
>>>>> ports for the JobManager to allocate when running on YARN.
>>>>> Note that when using this with a single port, the JMs may collide.
>>>>>
>>>>>
>>>>>
>>>>> On Sun, Feb 7, 2016 at 7:25 PM, Pieter Hameete <ph...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi Stephan,
>>>>>>
>>>>>> surely it seems this way! I must not be the first with this issue
>>>>>> though? I'll have to contact the cluster admins to find a solution
>>>>>> together. What would be a way of make the JobManagers accessible from
>>>>>> outside the network, because the IP and port number changes every time.
>>>>>>
>>>>>> Alternatively, I can ask for ssh access to a node within the network.
>>>>>> that will surely work but it's not my preferred solution.
>>>>>>
>>>>>> - Pieter
>>>>>>
>>>>>> 2016-02-06 16:22 GMT+01:00 Stephan Ewen <se...@apache.org>:
>>>>>>
>>>>>>> Yeah, sounds a lot like the client cannot connect to the JobManager
>>>>>>> port.
>>>>>>>
>>>>>>> The ports to communicate with HDFS and the YARN resource manager may
>>>>>>> be whitelisted r forwarded, so you can submit the YARN session, but then
>>>>>>> not connect to the JobManager afterwards.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Sat, Feb 6, 2016 at 2:11 PM, Pieter Hameete <ph...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi Max!
>>>>>>>>
>>>>>>>> I'm using Flink 0.10.1 and indeed the cluster seems to be created
>>>>>>>> fine, all in the JobManager Web UI looks good.
>>>>>>>>
>>>>>>>> It seems like the JobManager initiates the connection with my VM
>>>>>>>> and cannot reach it. It could be that this is similar to the problem here:
>>>>>>>>
>>>>>>>>
>>>>>>>> http://apache-spark-user-list.1001560.n3.nabble.com/spark-with-docker-errors-with-akka-NAT-td7702.html
>>>>>>>>
>>>>>>>> I probably have to make some changes to the networking
>>>>>>>> configuration of my VM so it can be reached by the JobManager despite using
>>>>>>>> a different port each time.
>>>>>>>>
>>>>>>>> - Pieter
>>>>>>>>
>>>>>>>> 2016-02-06 14:05 GMT+01:00 Maximilian Michels <mx...@apache.org>:
>>>>>>>>
>>>>>>>>> Hi Pieter,
>>>>>>>>>
>>>>>>>>> Which version of Flink are you using? It appears you've created a
>>>>>>>>> Flink YARN cluster but you can't reach the JobManager afterwards.
>>>>>>>>>
>>>>>>>>> Cheers,
>>>>>>>>> Max
>>>>>>>>>
>>>>>>>>> On Sat, Feb 6, 2016 at 1:42 PM, Pieter Hameete <ph...@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>> > Hi Robert,
>>>>>>>>> >
>>>>>>>>> > unfortunately there are no signs of what is going wrong in the
>>>>>>>>> logs. The
>>>>>>>>> > last log messages are about succesful registration of the
>>>>>>>>> TaskManagers.
>>>>>>>>> >
>>>>>>>>> > I'm also fairly sure it must be something in my VM that is
>>>>>>>>> causing this,
>>>>>>>>> > because when I start the yarn-session from a login node that is
>>>>>>>>> on the same
>>>>>>>>> > network as the hadoop cluster there are no problems registering
>>>>>>>>> with the
>>>>>>>>> > JobManager. I did also notice the following message in the local
>>>>>>>>> console:
>>>>>>>>> >
>>>>>>>>> > 12:30:27,173 WARN  Remoting
>>>>>>>>> > - Tried to associate with unreachable remote address
>>>>>>>>> > [akka.tcp://flink@145.100.41.13:41539]. Address is now gated
>>>>>>>>> for 5000 ms,
>>>>>>>>> > all messages to this address will be delivered to dead letters.
>>>>>>>>> Reason:
>>>>>>>>> > connection timed out: /145.100.41.13:41539
>>>>>>>>> >
>>>>>>>>> > I can ping the JobManager fine from with VM. Could there be some
>>>>>>>>> invalid or
>>>>>>>>> > missing configuration on my side?
>>>>>>>>> >
>>>>>>>>> > Cheers,
>>>>>>>>> >
>>>>>>>>> > Pieter
>>>>>>>>> >
>>>>>>>>> >
>>>>>>>>> > 2016-02-06 12:54 GMT+01:00 Robert Metzger <rm...@apache.org>:
>>>>>>>>> >>
>>>>>>>>> >> Hi,
>>>>>>>>> >>
>>>>>>>>> >> did you check the logs of the JobManager itself? Maybe it'll
>>>>>>>>> tell us
>>>>>>>>> >> already whats going on.
>>>>>>>>> >>
>>>>>>>>> >> On Sat, Feb 6, 2016 at 12:14 PM, Pieter Hameete <
>>>>>>>>> phameete@gmail.com>
>>>>>>>>> >> wrote:
>>>>>>>>> >>>
>>>>>>>>> >>> Hi Guys!
>>>>>>>>> >>>
>>>>>>>>> >>> Im attempting to run Flink on YARN, but I run into an issue.
>>>>>>>>> Im starting
>>>>>>>>> >>> the Flink YARN session from an Ubuntu 14.04 VM. All goes well
>>>>>>>>> until after
>>>>>>>>> >>> the JobManager web UI is started:
>>>>>>>>> >>>
>>>>>>>>> >>> JobManager web interface address
>>>>>>>>> >>>
>>>>>>>>> http://head05.hathi.surfsara.nl:8088/proxy/application_1452780322684_10532/
>>>>>>>>> >>> Waiting until all TaskManagers have connected
>>>>>>>>> >>> 11:09:51,557 INFO  org.apache.flink.yarn.ApplicationClient
>>>>>>>>> >>> - Notification about new leader address
>>>>>>>>> >>> akka.tcp://flink@145.100.41.148:35666/user/jobmanager with
>>>>>>>>> session ID null.
>>>>>>>>> >>> No status updates from the YARN cluster received so far.
>>>>>>>>> Waiting ...
>>>>>>>>> >>> 11:09:51,578 INFO  org.apache.flink.yarn.ApplicationClient
>>>>>>>>> >>> - Received address of new leader
>>>>>>>>> >>> akka.tcp://flink@145.100.41.148:35666/user/jobmanager with
>>>>>>>>> session ID null.
>>>>>>>>> >>> 11:09:51,583 INFO  org.apache.flink.yarn.ApplicationClient
>>>>>>>>> >>> - Disconnect from JobManager null.
>>>>>>>>> >>> 11:09:51,595 INFO  org.apache.flink.yarn.ApplicationClient
>>>>>>>>> >>> - Trying to register at JobManager
>>>>>>>>> >>> akka.tcp://flink@145.100.41.148:35666/user/jobmanager.
>>>>>>>>> >>> No status updates from the YARN cluster received so far.
>>>>>>>>> Waiting ...
>>>>>>>>> >>> No status updates from the YARN cluster received so far.
>>>>>>>>> Waiting ...
>>>>>>>>> >>>
>>>>>>>>> >>> It then hangs on these last steps (trying to register, no
>>>>>>>>> status
>>>>>>>>> >>> updates..)
>>>>>>>>> >>>
>>>>>>>>> >>> Im sure there must be a problem on my side that is causing me
>>>>>>>>> not to be
>>>>>>>>> >>> able to register at the JobManager. What could cause such
>>>>>>>>> connection
>>>>>>>>> >>> problems?
>>>>>>>>> >>>
>>>>>>>>> >>> Any tips are very welcome :-)
>>>>>>>>> >>>
>>>>>>>>> >>> Cheers and have a good weekend!
>>>>>>>>> >>>
>>>>>>>>> >>> - Pieter
>>>>>>>>> >>>
>>>>>>>>> >>>
>>>>>>>>> >>
>>>>>>>>> >
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Flink on YARN: Stuck on "Trying to register at JobManager"

Posted by Pieter Hameete <ph...@gmail.com>.

Matter of RTFM eh ;-) thx and sorry for the bother.

2016-02-08 17:06 GMT+01:00 Robert Metzger <rm...@apache.org>:

> You said earlier that you are using Flink 0.10. The feature is only
> available in 1.0-SNAPSHOT.
>
> On Mon, Feb 8, 2016 at 4:53 PM, Pieter Hameete <ph...@gmail.com> wrote:
>
>> Ive tried setting the yarn.application-master.port property in
>> flink-conf.yaml to a range suggested in
>> https://ci.apache.org/projects/flink/flink-docs-master/setup/yarn_setup.html#running-flink-on-yarn-behind-fi
>> rewalls
>>
>> The JobManager does not seem to be picking the property up. Am I setting
>> this in the wrong place? Or is there another way to enforce this property?
>>
>> Cheers,
>>
>> Pieter
>>
>> 2016-02-07 20:04 GMT+01:00 Pieter Hameete <ph...@gmail.com>:
>>
>>> I found the relevant information on the website. Ill consult with the
>>> cluster admin tomorrow, thanks for the help :-)
>>>
>>> - Pieter
>>>
>>> 2016-02-07 19:31 GMT+01:00 Robert Metzger <rm...@apache.org>:
>>>
>>>> Hi,
>>>>
>>>> we had other users with a similar issue as well. There is a
>>>> configuration value which allows you to specify a single port or a range of
>>>> ports for the JobManager to allocate when running on YARN.
>>>> Note that when using this with a single port, the JMs may collide.
>>>>
>>>>
>>>>
>>>> On Sun, Feb 7, 2016 at 7:25 PM, Pieter Hameete <ph...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi Stephan,
>>>>>
>>>>> surely it seems this way! I must not be the first with this issue
>>>>> though? I'll have to contact the cluster admins to find a solution
>>>>> together. What would be a way of make the JobManagers accessible from
>>>>> outside the network, because the IP and port number changes every time.
>>>>>
>>>>> Alternatively, I can ask for ssh access to a node within the network.
>>>>> that will surely work but it's not my preferred solution.
>>>>>
>>>>> - Pieter
>>>>>
>>>>> 2016-02-06 16:22 GMT+01:00 Stephan Ewen <se...@apache.org>:
>>>>>
>>>>>> Yeah, sounds a lot like the client cannot connect to the JobManager
>>>>>> port.
>>>>>>
>>>>>> The ports to communicate with HDFS and the YARN resource manager may
>>>>>> be whitelisted r forwarded, so you can submit the YARN session, but then
>>>>>> not connect to the JobManager afterwards.
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Sat, Feb 6, 2016 at 2:11 PM, Pieter Hameete <ph...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi Max!
>>>>>>>
>>>>>>> I'm using Flink 0.10.1 and indeed the cluster seems to be created
>>>>>>> fine, all in the JobManager Web UI looks good.
>>>>>>>
>>>>>>> It seems like the JobManager initiates the connection with my VM and
>>>>>>> cannot reach it. It could be that this is similar to the problem here:
>>>>>>>
>>>>>>>
>>>>>>> http://apache-spark-user-list.1001560.n3.nabble.com/spark-with-docker-errors-with-akka-NAT-td7702.html
>>>>>>>
>>>>>>> I probably have to make some changes to the networking configuration
>>>>>>> of my VM so it can be reached by the JobManager despite using a different
>>>>>>> port each time.
>>>>>>>
>>>>>>> - Pieter
>>>>>>>
>>>>>>> 2016-02-06 14:05 GMT+01:00 Maximilian Michels <mx...@apache.org>:
>>>>>>>
>>>>>>>> Hi Pieter,
>>>>>>>>
>>>>>>>> Which version of Flink are you using? It appears you've created a
>>>>>>>> Flink YARN cluster but you can't reach the JobManager afterwards.
>>>>>>>>
>>>>>>>> Cheers,
>>>>>>>> Max
>>>>>>>>
>>>>>>>> On Sat, Feb 6, 2016 at 1:42 PM, Pieter Hameete <ph...@gmail.com>
>>>>>>>> wrote:
>>>>>>>> > Hi Robert,
>>>>>>>> >
>>>>>>>> > unfortunately there are no signs of what is going wrong in the
>>>>>>>> logs. The
>>>>>>>> > last log messages are about succesful registration of the
>>>>>>>> TaskManagers.
>>>>>>>> >
>>>>>>>> > I'm also fairly sure it must be something in my VM that is
>>>>>>>> causing this,
>>>>>>>> > because when I start the yarn-session from a login node that is
>>>>>>>> on the same
>>>>>>>> > network as the hadoop cluster there are no problems registering
>>>>>>>> with the
>>>>>>>> > JobManager. I did also notice the following message in the local
>>>>>>>> console:
>>>>>>>> >
>>>>>>>> > 12:30:27,173 WARN  Remoting
>>>>>>>> > - Tried to associate with unreachable remote address
>>>>>>>> > [akka.tcp://flink@145.100.41.13:41539]. Address is now gated for
>>>>>>>> 5000 ms,
>>>>>>>> > all messages to this address will be delivered to dead letters.
>>>>>>>> Reason:
>>>>>>>> > connection timed out: /145.100.41.13:41539
>>>>>>>> >
>>>>>>>> > I can ping the JobManager fine from with VM. Could there be some
>>>>>>>> invalid or
>>>>>>>> > missing configuration on my side?
>>>>>>>> >
>>>>>>>> > Cheers,
>>>>>>>> >
>>>>>>>> > Pieter
>>>>>>>> >
>>>>>>>> >
>>>>>>>> > 2016-02-06 12:54 GMT+01:00 Robert Metzger <rm...@apache.org>:
>>>>>>>> >>
>>>>>>>> >> Hi,
>>>>>>>> >>
>>>>>>>> >> did you check the logs of the JobManager itself? Maybe it'll
>>>>>>>> tell us
>>>>>>>> >> already whats going on.
>>>>>>>> >>
>>>>>>>> >> On Sat, Feb 6, 2016 at 12:14 PM, Pieter Hameete <
>>>>>>>> phameete@gmail.com>
>>>>>>>> >> wrote:
>>>>>>>> >>>
>>>>>>>> >>> Hi Guys!
>>>>>>>> >>>
>>>>>>>> >>> Im attempting to run Flink on YARN, but I run into an issue. Im
>>>>>>>> starting
>>>>>>>> >>> the Flink YARN session from an Ubuntu 14.04 VM. All goes well
>>>>>>>> until after
>>>>>>>> >>> the JobManager web UI is started:
>>>>>>>> >>>
>>>>>>>> >>> JobManager web interface address
>>>>>>>> >>>
>>>>>>>> http://head05.hathi.surfsara.nl:8088/proxy/application_1452780322684_10532/
>>>>>>>> >>> Waiting until all TaskManagers have connected
>>>>>>>> >>> 11:09:51,557 INFO  org.apache.flink.yarn.ApplicationClient
>>>>>>>> >>> - Notification about new leader address
>>>>>>>> >>> akka.tcp://flink@145.100.41.148:35666/user/jobmanager with
>>>>>>>> session ID null.
>>>>>>>> >>> No status updates from the YARN cluster received so far.
>>>>>>>> Waiting ...
>>>>>>>> >>> 11:09:51,578 INFO  org.apache.flink.yarn.ApplicationClient
>>>>>>>> >>> - Received address of new leader
>>>>>>>> >>> akka.tcp://flink@145.100.41.148:35666/user/jobmanager with
>>>>>>>> session ID null.
>>>>>>>> >>> 11:09:51,583 INFO  org.apache.flink.yarn.ApplicationClient
>>>>>>>> >>> - Disconnect from JobManager null.
>>>>>>>> >>> 11:09:51,595 INFO  org.apache.flink.yarn.ApplicationClient
>>>>>>>> >>> - Trying to register at JobManager
>>>>>>>> >>> akka.tcp://flink@145.100.41.148:35666/user/jobmanager.
>>>>>>>> >>> No status updates from the YARN cluster received so far.
>>>>>>>> Waiting ...
>>>>>>>> >>> No status updates from the YARN cluster received so far.
>>>>>>>> Waiting ...
>>>>>>>> >>>
>>>>>>>> >>> It then hangs on these last steps (trying to register, no status
>>>>>>>> >>> updates..)
>>>>>>>> >>>
>>>>>>>> >>> Im sure there must be a problem on my side that is causing me
>>>>>>>> not to be
>>>>>>>> >>> able to register at the JobManager. What could cause such
>>>>>>>> connection
>>>>>>>> >>> problems?
>>>>>>>> >>>
>>>>>>>> >>> Any tips are very welcome :-)
>>>>>>>> >>>
>>>>>>>> >>> Cheers and have a good weekend!
>>>>>>>> >>>
>>>>>>>> >>> - Pieter
>>>>>>>> >>>
>>>>>>>> >>>
>>>>>>>> >>
>>>>>>>> >
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Flink on YARN: Stuck on "Trying to register at JobManager"

Posted by Robert Metzger <rm...@apache.org>.

You said earlier that you are using Flink 0.10. The feature is only
available in 1.0-SNAPSHOT.

On Mon, Feb 8, 2016 at 4:53 PM, Pieter Hameete <ph...@gmail.com> wrote:

> Ive tried setting the yarn.application-master.port property in
> flink-conf.yaml to a range suggested in
> https://ci.apache.org/projects/flink/flink-docs-master/setup/yarn_setup.html#running-flink-on-yarn-behind-fi
> rewalls
>
> The JobManager does not seem to be picking the property up. Am I setting
> this in the wrong place? Or is there another way to enforce this property?
>
> Cheers,
>
> Pieter
>
> 2016-02-07 20:04 GMT+01:00 Pieter Hameete <ph...@gmail.com>:
>
>> I found the relevant information on the website. Ill consult with the
>> cluster admin tomorrow, thanks for the help :-)
>>
>> - Pieter
>>
>> 2016-02-07 19:31 GMT+01:00 Robert Metzger <rm...@apache.org>:
>>
>>> Hi,
>>>
>>> we had other users with a similar issue as well. There is a
>>> configuration value which allows you to specify a single port or a range of
>>> ports for the JobManager to allocate when running on YARN.
>>> Note that when using this with a single port, the JMs may collide.
>>>
>>>
>>>
>>> On Sun, Feb 7, 2016 at 7:25 PM, Pieter Hameete <ph...@gmail.com>
>>> wrote:
>>>
>>>> Hi Stephan,
>>>>
>>>> surely it seems this way! I must not be the first with this issue
>>>> though? I'll have to contact the cluster admins to find a solution
>>>> together. What would be a way of make the JobManagers accessible from
>>>> outside the network, because the IP and port number changes every time.
>>>>
>>>> Alternatively, I can ask for ssh access to a node within the network.
>>>> that will surely work but it's not my preferred solution.
>>>>
>>>> - Pieter
>>>>
>>>> 2016-02-06 16:22 GMT+01:00 Stephan Ewen <se...@apache.org>:
>>>>
>>>>> Yeah, sounds a lot like the client cannot connect to the JobManager
>>>>> port.
>>>>>
>>>>> The ports to communicate with HDFS and the YARN resource manager may
>>>>> be whitelisted r forwarded, so you can submit the YARN session, but then
>>>>> not connect to the JobManager afterwards.
>>>>>
>>>>>
>>>>>
>>>>> On Sat, Feb 6, 2016 at 2:11 PM, Pieter Hameete <ph...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi Max!
>>>>>>
>>>>>> I'm using Flink 0.10.1 and indeed the cluster seems to be created
>>>>>> fine, all in the JobManager Web UI looks good.
>>>>>>
>>>>>> It seems like the JobManager initiates the connection with my VM and
>>>>>> cannot reach it. It could be that this is similar to the problem here:
>>>>>>
>>>>>>
>>>>>> http://apache-spark-user-list.1001560.n3.nabble.com/spark-with-docker-errors-with-akka-NAT-td7702.html
>>>>>>
>>>>>> I probably have to make some changes to the networking configuration
>>>>>> of my VM so it can be reached by the JobManager despite using a different
>>>>>> port each time.
>>>>>>
>>>>>> - Pieter
>>>>>>
>>>>>> 2016-02-06 14:05 GMT+01:00 Maximilian Michels <mx...@apache.org>:
>>>>>>
>>>>>>> Hi Pieter,
>>>>>>>
>>>>>>> Which version of Flink are you using? It appears you've created a
>>>>>>> Flink YARN cluster but you can't reach the JobManager afterwards.
>>>>>>>
>>>>>>> Cheers,
>>>>>>> Max
>>>>>>>
>>>>>>> On Sat, Feb 6, 2016 at 1:42 PM, Pieter Hameete <ph...@gmail.com>
>>>>>>> wrote:
>>>>>>> > Hi Robert,
>>>>>>> >
>>>>>>> > unfortunately there are no signs of what is going wrong in the
>>>>>>> logs. The
>>>>>>> > last log messages are about succesful registration of the
>>>>>>> TaskManagers.
>>>>>>> >
>>>>>>> > I'm also fairly sure it must be something in my VM that is causing
>>>>>>> this,
>>>>>>> > because when I start the yarn-session from a login node that is on
>>>>>>> the same
>>>>>>> > network as the hadoop cluster there are no problems registering
>>>>>>> with the
>>>>>>> > JobManager. I did also notice the following message in the local
>>>>>>> console:
>>>>>>> >
>>>>>>> > 12:30:27,173 WARN  Remoting
>>>>>>> > - Tried to associate with unreachable remote address
>>>>>>> > [akka.tcp://flink@145.100.41.13:41539]. Address is now gated for
>>>>>>> 5000 ms,
>>>>>>> > all messages to this address will be delivered to dead letters.
>>>>>>> Reason:
>>>>>>> > connection timed out: /145.100.41.13:41539
>>>>>>> >
>>>>>>> > I can ping the JobManager fine from with VM. Could there be some
>>>>>>> invalid or
>>>>>>> > missing configuration on my side?
>>>>>>> >
>>>>>>> > Cheers,
>>>>>>> >
>>>>>>> > Pieter
>>>>>>> >
>>>>>>> >
>>>>>>> > 2016-02-06 12:54 GMT+01:00 Robert Metzger <rm...@apache.org>:
>>>>>>> >>
>>>>>>> >> Hi,
>>>>>>> >>
>>>>>>> >> did you check the logs of the JobManager itself? Maybe it'll tell
>>>>>>> us
>>>>>>> >> already whats going on.
>>>>>>> >>
>>>>>>> >> On Sat, Feb 6, 2016 at 12:14 PM, Pieter Hameete <
>>>>>>> phameete@gmail.com>
>>>>>>> >> wrote:
>>>>>>> >>>
>>>>>>> >>> Hi Guys!
>>>>>>> >>>
>>>>>>> >>> Im attempting to run Flink on YARN, but I run into an issue. Im
>>>>>>> starting
>>>>>>> >>> the Flink YARN session from an Ubuntu 14.04 VM. All goes well
>>>>>>> until after
>>>>>>> >>> the JobManager web UI is started:
>>>>>>> >>>
>>>>>>> >>> JobManager web interface address
>>>>>>> >>>
>>>>>>> http://head05.hathi.surfsara.nl:8088/proxy/application_1452780322684_10532/
>>>>>>> >>> Waiting until all TaskManagers have connected
>>>>>>> >>> 11:09:51,557 INFO  org.apache.flink.yarn.ApplicationClient
>>>>>>> >>> - Notification about new leader address
>>>>>>> >>> akka.tcp://flink@145.100.41.148:35666/user/jobmanager with
>>>>>>> session ID null.
>>>>>>> >>> No status updates from the YARN cluster received so far. Waiting
>>>>>>> ...
>>>>>>> >>> 11:09:51,578 INFO  org.apache.flink.yarn.ApplicationClient
>>>>>>> >>> - Received address of new leader
>>>>>>> >>> akka.tcp://flink@145.100.41.148:35666/user/jobmanager with
>>>>>>> session ID null.
>>>>>>> >>> 11:09:51,583 INFO  org.apache.flink.yarn.ApplicationClient
>>>>>>> >>> - Disconnect from JobManager null.
>>>>>>> >>> 11:09:51,595 INFO  org.apache.flink.yarn.ApplicationClient
>>>>>>> >>> - Trying to register at JobManager
>>>>>>> >>> akka.tcp://flink@145.100.41.148:35666/user/jobmanager.
>>>>>>> >>> No status updates from the YARN cluster received so far. Waiting
>>>>>>> ...
>>>>>>> >>> No status updates from the YARN cluster received so far. Waiting
>>>>>>> ...
>>>>>>> >>>
>>>>>>> >>> It then hangs on these last steps (trying to register, no status
>>>>>>> >>> updates..)
>>>>>>> >>>
>>>>>>> >>> Im sure there must be a problem on my side that is causing me
>>>>>>> not to be
>>>>>>> >>> able to register at the JobManager. What could cause such
>>>>>>> connection
>>>>>>> >>> problems?
>>>>>>> >>>
>>>>>>> >>> Any tips are very welcome :-)
>>>>>>> >>>
>>>>>>> >>> Cheers and have a good weekend!
>>>>>>> >>>
>>>>>>> >>> - Pieter
>>>>>>> >>>
>>>>>>> >>>
>>>>>>> >>
>>>>>>> >
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Flink on YARN: Stuck on "Trying to register at JobManager"

Posted by Pieter Hameete <ph...@gmail.com>.

Ive tried setting the yarn.application-master.port property in
flink-conf.yaml to a range suggested in
https://ci.apache.org/projects/flink/flink-docs-master/setup/yarn_setup.html#running-flink-on-yarn-behind-fi
rewalls

The JobManager does not seem to be picking the property up. Am I setting
this in the wrong place? Or is there another way to enforce this property?

Cheers,

Pieter

2016-02-07 20:04 GMT+01:00 Pieter Hameete <ph...@gmail.com>:

> I found the relevant information on the website. Ill consult with the
> cluster admin tomorrow, thanks for the help :-)
>
> - Pieter
>
> 2016-02-07 19:31 GMT+01:00 Robert Metzger <rm...@apache.org>:
>
>> Hi,
>>
>> we had other users with a similar issue as well. There is a configuration
>> value which allows you to specify a single port or a range of ports for the
>> JobManager to allocate when running on YARN.
>> Note that when using this with a single port, the JMs may collide.
>>
>>
>>
>> On Sun, Feb 7, 2016 at 7:25 PM, Pieter Hameete <ph...@gmail.com>
>> wrote:
>>
>>> Hi Stephan,
>>>
>>> surely it seems this way! I must not be the first with this issue
>>> though? I'll have to contact the cluster admins to find a solution
>>> together. What would be a way of make the JobManagers accessible from
>>> outside the network, because the IP and port number changes every time.
>>>
>>> Alternatively, I can ask for ssh access to a node within the network.
>>> that will surely work but it's not my preferred solution.
>>>
>>> - Pieter
>>>
>>> 2016-02-06 16:22 GMT+01:00 Stephan Ewen <se...@apache.org>:
>>>
>>>> Yeah, sounds a lot like the client cannot connect to the JobManager
>>>> port.
>>>>
>>>> The ports to communicate with HDFS and the YARN resource manager may be
>>>> whitelisted r forwarded, so you can submit the YARN session, but then not
>>>> connect to the JobManager afterwards.
>>>>
>>>>
>>>>
>>>> On Sat, Feb 6, 2016 at 2:11 PM, Pieter Hameete <ph...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi Max!
>>>>>
>>>>> I'm using Flink 0.10.1 and indeed the cluster seems to be created
>>>>> fine, all in the JobManager Web UI looks good.
>>>>>
>>>>> It seems like the JobManager initiates the connection with my VM and
>>>>> cannot reach it. It could be that this is similar to the problem here:
>>>>>
>>>>>
>>>>> http://apache-spark-user-list.1001560.n3.nabble.com/spark-with-docker-errors-with-akka-NAT-td7702.html
>>>>>
>>>>> I probably have to make some changes to the networking configuration
>>>>> of my VM so it can be reached by the JobManager despite using a different
>>>>> port each time.
>>>>>
>>>>> - Pieter
>>>>>
>>>>> 2016-02-06 14:05 GMT+01:00 Maximilian Michels <mx...@apache.org>:
>>>>>
>>>>>> Hi Pieter,
>>>>>>
>>>>>> Which version of Flink are you using? It appears you've created a
>>>>>> Flink YARN cluster but you can't reach the JobManager afterwards.
>>>>>>
>>>>>> Cheers,
>>>>>> Max
>>>>>>
>>>>>> On Sat, Feb 6, 2016 at 1:42 PM, Pieter Hameete <ph...@gmail.com>
>>>>>> wrote:
>>>>>> > Hi Robert,
>>>>>> >
>>>>>> > unfortunately there are no signs of what is going wrong in the
>>>>>> logs. The
>>>>>> > last log messages are about succesful registration of the
>>>>>> TaskManagers.
>>>>>> >
>>>>>> > I'm also fairly sure it must be something in my VM that is causing
>>>>>> this,
>>>>>> > because when I start the yarn-session from a login node that is on
>>>>>> the same
>>>>>> > network as the hadoop cluster there are no problems registering
>>>>>> with the
>>>>>> > JobManager. I did also notice the following message in the local
>>>>>> console:
>>>>>> >
>>>>>> > 12:30:27,173 WARN  Remoting
>>>>>> > - Tried to associate with unreachable remote address
>>>>>> > [akka.tcp://flink@145.100.41.13:41539]. Address is now gated for
>>>>>> 5000 ms,
>>>>>> > all messages to this address will be delivered to dead letters.
>>>>>> Reason:
>>>>>> > connection timed out: /145.100.41.13:41539
>>>>>> >
>>>>>> > I can ping the JobManager fine from with VM. Could there be some
>>>>>> invalid or
>>>>>> > missing configuration on my side?
>>>>>> >
>>>>>> > Cheers,
>>>>>> >
>>>>>> > Pieter
>>>>>> >
>>>>>> >
>>>>>> > 2016-02-06 12:54 GMT+01:00 Robert Metzger <rm...@apache.org>:
>>>>>> >>
>>>>>> >> Hi,
>>>>>> >>
>>>>>> >> did you check the logs of the JobManager itself? Maybe it'll tell
>>>>>> us
>>>>>> >> already whats going on.
>>>>>> >>
>>>>>> >> On Sat, Feb 6, 2016 at 12:14 PM, Pieter Hameete <
>>>>>> phameete@gmail.com>
>>>>>> >> wrote:
>>>>>> >>>
>>>>>> >>> Hi Guys!
>>>>>> >>>
>>>>>> >>> Im attempting to run Flink on YARN, but I run into an issue. Im
>>>>>> starting
>>>>>> >>> the Flink YARN session from an Ubuntu 14.04 VM. All goes well
>>>>>> until after
>>>>>> >>> the JobManager web UI is started:
>>>>>> >>>
>>>>>> >>> JobManager web interface address
>>>>>> >>>
>>>>>> http://head05.hathi.surfsara.nl:8088/proxy/application_1452780322684_10532/
>>>>>> >>> Waiting until all TaskManagers have connected
>>>>>> >>> 11:09:51,557 INFO  org.apache.flink.yarn.ApplicationClient
>>>>>> >>> - Notification about new leader address
>>>>>> >>> akka.tcp://flink@145.100.41.148:35666/user/jobmanager with
>>>>>> session ID null.
>>>>>> >>> No status updates from the YARN cluster received so far. Waiting
>>>>>> ...
>>>>>> >>> 11:09:51,578 INFO  org.apache.flink.yarn.ApplicationClient
>>>>>> >>> - Received address of new leader
>>>>>> >>> akka.tcp://flink@145.100.41.148:35666/user/jobmanager with
>>>>>> session ID null.
>>>>>> >>> 11:09:51,583 INFO  org.apache.flink.yarn.ApplicationClient
>>>>>> >>> - Disconnect from JobManager null.
>>>>>> >>> 11:09:51,595 INFO  org.apache.flink.yarn.ApplicationClient
>>>>>> >>> - Trying to register at JobManager
>>>>>> >>> akka.tcp://flink@145.100.41.148:35666/user/jobmanager.
>>>>>> >>> No status updates from the YARN cluster received so far. Waiting
>>>>>> ...
>>>>>> >>> No status updates from the YARN cluster received so far. Waiting
>>>>>> ...
>>>>>> >>>
>>>>>> >>> It then hangs on these last steps (trying to register, no status
>>>>>> >>> updates..)
>>>>>> >>>
>>>>>> >>> Im sure there must be a problem on my side that is causing me not
>>>>>> to be
>>>>>> >>> able to register at the JobManager. What could cause such
>>>>>> connection
>>>>>> >>> problems?
>>>>>> >>>
>>>>>> >>> Any tips are very welcome :-)
>>>>>> >>>
>>>>>> >>> Cheers and have a good weekend!
>>>>>> >>>
>>>>>> >>> - Pieter
>>>>>> >>>
>>>>>> >>>
>>>>>> >>
>>>>>> >
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Flink on YARN: Stuck on "Trying to register at JobManager"

Posted by Pieter Hameete <ph...@gmail.com>.

I found the relevant information on the website. Ill consult with the
cluster admin tomorrow, thanks for the help :-)

- Pieter

2016-02-07 19:31 GMT+01:00 Robert Metzger <rm...@apache.org>:

> Hi,
>
> we had other users with a similar issue as well. There is a configuration
> value which allows you to specify a single port or a range of ports for the
> JobManager to allocate when running on YARN.
> Note that when using this with a single port, the JMs may collide.
>
>
>
> On Sun, Feb 7, 2016 at 7:25 PM, Pieter Hameete <ph...@gmail.com> wrote:
>
>> Hi Stephan,
>>
>> surely it seems this way! I must not be the first with this issue though?
>> I'll have to contact the cluster admins to find a solution together. What
>> would be a way of make the JobManagers accessible from outside the network,
>> because the IP and port number changes every time.
>>
>> Alternatively, I can ask for ssh access to a node within the network.
>> that will surely work but it's not my preferred solution.
>>
>> - Pieter
>>
>> 2016-02-06 16:22 GMT+01:00 Stephan Ewen <se...@apache.org>:
>>
>>> Yeah, sounds a lot like the client cannot connect to the JobManager port.
>>>
>>> The ports to communicate with HDFS and the YARN resource manager may be
>>> whitelisted r forwarded, so you can submit the YARN session, but then not
>>> connect to the JobManager afterwards.
>>>
>>>
>>>
>>> On Sat, Feb 6, 2016 at 2:11 PM, Pieter Hameete <ph...@gmail.com>
>>> wrote:
>>>
>>>> Hi Max!
>>>>
>>>> I'm using Flink 0.10.1 and indeed the cluster seems to be created fine,
>>>> all in the JobManager Web UI looks good.
>>>>
>>>> It seems like the JobManager initiates the connection with my VM and
>>>> cannot reach it. It could be that this is similar to the problem here:
>>>>
>>>>
>>>> http://apache-spark-user-list.1001560.n3.nabble.com/spark-with-docker-errors-with-akka-NAT-td7702.html
>>>>
>>>> I probably have to make some changes to the networking configuration of
>>>> my VM so it can be reached by the JobManager despite using a different port
>>>> each time.
>>>>
>>>> - Pieter
>>>>
>>>> 2016-02-06 14:05 GMT+01:00 Maximilian Michels <mx...@apache.org>:
>>>>
>>>>> Hi Pieter,
>>>>>
>>>>> Which version of Flink are you using? It appears you've created a
>>>>> Flink YARN cluster but you can't reach the JobManager afterwards.
>>>>>
>>>>> Cheers,
>>>>> Max
>>>>>
>>>>> On Sat, Feb 6, 2016 at 1:42 PM, Pieter Hameete <ph...@gmail.com>
>>>>> wrote:
>>>>> > Hi Robert,
>>>>> >
>>>>> > unfortunately there are no signs of what is going wrong in the logs.
>>>>> The
>>>>> > last log messages are about succesful registration of the
>>>>> TaskManagers.
>>>>> >
>>>>> > I'm also fairly sure it must be something in my VM that is causing
>>>>> this,
>>>>> > because when I start the yarn-session from a login node that is on
>>>>> the same
>>>>> > network as the hadoop cluster there are no problems registering with
>>>>> the
>>>>> > JobManager. I did also notice the following message in the local
>>>>> console:
>>>>> >
>>>>> > 12:30:27,173 WARN  Remoting
>>>>> > - Tried to associate with unreachable remote address
>>>>> > [akka.tcp://flink@145.100.41.13:41539]. Address is now gated for
>>>>> 5000 ms,
>>>>> > all messages to this address will be delivered to dead letters.
>>>>> Reason:
>>>>> > connection timed out: /145.100.41.13:41539
>>>>> >
>>>>> > I can ping the JobManager fine from with VM. Could there be some
>>>>> invalid or
>>>>> > missing configuration on my side?
>>>>> >
>>>>> > Cheers,
>>>>> >
>>>>> > Pieter
>>>>> >
>>>>> >
>>>>> > 2016-02-06 12:54 GMT+01:00 Robert Metzger <rm...@apache.org>:
>>>>> >>
>>>>> >> Hi,
>>>>> >>
>>>>> >> did you check the logs of the JobManager itself? Maybe it'll tell us
>>>>> >> already whats going on.
>>>>> >>
>>>>> >> On Sat, Feb 6, 2016 at 12:14 PM, Pieter Hameete <phameete@gmail.com
>>>>> >
>>>>> >> wrote:
>>>>> >>>
>>>>> >>> Hi Guys!
>>>>> >>>
>>>>> >>> Im attempting to run Flink on YARN, but I run into an issue. Im
>>>>> starting
>>>>> >>> the Flink YARN session from an Ubuntu 14.04 VM. All goes well
>>>>> until after
>>>>> >>> the JobManager web UI is started:
>>>>> >>>
>>>>> >>> JobManager web interface address
>>>>> >>>
>>>>> http://head05.hathi.surfsara.nl:8088/proxy/application_1452780322684_10532/
>>>>> >>> Waiting until all TaskManagers have connected
>>>>> >>> 11:09:51,557 INFO  org.apache.flink.yarn.ApplicationClient
>>>>> >>> - Notification about new leader address
>>>>> >>> akka.tcp://flink@145.100.41.148:35666/user/jobmanager with
>>>>> session ID null.
>>>>> >>> No status updates from the YARN cluster received so far. Waiting
>>>>> ...
>>>>> >>> 11:09:51,578 INFO  org.apache.flink.yarn.ApplicationClient
>>>>> >>> - Received address of new leader
>>>>> >>> akka.tcp://flink@145.100.41.148:35666/user/jobmanager with
>>>>> session ID null.
>>>>> >>> 11:09:51,583 INFO  org.apache.flink.yarn.ApplicationClient
>>>>> >>> - Disconnect from JobManager null.
>>>>> >>> 11:09:51,595 INFO  org.apache.flink.yarn.ApplicationClient
>>>>> >>> - Trying to register at JobManager
>>>>> >>> akka.tcp://flink@145.100.41.148:35666/user/jobmanager.
>>>>> >>> No status updates from the YARN cluster received so far. Waiting
>>>>> ...
>>>>> >>> No status updates from the YARN cluster received so far. Waiting
>>>>> ...
>>>>> >>>
>>>>> >>> It then hangs on these last steps (trying to register, no status
>>>>> >>> updates..)
>>>>> >>>
>>>>> >>> Im sure there must be a problem on my side that is causing me not
>>>>> to be
>>>>> >>> able to register at the JobManager. What could cause such
>>>>> connection
>>>>> >>> problems?
>>>>> >>>
>>>>> >>> Any tips are very welcome :-)
>>>>> >>>
>>>>> >>> Cheers and have a good weekend!
>>>>> >>>
>>>>> >>> - Pieter
>>>>> >>>
>>>>> >>>
>>>>> >>
>>>>> >
>>>>>
>>>>
>>>>
>>>
>>
>

Re: Flink on YARN: Stuck on "Trying to register at JobManager"

Posted by Robert Metzger <rm...@apache.org>.

Hi,

we had other users with a similar issue as well. There is a configuration
value which allows you to specify a single port or a range of ports for the
JobManager to allocate when running on YARN.
Note that when using this with a single port, the JMs may collide.



On Sun, Feb 7, 2016 at 7:25 PM, Pieter Hameete <ph...@gmail.com> wrote:

> Hi Stephan,
>
> surely it seems this way! I must not be the first with this issue though?
> I'll have to contact the cluster admins to find a solution together. What
> would be a way of make the JobManagers accessible from outside the network,
> because the IP and port number changes every time.
>
> Alternatively, I can ask for ssh access to a node within the network. that
> will surely work but it's not my preferred solution.
>
> - Pieter
>
> 2016-02-06 16:22 GMT+01:00 Stephan Ewen <se...@apache.org>:
>
>> Yeah, sounds a lot like the client cannot connect to the JobManager port.
>>
>> The ports to communicate with HDFS and the YARN resource manager may be
>> whitelisted r forwarded, so you can submit the YARN session, but then not
>> connect to the JobManager afterwards.
>>
>>
>>
>> On Sat, Feb 6, 2016 at 2:11 PM, Pieter Hameete <ph...@gmail.com>
>> wrote:
>>
>>> Hi Max!
>>>
>>> I'm using Flink 0.10.1 and indeed the cluster seems to be created fine,
>>> all in the JobManager Web UI looks good.
>>>
>>> It seems like the JobManager initiates the connection with my VM and
>>> cannot reach it. It could be that this is similar to the problem here:
>>>
>>>
>>> http://apache-spark-user-list.1001560.n3.nabble.com/spark-with-docker-errors-with-akka-NAT-td7702.html
>>>
>>> I probably have to make some changes to the networking configuration of
>>> my VM so it can be reached by the JobManager despite using a different port
>>> each time.
>>>
>>> - Pieter
>>>
>>> 2016-02-06 14:05 GMT+01:00 Maximilian Michels <mx...@apache.org>:
>>>
>>>> Hi Pieter,
>>>>
>>>> Which version of Flink are you using? It appears you've created a
>>>> Flink YARN cluster but you can't reach the JobManager afterwards.
>>>>
>>>> Cheers,
>>>> Max
>>>>
>>>> On Sat, Feb 6, 2016 at 1:42 PM, Pieter Hameete <ph...@gmail.com>
>>>> wrote:
>>>> > Hi Robert,
>>>> >
>>>> > unfortunately there are no signs of what is going wrong in the logs.
>>>> The
>>>> > last log messages are about succesful registration of the
>>>> TaskManagers.
>>>> >
>>>> > I'm also fairly sure it must be something in my VM that is causing
>>>> this,
>>>> > because when I start the yarn-session from a login node that is on
>>>> the same
>>>> > network as the hadoop cluster there are no problems registering with
>>>> the
>>>> > JobManager. I did also notice the following message in the local
>>>> console:
>>>> >
>>>> > 12:30:27,173 WARN  Remoting
>>>> > - Tried to associate with unreachable remote address
>>>> > [akka.tcp://flink@145.100.41.13:41539]. Address is now gated for
>>>> 5000 ms,
>>>> > all messages to this address will be delivered to dead letters.
>>>> Reason:
>>>> > connection timed out: /145.100.41.13:41539
>>>> >
>>>> > I can ping the JobManager fine from with VM. Could there be some
>>>> invalid or
>>>> > missing configuration on my side?
>>>> >
>>>> > Cheers,
>>>> >
>>>> > Pieter
>>>> >
>>>> >
>>>> > 2016-02-06 12:54 GMT+01:00 Robert Metzger <rm...@apache.org>:
>>>> >>
>>>> >> Hi,
>>>> >>
>>>> >> did you check the logs of the JobManager itself? Maybe it'll tell us
>>>> >> already whats going on.
>>>> >>
>>>> >> On Sat, Feb 6, 2016 at 12:14 PM, Pieter Hameete <ph...@gmail.com>
>>>> >> wrote:
>>>> >>>
>>>> >>> Hi Guys!
>>>> >>>
>>>> >>> Im attempting to run Flink on YARN, but I run into an issue. Im
>>>> starting
>>>> >>> the Flink YARN session from an Ubuntu 14.04 VM. All goes well until
>>>> after
>>>> >>> the JobManager web UI is started:
>>>> >>>
>>>> >>> JobManager web interface address
>>>> >>>
>>>> http://head05.hathi.surfsara.nl:8088/proxy/application_1452780322684_10532/
>>>> >>> Waiting until all TaskManagers have connected
>>>> >>> 11:09:51,557 INFO  org.apache.flink.yarn.ApplicationClient
>>>> >>> - Notification about new leader address
>>>> >>> akka.tcp://flink@145.100.41.148:35666/user/jobmanager with session
>>>> ID null.
>>>> >>> No status updates from the YARN cluster received so far. Waiting ...
>>>> >>> 11:09:51,578 INFO  org.apache.flink.yarn.ApplicationClient
>>>> >>> - Received address of new leader
>>>> >>> akka.tcp://flink@145.100.41.148:35666/user/jobmanager with session
>>>> ID null.
>>>> >>> 11:09:51,583 INFO  org.apache.flink.yarn.ApplicationClient
>>>> >>> - Disconnect from JobManager null.
>>>> >>> 11:09:51,595 INFO  org.apache.flink.yarn.ApplicationClient
>>>> >>> - Trying to register at JobManager
>>>> >>> akka.tcp://flink@145.100.41.148:35666/user/jobmanager.
>>>> >>> No status updates from the YARN cluster received so far. Waiting ...
>>>> >>> No status updates from the YARN cluster received so far. Waiting ...
>>>> >>>
>>>> >>> It then hangs on these last steps (trying to register, no status
>>>> >>> updates..)
>>>> >>>
>>>> >>> Im sure there must be a problem on my side that is causing me not
>>>> to be
>>>> >>> able to register at the JobManager. What could cause such connection
>>>> >>> problems?
>>>> >>>
>>>> >>> Any tips are very welcome :-)
>>>> >>>
>>>> >>> Cheers and have a good weekend!
>>>> >>>
>>>> >>> - Pieter
>>>> >>>
>>>> >>>
>>>> >>
>>>> >
>>>>
>>>
>>>
>>
>

Re: Flink on YARN: Stuck on "Trying to register at JobManager"

Posted by Pieter Hameete <ph...@gmail.com>.

Hi Stephan,

surely it seems this way! I must not be the first with this issue though?
I'll have to contact the cluster admins to find a solution together. What
would be a way of make the JobManagers accessible from outside the network,
because the IP and port number changes every time.

Alternatively, I can ask for ssh access to a node within the network. that
will surely work but it's not my preferred solution.

- Pieter

2016-02-06 16:22 GMT+01:00 Stephan Ewen <se...@apache.org>:

> Yeah, sounds a lot like the client cannot connect to the JobManager port.
>
> The ports to communicate with HDFS and the YARN resource manager may be
> whitelisted r forwarded, so you can submit the YARN session, but then not
> connect to the JobManager afterwards.
>
>
>
> On Sat, Feb 6, 2016 at 2:11 PM, Pieter Hameete <ph...@gmail.com> wrote:
>
>> Hi Max!
>>
>> I'm using Flink 0.10.1 and indeed the cluster seems to be created fine,
>> all in the JobManager Web UI looks good.
>>
>> It seems like the JobManager initiates the connection with my VM and
>> cannot reach it. It could be that this is similar to the problem here:
>>
>>
>> http://apache-spark-user-list.1001560.n3.nabble.com/spark-with-docker-errors-with-akka-NAT-td7702.html
>>
>> I probably have to make some changes to the networking configuration of
>> my VM so it can be reached by the JobManager despite using a different port
>> each time.
>>
>> - Pieter
>>
>> 2016-02-06 14:05 GMT+01:00 Maximilian Michels <mx...@apache.org>:
>>
>>> Hi Pieter,
>>>
>>> Which version of Flink are you using? It appears you've created a
>>> Flink YARN cluster but you can't reach the JobManager afterwards.
>>>
>>> Cheers,
>>> Max
>>>
>>> On Sat, Feb 6, 2016 at 1:42 PM, Pieter Hameete <ph...@gmail.com>
>>> wrote:
>>> > Hi Robert,
>>> >
>>> > unfortunately there are no signs of what is going wrong in the logs.
>>> The
>>> > last log messages are about succesful registration of the TaskManagers.
>>> >
>>> > I'm also fairly sure it must be something in my VM that is causing
>>> this,
>>> > because when I start the yarn-session from a login node that is on the
>>> same
>>> > network as the hadoop cluster there are no problems registering with
>>> the
>>> > JobManager. I did also notice the following message in the local
>>> console:
>>> >
>>> > 12:30:27,173 WARN  Remoting
>>> > - Tried to associate with unreachable remote address
>>> > [akka.tcp://flink@145.100.41.13:41539]. Address is now gated for 5000
>>> ms,
>>> > all messages to this address will be delivered to dead letters. Reason:
>>> > connection timed out: /145.100.41.13:41539
>>> >
>>> > I can ping the JobManager fine from with VM. Could there be some
>>> invalid or
>>> > missing configuration on my side?
>>> >
>>> > Cheers,
>>> >
>>> > Pieter
>>> >
>>> >
>>> > 2016-02-06 12:54 GMT+01:00 Robert Metzger <rm...@apache.org>:
>>> >>
>>> >> Hi,
>>> >>
>>> >> did you check the logs of the JobManager itself? Maybe it'll tell us
>>> >> already whats going on.
>>> >>
>>> >> On Sat, Feb 6, 2016 at 12:14 PM, Pieter Hameete <ph...@gmail.com>
>>> >> wrote:
>>> >>>
>>> >>> Hi Guys!
>>> >>>
>>> >>> Im attempting to run Flink on YARN, but I run into an issue. Im
>>> starting
>>> >>> the Flink YARN session from an Ubuntu 14.04 VM. All goes well until
>>> after
>>> >>> the JobManager web UI is started:
>>> >>>
>>> >>> JobManager web interface address
>>> >>>
>>> http://head05.hathi.surfsara.nl:8088/proxy/application_1452780322684_10532/
>>> >>> Waiting until all TaskManagers have connected
>>> >>> 11:09:51,557 INFO  org.apache.flink.yarn.ApplicationClient
>>> >>> - Notification about new leader address
>>> >>> akka.tcp://flink@145.100.41.148:35666/user/jobmanager with session
>>> ID null.
>>> >>> No status updates from the YARN cluster received so far. Waiting ...
>>> >>> 11:09:51,578 INFO  org.apache.flink.yarn.ApplicationClient
>>> >>> - Received address of new leader
>>> >>> akka.tcp://flink@145.100.41.148:35666/user/jobmanager with session
>>> ID null.
>>> >>> 11:09:51,583 INFO  org.apache.flink.yarn.ApplicationClient
>>> >>> - Disconnect from JobManager null.
>>> >>> 11:09:51,595 INFO  org.apache.flink.yarn.ApplicationClient
>>> >>> - Trying to register at JobManager
>>> >>> akka.tcp://flink@145.100.41.148:35666/user/jobmanager.
>>> >>> No status updates from the YARN cluster received so far. Waiting ...
>>> >>> No status updates from the YARN cluster received so far. Waiting ...
>>> >>>
>>> >>> It then hangs on these last steps (trying to register, no status
>>> >>> updates..)
>>> >>>
>>> >>> Im sure there must be a problem on my side that is causing me not to
>>> be
>>> >>> able to register at the JobManager. What could cause such connection
>>> >>> problems?
>>> >>>
>>> >>> Any tips are very welcome :-)
>>> >>>
>>> >>> Cheers and have a good weekend!
>>> >>>
>>> >>> - Pieter
>>> >>>
>>> >>>
>>> >>
>>> >
>>>
>>
>>
>

Re: Flink on YARN: Stuck on "Trying to register at JobManager"

Posted by Stephan Ewen <se...@apache.org>.

Yeah, sounds a lot like the client cannot connect to the JobManager port.

The ports to communicate with HDFS and the YARN resource manager may be
whitelisted r forwarded, so you can submit the YARN session, but then not
connect to the JobManager afterwards.



On Sat, Feb 6, 2016 at 2:11 PM, Pieter Hameete <ph...@gmail.com> wrote:

> Hi Max!
>
> I'm using Flink 0.10.1 and indeed the cluster seems to be created fine,
> all in the JobManager Web UI looks good.
>
> It seems like the JobManager initiates the connection with my VM and
> cannot reach it. It could be that this is similar to the problem here:
>
>
> http://apache-spark-user-list.1001560.n3.nabble.com/spark-with-docker-errors-with-akka-NAT-td7702.html
>
> I probably have to make some changes to the networking configuration of my
> VM so it can be reached by the JobManager despite using a different port
> each time.
>
> - Pieter
>
> 2016-02-06 14:05 GMT+01:00 Maximilian Michels <mx...@apache.org>:
>
>> Hi Pieter,
>>
>> Which version of Flink are you using? It appears you've created a
>> Flink YARN cluster but you can't reach the JobManager afterwards.
>>
>> Cheers,
>> Max
>>
>> On Sat, Feb 6, 2016 at 1:42 PM, Pieter Hameete <ph...@gmail.com>
>> wrote:
>> > Hi Robert,
>> >
>> > unfortunately there are no signs of what is going wrong in the logs. The
>> > last log messages are about succesful registration of the TaskManagers.
>> >
>> > I'm also fairly sure it must be something in my VM that is causing this,
>> > because when I start the yarn-session from a login node that is on the
>> same
>> > network as the hadoop cluster there are no problems registering with the
>> > JobManager. I did also notice the following message in the local
>> console:
>> >
>> > 12:30:27,173 WARN  Remoting
>> > - Tried to associate with unreachable remote address
>> > [akka.tcp://flink@145.100.41.13:41539]. Address is now gated for 5000
>> ms,
>> > all messages to this address will be delivered to dead letters. Reason:
>> > connection timed out: /145.100.41.13:41539
>> >
>> > I can ping the JobManager fine from with VM. Could there be some
>> invalid or
>> > missing configuration on my side?
>> >
>> > Cheers,
>> >
>> > Pieter
>> >
>> >
>> > 2016-02-06 12:54 GMT+01:00 Robert Metzger <rm...@apache.org>:
>> >>
>> >> Hi,
>> >>
>> >> did you check the logs of the JobManager itself? Maybe it'll tell us
>> >> already whats going on.
>> >>
>> >> On Sat, Feb 6, 2016 at 12:14 PM, Pieter Hameete <ph...@gmail.com>
>> >> wrote:
>> >>>
>> >>> Hi Guys!
>> >>>
>> >>> Im attempting to run Flink on YARN, but I run into an issue. Im
>> starting
>> >>> the Flink YARN session from an Ubuntu 14.04 VM. All goes well until
>> after
>> >>> the JobManager web UI is started:
>> >>>
>> >>> JobManager web interface address
>> >>>
>> http://head05.hathi.surfsara.nl:8088/proxy/application_1452780322684_10532/
>> >>> Waiting until all TaskManagers have connected
>> >>> 11:09:51,557 INFO  org.apache.flink.yarn.ApplicationClient
>> >>> - Notification about new leader address
>> >>> akka.tcp://flink@145.100.41.148:35666/user/jobmanager with session
>> ID null.
>> >>> No status updates from the YARN cluster received so far. Waiting ...
>> >>> 11:09:51,578 INFO  org.apache.flink.yarn.ApplicationClient
>> >>> - Received address of new leader
>> >>> akka.tcp://flink@145.100.41.148:35666/user/jobmanager with session
>> ID null.
>> >>> 11:09:51,583 INFO  org.apache.flink.yarn.ApplicationClient
>> >>> - Disconnect from JobManager null.
>> >>> 11:09:51,595 INFO  org.apache.flink.yarn.ApplicationClient
>> >>> - Trying to register at JobManager
>> >>> akka.tcp://flink@145.100.41.148:35666/user/jobmanager.
>> >>> No status updates from the YARN cluster received so far. Waiting ...
>> >>> No status updates from the YARN cluster received so far. Waiting ...
>> >>>
>> >>> It then hangs on these last steps (trying to register, no status
>> >>> updates..)
>> >>>
>> >>> Im sure there must be a problem on my side that is causing me not to
>> be
>> >>> able to register at the JobManager. What could cause such connection
>> >>> problems?
>> >>>
>> >>> Any tips are very welcome :-)
>> >>>
>> >>> Cheers and have a good weekend!
>> >>>
>> >>> - Pieter
>> >>>
>> >>>
>> >>
>> >
>>
>
>

Re: Flink on YARN: Stuck on "Trying to register at JobManager"

Posted by Pieter Hameete <ph...@gmail.com>.

Hi Max!

I'm using Flink 0.10.1 and indeed the cluster seems to be created fine, all
in the JobManager Web UI looks good.

It seems like the JobManager initiates the connection with my VM and cannot
reach it. It could be that this is similar to the problem here:

http://apache-spark-user-list.1001560.n3.nabble.com/spark-with-docker-errors-with-akka-NAT-td7702.html

I probably have to make some changes to the networking configuration of my
VM so it can be reached by the JobManager despite using a different port
each time.

- Pieter

2016-02-06 14:05 GMT+01:00 Maximilian Michels <mx...@apache.org>:

> Hi Pieter,
>
> Which version of Flink are you using? It appears you've created a
> Flink YARN cluster but you can't reach the JobManager afterwards.
>
> Cheers,
> Max
>
> On Sat, Feb 6, 2016 at 1:42 PM, Pieter Hameete <ph...@gmail.com> wrote:
> > Hi Robert,
> >
> > unfortunately there are no signs of what is going wrong in the logs. The
> > last log messages are about succesful registration of the TaskManagers.
> >
> > I'm also fairly sure it must be something in my VM that is causing this,
> > because when I start the yarn-session from a login node that is on the
> same
> > network as the hadoop cluster there are no problems registering with the
> > JobManager. I did also notice the following message in the local console:
> >
> > 12:30:27,173 WARN  Remoting
> > - Tried to associate with unreachable remote address
> > [akka.tcp://flink@145.100.41.13:41539]. Address is now gated for 5000
> ms,
> > all messages to this address will be delivered to dead letters. Reason:
> > connection timed out: /145.100.41.13:41539
> >
> > I can ping the JobManager fine from with VM. Could there be some invalid
> or
> > missing configuration on my side?
> >
> > Cheers,
> >
> > Pieter
> >
> >
> > 2016-02-06 12:54 GMT+01:00 Robert Metzger <rm...@apache.org>:
> >>
> >> Hi,
> >>
> >> did you check the logs of the JobManager itself? Maybe it'll tell us
> >> already whats going on.
> >>
> >> On Sat, Feb 6, 2016 at 12:14 PM, Pieter Hameete <ph...@gmail.com>
> >> wrote:
> >>>
> >>> Hi Guys!
> >>>
> >>> Im attempting to run Flink on YARN, but I run into an issue. Im
> starting
> >>> the Flink YARN session from an Ubuntu 14.04 VM. All goes well until
> after
> >>> the JobManager web UI is started:
> >>>
> >>> JobManager web interface address
> >>>
> http://head05.hathi.surfsara.nl:8088/proxy/application_1452780322684_10532/
> >>> Waiting until all TaskManagers have connected
> >>> 11:09:51,557 INFO  org.apache.flink.yarn.ApplicationClient
> >>> - Notification about new leader address
> >>> akka.tcp://flink@145.100.41.148:35666/user/jobmanager with session ID
> null.
> >>> No status updates from the YARN cluster received so far. Waiting ...
> >>> 11:09:51,578 INFO  org.apache.flink.yarn.ApplicationClient
> >>> - Received address of new leader
> >>> akka.tcp://flink@145.100.41.148:35666/user/jobmanager with session ID
> null.
> >>> 11:09:51,583 INFO  org.apache.flink.yarn.ApplicationClient
> >>> - Disconnect from JobManager null.
> >>> 11:09:51,595 INFO  org.apache.flink.yarn.ApplicationClient
> >>> - Trying to register at JobManager
> >>> akka.tcp://flink@145.100.41.148:35666/user/jobmanager.
> >>> No status updates from the YARN cluster received so far. Waiting ...
> >>> No status updates from the YARN cluster received so far. Waiting ...
> >>>
> >>> It then hangs on these last steps (trying to register, no status
> >>> updates..)
> >>>
> >>> Im sure there must be a problem on my side that is causing me not to be
> >>> able to register at the JobManager. What could cause such connection
> >>> problems?
> >>>
> >>> Any tips are very welcome :-)
> >>>
> >>> Cheers and have a good weekend!
> >>>
> >>> - Pieter
> >>>
> >>>
> >>
> >
>

Re: Flink on YARN: Stuck on "Trying to register at JobManager"

Posted by Maximilian Michels <mx...@apache.org>.

Hi Pieter,

Which version of Flink are you using? It appears you've created a
Flink YARN cluster but you can't reach the JobManager afterwards.

Cheers,
Max

On Sat, Feb 6, 2016 at 1:42 PM, Pieter Hameete <ph...@gmail.com> wrote:
> Hi Robert,
>
> unfortunately there are no signs of what is going wrong in the logs. The
> last log messages are about succesful registration of the TaskManagers.
>
> I'm also fairly sure it must be something in my VM that is causing this,
> because when I start the yarn-session from a login node that is on the same
> network as the hadoop cluster there are no problems registering with the
> JobManager. I did also notice the following message in the local console:
>
> 12:30:27,173 WARN  Remoting
> - Tried to associate with unreachable remote address
> [akka.tcp://flink@145.100.41.13:41539]. Address is now gated for 5000 ms,
> all messages to this address will be delivered to dead letters. Reason:
> connection timed out: /145.100.41.13:41539
>
> I can ping the JobManager fine from with VM. Could there be some invalid or
> missing configuration on my side?
>
> Cheers,
>
> Pieter
>
>
> 2016-02-06 12:54 GMT+01:00 Robert Metzger <rm...@apache.org>:
>>
>> Hi,
>>
>> did you check the logs of the JobManager itself? Maybe it'll tell us
>> already whats going on.
>>
>> On Sat, Feb 6, 2016 at 12:14 PM, Pieter Hameete <ph...@gmail.com>
>> wrote:
>>>
>>> Hi Guys!
>>>
>>> Im attempting to run Flink on YARN, but I run into an issue. Im starting
>>> the Flink YARN session from an Ubuntu 14.04 VM. All goes well until after
>>> the JobManager web UI is started:
>>>
>>> JobManager web interface address
>>> http://head05.hathi.surfsara.nl:8088/proxy/application_1452780322684_10532/
>>> Waiting until all TaskManagers have connected
>>> 11:09:51,557 INFO  org.apache.flink.yarn.ApplicationClient
>>> - Notification about new leader address
>>> akka.tcp://flink@145.100.41.148:35666/user/jobmanager with session ID null.
>>> No status updates from the YARN cluster received so far. Waiting ...
>>> 11:09:51,578 INFO  org.apache.flink.yarn.ApplicationClient
>>> - Received address of new leader
>>> akka.tcp://flink@145.100.41.148:35666/user/jobmanager with session ID null.
>>> 11:09:51,583 INFO  org.apache.flink.yarn.ApplicationClient
>>> - Disconnect from JobManager null.
>>> 11:09:51,595 INFO  org.apache.flink.yarn.ApplicationClient
>>> - Trying to register at JobManager
>>> akka.tcp://flink@145.100.41.148:35666/user/jobmanager.
>>> No status updates from the YARN cluster received so far. Waiting ...
>>> No status updates from the YARN cluster received so far. Waiting ...
>>>
>>> It then hangs on these last steps (trying to register, no status
>>> updates..)
>>>
>>> Im sure there must be a problem on my side that is causing me not to be
>>> able to register at the JobManager. What could cause such connection
>>> problems?
>>>
>>> Any tips are very welcome :-)
>>>
>>> Cheers and have a good weekend!
>>>
>>> - Pieter
>>>
>>>
>>
>

Re: Flink on YARN: Stuck on "Trying to register at JobManager"

Posted by Pieter Hameete <ph...@gmail.com>.

Hi Robert,

unfortunately there are no signs of what is going wrong in the logs. The
last log messages are about succesful registration of the TaskManagers.

I'm also fairly sure it must be something in my VM that is causing this,
because when I start the yarn-session from a login node that is on the same
network as the hadoop cluster there are no problems registering with the
JobManager. I did also notice the following message in the local console:

12:30:27,173 WARN  Remoting
     - Tried to associate with unreachable remote address [akka.tcp://
flink@145.100.41.13:41539]. Address is now gated for 5000 ms, all messages
to this address will be delivered to dead letters. Reason: connection timed
out: /145.100.41.13:41539

I can ping the JobManager fine from with VM. Could there be some invalid or
missing configuration on my side?

Cheers,

Pieter


2016-02-06 12:54 GMT+01:00 Robert Metzger <rm...@apache.org>:

> Hi,
>
> did you check the logs of the JobManager itself? Maybe it'll tell us
> already whats going on.
>
> On Sat, Feb 6, 2016 at 12:14 PM, Pieter Hameete <ph...@gmail.com>
> wrote:
>
>> Hi Guys!
>>
>> Im attempting to run Flink on YARN, but I run into an issue. Im starting
>> the Flink YARN session from an Ubuntu 14.04 VM. All goes well until after
>> the JobManager web UI is started:
>>
>> JobManager web interface address
>> http://head05.hathi.surfsara.nl:8088/proxy/application_1452780322684_10532/
>> Waiting until all TaskManagers have connected
>> 11:09:51,557 INFO  org.apache.flink.yarn.ApplicationClient
>>         - Notification about new leader address akka.tcp://
>> flink@145.100.41.148:35666/user/jobmanager with session ID null.
>> No status updates from the YARN cluster received so far. Waiting ...
>> 11:09:51,578 INFO  org.apache.flink.yarn.ApplicationClient
>>         - Received address of new leader akka.tcp://
>> flink@145.100.41.148:35666/user/jobmanager with session ID null.
>> 11:09:51,583 INFO  org.apache.flink.yarn.ApplicationClient
>>         - Disconnect from JobManager null.
>> 11:09:51,595 INFO  org.apache.flink.yarn.ApplicationClient
>>         - Trying to register at JobManager akka.tcp://
>> flink@145.100.41.148:35666/user/jobmanager.
>> No status updates from the YARN cluster received so far. Waiting ...
>> No status updates from the YARN cluster received so far. Waiting ...
>>
>> It then hangs on these last steps (trying to register, no status
>> updates..)
>>
>> Im sure there must be a problem on my side that is causing me not to be
>> able to register at the JobManager. What could cause such connection
>> problems?
>>
>> Any tips are very welcome :-)
>>
>> Cheers and have a good weekend!
>>
>> - Pieter
>>
>>
>>
>

Re: Flink on YARN: Stuck on "Trying to register at JobManager"

Posted by Robert Metzger <rm...@apache.org>.

Hi,

did you check the logs of the JobManager itself? Maybe it'll tell us
already whats going on.

On Sat, Feb 6, 2016 at 12:14 PM, Pieter Hameete <ph...@gmail.com> wrote:

> Hi Guys!
>
> Im attempting to run Flink on YARN, but I run into an issue. Im starting
> the Flink YARN session from an Ubuntu 14.04 VM. All goes well until after
> the JobManager web UI is started:
>
> JobManager web interface address
> http://head05.hathi.surfsara.nl:8088/proxy/application_1452780322684_10532/
> Waiting until all TaskManagers have connected
> 11:09:51,557 INFO  org.apache.flink.yarn.ApplicationClient
>       - Notification about new leader address akka.tcp://
> flink@145.100.41.148:35666/user/jobmanager with session ID null.
> No status updates from the YARN cluster received so far. Waiting ...
> 11:09:51,578 INFO  org.apache.flink.yarn.ApplicationClient
>       - Received address of new leader akka.tcp://
> flink@145.100.41.148:35666/user/jobmanager with session ID null.
> 11:09:51,583 INFO  org.apache.flink.yarn.ApplicationClient
>       - Disconnect from JobManager null.
> 11:09:51,595 INFO  org.apache.flink.yarn.ApplicationClient
>       - Trying to register at JobManager akka.tcp://
> flink@145.100.41.148:35666/user/jobmanager.
> No status updates from the YARN cluster received so far. Waiting ...
> No status updates from the YARN cluster received so far. Waiting ...
>
> It then hangs on these last steps (trying to register, no status updates..)
>
> Im sure there must be a problem on my side that is causing me not to be
> able to register at the JobManager. What could cause such connection
> problems?
>
> Any tips are very welcome :-)
>
> Cheers and have a good weekend!
>
> - Pieter
>
>
>