You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Malte Schwarzer <im...@mieo.de> on 2017/01/12 15:13:47 UTC

Flink on YARN: Cannot connect to JobManager

Hi all,

I trying to run a Flink job on YARN via "$/bin/flink run -m yarn-cluster 
-yn 2 ..." with two nodes. But only one JobManager seems to be connected.

Flinks hangs at this stage (look up message repeats every second):

017-01-11 15:12:13,653 DEBUG org.apache.flink.yarn.YarnClusterClient 
                   - Looking up JobManager
2017-01-11 15:12:13,678 INFO org.apache.flink.yarn.YarnClusterClient 
                   - TaskManager status (1/2)
TaskManager status (1/2)
2017-01-11 15:12:13,929 DEBUG org.apache.flink.yarn.YarnClusterClient 
                    - Looking up JobManager
2017-01-11 15:12:14,197 DEBUG org.apache.flink.yarn.YarnClusterClient 
                    - Looking up JobManager
2017-01-11 15:12:14,451 DEBUG org.apache.hadoop.ipc.Client 
                    - IPC Client (20529812) connection to ____/10.68.17
.206:8032 from user sending #104
2017-01-11 15:12:14,452 DEBUG org.apache.hadoop.ipc.Client 
                    - IPC Client (20529812) connection to ___:8032 from 
user got value #104
2017-01-11 15:12:14,452 DEBUG org.apache.hadoop.ipc.ProtobufRpcEngine 
                    - Call: getApplicationReport took 1ms
2017-01-11 15:12:14,462 DEBUG org.apache.flink.yarn.YarnClusterClient 
                    - Looking up JobManager
2017-01-11 15:12:14,745 DEBUG org.apache.flink.yarn.YarnClusterClient 
                    - Looking up JobManager
2017-01-11 15:12:15,014 DEBUG org.apache.flink.yarn.YarnClusterClient 
                    - Looking up JobManager
2017-01-11 15:12:15,276 DEBUG org.apache.flink.yarn.YarnClusterClient 
                    - Looking up JobManager
2017-01-11 15:12:15,322 DEBUG org.apache.hadoop.ipc.Client 
                    - IPC Client (20529812) connection to ___:8020 from 
user: closed
...

Any suggestions what can cause this?

Standard MapReduce jobs work without any problem on YARN.

Best regards,
Malte

Re: Flink on YARN: Cannot connect to JobManager

Posted by Till Rohrmann <tr...@apache.org>.
Hi Malte,

can it be that you’re trying to request more resources from your yarn
cluster than there are currently available? It depends a little bit on your
other settings but -yn 2 says that you request 2 TaskManagers.
Additionally, Flink will also allocate another container for the JobManager.
Per default, the TaskManager containers and the JobManager containers will
be started with 1 GB of memory. Thus, it needs at least 3 containers with 3
GB of memory. Could you check whether you have these resources available in
your YARN cluster?

If you have them available, then it indicates a faulty behaviour. Then it
would be great if you could share the aggregated YARN logs for the Flink
application with us (available after terminating the YARN application).
This would help with the further debugging of the problem.

Cheers,
Till
​

On Thu, Jan 12, 2017 at 4:13 PM, Malte Schwarzer <im...@mieo.de> wrote:

> Hi all,
>
> I trying to run a Flink job on YARN via "$/bin/flink run -m yarn-cluster
> -yn 2 ..." with two nodes. But only one JobManager seems to be connected.
>
> Flinks hangs at this stage (look up message repeats every second):
>
> 017-01-11 15:12:13,653 DEBUG org.apache.flink.yarn.YarnClusterClient
>              - Looking up JobManager
> 2017-01-11 15:12:13,678 INFO org.apache.flink.yarn.YarnClusterClient
>              - TaskManager status (1/2)
> TaskManager status (1/2)
> 2017-01-11 15:12:13,929 DEBUG org.apache.flink.yarn.YarnClusterClient
>                 - Looking up JobManager
> 2017-01-11 15:12:14,197 DEBUG org.apache.flink.yarn.YarnClusterClient
>                 - Looking up JobManager
> 2017-01-11 15:12:14,451 DEBUG org.apache.hadoop.ipc.Client
>     - IPC Client (20529812) connection to ____/10.68.17
> .206:8032 from user sending #104
> 2017-01-11 15:12:14,452 DEBUG org.apache.hadoop.ipc.Client
>     - IPC Client (20529812) connection to ___:8032 from user got value #104
> 2017-01-11 15:12:14,452 DEBUG org.apache.hadoop.ipc.ProtobufRpcEngine
>                 - Call: getApplicationReport took 1ms
> 2017-01-11 15:12:14,462 DEBUG org.apache.flink.yarn.YarnClusterClient
>                 - Looking up JobManager
> 2017-01-11 15:12:14,745 DEBUG org.apache.flink.yarn.YarnClusterClient
>                 - Looking up JobManager
> 2017-01-11 15:12:15,014 DEBUG org.apache.flink.yarn.YarnClusterClient
>                 - Looking up JobManager
> 2017-01-11 15:12:15,276 DEBUG org.apache.flink.yarn.YarnClusterClient
>                 - Looking up JobManager
> 2017-01-11 15:12:15,322 DEBUG org.apache.hadoop.ipc.Client
>     - IPC Client (20529812) connection to ___:8020 from user: closed
> ...
>
> Any suggestions what can cause this?
>
> Standard MapReduce jobs work without any problem on YARN.
>
> Best regards,
> Malte
>