You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Vinay Patil <vi...@gmail.com> on 2018/10/04 18:30:36 UTC

Unable to start session cluster using Docker

Hi,

I have used the docker-compose file for creating the cluster as shown in
the documentation. The web ui is started successfully, however, the task
managers are unable to join.

Job Manager container logs:

018-10-04 18:13:13,907 INFO
org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint    - Rest
endpoint listening at cluster:8081

2018-10-04 18:13:13,907 INFO
org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint    -
http://cluster:8081 was granted leadership with
leaderSessionID=00000000-0000-0000-0000-000000000000

2018-10-04 18:13:13,907 INFO
org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint    - Web
frontend listening at http://cluster:8081

2018-10-04 18:13:14,012 INFO
org.apache.flink.runtime.resourcemanager.StandaloneResourceManager  -
ResourceManager akka.tcp://flink@cluster:6123/user/resourcemanager was
granted leadership with fencing token 00000000000000000000000000000000

2018-10-04 18:13:14,013 INFO
org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager  -
Starting the SlotManager.

2018-10-04 18:13:14,026 INFO
org.apache.flink.runtime.dispatcher.StandaloneDispatcher      - Dispatcher
akka.tcp://flink@cluster:6123/user/dispatcher was granted leadership with
fencing token 00000000-0000-0000-0000-000000000000

Not sure why it says Web Frontend listening at cluster:8081 when the job
manager rpc address is specified to jobmanager

Task Manager Container Logs:

018-10-04 18:19:18,818 INFO
org.apache.flink.runtime.taskexecutor.TaskExecutor            - Connecting
to ResourceManager akka.tcp://flink@jobmanager
:6123/user/resourcemanager(00000000000000000000000000000000).

2018-10-04 18:19:18,818 INFO  org.apache.flink.runtime.filecache.FileCache
                - User file cache uses directory
/tmp/flink-dist-cache-1bd95c51-3031-42ab-b782-14a0023921e5

2018-10-04 18:19:28,850 INFO
org.apache.flink.runtime.taskexecutor.TaskExecutor            - Could not
resolve ResourceManager address
akka.tcp://flink@jobmanager:6123/user/resourcemanager,
retrying in 10000 ms: Ask timed out on
[ActorSelection[Anchor(akka.tcp://flink@jobmanager:6123/),
Path(/user/resourcemanager)]] after [10000 ms]. Sender[null] sent message
of type "akka.actor.Identify".


I have even tried to set JOB_MANAGER_RPC_ADDRESS=cluster in   in
docker-compose file, it does not work.
Even "cluster" and "jobmanager" points to localhost in /etc/hosts file.

Can you please let me know what is the issue here.

Regards,
Vinay Patil

Re: Unable to start session cluster using Docker

Posted by Vinay Patil <vi...@gmail.com>.
Thank you Till, I am able to start the session-cluster now.

Regards,
Vinay Patil


On Fri, Oct 5, 2018 at 8:15 PM Till Rohrmann <tr...@apache.org> wrote:

> Hi Vinay,
>
> are you referring to flink-contrib/docker-flink/docker-compose.yml? We
> recently fixed the command line parsing with Flink 1.5.4 and 1.6.1. Due to
> this, the removal of the second command line parameter intended to be
> introduced with 1.5.0 and 1.6.0 (see
> https://issues.apache.org/jira/browse/FLINK-8696) became visible. The
> docker-compose.yml file has not yet been updated. I will do this right away
> and push the changes to the 1.5, 1.6 and master branch. Sorry for the
> inconveniences. As a local fix for you, please go to
> flink-contrib/docker-flink/docker-entrypoint.sh:33 and remove the cluster
> parameter from this line.
>
> Cheers,
> Till
>
> On Thu, Oct 4, 2018 at 8:30 PM Vinay Patil <vi...@gmail.com>
> wrote:
>
>> Hi,
>>
>> I have used the docker-compose file for creating the cluster as shown in
>> the documentation. The web ui is started successfully, however, the task
>> managers are unable to join.
>>
>> Job Manager container logs:
>>
>> 018-10-04 18:13:13,907 INFO
>> org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint    - Rest
>> endpoint listening at cluster:8081
>>
>> 2018-10-04 18:13:13,907 INFO
>> org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint    -
>> http://cluster:8081 was granted leadership with
>> leaderSessionID=00000000-0000-0000-0000-000000000000
>>
>> 2018-10-04 18:13:13,907 INFO
>> org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint    - Web
>> frontend listening at http://cluster:8081
>>
>> 2018-10-04 18:13:14,012 INFO
>> org.apache.flink.runtime.resourcemanager.StandaloneResourceManager  -
>> ResourceManager akka.tcp://flink@cluster:6123/user/resourcemanager was
>> granted leadership with fencing token 00000000000000000000000000000000
>>
>> 2018-10-04 18:13:14,013 INFO
>> org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager  -
>> Starting the SlotManager.
>>
>> 2018-10-04 18:13:14,026 INFO
>> org.apache.flink.runtime.dispatcher.StandaloneDispatcher      - Dispatcher
>> akka.tcp://flink@cluster:6123/user/dispatcher was granted leadership
>> with fencing token 00000000-0000-0000-0000-000000000000
>>
>> Not sure why it says Web Frontend listening at cluster:8081 when the job
>> manager rpc address is specified to jobmanager
>>
>> Task Manager Container Logs:
>>
>> 018-10-04 18:19:18,818 INFO
>> org.apache.flink.runtime.taskexecutor.TaskExecutor            - Connecting
>> to ResourceManager akka.tcp://flink@jobmanager
>> :6123/user/resourcemanager(00000000000000000000000000000000).
>>
>> 2018-10-04 18:19:18,818 INFO
>> org.apache.flink.runtime.filecache.FileCache                  - User file
>> cache uses directory
>> /tmp/flink-dist-cache-1bd95c51-3031-42ab-b782-14a0023921e5
>>
>> 2018-10-04 18:19:28,850 INFO
>> org.apache.flink.runtime.taskexecutor.TaskExecutor            - Could not
>> resolve ResourceManager address akka.tcp://flink@jobmanager:6123/user/resourcemanager,
>> retrying in 10000 ms: Ask timed out on
>> [ActorSelection[Anchor(akka.tcp://flink@jobmanager:6123/),
>> Path(/user/resourcemanager)]] after [10000 ms]. Sender[null] sent message
>> of type "akka.actor.Identify".
>>
>>
>> I have even tried to set JOB_MANAGER_RPC_ADDRESS=cluster in   in
>> docker-compose file, it does not work.
>> Even "cluster" and "jobmanager" points to localhost in /etc/hosts file.
>>
>> Can you please let me know what is the issue here.
>>
>> Regards,
>> Vinay Patil
>>
>

Re: Unable to start session cluster using Docker

Posted by Till Rohrmann <tr...@apache.org>.
Hi Vinay,

are you referring to flink-contrib/docker-flink/docker-compose.yml? We
recently fixed the command line parsing with Flink 1.5.4 and 1.6.1. Due to
this, the removal of the second command line parameter intended to be
introduced with 1.5.0 and 1.6.0 (see
https://issues.apache.org/jira/browse/FLINK-8696) became visible. The
docker-compose.yml file has not yet been updated. I will do this right away
and push the changes to the 1.5, 1.6 and master branch. Sorry for the
inconveniences. As a local fix for you, please go to
flink-contrib/docker-flink/docker-entrypoint.sh:33 and remove the cluster
parameter from this line.

Cheers,
Till

On Thu, Oct 4, 2018 at 8:30 PM Vinay Patil <vi...@gmail.com> wrote:

> Hi,
>
> I have used the docker-compose file for creating the cluster as shown in
> the documentation. The web ui is started successfully, however, the task
> managers are unable to join.
>
> Job Manager container logs:
>
> 018-10-04 18:13:13,907 INFO
> org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint    - Rest
> endpoint listening at cluster:8081
>
> 2018-10-04 18:13:13,907 INFO
> org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint    -
> http://cluster:8081 was granted leadership with
> leaderSessionID=00000000-0000-0000-0000-000000000000
>
> 2018-10-04 18:13:13,907 INFO
> org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint    - Web
> frontend listening at http://cluster:8081
>
> 2018-10-04 18:13:14,012 INFO
> org.apache.flink.runtime.resourcemanager.StandaloneResourceManager  -
> ResourceManager akka.tcp://flink@cluster:6123/user/resourcemanager was
> granted leadership with fencing token 00000000000000000000000000000000
>
> 2018-10-04 18:13:14,013 INFO
> org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager  -
> Starting the SlotManager.
>
> 2018-10-04 18:13:14,026 INFO
> org.apache.flink.runtime.dispatcher.StandaloneDispatcher      - Dispatcher
> akka.tcp://flink@cluster:6123/user/dispatcher was granted leadership with
> fencing token 00000000-0000-0000-0000-000000000000
>
> Not sure why it says Web Frontend listening at cluster:8081 when the job
> manager rpc address is specified to jobmanager
>
> Task Manager Container Logs:
>
> 018-10-04 18:19:18,818 INFO
> org.apache.flink.runtime.taskexecutor.TaskExecutor            - Connecting
> to ResourceManager akka.tcp://flink@jobmanager
> :6123/user/resourcemanager(00000000000000000000000000000000).
>
> 2018-10-04 18:19:18,818 INFO
> org.apache.flink.runtime.filecache.FileCache                  - User file
> cache uses directory
> /tmp/flink-dist-cache-1bd95c51-3031-42ab-b782-14a0023921e5
>
> 2018-10-04 18:19:28,850 INFO
> org.apache.flink.runtime.taskexecutor.TaskExecutor            - Could not
> resolve ResourceManager address akka.tcp://flink@jobmanager:6123/user/resourcemanager,
> retrying in 10000 ms: Ask timed out on
> [ActorSelection[Anchor(akka.tcp://flink@jobmanager:6123/),
> Path(/user/resourcemanager)]] after [10000 ms]. Sender[null] sent message
> of type "akka.actor.Identify".
>
>
> I have even tried to set JOB_MANAGER_RPC_ADDRESS=cluster in   in
> docker-compose file, it does not work.
> Even "cluster" and "jobmanager" points to localhost in /etc/hosts file.
>
> Can you please let me know what is the issue here.
>
> Regards,
> Vinay Patil
>