You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by "Marchant, Hayden " <ha...@citi.com> on 2017/10/16 09:53:15 UTC

start-cluster.sh not working in HA mode

I am attempting to run Flink 1.3.2 in HA mode with zookeeper.

When I run the start-cluster.sh, the job manager is not started, even though the task manager is started. When I delved into this, I saw that the  command:

ssh -n $FLINK_SSH_OPTS $master -- "nohup /bin/bash -l \"${FLINK_BIN_DIR}/jobmanager.sh\" start cluster ${master} ${webuiport} &"

is not actually running anything on the host. i.e. I do not see "Starting jobmanager daemon on host ....."

Only when I remove ALL quotes, do I see it working. i.e. if I run:

ssh -n $FLINK_SSH_OPTS $master -- nohup /bin/bash -l ${FLINK_BIN_DIR}/jobmanager.sh start cluster ${master} ${webuiport} &

I see that it manages to run the job manager - I see " Starting jobmanager daemon on host.....".

Did anyone else experience a similar problem? Any elegant workarounds without having to change source code?

Thanks,
Hayden Marchant


Re: start-cluster.sh not working in HA mode

Posted by Fabian Hueske <fh...@gmail.com>.
Hi Hayden,

I tried to reproduce the problem you described and followed the HA setup
instructions of the documentation [1].
For me the instructions worked and start-cluster.sh started two JobManagers
on my local machine (master contained two localhost entries).

The bash scripts tend to be a bit fragile, especially when it comes to
handling spaces in variables and quotes.
What kind of environment are you running on (I'm on macOS) and do you try
to start the JMs on localhost or remote machines?

Best, Fabian

[1]
https://ci.apache.org/projects/flink/flink-docs-release-1.3/setup/jobmanager_high_availability.html#configuration

2017-10-16 11:53 GMT+02:00 Marchant, Hayden <ha...@citi.com>:

> I am attempting to run Flink 1.3.2 in HA mode with zookeeper.
>
> When I run the start-cluster.sh, the job manager is not started, even
> though the task manager is started. When I delved into this, I saw that
> the  command:
>
> ssh -n $FLINK_SSH_OPTS $master -- "nohup /bin/bash -l
> \"${FLINK_BIN_DIR}/jobmanager.sh\" start cluster ${master} ${webuiport} &"
>
> is not actually running anything on the host. i.e. I do not see "Starting
> jobmanager daemon on host ....."
>
> Only when I remove ALL quotes, do I see it working. i.e. if I run:
>
> ssh -n $FLINK_SSH_OPTS $master -- nohup /bin/bash -l
> ${FLINK_BIN_DIR}/jobmanager.sh start cluster ${master} ${webuiport} &
>
> I see that it manages to run the job manager - I see " Starting jobmanager
> daemon on host.....".
>
> Did anyone else experience a similar problem? Any elegant workarounds
> without having to change source code?
>
> Thanks,
> Hayden Marchant
>
>