You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Yesheng Ma <ki...@gmail.com> on 2018/03/04 14:52:55 UTC

bin/start-cluster.sh won't start jobmanager on master machine

Hi all,

​​When I execute bin/start-cluster.sh on the master machine, actually the
command `nohup /bin/bash -l bin/jobmanager.sh start cluster ...` is
exexuted, which does not open the job manager properly.

I think there might be something wrong with the `-l` argument, since when I
use the `bin/jobmanager.sh start` command, everything is fine. Kindly point
out if I've done any configuration wrong. Thanks!

Best,
Yesheng

Re: bin/start-cluster.sh won't start jobmanager on master machine

Posted by Yesheng Ma <ki...@gmail.com>.
Oh, I have figured out the problem, which has something to do with my
~/.profile, i cannot remember when i added one line in the ~/.profile,
which sources my .zshrc, leading to  the login shell always goes to zsh.

On Wed, Mar 7, 2018 at 2:13 AM, Yesheng Ma <ki...@gmail.com> wrote:

> Related source code: https://github.com/apache/flink/blob/master/
> flink-dist/src/main/flink-bin/bin/start-cluster.sh#L40
>
> On Wed, Mar 7, 2018 at 2:11 AM, Yesheng Ma <ki...@gmail.com> wrote:
>
>> Hi Nico,
>>
>> Thanks for your reply. My major concern is actually the `-l` argument.
>> The command I executed is: `nohup /bin/bash -x -l
>> "/state/partition1/ysma/flink-1.4.1/bin/jobmanager.sh" start cluster
>> dell-01.epcc 8091`, with and without the `-l` argument (the script in
>> Flink's bin directory uses the `-l` argument).
>>
>> 1) with the `-l` argument: the log is quite messy, but there are some
>> clue, the last executed command starts a zsh shell:
>> ```
>> + . /home/ysma/.bashrc
>> ++ case $- in
>> ++ return
>> + PATH=/home/ysma/bin:/home/ysma/.local/bin:/state/partition1/
>> ysma/redis-4.0.8/../bin:/home/ysma/env/jdk1.8.0_151/bin:/
>> home/ysma/env/maven/bin:/home/ysma/bin:/home/ysma/.local/
>> bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/
>> sbin:/bin:/usr/games:/usr/local/games:/snap/bin
>> + '[' -f /bin/zsh ']'
>> + exec /bin/zsh -l
>> ```
>> I guess the bash -l arguments detects the user's login shell and then
>> logs in a zsh shell (which I'm currently using) and never back.
>>
>> 2) without the `-l` argument, everything just goes fine.
>>
>> Therefore I suspect there might be something wrong with the `-l`
>> argument, or something wrong with my bash config?  Any ideas? Thanks!
>>
>>
>> On Wed, Mar 7, 2018 at 12:20 AM, Nico Kruber <ni...@data-artisans.com>
>> wrote:
>>
>>> Hi Yesheng,
>>> `nohup /bin/bash -l bin/jobmanager.sh start cluster ...` looks a bit
>>> strange since it should (imho) be an absolute path towards flink.
>>>
>>> What you could do to diagnose further, is to try to run the ssh command
>>> manually, i.e. figure out what is being executed by calling
>>> bash -x ./bin/start-cluster.sh
>>> and then run the ssh command without "-n" and not in background "&".
>>> Then you should also see the JobManager stdout to diagnose further.
>>>
>>> If that does not help yet, please log into the master manually and
>>> execute the "nohup /bin/bash..." command there to see what is going on.
>>>
>>> Depending on where the failure was, there may even be logs on the master
>>> machine.
>>>
>>>
>>> Nico
>>>
>>> On 04/03/18 15:52, Yesheng Ma wrote:
>>> > Hi all,
>>> >
>>> > ​​When I execute bin/start-cluster.sh on the master machine, actually
>>> > the command `nohup /bin/bash -l bin/jobmanager.sh start cluster ...` is
>>> > exexuted, which does not open the job manager properly.
>>> >
>>> > I think there might be something wrong with the `-l` argument, since
>>> > when I use the `bin/jobmanager.sh start` command, everything is fine.
>>> > Kindly point out if I've done any configuration wrong. Thanks!
>>> >
>>> > Best,
>>> > Yesheng
>>> >
>>> >
>>>
>>>
>>
>

Re: bin/start-cluster.sh won't start jobmanager on master machine

Posted by Yesheng Ma <ki...@gmail.com>.
Related source code:
https://github.com/apache/flink/blob/master/flink-dist/src/main/flink-bin/bin/start-cluster.sh#L40

On Wed, Mar 7, 2018 at 2:11 AM, Yesheng Ma <ki...@gmail.com> wrote:

> Hi Nico,
>
> Thanks for your reply. My major concern is actually the `-l` argument.
> The command I executed is: `nohup /bin/bash -x -l
> "/state/partition1/ysma/flink-1.4.1/bin/jobmanager.sh" start cluster
> dell-01.epcc 8091`, with and without the `-l` argument (the script in
> Flink's bin directory uses the `-l` argument).
>
> 1) with the `-l` argument: the log is quite messy, but there are some
> clue, the last executed command starts a zsh shell:
> ```
> + . /home/ysma/.bashrc
> ++ case $- in
> ++ return
> + PATH=/home/ysma/bin:/home/ysma/.local/bin:/state/
> partition1/ysma/redis-4.0.8/../bin:/home/ysma/env/jdk1.8.0_
> 151/bin:/home/ysma/env/maven/bin:/home/ysma/bin:/home/ysma/
> .local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/
> bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin
> + '[' -f /bin/zsh ']'
> + exec /bin/zsh -l
> ```
> I guess the bash -l arguments detects the user's login shell and then logs
> in a zsh shell (which I'm currently using) and never back.
>
> 2) without the `-l` argument, everything just goes fine.
>
> Therefore I suspect there might be something wrong with the `-l` argument,
> or something wrong with my bash config?  Any ideas? Thanks!
>
>
> On Wed, Mar 7, 2018 at 12:20 AM, Nico Kruber <ni...@data-artisans.com>
> wrote:
>
>> Hi Yesheng,
>> `nohup /bin/bash -l bin/jobmanager.sh start cluster ...` looks a bit
>> strange since it should (imho) be an absolute path towards flink.
>>
>> What you could do to diagnose further, is to try to run the ssh command
>> manually, i.e. figure out what is being executed by calling
>> bash -x ./bin/start-cluster.sh
>> and then run the ssh command without "-n" and not in background "&".
>> Then you should also see the JobManager stdout to diagnose further.
>>
>> If that does not help yet, please log into the master manually and
>> execute the "nohup /bin/bash..." command there to see what is going on.
>>
>> Depending on where the failure was, there may even be logs on the master
>> machine.
>>
>>
>> Nico
>>
>> On 04/03/18 15:52, Yesheng Ma wrote:
>> > Hi all,
>> >
>> > ​​When I execute bin/start-cluster.sh on the master machine, actually
>> > the command `nohup /bin/bash -l bin/jobmanager.sh start cluster ...` is
>> > exexuted, which does not open the job manager properly.
>> >
>> > I think there might be something wrong with the `-l` argument, since
>> > when I use the `bin/jobmanager.sh start` command, everything is fine.
>> > Kindly point out if I've done any configuration wrong. Thanks!
>> >
>> > Best,
>> > Yesheng
>> >
>> >
>>
>>
>

Re: bin/start-cluster.sh won't start jobmanager on master machine

Posted by Yesheng Ma <ki...@gmail.com>.
Hi Nico,

Thanks for your reply. My major concern is actually the `-l` argument.
The command I executed is: `nohup /bin/bash -x -l
"/state/partition1/ysma/flink-1.4.1/bin/jobmanager.sh" start cluster
dell-01.epcc 8091`, with and without the `-l` argument (the script in
Flink's bin directory uses the `-l` argument).

1) with the `-l` argument: the log is quite messy, but there are some clue,
the last executed command starts a zsh shell:
```
+ . /home/ysma/.bashrc
++ case $- in
++ return
+
PATH=/home/ysma/bin:/home/ysma/.local/bin:/state/partition1/ysma/redis-4.0.8/../bin:/home/ysma/env/jdk1.8.0_151/bin:/home/ysma/env/maven/bin:/home/ysma/bin:/home/ysma/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin
+ '[' -f /bin/zsh ']'
+ exec /bin/zsh -l
```
I guess the bash -l arguments detects the user's login shell and then logs
in a zsh shell (which I'm currently using) and never back.

2) without the `-l` argument, everything just goes fine.

Therefore I suspect there might be something wrong with the `-l` argument,
or something wrong with my bash config?  Any ideas? Thanks!


On Wed, Mar 7, 2018 at 12:20 AM, Nico Kruber <ni...@data-artisans.com> wrote:

> Hi Yesheng,
> `nohup /bin/bash -l bin/jobmanager.sh start cluster ...` looks a bit
> strange since it should (imho) be an absolute path towards flink.
>
> What you could do to diagnose further, is to try to run the ssh command
> manually, i.e. figure out what is being executed by calling
> bash -x ./bin/start-cluster.sh
> and then run the ssh command without "-n" and not in background "&".
> Then you should also see the JobManager stdout to diagnose further.
>
> If that does not help yet, please log into the master manually and
> execute the "nohup /bin/bash..." command there to see what is going on.
>
> Depending on where the failure was, there may even be logs on the master
> machine.
>
>
> Nico
>
> On 04/03/18 15:52, Yesheng Ma wrote:
> > Hi all,
> >
> > ​​When I execute bin/start-cluster.sh on the master machine, actually
> > the command `nohup /bin/bash -l bin/jobmanager.sh start cluster ...` is
> > exexuted, which does not open the job manager properly.
> >
> > I think there might be something wrong with the `-l` argument, since
> > when I use the `bin/jobmanager.sh start` command, everything is fine.
> > Kindly point out if I've done any configuration wrong. Thanks!
> >
> > Best,
> > Yesheng
> >
> >
>
>

Re: bin/start-cluster.sh won't start jobmanager on master machine

Posted by Nico Kruber <ni...@data-artisans.com>.
Hi Yesheng,
`nohup /bin/bash -l bin/jobmanager.sh start cluster ...` looks a bit
strange since it should (imho) be an absolute path towards flink.

What you could do to diagnose further, is to try to run the ssh command
manually, i.e. figure out what is being executed by calling
bash -x ./bin/start-cluster.sh
and then run the ssh command without "-n" and not in background "&".
Then you should also see the JobManager stdout to diagnose further.

If that does not help yet, please log into the master manually and
execute the "nohup /bin/bash..." command there to see what is going on.

Depending on where the failure was, there may even be logs on the master
machine.


Nico

On 04/03/18 15:52, Yesheng Ma wrote:
> Hi all,
> 
> ​​When I execute bin/start-cluster.sh on the master machine, actually
> the command `nohup /bin/bash -l bin/jobmanager.sh start cluster ...` is
> exexuted, which does not open the job manager properly.
> 
> I think there might be something wrong with the `-l` argument, since
> when I use the `bin/jobmanager.sh start` command, everything is fine.
> Kindly point out if I've done any configuration wrong. Thanks!
> 
> Best,
> Yesheng
> 
>