You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Qihua Yang <ya...@gmail.com> on 2021/09/29 18:36:51 UTC

Start Flink cluster, k8s pod behavior

Hi,
I deployed flink in session mode. I didn't run any jobs. I saw below logs.
That is normal, same as Flink menual shows.

+ /opt/flink/bin/run-job-manager.sh
Starting HA cluster with 1 masters.
Starting standalonesession daemon on host job-manager-776dcf6dd-xzs8g.
Starting taskexecutor daemon on host job-manager-776dcf6dd-xzs8g.

But when I check kubectl, it shows status is Completed. After a while,
status changed to CrashLoopBackOff, and pod restart.
NAME                                                              READY
STATUS             RESTARTS   AGE
job-manager-776dcf6dd-xzs8g       0/1     Completed          5
 5m27s

NAME                                                              READY
STATUS             RESTARTS   AGE
job-manager-776dcf6dd-xzs8g       0/1     CrashLoopBackOff   5
 7m35s

Anyone can help me understand why?
Why do kubernetes regard this pod as completed and restart? Should I config
something? either Flink side or Kubernetes side? From the Flink manual,
after the cluster is started, I can upload a jar to run the application.

Thanks,
Qihua

Re: Start Flink cluster, k8s pod behavior

Posted by Yang Wang <da...@gmail.com>.
Did you use the "jobmanager.sh start-foreground" in your own
"run-job-manager.sh", just like what Flink has done
in the docker-entrypoint.sh[1]?

I strongly suggest to start the Flink session cluster with official
yamls[2].

[1].
https://github.com/apache/flink-docker/blob/master/1.13/scala_2.11-java11-debian/docker-entrypoint.sh#L114
[2].
https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/deployment/resource-providers/standalone/kubernetes/#starting-a-kubernetes-cluster-session-mode

Best,
Yang

Qihua Yang <ya...@gmail.com> 于2021年10月1日周五 上午2:59写道:

> Looks like after script *flink-daemon.sh *complete, it return exit 0.
> Kubernetes regard it as done. Is that expected?
>
> Thanks,
> Qihua
>
> On Thu, Sep 30, 2021 at 11:11 AM Qihua Yang <ya...@gmail.com> wrote:
>
>> Thank you for your reply.
>> From the log, exit code is 0, and reason is Completed.
>> Looks like the cluster is fine. But why kubenetes restart the pod. As you
>> said, from perspective of Kubernetes everything is done. Then how to
>> prevent the restart?
>> It didn't even give chance to upload and run a jar....
>>
>>     Ports:         8081/TCP, 6123/TCP, 6124/TCP, 6125/TCP
>>     Host Ports:    0/TCP, 0/TCP, 0/TCP, 0/TCP
>>     Command:
>>       /opt/flink/bin/entrypoint.sh
>>     Args:
>>       /opt/flink/bin/run-job-manager.sh
>>     State:          Waiting
>>       Reason:       CrashLoopBackOff
>>     Last State:     Terminated
>>       Reason:       Completed
>>       Exit Code:    0
>>       Started:      Wed, 29 Sep 2021 20:12:30 -0700
>>       Finished:     Wed, 29 Sep 2021 20:12:45 -0700
>>     Ready:          False
>>     Restart Count:  131
>>
>> Thanks,
>> Qihua
>>
>> On Thu, Sep 30, 2021 at 1:00 AM Chesnay Schepler <ch...@apache.org>
>> wrote:
>>
>>> Is the run-job-manager.sh script actually blocking?
>>> Since you (apparently) use that as an entrypoint, if that scripts exits
>>> after starting the JM then from the perspective of Kubernetes everything is
>>> done.
>>>
>>> On 30/09/2021 08:59, Matthias Pohl wrote:
>>>
>>> Hi Qihua,
>>> I guess, looking into kubectl describe and the JobManager logs would
>>> help in understanding what's going on.
>>>
>>> Best,
>>> Matthias
>>>
>>> On Wed, Sep 29, 2021 at 8:37 PM Qihua Yang <ya...@gmail.com> wrote:
>>>
>>>> Hi,
>>>> I deployed flink in session mode. I didn't run any jobs. I saw below
>>>> logs. That is normal, same as Flink menual shows.
>>>>
>>>> + /opt/flink/bin/run-job-manager.sh
>>>> Starting HA cluster with 1 masters.
>>>> Starting standalonesession daemon on host job-manager-776dcf6dd-xzs8g.
>>>> Starting taskexecutor daemon on host job-manager-776dcf6dd-xzs8g.
>>>>
>>>>
>>>> But when I check kubectl, it shows status is Completed. After a while,
>>>> status changed to CrashLoopBackOff, and pod restart.
>>>> NAME                                                              READY
>>>>   STATUS             RESTARTS   AGE
>>>> job-manager-776dcf6dd-xzs8g       0/1     Completed          5
>>>>  5m27s
>>>>
>>>> NAME                                                              READY
>>>>   STATUS             RESTARTS   AGE
>>>> job-manager-776dcf6dd-xzs8g       0/1     CrashLoopBackOff   5
>>>>  7m35s
>>>>
>>>> Anyone can help me understand why?
>>>> Why do kubernetes regard this pod as completed and restart? Should I
>>>> config something? either Flink side or Kubernetes side? From the Flink
>>>> manual, after the cluster is started, I can upload a jar to run the
>>>> application.
>>>>
>>>> Thanks,
>>>> Qihua
>>>>
>>>
>>>

Re: Start Flink cluster, k8s pod behavior

Posted by Qihua Yang <ya...@gmail.com>.
Looks like after script *flink-daemon.sh *complete, it return exit 0.
Kubernetes regard it as done. Is that expected?

Thanks,
Qihua

On Thu, Sep 30, 2021 at 11:11 AM Qihua Yang <ya...@gmail.com> wrote:

> Thank you for your reply.
> From the log, exit code is 0, and reason is Completed.
> Looks like the cluster is fine. But why kubenetes restart the pod. As you
> said, from perspective of Kubernetes everything is done. Then how to
> prevent the restart?
> It didn't even give chance to upload and run a jar....
>
>     Ports:         8081/TCP, 6123/TCP, 6124/TCP, 6125/TCP
>     Host Ports:    0/TCP, 0/TCP, 0/TCP, 0/TCP
>     Command:
>       /opt/flink/bin/entrypoint.sh
>     Args:
>       /opt/flink/bin/run-job-manager.sh
>     State:          Waiting
>       Reason:       CrashLoopBackOff
>     Last State:     Terminated
>       Reason:       Completed
>       Exit Code:    0
>       Started:      Wed, 29 Sep 2021 20:12:30 -0700
>       Finished:     Wed, 29 Sep 2021 20:12:45 -0700
>     Ready:          False
>     Restart Count:  131
>
> Thanks,
> Qihua
>
> On Thu, Sep 30, 2021 at 1:00 AM Chesnay Schepler <ch...@apache.org>
> wrote:
>
>> Is the run-job-manager.sh script actually blocking?
>> Since you (apparently) use that as an entrypoint, if that scripts exits
>> after starting the JM then from the perspective of Kubernetes everything is
>> done.
>>
>> On 30/09/2021 08:59, Matthias Pohl wrote:
>>
>> Hi Qihua,
>> I guess, looking into kubectl describe and the JobManager logs would help
>> in understanding what's going on.
>>
>> Best,
>> Matthias
>>
>> On Wed, Sep 29, 2021 at 8:37 PM Qihua Yang <ya...@gmail.com> wrote:
>>
>>> Hi,
>>> I deployed flink in session mode. I didn't run any jobs. I saw below
>>> logs. That is normal, same as Flink menual shows.
>>>
>>> + /opt/flink/bin/run-job-manager.sh
>>> Starting HA cluster with 1 masters.
>>> Starting standalonesession daemon on host job-manager-776dcf6dd-xzs8g.
>>> Starting taskexecutor daemon on host job-manager-776dcf6dd-xzs8g.
>>>
>>>
>>> But when I check kubectl, it shows status is Completed. After a while,
>>> status changed to CrashLoopBackOff, and pod restart.
>>> NAME                                                              READY
>>>   STATUS             RESTARTS   AGE
>>> job-manager-776dcf6dd-xzs8g       0/1     Completed          5
>>>  5m27s
>>>
>>> NAME                                                              READY
>>>   STATUS             RESTARTS   AGE
>>> job-manager-776dcf6dd-xzs8g       0/1     CrashLoopBackOff   5
>>>  7m35s
>>>
>>> Anyone can help me understand why?
>>> Why do kubernetes regard this pod as completed and restart? Should I
>>> config something? either Flink side or Kubernetes side? From the Flink
>>> manual, after the cluster is started, I can upload a jar to run the
>>> application.
>>>
>>> Thanks,
>>> Qihua
>>>
>>
>>

Re: Start Flink cluster, k8s pod behavior

Posted by Qihua Yang <ya...@gmail.com>.
Thank you for your reply.
From the log, exit code is 0, and reason is Completed.
Looks like the cluster is fine. But why kubenetes restart the pod. As you
said, from perspective of Kubernetes everything is done. Then how to
prevent the restart?
It didn't even give chance to upload and run a jar....

    Ports:         8081/TCP, 6123/TCP, 6124/TCP, 6125/TCP
    Host Ports:    0/TCP, 0/TCP, 0/TCP, 0/TCP
    Command:
      /opt/flink/bin/entrypoint.sh
    Args:
      /opt/flink/bin/run-job-manager.sh
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Wed, 29 Sep 2021 20:12:30 -0700
      Finished:     Wed, 29 Sep 2021 20:12:45 -0700
    Ready:          False
    Restart Count:  131

Thanks,
Qihua

On Thu, Sep 30, 2021 at 1:00 AM Chesnay Schepler <ch...@apache.org> wrote:

> Is the run-job-manager.sh script actually blocking?
> Since you (apparently) use that as an entrypoint, if that scripts exits
> after starting the JM then from the perspective of Kubernetes everything is
> done.
>
> On 30/09/2021 08:59, Matthias Pohl wrote:
>
> Hi Qihua,
> I guess, looking into kubectl describe and the JobManager logs would help
> in understanding what's going on.
>
> Best,
> Matthias
>
> On Wed, Sep 29, 2021 at 8:37 PM Qihua Yang <ya...@gmail.com> wrote:
>
>> Hi,
>> I deployed flink in session mode. I didn't run any jobs. I saw below
>> logs. That is normal, same as Flink menual shows.
>>
>> + /opt/flink/bin/run-job-manager.sh
>> Starting HA cluster with 1 masters.
>> Starting standalonesession daemon on host job-manager-776dcf6dd-xzs8g.
>> Starting taskexecutor daemon on host job-manager-776dcf6dd-xzs8g.
>>
>>
>> But when I check kubectl, it shows status is Completed. After a while,
>> status changed to CrashLoopBackOff, and pod restart.
>> NAME                                                              READY
>> STATUS             RESTARTS   AGE
>> job-manager-776dcf6dd-xzs8g       0/1     Completed          5
>>  5m27s
>>
>> NAME                                                              READY
>> STATUS             RESTARTS   AGE
>> job-manager-776dcf6dd-xzs8g       0/1     CrashLoopBackOff   5
>>  7m35s
>>
>> Anyone can help me understand why?
>> Why do kubernetes regard this pod as completed and restart? Should I
>> config something? either Flink side or Kubernetes side? From the Flink
>> manual, after the cluster is started, I can upload a jar to run the
>> application.
>>
>> Thanks,
>> Qihua
>>
>
>

Re: Start Flink cluster, k8s pod behavior

Posted by Chesnay Schepler <ch...@apache.org>.
Is the run-job-manager.sh script actually blocking?
Since you (apparently) use that as an entrypoint, if that scripts exits 
after starting the JM then from the perspective of Kubernetes everything 
is done.

On 30/09/2021 08:59, Matthias Pohl wrote:
> Hi Qihua,
> I guess, looking into kubectl describe and the JobManager logs would 
> help in understanding what's going on.
>
> Best,
> Matthias
>
> On Wed, Sep 29, 2021 at 8:37 PM Qihua Yang <yangqqh@gmail.com 
> <ma...@gmail.com>> wrote:
>
>     Hi,
>     I deployed flink in session mode. I didn't run any jobs. I saw
>     below logs. That is normal, same as Flink menual shows.
>
>     + /opt/flink/bin/run-job-manager.sh
>     Starting HA cluster with 1 masters.
>     Starting standalonesession daemon on host job-manager-776dcf6dd-xzs8g.
>     Starting taskexecutor daemon on host job-manager-776dcf6dd-xzs8g.
>
>     But when I check kubectl, it shows status is Completed. After a
>     while, status changed to CrashLoopBackOff, and pod restart.
>     NAME              READY   STATUS             RESTARTS   AGE
>     job-manager-776dcf6dd-xzs8g       0/1     Completed      5        
>      5m27s
>
>     NAME              READY   STATUS             RESTARTS   AGE
>     job-manager-776dcf6dd-xzs8g       0/1 CrashLoopBackOff   5        
>      7m35s
>
>     Anyone can help me understand why?
>     Why do kubernetes regard this pod as completed and restart? Should
>     I config something? either Flink side or Kubernetes side? From the
>     Flink manual, after the cluster is started, I can upload a jar to
>     run the application.
>
>     Thanks,
>     Qihua
>


Re: Start Flink cluster, k8s pod behavior

Posted by Qihua Yang <ya...@gmail.com>.
I did check the kubectl describe, it shows below info. Reason is Completed.

    Ports:         8081/TCP, 6123/TCP, 6124/TCP, 6125/TCP
    Host Ports:    0/TCP, 0/TCP, 0/TCP, 0/TCP
    Command:
      /opt/flink/bin/entrypoint.sh
    Args:
      /opt/flink/bin/run-job-manager.sh
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Wed, 29 Sep 2021 20:12:30 -0700
      Finished:     Wed, 29 Sep 2021 20:12:45 -0700
    Ready:          False
    Restart Count:  131


On Wed, Sep 29, 2021 at 11:59 PM Matthias Pohl <ma...@ververica.com>
wrote:

> Hi Qihua,
> I guess, looking into kubectl describe and the JobManager logs would help
> in understanding what's going on.
>
> Best,
> Matthias
>
> On Wed, Sep 29, 2021 at 8:37 PM Qihua Yang <ya...@gmail.com> wrote:
>
>> Hi,
>> I deployed flink in session mode. I didn't run any jobs. I saw below
>> logs. That is normal, same as Flink menual shows.
>>
>> + /opt/flink/bin/run-job-manager.sh
>> Starting HA cluster with 1 masters.
>> Starting standalonesession daemon on host job-manager-776dcf6dd-xzs8g.
>> Starting taskexecutor daemon on host job-manager-776dcf6dd-xzs8g.
>>
>> But when I check kubectl, it shows status is Completed. After a while,
>> status changed to CrashLoopBackOff, and pod restart.
>> NAME                                                              READY
>> STATUS             RESTARTS   AGE
>> job-manager-776dcf6dd-xzs8g       0/1     Completed          5
>>  5m27s
>>
>> NAME                                                              READY
>> STATUS             RESTARTS   AGE
>> job-manager-776dcf6dd-xzs8g       0/1     CrashLoopBackOff   5
>>  7m35s
>>
>> Anyone can help me understand why?
>> Why do kubernetes regard this pod as completed and restart? Should I
>> config something? either Flink side or Kubernetes side? From the Flink
>> manual, after the cluster is started, I can upload a jar to run the
>> application.
>>
>> Thanks,
>> Qihua
>>
>

Re: Start Flink cluster, k8s pod behavior

Posted by Matthias Pohl <ma...@ververica.com>.
Hi Qihua,
I guess, looking into kubectl describe and the JobManager logs would help
in understanding what's going on.

Best,
Matthias

On Wed, Sep 29, 2021 at 8:37 PM Qihua Yang <ya...@gmail.com> wrote:

> Hi,
> I deployed flink in session mode. I didn't run any jobs. I saw below logs.
> That is normal, same as Flink menual shows.
>
> + /opt/flink/bin/run-job-manager.sh
> Starting HA cluster with 1 masters.
> Starting standalonesession daemon on host job-manager-776dcf6dd-xzs8g.
> Starting taskexecutor daemon on host job-manager-776dcf6dd-xzs8g.
>
> But when I check kubectl, it shows status is Completed. After a while,
> status changed to CrashLoopBackOff, and pod restart.
> NAME                                                              READY
> STATUS             RESTARTS   AGE
> job-manager-776dcf6dd-xzs8g       0/1     Completed          5
>  5m27s
>
> NAME                                                              READY
> STATUS             RESTARTS   AGE
> job-manager-776dcf6dd-xzs8g       0/1     CrashLoopBackOff   5
>  7m35s
>
> Anyone can help me understand why?
> Why do kubernetes regard this pod as completed and restart? Should I
> config something? either Flink side or Kubernetes side? From the Flink
> manual, after the cluster is started, I can upload a jar to run the
> application.
>
> Thanks,
> Qihua
>