You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@flink.apache.org by M Singh <ma...@yahoo.com> on 2020/02/22 18:28:20 UTC

Flink on Kubernetes - Session vs Job cluster mode and storage

Hey Folks:
I am trying to figure out the options for running Flink on Kubernetes and am trying to find out the pros and cons of running in Flink Session vs Flink Cluster mode (https://ci.apache.org/projects/flink/flink-docs-stable/ops/deployment/kubernetes.html#flink-session-cluster-on-kubernetes).
I understand that in job mode there is no need to submit the job since it is part of the job image.  But what are other the pros and cons of this approach vs session mode where a job manager is deployed and flink jobs can be submitted it ?  Are there any benefits with regards to:
1. Configuring the jobs 2. Scaling the taskmanager3. Restarting jobs4. Managing the flink jobs5. Passing credentials (in case of AWS, etc)6. Fault tolerence and recovery of jobs from failure
Also, we will be keeping the checkpoints for the jobs on S3.  Is there any need for specifying volume for the pods ?  If volume is required do we need provisioned volume and what are the recommended alternatives/considerations especially with AWS.
If there are any other considerations, please let me know.
Thanks for your advice.

Re: Flink on Kubernetes - Session vs Job cluster mode and storage

Posted by Hao Sun <ha...@zendesk.com>.

Sounds good. Thank you!

Hao Sun


On Thu, Feb 27, 2020 at 6:52 PM Yang Wang <da...@gmail.com> wrote:

> Hi Hao Sun,
>
> I just post the explanation to the user ML so that others could also have
> the same problem.
>
> Gven the job graph is fetched from the jar, do we still need Zookeeper for
>> HA? Maybe we still need it for checkpoint locations?
>
>
> Yes, we still need the zookeeper(maybe in the future we will have a native
> K8s HA based on etcd) for the complete recovery. You
> are right. We still need it for finding the checkpoint locations. Also the
> Zookeeper will be used for leader election and leader retriever.
>
>
> Best,
> Yang
>
> Hao Sun <ha...@zendesk.com> 于2020年2月28日周五 上午1:41写道：
>
>> Hi Yang, given the job graph is fetched from the jar, do we still need
>> Zookeeper for HA? Maybe we still need it for checkpoint locations?
>>
>> Hao Sun
>>
>>
>> On Thu, Feb 27, 2020 at 5:13 AM Yang Wang <da...@gmail.com> wrote:
>>
>>> Hi Jin Yi,
>>>
>>> For standalone per-job cluster, it is a little different about the
>>> recovery.
>>> Just as you say, the user jar has built in the image, when the
>>> JobManager failed
>>> and relaunched by the K8s, the user `main()` will be executed again to
>>> get the
>>> job graph, not like session cluster to get the job graph from
>>> high-availability storage.
>>> Then the job will be submitted again and the job could recover from the
>>> latest
>>> checkpoint(assume that you have configured the high-availability).
>>>
>>>
>>> Best,
>>> Yang
>>>
>>> Jin Yi <el...@gmail.com> 于2020年2月27日周四 下午2:50写道：
>>>
>>>> Hi Yang,
>>>>
>>>> regarding your statement below:
>>>>
>>>> Since you are starting JM/TM with K8s deployment, when they failed new
>>>> JM/TM will be created. If you do not set the high
>>>> availability configuration, your jobs could recover when TM failed.
>>>> However, they could not recover when JM failed. With HA
>>>> configured, the jobs could always be recovered and you do not need to
>>>> re-submit again.
>>>>
>>>> Does it also apply to Flink Job Cluster? When the JM pod restarted by
>>>> Kubernetes, the image contains the application jar also, so if the
>>>> statement also applies to the Flink Job Cluster mode, can you please
>>>> elaborate why?
>>>>
>>>> Thanks a lot!
>>>> Eleanore
>>>>
>>>> On Mon, Feb 24, 2020 at 6:36 PM Yang Wang <da...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi M Singh,
>>>>>
>>>>> > Mans - If we use the session based deployment option for K8 - I
>>>>>> thought K8 will automatically restarts any failed TM or JM.
>>>>>> In the case of failed TM - the job will probably recover, but in the
>>>>>> case of failed JM - perhaps we need to resubmit all jobs.
>>>>>> Let me know if I have misunderstood anything.
>>>>>
>>>>>
>>>>> Since you are starting JM/TM with K8s deployment, when they failed new
>>>>> JM/TM will be created. If you do not set the high
>>>>> availability configuration, your jobs could recover when TM failed.
>>>>> However, they could not recover when JM failed. With HA
>>>>> configured, the jobs could always be recovered and you do not need to
>>>>> re-submit again.
>>>>>
>>>>> > Mans - Is there any safe way of a passing creds ?
>>>>>
>>>>>
>>>>> Yes, you are right, Using configmap to pass the credentials is not
>>>>> safe. On K8s, i think you could use secrets instead[1].
>>>>>
>>>>> > Mans - Does a task manager failure cause the job to fail ?  My
>>>>>> understanding is the JM failure are catastrophic while TM failures are
>>>>>> recoverable.
>>>>>
>>>>>
>>>>> What i mean is the job failed, and it could be restarted by your
>>>>> configured restart strategy[2].
>>>>>
>>>>> > Mans - So if we are saving checkpoint in S3 then there is no need
>>>>>> for disks - should we use emptyDir ?
>>>>>
>>>>>
>>>>> Yes, if you are saving the checkpoint in S3 and also set the
>>>>> `high-availability.storageDir` to S3. Then you do not need persistent
>>>>> volume. Since
>>>>> the local directory is only used for local cache, so you could
>>>>> directly use the overlay filesystem or empryDir(better io performance).
>>>>>
>>>>>
>>>>> [1].
>>>>> https://kubernetes.io/docs/tasks/inject-data-application/distribute-credentials-secure/
>>>>> <https://kubernetes.io/docs/tasks/inject-data-application/distribute-credentials-secure>
>>>>> [2].
>>>>> https://ci.apache.org/projects/flink/flink-docs-master/ops/config.html#fault-tolerance
>>>>> <https://ci.apache.org/projects/flink/flink-docs-master/ops/config.html#fault-tolerance>
>>>>>
>>>>> M Singh <ma...@yahoo.com> 于2020年2月25日周二 上午5:53写道：
>>>>>
>>>>>> Thanks Wang for your detailed answers.
>>>>>>
>>>>>> From what I understand the native_kubernetes also leans towards
>>>>>> creating a session and submitting a job to it.
>>>>>>
>>>>>> Regarding other recommendations, please my inline comments and advice.
>>>>>>
>>>>>> On Sunday, February 23, 2020, 10:01:10 PM EST, Yang Wang <
>>>>>> danrtsey.wy@gmail.com> wrote:
>>>>>>
>>>>>>
>>>>>> Hi Singh,
>>>>>>
>>>>>> Glad to hear that you are looking to run Flink on the Kubernetes. I am
>>>>>> trying to answer your question based on my limited knowledge and
>>>>>> others could correct me and add some more supplements.
>>>>>>
>>>>>> I think the biggest difference between session cluster and per-job
>>>>>> cluster
>>>>>> on Kubernetesis the isolation. Since for per-job, a dedicated Flink
>>>>>> cluster
>>>>>> will be started for the only one job and no any other jobs could be
>>>>>> submitted.
>>>>>> Once the job is finished, then the Flink cluster will be
>>>>>> destroyed immediately.
>>>>>> The second point is one-step submission. You do not need to start a
>>>>>> Flink
>>>>>> cluster first and then submit a job to the existing session.
>>>>>>
>>>>>> > Are there any benefits with regards to
>>>>>> 1. Configuring the jobs
>>>>>> No matter you are using the per-job cluster or submitting to the
>>>>>> existing
>>>>>> session cluster, they share the configuration mechanism. You do not
>>>>>> have
>>>>>> to change any codes and configurations.
>>>>>>
>>>>>> 2. Scaling the taskmanager
>>>>>> Since you are using the Standalone cluster on Kubernetes, it do not
>>>>>> provide
>>>>>> an active resourcemanager. You need to use external tools to monitor
>>>>>> and
>>>>>> scale up the taskmanagers. The active integration is still evolving
>>>>>> and you
>>>>>> could have a taste[1].
>>>>>>
>>>>>> Mans - If we use the session based deployment option for K8 - I
>>>>>> thought K8 will automatically restarts any failed TM or JM.
>>>>>> In the case of failed TM - the job will probably recover, but in the
>>>>>> case of failed JM - perhaps we need to resubmit all jobs.
>>>>>> Let me know if I have misunderstood anything.
>>>>>>
>>>>>> 3. Restarting jobs
>>>>>> For the session cluster, you could directly cancel the job and
>>>>>> re-submit. And
>>>>>> for per-job cluster, when the job is canceled, you need to start a
>>>>>> new per-job
>>>>>> cluster from the latest savepoint.
>>>>>>
>>>>>> 4. Managing the flink jobs
>>>>>> The rest api and flink command line could be used to managing the
>>>>>> jobs(e.g.
>>>>>> flink cancel, etc.). I think there is no difference for session and
>>>>>> per-job here.
>>>>>>
>>>>>> 5. Passing credentials (in case of AWS, etc)
>>>>>> I am not sure how do you provide your credentials. If you put them
>>>>>> in the
>>>>>> config map and then mount into the jobmanager/taskmanager pod, then
>>>>>> both
>>>>>> session and per-job could support this way.
>>>>>>
>>>>>> Mans - Is there any safe way of a passing creds ?
>>>>>>
>>>>>> 6. Fault tolerence and recovery of jobs from failure
>>>>>> For session cluster, if one taskmanager crashed, then all the jobs
>>>>>> which have tasks
>>>>>> on this taskmanager will failed.
>>>>>> Both session and per-job could be configured with high availability
>>>>>> and recover
>>>>>> from the latest checkpoint.
>>>>>>
>>>>>> Mans - Does a task manager failure cause the job to fail ?  My
>>>>>> understanding is the JM failure are catastrophic while TM failures are
>>>>>> recoverable.
>>>>>>
>>>>>> > Is there any need for specifying volume for the pods?
>>>>>> No, you do not need to specify a volume for pod. All the data in the
>>>>>> pod
>>>>>> local directory is temporary. When a pod crashed and relaunched, the
>>>>>> taskmanager will retrieve the checkpoint from zookeeper + S3 and
>>>>>> resume
>>>>>> from the latest checkpoint.
>>>>>>
>>>>>> Mans - So if we are saving checkpoint in S3 then there is no need for
>>>>>> disks - should we use emptyDir ?
>>>>>>
>>>>>>
>>>>>> [1].
>>>>>> https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/native_kubernetes.html
>>>>>> <https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/native_kubernetes.html>
>>>>>>
>>>>>> M Singh <ma...@yahoo.com> 于2020年2月23日周日 上午2:28写道：
>>>>>>
>>>>>> Hey Folks:
>>>>>>
>>>>>> I am trying to figure out the options for running Flink on Kubernetes
>>>>>> and am trying to find out the pros and cons of running in Flink Session vs
>>>>>> Flink Cluster mode (
>>>>>> https://ci.apache.org/projects/flink/flink-docs-stable/ops/deployment/kubernetes.html#flink-session-cluster-on-kubernetes
>>>>>> <https://ci.apache.org/projects/flink/flink-docs-stable/ops/deployment/kubernetes.html#flink-session-cluster-on-kubernetes>).
>>>>>>
>>>>>> I understand that in job mode there is no need to submit the job
>>>>>> since it is part of the job image.  But what are other the pros and cons of
>>>>>> this approach vs session mode where a job manager is deployed and flink
>>>>>> jobs can be submitted it ?  Are there any benefits with regards to:
>>>>>>
>>>>>> 1. Configuring the jobs
>>>>>> 2. Scaling the taskmanager
>>>>>> 3. Restarting jobs
>>>>>> 4. Managing the flink jobs
>>>>>> 5. Passing credentials (in case of AWS, etc)
>>>>>> 6. Fault tolerence and recovery of jobs from failure
>>>>>>
>>>>>> Also, we will be keeping the checkpoints for the jobs on S3.  Is
>>>>>> there any need for specifying volume for the pods ?  If volume is required
>>>>>> do we need provisioned volume and what are the recommended
>>>>>> alternatives/considerations especially with AWS.
>>>>>>
>>>>>> If there are any other considerations, please let me know.
>>>>>>
>>>>>> Thanks for your advice.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>

Re: Flink on Kubernetes - Session vs Job cluster mode and storage

Posted by Yang Wang <da...@gmail.com>.

Hi Hao Sun,

I just post the explanation to the user ML so that others could also have
the same problem.

Gven the job graph is fetched from the jar, do we still need Zookeeper for
> HA? Maybe we still need it for checkpoint locations?


Yes, we still need the zookeeper(maybe in the future we will have a native
K8s HA based on etcd) for the complete recovery. You
are right. We still need it for finding the checkpoint locations. Also the
Zookeeper will be used for leader election and leader retriever.


Best,
Yang

Hao Sun <ha...@zendesk.com> 于2020年2月28日周五 上午1:41写道：

> Hi Yang, given the job graph is fetched from the jar, do we still need
> Zookeeper for HA? Maybe we still need it for checkpoint locations?
>
> Hao Sun
>
>
> On Thu, Feb 27, 2020 at 5:13 AM Yang Wang <da...@gmail.com> wrote:
>
>> Hi Jin Yi,
>>
>> For standalone per-job cluster, it is a little different about the
>> recovery.
>> Just as you say, the user jar has built in the image, when the JobManager
>> failed
>> and relaunched by the K8s, the user `main()` will be executed again to
>> get the
>> job graph, not like session cluster to get the job graph from
>> high-availability storage.
>> Then the job will be submitted again and the job could recover from the
>> latest
>> checkpoint(assume that you have configured the high-availability).
>>
>>
>> Best,
>> Yang
>>
>> Jin Yi <el...@gmail.com> 于2020年2月27日周四 下午2:50写道：
>>
>>> Hi Yang,
>>>
>>> regarding your statement below:
>>>
>>> Since you are starting JM/TM with K8s deployment, when they failed new
>>> JM/TM will be created. If you do not set the high
>>> availability configuration, your jobs could recover when TM failed.
>>> However, they could not recover when JM failed. With HA
>>> configured, the jobs could always be recovered and you do not need to
>>> re-submit again.
>>>
>>> Does it also apply to Flink Job Cluster? When the JM pod restarted by
>>> Kubernetes, the image contains the application jar also, so if the
>>> statement also applies to the Flink Job Cluster mode, can you please
>>> elaborate why?
>>>
>>> Thanks a lot!
>>> Eleanore
>>>
>>> On Mon, Feb 24, 2020 at 6:36 PM Yang Wang <da...@gmail.com> wrote:
>>>
>>>> Hi M Singh,
>>>>
>>>> > Mans - If we use the session based deployment option for K8 - I
>>>>> thought K8 will automatically restarts any failed TM or JM.
>>>>> In the case of failed TM - the job will probably recover, but in the
>>>>> case of failed JM - perhaps we need to resubmit all jobs.
>>>>> Let me know if I have misunderstood anything.
>>>>
>>>>
>>>> Since you are starting JM/TM with K8s deployment, when they failed new
>>>> JM/TM will be created. If you do not set the high
>>>> availability configuration, your jobs could recover when TM failed.
>>>> However, they could not recover when JM failed. With HA
>>>> configured, the jobs could always be recovered and you do not need to
>>>> re-submit again.
>>>>
>>>> > Mans - Is there any safe way of a passing creds ?
>>>>
>>>>
>>>> Yes, you are right, Using configmap to pass the credentials is not
>>>> safe. On K8s, i think you could use secrets instead[1].
>>>>
>>>> > Mans - Does a task manager failure cause the job to fail ?  My
>>>>> understanding is the JM failure are catastrophic while TM failures are
>>>>> recoverable.
>>>>
>>>>
>>>> What i mean is the job failed, and it could be restarted by your
>>>> configured restart strategy[2].
>>>>
>>>> > Mans - So if we are saving checkpoint in S3 then there is no need
>>>>> for disks - should we use emptyDir ?
>>>>
>>>>
>>>> Yes, if you are saving the checkpoint in S3 and also set the
>>>> `high-availability.storageDir` to S3. Then you do not need persistent
>>>> volume. Since
>>>> the local directory is only used for local cache, so you could directly
>>>> use the overlay filesystem or empryDir(better io performance).
>>>>
>>>>
>>>> [1].
>>>> https://kubernetes.io/docs/tasks/inject-data-application/distribute-credentials-secure/
>>>> [2].
>>>> https://ci.apache.org/projects/flink/flink-docs-master/ops/config.html#fault-tolerance
>>>>
>>>> M Singh <ma...@yahoo.com> 于2020年2月25日周二 上午5:53写道：
>>>>
>>>>> Thanks Wang for your detailed answers.
>>>>>
>>>>> From what I understand the native_kubernetes also leans towards
>>>>> creating a session and submitting a job to it.
>>>>>
>>>>> Regarding other recommendations, please my inline comments and advice.
>>>>>
>>>>> On Sunday, February 23, 2020, 10:01:10 PM EST, Yang Wang <
>>>>> danrtsey.wy@gmail.com> wrote:
>>>>>
>>>>>
>>>>> Hi Singh,
>>>>>
>>>>> Glad to hear that you are looking to run Flink on the Kubernetes. I am
>>>>> trying to answer your question based on my limited knowledge and
>>>>> others could correct me and add some more supplements.
>>>>>
>>>>> I think the biggest difference between session cluster and per-job
>>>>> cluster
>>>>> on Kubernetesis the isolation. Since for per-job, a dedicated Flink
>>>>> cluster
>>>>> will be started for the only one job and no any other jobs could be
>>>>> submitted.
>>>>> Once the job is finished, then the Flink cluster will be
>>>>> destroyed immediately.
>>>>> The second point is one-step submission. You do not need to start a
>>>>> Flink
>>>>> cluster first and then submit a job to the existing session.
>>>>>
>>>>> > Are there any benefits with regards to
>>>>> 1. Configuring the jobs
>>>>> No matter you are using the per-job cluster or submitting to the
>>>>> existing
>>>>> session cluster, they share the configuration mechanism. You do not
>>>>> have
>>>>> to change any codes and configurations.
>>>>>
>>>>> 2. Scaling the taskmanager
>>>>> Since you are using the Standalone cluster on Kubernetes, it do not
>>>>> provide
>>>>> an active resourcemanager. You need to use external tools to monitor
>>>>> and
>>>>> scale up the taskmanagers. The active integration is still evolving
>>>>> and you
>>>>> could have a taste[1].
>>>>>
>>>>> Mans - If we use the session based deployment option for K8 - I
>>>>> thought K8 will automatically restarts any failed TM or JM.
>>>>> In the case of failed TM - the job will probably recover, but in the
>>>>> case of failed JM - perhaps we need to resubmit all jobs.
>>>>> Let me know if I have misunderstood anything.
>>>>>
>>>>> 3. Restarting jobs
>>>>> For the session cluster, you could directly cancel the job and
>>>>> re-submit. And
>>>>> for per-job cluster, when the job is canceled, you need to start a new
>>>>> per-job
>>>>> cluster from the latest savepoint.
>>>>>
>>>>> 4. Managing the flink jobs
>>>>> The rest api and flink command line could be used to managing the
>>>>> jobs(e.g.
>>>>> flink cancel, etc.). I think there is no difference for session and
>>>>> per-job here.
>>>>>
>>>>> 5. Passing credentials (in case of AWS, etc)
>>>>> I am not sure how do you provide your credentials. If you put them in
>>>>> the
>>>>> config map and then mount into the jobmanager/taskmanager pod, then
>>>>> both
>>>>> session and per-job could support this way.
>>>>>
>>>>> Mans - Is there any safe way of a passing creds ?
>>>>>
>>>>> 6. Fault tolerence and recovery of jobs from failure
>>>>> For session cluster, if one taskmanager crashed, then all the jobs
>>>>> which have tasks
>>>>> on this taskmanager will failed.
>>>>> Both session and per-job could be configured with high availability
>>>>> and recover
>>>>> from the latest checkpoint.
>>>>>
>>>>> Mans - Does a task manager failure cause the job to fail ?  My
>>>>> understanding is the JM failure are catastrophic while TM failures are
>>>>> recoverable.
>>>>>
>>>>> > Is there any need for specifying volume for the pods?
>>>>> No, you do not need to specify a volume for pod. All the data in the
>>>>> pod
>>>>> local directory is temporary. When a pod crashed and relaunched, the
>>>>> taskmanager will retrieve the checkpoint from zookeeper + S3 and
>>>>> resume
>>>>> from the latest checkpoint.
>>>>>
>>>>> Mans - So if we are saving checkpoint in S3 then there is no need for
>>>>> disks - should we use emptyDir ?
>>>>>
>>>>>
>>>>> [1].
>>>>> https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/native_kubernetes.html
>>>>>
>>>>> M Singh <ma...@yahoo.com> 于2020年2月23日周日 上午2:28写道：
>>>>>
>>>>> Hey Folks:
>>>>>
>>>>> I am trying to figure out the options for running Flink on Kubernetes
>>>>> and am trying to find out the pros and cons of running in Flink Session vs
>>>>> Flink Cluster mode (
>>>>> https://ci.apache.org/projects/flink/flink-docs-stable/ops/deployment/kubernetes.html#flink-session-cluster-on-kubernetes
>>>>> ).
>>>>>
>>>>> I understand that in job mode there is no need to submit the job since
>>>>> it is part of the job image.  But what are other the pros and cons of this
>>>>> approach vs session mode where a job manager is deployed and flink jobs can
>>>>> be submitted it ?  Are there any benefits with regards to:
>>>>>
>>>>> 1. Configuring the jobs
>>>>> 2. Scaling the taskmanager
>>>>> 3. Restarting jobs
>>>>> 4. Managing the flink jobs
>>>>> 5. Passing credentials (in case of AWS, etc)
>>>>> 6. Fault tolerence and recovery of jobs from failure
>>>>>
>>>>> Also, we will be keeping the checkpoints for the jobs on S3.  Is there
>>>>> any need for specifying volume for the pods ?  If volume is required do we
>>>>> need provisioned volume and what are the recommended
>>>>> alternatives/considerations especially with AWS.
>>>>>
>>>>> If there are any other considerations, please let me know.
>>>>>
>>>>> Thanks for your advice.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>

Re: Flink on Kubernetes - Session vs Job cluster mode and storage

Posted by Yang Wang <da...@gmail.com>.

Hi Jin Yi,

For standalone per-job cluster, it is a little different about the
recovery.
Just as you say, the user jar has built in the image, when the JobManager
failed
and relaunched by the K8s, the user `main()` will be executed again to get
the
job graph, not like session cluster to get the job graph from
high-availability storage.
Then the job will be submitted again and the job could recover from the
latest
checkpoint(assume that you have configured the high-availability).


Best,
Yang

Jin Yi <el...@gmail.com> 于2020年2月27日周四 下午2:50写道：

> Hi Yang,
>
> regarding your statement below:
>
> Since you are starting JM/TM with K8s deployment, when they failed new
> JM/TM will be created. If you do not set the high
> availability configuration, your jobs could recover when TM failed.
> However, they could not recover when JM failed. With HA
> configured, the jobs could always be recovered and you do not need to
> re-submit again.
>
> Does it also apply to Flink Job Cluster? When the JM pod restarted by
> Kubernetes, the image contains the application jar also, so if the
> statement also applies to the Flink Job Cluster mode, can you please
> elaborate why?
>
> Thanks a lot!
> Eleanore
>
> On Mon, Feb 24, 2020 at 6:36 PM Yang Wang <da...@gmail.com> wrote:
>
>> Hi M Singh,
>>
>> > Mans - If we use the session based deployment option for K8 - I
>>> thought K8 will automatically restarts any failed TM or JM.
>>> In the case of failed TM - the job will probably recover, but in the
>>> case of failed JM - perhaps we need to resubmit all jobs.
>>> Let me know if I have misunderstood anything.
>>
>>
>> Since you are starting JM/TM with K8s deployment, when they failed new
>> JM/TM will be created. If you do not set the high
>> availability configuration, your jobs could recover when TM failed.
>> However, they could not recover when JM failed. With HA
>> configured, the jobs could always be recovered and you do not need to
>> re-submit again.
>>
>> > Mans - Is there any safe way of a passing creds ?
>>
>>
>> Yes, you are right, Using configmap to pass the credentials is not safe.
>> On K8s, i think you could use secrets instead[1].
>>
>> > Mans - Does a task manager failure cause the job to fail ?  My
>>> understanding is the JM failure are catastrophic while TM failures are
>>> recoverable.
>>
>>
>> What i mean is the job failed, and it could be restarted by your
>> configured restart strategy[2].
>>
>> > Mans - So if we are saving checkpoint in S3 then there is no need for
>>> disks - should we use emptyDir ?
>>
>>
>> Yes, if you are saving the checkpoint in S3 and also set the
>> `high-availability.storageDir` to S3. Then you do not need persistent
>> volume. Since
>> the local directory is only used for local cache, so you could directly
>> use the overlay filesystem or empryDir(better io performance).
>>
>>
>> [1].
>> https://kubernetes.io/docs/tasks/inject-data-application/distribute-credentials-secure/
>> [2].
>> https://ci.apache.org/projects/flink/flink-docs-master/ops/config.html#fault-tolerance
>>
>> M Singh <ma...@yahoo.com> 于2020年2月25日周二 上午5:53写道：
>>
>>> Thanks Wang for your detailed answers.
>>>
>>> From what I understand the native_kubernetes also leans towards creating
>>> a session and submitting a job to it.
>>>
>>> Regarding other recommendations, please my inline comments and advice.
>>>
>>> On Sunday, February 23, 2020, 10:01:10 PM EST, Yang Wang <
>>> danrtsey.wy@gmail.com> wrote:
>>>
>>>
>>> Hi Singh,
>>>
>>> Glad to hear that you are looking to run Flink on the Kubernetes. I am
>>> trying to answer your question based on my limited knowledge and
>>> others could correct me and add some more supplements.
>>>
>>> I think the biggest difference between session cluster and per-job
>>> cluster
>>> on Kubernetesis the isolation. Since for per-job, a dedicated Flink
>>> cluster
>>> will be started for the only one job and no any other jobs could be
>>> submitted.
>>> Once the job is finished, then the Flink cluster will be
>>> destroyed immediately.
>>> The second point is one-step submission. You do not need to start a Flink
>>> cluster first and then submit a job to the existing session.
>>>
>>> > Are there any benefits with regards to
>>> 1. Configuring the jobs
>>> No matter you are using the per-job cluster or submitting to the existing
>>> session cluster, they share the configuration mechanism. You do not have
>>> to change any codes and configurations.
>>>
>>> 2. Scaling the taskmanager
>>> Since you are using the Standalone cluster on Kubernetes, it do not
>>> provide
>>> an active resourcemanager. You need to use external tools to monitor and
>>> scale up the taskmanagers. The active integration is still evolving and
>>> you
>>> could have a taste[1].
>>>
>>> Mans - If we use the session based deployment option for K8 - I thought
>>> K8 will automatically restarts any failed TM or JM.
>>> In the case of failed TM - the job will probably recover, but in the
>>> case of failed JM - perhaps we need to resubmit all jobs.
>>> Let me know if I have misunderstood anything.
>>>
>>> 3. Restarting jobs
>>> For the session cluster, you could directly cancel the job and
>>> re-submit. And
>>> for per-job cluster, when the job is canceled, you need to start a new
>>> per-job
>>> cluster from the latest savepoint.
>>>
>>> 4. Managing the flink jobs
>>> The rest api and flink command line could be used to managing the
>>> jobs(e.g.
>>> flink cancel, etc.). I think there is no difference for session and
>>> per-job here.
>>>
>>> 5. Passing credentials (in case of AWS, etc)
>>> I am not sure how do you provide your credentials. If you put them in
>>> the
>>> config map and then mount into the jobmanager/taskmanager pod, then both
>>> session and per-job could support this way.
>>>
>>> Mans - Is there any safe way of a passing creds ?
>>>
>>> 6. Fault tolerence and recovery of jobs from failure
>>> For session cluster, if one taskmanager crashed, then all the jobs which
>>> have tasks
>>> on this taskmanager will failed.
>>> Both session and per-job could be configured with high availability and
>>> recover
>>> from the latest checkpoint.
>>>
>>> Mans - Does a task manager failure cause the job to fail ?  My
>>> understanding is the JM failure are catastrophic while TM failures are
>>> recoverable.
>>>
>>> > Is there any need for specifying volume for the pods?
>>> No, you do not need to specify a volume for pod. All the data in the pod
>>> local directory is temporary. When a pod crashed and relaunched, the
>>> taskmanager will retrieve the checkpoint from zookeeper + S3 and resume
>>> from the latest checkpoint.
>>>
>>> Mans - So if we are saving checkpoint in S3 then there is no need for
>>> disks - should we use emptyDir ?
>>>
>>>
>>> [1].
>>> https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/native_kubernetes.html
>>>
>>> M Singh <ma...@yahoo.com> 于2020年2月23日周日 上午2:28写道：
>>>
>>> Hey Folks:
>>>
>>> I am trying to figure out the options for running Flink on Kubernetes
>>> and am trying to find out the pros and cons of running in Flink Session vs
>>> Flink Cluster mode (
>>> https://ci.apache.org/projects/flink/flink-docs-stable/ops/deployment/kubernetes.html#flink-session-cluster-on-kubernetes
>>> ).
>>>
>>> I understand that in job mode there is no need to submit the job since
>>> it is part of the job image.  But what are other the pros and cons of this
>>> approach vs session mode where a job manager is deployed and flink jobs can
>>> be submitted it ?  Are there any benefits with regards to:
>>>
>>> 1. Configuring the jobs
>>> 2. Scaling the taskmanager
>>> 3. Restarting jobs
>>> 4. Managing the flink jobs
>>> 5. Passing credentials (in case of AWS, etc)
>>> 6. Fault tolerence and recovery of jobs from failure
>>>
>>> Also, we will be keeping the checkpoints for the jobs on S3.  Is there
>>> any need for specifying volume for the pods ?  If volume is required do we
>>> need provisioned volume and what are the recommended
>>> alternatives/considerations especially with AWS.
>>>
>>> If there are any other considerations, please let me know.
>>>
>>> Thanks for your advice.
>>>
>>>
>>>
>>>
>>>

Re: Flink on Kubernetes - Session vs Job cluster mode and storage

Posted by Jin Yi <el...@gmail.com>.

Hi Yang,

regarding your statement below:

Since you are starting JM/TM with K8s deployment, when they failed new
JM/TM will be created. If you do not set the high
availability configuration, your jobs could recover when TM failed.
However, they could not recover when JM failed. With HA
configured, the jobs could always be recovered and you do not need to
re-submit again.

Does it also apply to Flink Job Cluster? When the JM pod restarted by
Kubernetes, the image contains the application jar also, so if the
statement also applies to the Flink Job Cluster mode, can you please
elaborate why?

Thanks a lot!
Eleanore

On Mon, Feb 24, 2020 at 6:36 PM Yang Wang <da...@gmail.com> wrote:

> Hi M Singh,
>
> > Mans - If we use the session based deployment option for K8 - I thought
>> K8 will automatically restarts any failed TM or JM.
>> In the case of failed TM - the job will probably recover, but in the case
>> of failed JM - perhaps we need to resubmit all jobs.
>> Let me know if I have misunderstood anything.
>
>
> Since you are starting JM/TM with K8s deployment, when they failed new
> JM/TM will be created. If you do not set the high
> availability configuration, your jobs could recover when TM failed.
> However, they could not recover when JM failed. With HA
> configured, the jobs could always be recovered and you do not need to
> re-submit again.
>
> > Mans - Is there any safe way of a passing creds ?
>
>
> Yes, you are right, Using configmap to pass the credentials is not safe.
> On K8s, i think you could use secrets instead[1].
>
> > Mans - Does a task manager failure cause the job to fail ?  My
>> understanding is the JM failure are catastrophic while TM failures are
>> recoverable.
>
>
> What i mean is the job failed, and it could be restarted by your
> configured restart strategy[2].
>
> > Mans - So if we are saving checkpoint in S3 then there is no need for
>> disks - should we use emptyDir ?
>
>
> Yes, if you are saving the checkpoint in S3 and also set the
> `high-availability.storageDir` to S3. Then you do not need persistent
> volume. Since
> the local directory is only used for local cache, so you could directly
> use the overlay filesystem or empryDir(better io performance).
>
>
> [1].
> https://kubernetes.io/docs/tasks/inject-data-application/distribute-credentials-secure/
> [2].
> https://ci.apache.org/projects/flink/flink-docs-master/ops/config.html#fault-tolerance
>
> M Singh <ma...@yahoo.com> 于2020年2月25日周二 上午5:53写道：
>
>> Thanks Wang for your detailed answers.
>>
>> From what I understand the native_kubernetes also leans towards creating
>> a session and submitting a job to it.
>>
>> Regarding other recommendations, please my inline comments and advice.
>>
>> On Sunday, February 23, 2020, 10:01:10 PM EST, Yang Wang <
>> danrtsey.wy@gmail.com> wrote:
>>
>>
>> Hi Singh,
>>
>> Glad to hear that you are looking to run Flink on the Kubernetes. I am
>> trying to answer your question based on my limited knowledge and
>> others could correct me and add some more supplements.
>>
>> I think the biggest difference between session cluster and per-job cluster
>> on Kubernetesis the isolation. Since for per-job, a dedicated Flink
>> cluster
>> will be started for the only one job and no any other jobs could be
>> submitted.
>> Once the job is finished, then the Flink cluster will be
>> destroyed immediately.
>> The second point is one-step submission. You do not need to start a Flink
>> cluster first and then submit a job to the existing session.
>>
>> > Are there any benefits with regards to
>> 1. Configuring the jobs
>> No matter you are using the per-job cluster or submitting to the existing
>> session cluster, they share the configuration mechanism. You do not have
>> to change any codes and configurations.
>>
>> 2. Scaling the taskmanager
>> Since you are using the Standalone cluster on Kubernetes, it do not
>> provide
>> an active resourcemanager. You need to use external tools to monitor and
>> scale up the taskmanagers. The active integration is still evolving and
>> you
>> could have a taste[1].
>>
>> Mans - If we use the session based deployment option for K8 - I thought
>> K8 will automatically restarts any failed TM or JM.
>> In the case of failed TM - the job will probably recover, but in the case
>> of failed JM - perhaps we need to resubmit all jobs.
>> Let me know if I have misunderstood anything.
>>
>> 3. Restarting jobs
>> For the session cluster, you could directly cancel the job and re-submit.
>> And
>> for per-job cluster, when the job is canceled, you need to start a new
>> per-job
>> cluster from the latest savepoint.
>>
>> 4. Managing the flink jobs
>> The rest api and flink command line could be used to managing the
>> jobs(e.g.
>> flink cancel, etc.). I think there is no difference for session and
>> per-job here.
>>
>> 5. Passing credentials (in case of AWS, etc)
>> I am not sure how do you provide your credentials. If you put them in
>> the
>> config map and then mount into the jobmanager/taskmanager pod, then both
>> session and per-job could support this way.
>>
>> Mans - Is there any safe way of a passing creds ?
>>
>> 6. Fault tolerence and recovery of jobs from failure
>> For session cluster, if one taskmanager crashed, then all the jobs which
>> have tasks
>> on this taskmanager will failed.
>> Both session and per-job could be configured with high availability and
>> recover
>> from the latest checkpoint.
>>
>> Mans - Does a task manager failure cause the job to fail ?  My
>> understanding is the JM failure are catastrophic while TM failures are
>> recoverable.
>>
>> > Is there any need for specifying volume for the pods?
>> No, you do not need to specify a volume for pod. All the data in the pod
>> local directory is temporary. When a pod crashed and relaunched, the
>> taskmanager will retrieve the checkpoint from zookeeper + S3 and resume
>> from the latest checkpoint.
>>
>> Mans - So if we are saving checkpoint in S3 then there is no need for
>> disks - should we use emptyDir ?
>>
>>
>> [1].
>> https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/native_kubernetes.html
>>
>> M Singh <ma...@yahoo.com> 于2020年2月23日周日 上午2:28写道：
>>
>> Hey Folks:
>>
>> I am trying to figure out the options for running Flink on Kubernetes and
>> am trying to find out the pros and cons of running in Flink Session vs
>> Flink Cluster mode (
>> https://ci.apache.org/projects/flink/flink-docs-stable/ops/deployment/kubernetes.html#flink-session-cluster-on-kubernetes
>> ).
>>
>> I understand that in job mode there is no need to submit the job since it
>> is part of the job image.  But what are other the pros and cons of this
>> approach vs session mode where a job manager is deployed and flink jobs can
>> be submitted it ?  Are there any benefits with regards to:
>>
>> 1. Configuring the jobs
>> 2. Scaling the taskmanager
>> 3. Restarting jobs
>> 4. Managing the flink jobs
>> 5. Passing credentials (in case of AWS, etc)
>> 6. Fault tolerence and recovery of jobs from failure
>>
>> Also, we will be keeping the checkpoints for the jobs on S3.  Is there
>> any need for specifying volume for the pods ?  If volume is required do we
>> need provisioned volume and what are the recommended
>> alternatives/considerations especially with AWS.
>>
>> If there are any other considerations, please let me know.
>>
>> Thanks for your advice.
>>
>>
>>
>>
>>

Re: Flink on Kubernetes - Session vs Job cluster mode and storage

Posted by Yang Wang <da...@gmail.com>.

I think the only limitation is the disk size of your kubelet machine.
Please remember
to set the "sizeLimit" of your empty dir. Otherwise, your pod may be killed
due to
ephemeral storage is full.


Best,
Yang

M Singh <ma...@yahoo.com> 于2020年2月27日周四 上午8:34写道：

> BTW - Is there any limit to the amount of data that can be stored on
> emptyDir in K8 ?
>
> On Wednesday, February 26, 2020, 07:33:54 PM EST, M Singh <
> mans2singh@yahoo.com> wrote:
>
>
> Thanks Yang and Arvid for your advice and pointers.  Mans
>
> On Wednesday, February 26, 2020, 09:52:26 AM EST, Arvid Heise <
> arvid@ververica.com> wrote:
>
>
> Creds on AWS are typically resolved through roles assigned to K8s pods
> (for example with KIAM [1]).
>
> [1] https://github.com/uswitch/kiam
>
> On Tue, Feb 25, 2020 at 3:36 AM Yang Wang <da...@gmail.com> wrote:
>
> Hi M Singh,
>
> > Mans - If we use the session based deployment option for K8 - I thought
> K8 will automatically restarts any failed TM or JM.
> In the case of failed TM - the job will probably recover, but in the case
> of failed JM - perhaps we need to resubmit all jobs.
> Let me know if I have misunderstood anything.
>
>
> Since you are starting JM/TM with K8s deployment, when they failed new
> JM/TM will be created. If you do not set the high
> availability configuration, your jobs could recover when TM failed.
> However, they could not recover when JM failed. With HA
> configured, the jobs could always be recovered and you do not need to
> re-submit again.
>
> > Mans - Is there any safe way of a passing creds ?
>
>
> Yes, you are right, Using configmap to pass the credentials is not safe.
> On K8s, i think you could use secrets instead[1].
>
> > Mans - Does a task manager failure cause the job to fail ?  My
> understanding is the JM failure are catastrophic while TM failures are
> recoverable.
>
>
> What i mean is the job failed, and it could be restarted by your
> configured restart strategy[2].
>
> > Mans - So if we are saving checkpoint in S3 then there is no need for
> disks - should we use emptyDir ?
>
>
> Yes, if you are saving the checkpoint in S3 and also set the
> `high-availability.storageDir` to S3. Then you do not need persistent
> volume. Since
> the local directory is only used for local cache, so you could directly
> use the overlay filesystem or empryDir(better io performance).
>
>
> [1].
> https://kubernetes.io/docs/tasks/inject-data-application/distribute-credentials-secure/
> [2].
> https://ci.apache.org/projects/flink/flink-docs-master/ops/config.html#fault-tolerance
>
> M Singh <ma...@yahoo.com> 于2020年2月25日周二 上午5:53写道：
>
> Thanks Wang for your detailed answers.
>
> From what I understand the native_kubernetes also leans towards creating a
> session and submitting a job to it.
>
> Regarding other recommendations, please my inline comments and advice.
>
> On Sunday, February 23, 2020, 10:01:10 PM EST, Yang Wang <
> danrtsey.wy@gmail.com> wrote:
>
>
> Hi Singh,
>
> Glad to hear that you are looking to run Flink on the Kubernetes. I am
> trying to answer your question based on my limited knowledge and
> others could correct me and add some more supplements.
>
> I think the biggest difference between session cluster and per-job cluster
> on Kubernetesis the isolation. Since for per-job, a dedicated Flink cluster
> will be started for the only one job and no any other jobs could be
> submitted.
> Once the job is finished, then the Flink cluster will be
> destroyed immediately.
> The second point is one-step submission. You do not need to start a Flink
> cluster first and then submit a job to the existing session.
>
> > Are there any benefits with regards to
> 1. Configuring the jobs
> No matter you are using the per-job cluster or submitting to the existing
> session cluster, they share the configuration mechanism. You do not have
> to change any codes and configurations.
>
> 2. Scaling the taskmanager
> Since you are using the Standalone cluster on Kubernetes, it do not provide
> an active resourcemanager. You need to use external tools to monitor and
> scale up the taskmanagers. The active integration is still evolving and you
> could have a taste[1].
>
> Mans - If we use the session based deployment option for K8 - I thought K8
> will automatically restarts any failed TM or JM.
> In the case of failed TM - the job will probably recover, but in the case
> of failed JM - perhaps we need to resubmit all jobs.
> Let me know if I have misunderstood anything.
>
> 3. Restarting jobs
> For the session cluster, you could directly cancel the job and re-submit.
> And
> for per-job cluster, when the job is canceled, you need to start a new
> per-job
> cluster from the latest savepoint.
>
> 4. Managing the flink jobs
> The rest api and flink command line could be used to managing the jobs(e.g.
> flink cancel, etc.). I think there is no difference for session and
> per-job here.
>
> 5. Passing credentials (in case of AWS, etc)
> I am not sure how do you provide your credentials. If you put them in the
> config map and then mount into the jobmanager/taskmanager pod, then both
> session and per-job could support this way.
>
> Mans - Is there any safe way of a passing creds ?
>
> 6. Fault tolerence and recovery of jobs from failure
> For session cluster, if one taskmanager crashed, then all the jobs which
> have tasks
> on this taskmanager will failed.
> Both session and per-job could be configured with high availability and
> recover
> from the latest checkpoint.
>
> Mans - Does a task manager failure cause the job to fail ?  My
> understanding is the JM failure are catastrophic while TM failures are
> recoverable.
>
> > Is there any need for specifying volume for the pods?
> No, you do not need to specify a volume for pod. All the data in the pod
> local directory is temporary. When a pod crashed and relaunched, the
> taskmanager will retrieve the checkpoint from zookeeper + S3 and resume
> from the latest checkpoint.
>
> Mans - So if we are saving checkpoint in S3 then there is no need for
> disks - should we use emptyDir ?
>
>
> [1].
> https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/native_kubernetes.html
>
> M Singh <ma...@yahoo.com> 于2020年2月23日周日 上午2:28写道：
>
> Hey Folks:
>
> I am trying to figure out the options for running Flink on Kubernetes and
> am trying to find out the pros and cons of running in Flink Session vs
> Flink Cluster mode (
> https://ci.apache.org/projects/flink/flink-docs-stable/ops/deployment/kubernetes.html#flink-session-cluster-on-kubernetes
> ).
>
> I understand that in job mode there is no need to submit the job since it
> is part of the job image.  But what are other the pros and cons of this
> approach vs session mode where a job manager is deployed and flink jobs can
> be submitted it ?  Are there any benefits with regards to:
>
> 1. Configuring the jobs
> 2. Scaling the taskmanager
> 3. Restarting jobs
> 4. Managing the flink jobs
> 5. Passing credentials (in case of AWS, etc)
> 6. Fault tolerence and recovery of jobs from failure
>
> Also, we will be keeping the checkpoints for the jobs on S3.  Is there any
> need for specifying volume for the pods ?  If volume is required do we need
> provisioned volume and what are the recommended alternatives/considerations
> especially with AWS.
>
> If there are any other considerations, please let me know.
>
> Thanks for your advice.
>
>
>
>
>

Re: Flink on Kubernetes - Session vs Job cluster mode and storage

Posted by M Singh <ma...@yahoo.com>.

 BTW - Is there any limit to the amount of data that can be stored on emptyDir in K8 ?  
    On Wednesday, February 26, 2020, 07:33:54 PM EST, M Singh <ma...@yahoo.com> wrote:  
 
  Thanks Yang and Arvid for your advice and pointers.  Mans
    On Wednesday, February 26, 2020, 09:52:26 AM EST, Arvid Heise <ar...@ververica.com> wrote:  
 
 Creds on AWS are typically resolved through roles assigned to K8s pods (for example with KIAM [1]).
[1] https://github.com/uswitch/kiam
On Tue, Feb 25, 2020 at 3:36 AM Yang Wang <da...@gmail.com> wrote:

Hi M Singh,

> Mans - If we use the session based deployment option for K8 - I thought K8 will automatically restarts any failed TM or JM. 
In the case of failed TM - the job will probably recover, but in the case of failed JM - perhaps we need to resubmit all jobs.
Let me know if I have misunderstood anything.

Since you are starting JM/TM with K8s deployment, when they failed new JM/TM will be created. If you do not set the highavailability configuration, your jobs could recover when TM failed. However, they could not recover when JM failed. With HA
configured, the jobs could always be recovered and you do not need to re-submit again.

> Mans - Is there any safe way of a passing creds ?

Yes, you are right, Using configmap to pass the credentials is not safe. On K8s, i think you could use secrets instead[1].

> Mans - Does a task manager failure cause the job to fail ?  My understanding is the JM failure are catastrophic while TM failures are recoverable.

What i mean is the job failed, and it could be restarted by your configured restart strategy[2].

> Mans - So if we are saving checkpoint in S3 then there is no need for disks - should we use emptyDir ?
 Yes, if you are saving the checkpoint in S3 and also set the `high-availability.storageDir` to S3. Then you do not need persistent volume. Sincethe local directory is only used for local cache, so you could directly use the overlay filesystem or empryDir(better io performance).

[1]. https://kubernetes.io/docs/tasks/inject-data-application/distribute-credentials-secure/[2]. https://ci.apache.org/projects/flink/flink-docs-master/ops/config.html#fault-tolerance
M Singh <ma...@yahoo.com> 于2020年2月25日周二 上午5:53写道：

 Thanks Wang for your detailed answers.
From what I understand the native_kubernetes also leans towards creating a session and submitting a job to it.  
Regarding other recommendations, please my inline comments and advice.
    On Sunday, February 23, 2020, 10:01:10 PM EST, Yang Wang <da...@gmail.com> wrote:  
 
 Hi Singh,
Glad to hear that you are looking to run Flink on the Kubernetes. I amtrying to answer your question based on my limited knowledge andothers could correct me and add some more supplements.
I think the biggest difference between session cluster and per-job clusteron Kubernetesis the isolation. Since for per-job, a dedicated Flink clusterwill be started for the only one job and no any other jobs could be submitted.Once the job is finished, then the Flink cluster will be destroyed immediately.The second point is one-step submission. You do not need to start a Flinkcluster first and then submit a job to the existing session.
> Are there any benefits with regards to1. Configuring the jobsNo matter you are using the per-job cluster or submitting to the existingsession cluster, they share the configuration mechanism. You do not haveto change any codes and configurations.
2. Scaling the taskmanagerSince you are using the Standalone cluster on Kubernetes, it do not providean active resourcemanager. You need to use external tools to monitor andscale up the taskmanagers. The active integration is still evolving and youcould have a taste[1].
Mans - If we use the session based deployment option for K8 - I thought K8 will automatically restarts any failed TM or JM. In the case of failed TM - the job will probably recover, but in the case of failed JM - perhaps we need to resubmit all jobs.Let me know if I have misunderstood anything.
3. Restarting jobsFor the session cluster, you could directly cancel the job and re-submit. Andfor per-job cluster, when the job is canceled, you need to start a new per-jobcluster from the latest savepoint.
4. Managing the flink jobsThe rest api and flink command line could be used to managing the jobs(e.g.flink cancel, etc.). I think there is no difference for session and per-job here.
5. Passing credentials (in case of AWS, etc)
I am not sure how do you provide your credentials. If you put them in the config map and then mount into the jobmanager/taskmanager pod, then bothsession and per-job could support this way.
Mans - Is there any safe way of a passing creds ?
6. Fault tolerence and recovery of jobs from failure
For session cluster, if one taskmanager crashed, then all the jobs which have taskson this taskmanager will failed. Both session and per-job could be configured with high availability and recoverfrom the latest checkpoint. 
Mans - Does a task manager failure cause the job to fail ?  My understanding is the JM failure are catastrophic while TM failures are recoverable.
> Is there any need for specifying volume for the pods?No, you do not need to specify a volume for pod. All the data in the pod local directory is temporary. When a pod crashed and relaunched, thetaskmanager will retrieve the checkpoint from zookeeper + S3 and resumefrom the latest checkpoint.
Mans - So if we are saving checkpoint in S3 then there is no need for disks - should we use emptyDir ?

[1]. https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/native_kubernetes.html
M Singh <ma...@yahoo.com> 于2020年2月23日周日 上午2:28写道：

Hey Folks:
I am trying to figure out the options for running Flink on Kubernetes and am trying to find out the pros and cons of running in Flink Session vs Flink Cluster mode (https://ci.apache.org/projects/flink/flink-docs-stable/ops/deployment/kubernetes.html#flink-session-cluster-on-kubernetes).
I understand that in job mode there is no need to submit the job since it is part of the job image.  But what are other the pros and cons of this approach vs session mode where a job manager is deployed and flink jobs can be submitted it ?  Are there any benefits with regards to:
1. Configuring the jobs 2. Scaling the taskmanager3. Restarting jobs4. Managing the flink jobs5. Passing credentials (in case of AWS, etc)6. Fault tolerence and recovery of jobs from failure
Also, we will be keeping the checkpoints for the jobs on S3.  Is there any need for specifying volume for the pods ?  If volume is required do we need provisioned volume and what are the recommended alternatives/considerations especially with AWS.
If there are any other considerations, please let me know.
Thanks for your advice.

Re: Flink on Kubernetes - Session vs Job cluster mode and storage

Posted by M Singh <ma...@yahoo.com>.

 Thanks Yang and Arvid for your advice and pointers.  Mans
    On Wednesday, February 26, 2020, 09:52:26 AM EST, Arvid Heise <ar...@ververica.com> wrote:  
 
 Creds on AWS are typically resolved through roles assigned to K8s pods (for example with KIAM [1]).
[1] https://github.com/uswitch/kiam
On Tue, Feb 25, 2020 at 3:36 AM Yang Wang <da...@gmail.com> wrote:

Hi M Singh,

> Mans - If we use the session based deployment option for K8 - I thought K8 will automatically restarts any failed TM or JM. 
In the case of failed TM - the job will probably recover, but in the case of failed JM - perhaps we need to resubmit all jobs.
Let me know if I have misunderstood anything.

Since you are starting JM/TM with K8s deployment, when they failed new JM/TM will be created. If you do not set the highavailability configuration, your jobs could recover when TM failed. However, they could not recover when JM failed. With HA
configured, the jobs could always be recovered and you do not need to re-submit again.

> Mans - Is there any safe way of a passing creds ?

Yes, you are right, Using configmap to pass the credentials is not safe. On K8s, i think you could use secrets instead[1].

> Mans - Does a task manager failure cause the job to fail ?  My understanding is the JM failure are catastrophic while TM failures are recoverable.

What i mean is the job failed, and it could be restarted by your configured restart strategy[2].

> Mans - So if we are saving checkpoint in S3 then there is no need for disks - should we use emptyDir ?
 Yes, if you are saving the checkpoint in S3 and also set the `high-availability.storageDir` to S3. Then you do not need persistent volume. Sincethe local directory is only used for local cache, so you could directly use the overlay filesystem or empryDir(better io performance).

[1]. https://kubernetes.io/docs/tasks/inject-data-application/distribute-credentials-secure/[2]. https://ci.apache.org/projects/flink/flink-docs-master/ops/config.html#fault-tolerance
M Singh <ma...@yahoo.com> 于2020年2月25日周二 上午5:53写道：

 Thanks Wang for your detailed answers.
From what I understand the native_kubernetes also leans towards creating a session and submitting a job to it.  
Regarding other recommendations, please my inline comments and advice.
    On Sunday, February 23, 2020, 10:01:10 PM EST, Yang Wang <da...@gmail.com> wrote:  
 
 Hi Singh,
Glad to hear that you are looking to run Flink on the Kubernetes. I amtrying to answer your question based on my limited knowledge andothers could correct me and add some more supplements.
I think the biggest difference between session cluster and per-job clusteron Kubernetesis the isolation. Since for per-job, a dedicated Flink clusterwill be started for the only one job and no any other jobs could be submitted.Once the job is finished, then the Flink cluster will be destroyed immediately.The second point is one-step submission. You do not need to start a Flinkcluster first and then submit a job to the existing session.
> Are there any benefits with regards to1. Configuring the jobsNo matter you are using the per-job cluster or submitting to the existingsession cluster, they share the configuration mechanism. You do not haveto change any codes and configurations.
2. Scaling the taskmanagerSince you are using the Standalone cluster on Kubernetes, it do not providean active resourcemanager. You need to use external tools to monitor andscale up the taskmanagers. The active integration is still evolving and youcould have a taste[1].
Mans - If we use the session based deployment option for K8 - I thought K8 will automatically restarts any failed TM or JM. In the case of failed TM - the job will probably recover, but in the case of failed JM - perhaps we need to resubmit all jobs.Let me know if I have misunderstood anything.
3. Restarting jobsFor the session cluster, you could directly cancel the job and re-submit. Andfor per-job cluster, when the job is canceled, you need to start a new per-jobcluster from the latest savepoint.
4. Managing the flink jobsThe rest api and flink command line could be used to managing the jobs(e.g.flink cancel, etc.). I think there is no difference for session and per-job here.
5. Passing credentials (in case of AWS, etc)
I am not sure how do you provide your credentials. If you put them in the config map and then mount into the jobmanager/taskmanager pod, then bothsession and per-job could support this way.
Mans - Is there any safe way of a passing creds ?
6. Fault tolerence and recovery of jobs from failure
For session cluster, if one taskmanager crashed, then all the jobs which have taskson this taskmanager will failed. Both session and per-job could be configured with high availability and recoverfrom the latest checkpoint. 
Mans - Does a task manager failure cause the job to fail ?  My understanding is the JM failure are catastrophic while TM failures are recoverable.
> Is there any need for specifying volume for the pods?No, you do not need to specify a volume for pod. All the data in the pod local directory is temporary. When a pod crashed and relaunched, thetaskmanager will retrieve the checkpoint from zookeeper + S3 and resumefrom the latest checkpoint.
Mans - So if we are saving checkpoint in S3 then there is no need for disks - should we use emptyDir ?

[1]. https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/native_kubernetes.html
M Singh <ma...@yahoo.com> 于2020年2月23日周日 上午2:28写道：

Hey Folks:
I am trying to figure out the options for running Flink on Kubernetes and am trying to find out the pros and cons of running in Flink Session vs Flink Cluster mode (https://ci.apache.org/projects/flink/flink-docs-stable/ops/deployment/kubernetes.html#flink-session-cluster-on-kubernetes).
I understand that in job mode there is no need to submit the job since it is part of the job image.  But what are other the pros and cons of this approach vs session mode where a job manager is deployed and flink jobs can be submitted it ?  Are there any benefits with regards to:
1. Configuring the jobs 2. Scaling the taskmanager3. Restarting jobs4. Managing the flink jobs5. Passing credentials (in case of AWS, etc)6. Fault tolerence and recovery of jobs from failure
Also, we will be keeping the checkpoints for the jobs on S3.  Is there any need for specifying volume for the pods ?  If volume is required do we need provisioned volume and what are the recommended alternatives/considerations especially with AWS.
If there are any other considerations, please let me know.
Thanks for your advice.

Re: Flink on Kubernetes - Session vs Job cluster mode and storage

Posted by Arvid Heise <ar...@ververica.com>.

Creds on AWS are typically resolved through roles assigned to K8s pods (for
example with KIAM [1]).

[1] https://github.com/uswitch/kiam

On Tue, Feb 25, 2020 at 3:36 AM Yang Wang <da...@gmail.com> wrote:

> Hi M Singh,
>
> > Mans - If we use the session based deployment option for K8 - I thought
>> K8 will automatically restarts any failed TM or JM.
>> In the case of failed TM - the job will probably recover, but in the case
>> of failed JM - perhaps we need to resubmit all jobs.
>> Let me know if I have misunderstood anything.
>
>
> Since you are starting JM/TM with K8s deployment, when they failed new
> JM/TM will be created. If you do not set the high
> availability configuration, your jobs could recover when TM failed.
> However, they could not recover when JM failed. With HA
> configured, the jobs could always be recovered and you do not need to
> re-submit again.
>
> > Mans - Is there any safe way of a passing creds ?
>
>
> Yes, you are right, Using configmap to pass the credentials is not safe.
> On K8s, i think you could use secrets instead[1].
>
> > Mans - Does a task manager failure cause the job to fail ?  My
>> understanding is the JM failure are catastrophic while TM failures are
>> recoverable.
>
>
> What i mean is the job failed, and it could be restarted by your
> configured restart strategy[2].
>
> > Mans - So if we are saving checkpoint in S3 then there is no need for
>> disks - should we use emptyDir ?
>
>
> Yes, if you are saving the checkpoint in S3 and also set the
> `high-availability.storageDir` to S3. Then you do not need persistent
> volume. Since
> the local directory is only used for local cache, so you could directly
> use the overlay filesystem or empryDir(better io performance).
>
>
> [1].
> https://kubernetes.io/docs/tasks/inject-data-application/distribute-credentials-secure/
> [2].
> https://ci.apache.org/projects/flink/flink-docs-master/ops/config.html#fault-tolerance
>
> M Singh <ma...@yahoo.com> 于2020年2月25日周二 上午5:53写道：
>
>> Thanks Wang for your detailed answers.
>>
>> From what I understand the native_kubernetes also leans towards creating
>> a session and submitting a job to it.
>>
>> Regarding other recommendations, please my inline comments and advice.
>>
>> On Sunday, February 23, 2020, 10:01:10 PM EST, Yang Wang <
>> danrtsey.wy@gmail.com> wrote:
>>
>>
>> Hi Singh,
>>
>> Glad to hear that you are looking to run Flink on the Kubernetes. I am
>> trying to answer your question based on my limited knowledge and
>> others could correct me and add some more supplements.
>>
>> I think the biggest difference between session cluster and per-job cluster
>> on Kubernetesis the isolation. Since for per-job, a dedicated Flink
>> cluster
>> will be started for the only one job and no any other jobs could be
>> submitted.
>> Once the job is finished, then the Flink cluster will be
>> destroyed immediately.
>> The second point is one-step submission. You do not need to start a Flink
>> cluster first and then submit a job to the existing session.
>>
>> > Are there any benefits with regards to
>> 1. Configuring the jobs
>> No matter you are using the per-job cluster or submitting to the existing
>> session cluster, they share the configuration mechanism. You do not have
>> to change any codes and configurations.
>>
>> 2. Scaling the taskmanager
>> Since you are using the Standalone cluster on Kubernetes, it do not
>> provide
>> an active resourcemanager. You need to use external tools to monitor and
>> scale up the taskmanagers. The active integration is still evolving and
>> you
>> could have a taste[1].
>>
>> Mans - If we use the session based deployment option for K8 - I thought
>> K8 will automatically restarts any failed TM or JM.
>> In the case of failed TM - the job will probably recover, but in the case
>> of failed JM - perhaps we need to resubmit all jobs.
>> Let me know if I have misunderstood anything.
>>
>> 3. Restarting jobs
>> For the session cluster, you could directly cancel the job and re-submit.
>> And
>> for per-job cluster, when the job is canceled, you need to start a new
>> per-job
>> cluster from the latest savepoint.
>>
>> 4. Managing the flink jobs
>> The rest api and flink command line could be used to managing the
>> jobs(e.g.
>> flink cancel, etc.). I think there is no difference for session and
>> per-job here.
>>
>> 5. Passing credentials (in case of AWS, etc)
>> I am not sure how do you provide your credentials. If you put them in
>> the
>> config map and then mount into the jobmanager/taskmanager pod, then both
>> session and per-job could support this way.
>>
>> Mans - Is there any safe way of a passing creds ?
>>
>> 6. Fault tolerence and recovery of jobs from failure
>> For session cluster, if one taskmanager crashed, then all the jobs which
>> have tasks
>> on this taskmanager will failed.
>> Both session and per-job could be configured with high availability and
>> recover
>> from the latest checkpoint.
>>
>> Mans - Does a task manager failure cause the job to fail ?  My
>> understanding is the JM failure are catastrophic while TM failures are
>> recoverable.
>>
>> > Is there any need for specifying volume for the pods?
>> No, you do not need to specify a volume for pod. All the data in the pod
>> local directory is temporary. When a pod crashed and relaunched, the
>> taskmanager will retrieve the checkpoint from zookeeper + S3 and resume
>> from the latest checkpoint.
>>
>> Mans - So if we are saving checkpoint in S3 then there is no need for
>> disks - should we use emptyDir ?
>>
>>
>> [1].
>> https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/native_kubernetes.html
>>
>> M Singh <ma...@yahoo.com> 于2020年2月23日周日 上午2:28写道：
>>
>> Hey Folks:
>>
>> I am trying to figure out the options for running Flink on Kubernetes and
>> am trying to find out the pros and cons of running in Flink Session vs
>> Flink Cluster mode (
>> https://ci.apache.org/projects/flink/flink-docs-stable/ops/deployment/kubernetes.html#flink-session-cluster-on-kubernetes
>> ).
>>
>> I understand that in job mode there is no need to submit the job since it
>> is part of the job image.  But what are other the pros and cons of this
>> approach vs session mode where a job manager is deployed and flink jobs can
>> be submitted it ?  Are there any benefits with regards to:
>>
>> 1. Configuring the jobs
>> 2. Scaling the taskmanager
>> 3. Restarting jobs
>> 4. Managing the flink jobs
>> 5. Passing credentials (in case of AWS, etc)
>> 6. Fault tolerence and recovery of jobs from failure
>>
>> Also, we will be keeping the checkpoints for the jobs on S3.  Is there
>> any need for specifying volume for the pods ?  If volume is required do we
>> need provisioned volume and what are the recommended
>> alternatives/considerations especially with AWS.
>>
>> If there are any other considerations, please let me know.
>>
>> Thanks for your advice.
>>
>>
>>
>>
>>

Re: Flink on Kubernetes - Session vs Job cluster mode and storage

Posted by Yang Wang <da...@gmail.com>.

Hi M Singh,

> Mans - If we use the session based deployment option for K8 - I thought
> K8 will automatically restarts any failed TM or JM.
> In the case of failed TM - the job will probably recover, but in the case
> of failed JM - perhaps we need to resubmit all jobs.
> Let me know if I have misunderstood anything.


Since you are starting JM/TM with K8s deployment, when they failed new
JM/TM will be created. If you do not set the high
availability configuration, your jobs could recover when TM failed.
However, they could not recover when JM failed. With HA
configured, the jobs could always be recovered and you do not need to
re-submit again.

> Mans - Is there any safe way of a passing creds ?


Yes, you are right, Using configmap to pass the credentials is not safe. On
K8s, i think you could use secrets instead[1].

> Mans - Does a task manager failure cause the job to fail ?  My
> understanding is the JM failure are catastrophic while TM failures are
> recoverable.


What i mean is the job failed, and it could be restarted by your configured
restart strategy[2].

> Mans - So if we are saving checkpoint in S3 then there is no need for
> disks - should we use emptyDir ?


Yes, if you are saving the checkpoint in S3 and also set the
`high-availability.storageDir` to S3. Then you do not need persistent
volume. Since
the local directory is only used for local cache, so you could directly use
the overlay filesystem or empryDir(better io performance).


[1].
https://kubernetes.io/docs/tasks/inject-data-application/distribute-credentials-secure/
[2].
https://ci.apache.org/projects/flink/flink-docs-master/ops/config.html#fault-tolerance

M Singh <ma...@yahoo.com> 于2020年2月25日周二 上午5:53写道：

> Thanks Wang for your detailed answers.
>
> From what I understand the native_kubernetes also leans towards creating a
> session and submitting a job to it.
>
> Regarding other recommendations, please my inline comments and advice.
>
> On Sunday, February 23, 2020, 10:01:10 PM EST, Yang Wang <
> danrtsey.wy@gmail.com> wrote:
>
>
> Hi Singh,
>
> Glad to hear that you are looking to run Flink on the Kubernetes. I am
> trying to answer your question based on my limited knowledge and
> others could correct me and add some more supplements.
>
> I think the biggest difference between session cluster and per-job cluster
> on Kubernetesis the isolation. Since for per-job, a dedicated Flink cluster
> will be started for the only one job and no any other jobs could be
> submitted.
> Once the job is finished, then the Flink cluster will be
> destroyed immediately.
> The second point is one-step submission. You do not need to start a Flink
> cluster first and then submit a job to the existing session.
>
> > Are there any benefits with regards to
> 1. Configuring the jobs
> No matter you are using the per-job cluster or submitting to the existing
> session cluster, they share the configuration mechanism. You do not have
> to change any codes and configurations.
>
> 2. Scaling the taskmanager
> Since you are using the Standalone cluster on Kubernetes, it do not provide
> an active resourcemanager. You need to use external tools to monitor and
> scale up the taskmanagers. The active integration is still evolving and you
> could have a taste[1].
>
> Mans - If we use the session based deployment option for K8 - I thought K8
> will automatically restarts any failed TM or JM.
> In the case of failed TM - the job will probably recover, but in the case
> of failed JM - perhaps we need to resubmit all jobs.
> Let me know if I have misunderstood anything.
>
> 3. Restarting jobs
> For the session cluster, you could directly cancel the job and re-submit.
> And
> for per-job cluster, when the job is canceled, you need to start a new
> per-job
> cluster from the latest savepoint.
>
> 4. Managing the flink jobs
> The rest api and flink command line could be used to managing the jobs(e.g.
> flink cancel, etc.). I think there is no difference for session and
> per-job here.
>
> 5. Passing credentials (in case of AWS, etc)
> I am not sure how do you provide your credentials. If you put them in the
> config map and then mount into the jobmanager/taskmanager pod, then both
> session and per-job could support this way.
>
> Mans - Is there any safe way of a passing creds ?
>
> 6. Fault tolerence and recovery of jobs from failure
> For session cluster, if one taskmanager crashed, then all the jobs which
> have tasks
> on this taskmanager will failed.
> Both session and per-job could be configured with high availability and
> recover
> from the latest checkpoint.
>
> Mans - Does a task manager failure cause the job to fail ?  My
> understanding is the JM failure are catastrophic while TM failures are
> recoverable.
>
> > Is there any need for specifying volume for the pods?
> No, you do not need to specify a volume for pod. All the data in the pod
> local directory is temporary. When a pod crashed and relaunched, the
> taskmanager will retrieve the checkpoint from zookeeper + S3 and resume
> from the latest checkpoint.
>
> Mans - So if we are saving checkpoint in S3 then there is no need for
> disks - should we use emptyDir ?
>
>
> [1].
> https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/native_kubernetes.html
>
> M Singh <ma...@yahoo.com> 于2020年2月23日周日 上午2:28写道：
>
> Hey Folks:
>
> I am trying to figure out the options for running Flink on Kubernetes and
> am trying to find out the pros and cons of running in Flink Session vs
> Flink Cluster mode (
> https://ci.apache.org/projects/flink/flink-docs-stable/ops/deployment/kubernetes.html#flink-session-cluster-on-kubernetes
> ).
>
> I understand that in job mode there is no need to submit the job since it
> is part of the job image.  But what are other the pros and cons of this
> approach vs session mode where a job manager is deployed and flink jobs can
> be submitted it ?  Are there any benefits with regards to:
>
> 1. Configuring the jobs
> 2. Scaling the taskmanager
> 3. Restarting jobs
> 4. Managing the flink jobs
> 5. Passing credentials (in case of AWS, etc)
> 6. Fault tolerence and recovery of jobs from failure
>
> Also, we will be keeping the checkpoints for the jobs on S3.  Is there any
> need for specifying volume for the pods ?  If volume is required do we need
> provisioned volume and what are the recommended alternatives/considerations
> especially with AWS.
>
> If there are any other considerations, please let me know.
>
> Thanks for your advice.
>
>
>
>
>

Re: Flink on Kubernetes - Session vs Job cluster mode and storage

Posted by M Singh <ma...@yahoo.com>.

Thanks Wang for your detailed answers.
From what I understand the native_kubernetes also leans towards creating a session and submitting a job to it.
Regarding other recommendations, please my inline comments and advice.
On Sunday, February 23, 2020, 10:01:10 PM EST, Yang Wang <da...@gmail.com> wrote:

Hi Singh,
Glad to hear that you are looking to run Flink on the Kubernetes. I amtrying to answer your question based on my limited knowledge andothers could correct me and add some more supplements.
I think the biggest difference between session cluster and per-job clusteron Kubernetesis the isolation. Since for per-job, a dedicated Flink clusterwill be started for the only one job and no any other jobs could be submitted.Once the job is finished, then the Flink cluster will be destroyed immediately.The second point is one-step submission. You do not need to start a Flinkcluster first and then submit a job to the existing session.
> Are there any benefits with regards to1. Configuring the jobsNo matter you are using the per-job cluster or submitting to the existingsession cluster, they share the configuration mechanism. You do not haveto change any codes and configurations.
2. Scaling the taskmanagerSince you are using the Standalone cluster on Kubernetes, it do not providean active resourcemanager. You need to use external tools to monitor andscale up the taskmanagers. The active integration is still evolving and youcould have a taste[1].
Mans - If we use the session based deployment option for K8 - I thought K8 will automatically restarts any failed TM or JM. In the case of failed TM - the job will probably recover, but in the case of failed JM - perhaps we need to resubmit all jobs.Let me know if I have misunderstood anything.
3. Restarting jobsFor the session cluster, you could directly cancel the job and re-submit. Andfor per-job cluster, when the job is canceled, you need to start a new per-jobcluster from the latest savepoint.
4. Managing the flink jobsThe rest api and flink command line could be used to managing the jobs(e.g.flink cancel, etc.). I think there is no difference for session and per-job here.
5. Passing credentials (in case of AWS, etc)
I am not sure how do you provide your credentials. If you put them in the config map and then mount into the jobmanager/taskmanager pod, then bothsession and per-job could support this way.
Mans - Is there any safe way of a passing creds ?
6. Fault tolerence and recovery of jobs from failure
For session cluster, if one taskmanager crashed, then all the jobs which have taskson this taskmanager will failed. Both session and per-job could be configured with high availability and recoverfrom the latest checkpoint.
Mans - Does a task manager failure cause the job to fail ? My understanding is the JM failure are catastrophic while TM failures are recoverable.
> Is there any need for specifying volume for the pods?No, you do not need to specify a volume for pod. All the data in the pod local directory is temporary. When a pod crashed and relaunched, thetaskmanager will retrieve the checkpoint from zookeeper + S3 and resumefrom the latest checkpoint.
Mans - So if we are saving checkpoint in S3 then there is no need for disks - should we use emptyDir ?

[1]. https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/native_kubernetes.html
M Singh <ma...@yahoo.com> 于2020年2月23日周日 上午2:28写道：

Hey Folks:
I am trying to figure out the options for running Flink on Kubernetes and am trying to find out the pros and cons of running in Flink Session vs Flink Cluster mode (https://ci.apache.org/projects/flink/flink-docs-stable/ops/deployment/kubernetes.html#flink-session-cluster-on-kubernetes).
I understand that in job mode there is no need to submit the job since it is part of the job image. But what are other the pros and cons of this approach vs session mode where a job manager is deployed and flink jobs can be submitted it ? Are there any benefits with regards to:
1. Configuring the jobs 2. Scaling the taskmanager3. Restarting jobs4. Managing the flink jobs5. Passing credentials (in case of AWS, etc)6. Fault tolerence and recovery of jobs from failure
Also, we will be keeping the checkpoints for the jobs on S3. Is there any need for specifying volume for the pods ? If volume is required do we need provisioned volume and what are the recommended alternatives/considerations especially with AWS.
If there are any other considerations, please let me know.
Thanks for your advice.

Re: Flink on Kubernetes - Session vs Job cluster mode and storage

Posted by Yang Wang <da...@gmail.com>.

Hi Singh,

Glad to hear that you are looking to run Flink on the Kubernetes. I am
trying to answer your question based on my limited knowledge and
others could correct me and add some more supplements.

I think the biggest difference between session cluster and per-job cluster
on Kubernetesis the isolation. Since for per-job, a dedicated Flink cluster
will be started for the only one job and no any other jobs could be
submitted.
Once the job is finished, then the Flink cluster will be
destroyed immediately.
The second point is one-step submission. You do not need to start a Flink
cluster first and then submit a job to the existing session.

> Are there any benefits with regards to
1. Configuring the jobs
No matter you are using the per-job cluster or submitting to the existing
session cluster, they share the configuration mechanism. You do not have
to change any codes and configurations.

2. Scaling the taskmanager
Since you are using the Standalone cluster on Kubernetes, it do not provide
an active resourcemanager. You need to use external tools to monitor and
scale up the taskmanagers. The active integration is still evolving and you
could have a taste[1].

3. Restarting jobs
For the session cluster, you could directly cancel the job and re-submit.
And
for per-job cluster, when the job is canceled, you need to start a new
per-job
cluster from the latest savepoint.

4. Managing the flink jobs
The rest api and flink command line could be used to managing the jobs(e.g.
flink cancel, etc.). I think there is no difference for session and per-job
here.

5. Passing credentials (in case of AWS, etc)
I am not sure how do you provide your credentials. If you put them in the
config map and then mount into the jobmanager/taskmanager pod, then both
session and per-job could support this way.

6. Fault tolerence and recovery of jobs from failure
For session cluster, if one taskmanager crashed, then all the jobs which
have tasks
on this taskmanager will failed.
Both session and per-job could be configured with high availability and
recover
from the latest checkpoint.

> Is there any need for specifying volume for the pods?
No, you do not need to specify a volume for pod. All the data in the pod
local directory is temporary. When a pod crashed and relaunched, the
taskmanager will retrieve the checkpoint from zookeeper + S3 and resume
from the latest checkpoint.


[1].
https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/native_kubernetes.html

M Singh <ma...@yahoo.com> 于2020年2月23日周日 上午2:28写道：

> Hey Folks:
>
> I am trying to figure out the options for running Flink on Kubernetes and
> am trying to find out the pros and cons of running in Flink Session vs
> Flink Cluster mode (
> https://ci.apache.org/projects/flink/flink-docs-stable/ops/deployment/kubernetes.html#flink-session-cluster-on-kubernetes
> ).
>
> I understand that in job mode there is no need to submit the job since it
> is part of the job image.  But what are other the pros and cons of this
> approach vs session mode where a job manager is deployed and flink jobs can
> be submitted it ?  Are there any benefits with regards to:
>
> 1. Configuring the jobs
> 2. Scaling the taskmanager
> 3. Restarting jobs
> 4. Managing the flink jobs
> 5. Passing credentials (in case of AWS, etc)
> 6. Fault tolerence and recovery of jobs from failure
>
> Also, we will be keeping the checkpoints for the jobs on S3.  Is there any
> need for specifying volume for the pods ?  If volume is required do we need
> provisioned volume and what are the recommended alternatives/considerations
> especially with AWS.
>
> If there are any other considerations, please let me know.
>
> Thanks for your advice.
>
>
>
>
>