You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Niklas Wilcke <ni...@uniberg.com> on 2021/08/06 11:48:58 UTC

Production Grade GitOps Based Kubernetes Setup

Hi Flink Community,

I'm currently assessing the situation about how to properly deploy Flink on Kubernetes via GitOps. There are some options available to deploy Flink on Kubernetes, which I would like to discuss.  In general we are looking for an open source or at least unpaid solution, but I don't exclude paid solutions from the beginning.
I see the following options.

1. Kubernetes Standalone [1]
	* Seems to be deprecated, since the docs state to use Native Kubernetes instead
2. Native Kubernetes [2]
	* Doesn't seem to implement the Kubernetes operator pattern
	* Seems to require command line activities to be operated / upgraded (not GitOps compatible out of the box)
3. "GoogleCloudPlatform/flink-on-k8s-operator" Operator [3]
	* Seems not to be well maintained / documented
	* We had some trouble with crashes during configuration changes, but we need to investigate further
	* There is a "maintained" fork from spotify, which could be an option
4. Flink Native Kubernetes Operator [4]
	* Seems to be a private project from a Flink Committer, which might not be mature enough for a stable operation
5. Proprietary Solution Ververica Platform [5]
	* I didn't try it out yet and have no experience with it
	* I'm unsure whether the Community Edition is suited for a production environment. (one namespace, no auto scaling, no RBAC, etc.)

I have the following questions.

1. Is the "Native Kubernetes" approach suited to be operated via Gitops and does it have some drawbacks compared to an operator based setup? (e.g. is a rollback during a failed upgrade possible?)
2. Are there any experiences with the "GoogleCloudPlatform/flink-on-k8s-operator" or a fork of it in a production environment?
3. Is the "Flink Native Kubernetes Operator" an option or is it just a playground project. How is it related to the "Native Kubernetes" setup? Is it going to be "integrated" into Flink?
4. Is a proprietary unpaid solution like "Ververica Platform Community Edition" a solution for a production environment or will it definitely lack features I need?

Any information or feedback is highly appreciated. Thank you very much in advance.

Kind Regards,
Niklas Wilcke


[1] https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/deployment/resource-providers/standalone/kubernetes/
[2] https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/deployment/resource-providers/native_kubernetes/
[3] https://github.com/GoogleCloudPlatform/flink-on-k8s-operator
[4] https://github.com/wangyang0918/flink-native-k8s-operator
[5] https://www.ververica.com/getting-started-flink-ververica




UNIBERG GmbH 
Simon-von-Utrecht-Straße 85a
20359 Hamburg

niklas.wilcke@uniberg.com
Mobile: +49 160 9793 2593
Office: +49 40 2380 6523


UNIBERG GmbH, Dorfstraße 3, 23816 Bebensee 

Registergericht / Register: Amtsgericht Kiel HRB SE-1507
Geschäftsführer / CEO‘s: Andreas Möller, Martin Ulbricht

Informationen zum Datenschutz / Privacy Information: https://www.uniberg.com/impressum.html


Re: Production Grade GitOps Based Kubernetes Setup

Posted by Svend <st...@svend.xyz>.
Hi all,

I reached out [1] to Filipe Regadas, the author of the Spotify fork of the GCP k8s operator you linked in your message. He confirms he's actively working on it and would welcome PR and community input. I have a modest PR I'll submit to him some time this week already.

This seems to me a promising option to look into.

Svend


[1] https://github.com/spotify/flink-on-k8s-operator/issues/82

On Mon, 9 Aug 2021, at 12:36 PM, Niklas Wilcke wrote:
> Hi Yuval,
> 
> thank you for sharing all the information. I forgot to mention the Lyft operator. Thanks for "adding" it to the list.
> About the dual cluster approach during upgrade I have some doubts about the resource usage. If you are operating some "big" jobs that would mean you always have to provide enough resources to run two of them in parallel during the upgrade or is there some workaround (downscaling, upscaling) available?
> I will further investigate how the option three "GoogleCloudPlatform/flink-on-k8s-operator" is implementing the upgrade process.
> 
> I agree that there is a need for a community based operator project. It is unfortunate that both "relevant" projects (Lyft, GCP) have been more or less abandoned. The only thing that is left I can see with some activity is a fork of the GCP operator from spotify [0], but there is only one person involved.
> 
> Regards,
> Niklas
> 
> [0] https://github.com/spotify/flink-on-k8s-operator
> 
> 
> UNIBERG GmbH 
> Simon-von-Utrecht-Straße 85a
> 20359 Hamburg
> 
> niklas.wilcke@uniberg.com
> Mobile: +49 160 9793 2593
> Office: +49 40 2380 6523
> 
> 
> UNIBERG GmbH, Dorfstraße 3, 23816 Bebensee 
> 
> Registergericht / Register: Amtsgericht Kiel HRB SE-1507
> Geschäftsführer / CEO‘s: Andreas Möller, Martin Ulbricht
> 
> Informationen zum Datenschutz / Privacy Information: https://www.uniberg.com/impressum.html
> 
>> On 6. Aug 2021, at 16:59, Yuval Itzchakov <yu...@gmail.com> wrote:
>> 
>> Hi Niklas,
>> 
>> We are currently using the Lyft operator for Flink in production (https://github.com/lyft/flinkk8soperator), which is additional alternative. The project itself is pretty much in Zombie state, but commits happen every now and then.
>> 
>> 1. Native Kubernetes could definitely work with GitOps, it would just require you to do lots of steps "by hand" in terms of application upgrade and rollover.
>> 2. We're using Lyfts operator as mentioned above. It mostly works well, there were several issues we had along the way but were mostly resolved. One feature that is missing for us specifically is being able to perform an upgrade by first savepointing and killing the existing cluster and only then deploying a new one (their approach is dual, meaning have two clusters up and running before doing the rollover).
>> 3. At it's current state it looks more like a side project than an actively maintained operator.
>> 4. Ververica is definitely an option, we haven't tested their operator, not sure about the maturity level yet.
>> 
>> I think a Flink community based operator for k8s is a much needed project (which I'd be happy to contribute to).
>> 
>> 
>> 
>> 
>> On Fri, Aug 6, 2021, 14:49 Niklas Wilcke <ni...@uniberg.com> wrote:
>>> Hi Flink Community,
>>> 
>>> I'm currently assessing the situation about how to properly deploy Flink on Kubernetes via GitOps. There are some options available to deploy Flink on Kubernetes, which I would like to discuss.  In general we are looking for an open source or at least unpaid solution, but I don't exclude paid solutions from the beginning.
>>> I see the following options.
>>> 
>>> 1. Kubernetes Standalone [1]
>>> * Seems to be deprecated, since the docs state to use Native Kubernetes instead
>>> 2. Native Kubernetes [2]
>>> * Doesn't seem to implement the Kubernetes operator pattern
>>> * Seems to require command line activities to be operated / upgraded (not GitOps compatible out of the box)
>>> 3. "GoogleCloudPlatform/flink-on-k8s-operator" Operator [3]
>>> * Seems not to be well maintained / documented
>>> * We had some trouble with crashes during configuration changes, but we need to investigate further
>>> * There is a "maintained" fork from spotify, which could be an option
>>> 4. Flink Native Kubernetes Operator [4]
>>> * Seems to be a private project from a Flink Committer, which might not be mature enough for a stable operation
>>> 5. Proprietary Solution Ververica Platform [5]
>>> * I didn't try it out yet and have no experience with it
>>> * I'm unsure whether the Community Edition is suited for a production environment. (one namespace, no auto scaling, no RBAC, etc.)
>>> 
>>> I have the following questions.
>>> 
>>> 1. Is the "Native Kubernetes" approach suited to be operated via Gitops and does it have some drawbacks compared to an operator based setup? (e.g. is a rollback during a failed upgrade possible?)
>>> 2. Are there any experiences with the "GoogleCloudPlatform/flink-on-k8s-operator" or a fork of it in a production environment?
>>> 3. Is the "Flink Native Kubernetes Operator" an option or is it just a playground project. How is it related to the "Native Kubernetes" setup? Is it going to be "integrated" into Flink?
>>> 4. Is a proprietary unpaid solution like "Ververica Platform Community Edition" a solution for a production environment or will it definitely lack features I need?
>>> 
>>> Any information or feedback is highly appreciated. Thank you very much in advance.
>>> 
>>> Kind Regards,
>>> Niklas Wilcke
>>> 
>>> 
>>> [1] https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/deployment/resource-providers/standalone/kubernetes/
>>> [2] https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/deployment/resource-providers/native_kubernetes/
>>> [3] https://github.com/GoogleCloudPlatform/flink-on-k8s-operator
>>> [4] https://github.com/wangyang0918/flink-native-k8s-operator
>>> [5] https://www.ververica.com/getting-started-flink-ververica
>>> 
>>> 
>>> 
>>> 
>>> UNIBERG GmbH 
>>> Simon-von-Utrecht-Straße 85a
>>> 20359 Hamburg
>>> 
>>> niklas.wilcke@uniberg.com
>>> Mobile: +49 160 9793 2593
>>> Office: +49 40 2380 6523
>>> 
>>> 
>>> UNIBERG GmbH, Dorfstraße 3, 23816 Bebensee 
>>> 
>>> Registergericht / Register: Amtsgericht Kiel HRB SE-1507
>>> Geschäftsführer / CEO‘s: Andreas Möller, Martin Ulbricht
>>> 
>>> Informationen zum Datenschutz / Privacy Information: https://www.uniberg.com/impressum.html
> 
> *Attachments:*
>  * smime.p7s

Re: Production Grade GitOps Based Kubernetes Setup

Posted by Niklas Wilcke <ni...@uniberg.com>.
Hi Yuval,

thank you for sharing all the information. I forgot to mention the Lyft operator. Thanks for "adding" it to the list.
About the dual cluster approach during upgrade I have some doubts about the resource usage. If you are operating some "big" jobs that would mean you always have to provide enough resources to run two of them in parallel during the upgrade or is there some workaround (downscaling, upscaling) available?
I will further investigate how the option three "GoogleCloudPlatform/flink-on-k8s-operator" is implementing the upgrade process.

I agree that there is a need for a community based operator project. It is unfortunate that both "relevant" projects (Lyft, GCP) have been more or less abandoned. The only thing that is left I can see with some activity is a fork of the GCP operator from spotify [0], but there is only one person involved.

Regards,
Niklas

[0] https://github.com/spotify/flink-on-k8s-operator


UNIBERG GmbH 
Simon-von-Utrecht-Straße 85a
20359 Hamburg

niklas.wilcke@uniberg.com
Mobile: +49 160 9793 2593
Office: +49 40 2380 6523


UNIBERG GmbH, Dorfstraße 3, 23816 Bebensee 

Registergericht / Register: Amtsgericht Kiel HRB SE-1507
Geschäftsführer / CEO‘s: Andreas Möller, Martin Ulbricht

Informationen zum Datenschutz / Privacy Information: https://www.uniberg.com/impressum.html

> On 6. Aug 2021, at 16:59, Yuval Itzchakov <yu...@gmail.com> wrote:
> 
> Hi Niklas,
> 
> We are currently using the Lyft operator for Flink in production (https://github.com/lyft/flinkk8soperator <https://github.com/lyft/flinkk8soperator>), which is additional alternative. The project itself is pretty much in Zombie state, but commits happen every now and then.
> 
> 1. Native Kubernetes could definitely work with GitOps, it would just require you to do lots of steps "by hand" in terms of application upgrade and rollover.
> 2. We're using Lyfts operator as mentioned above. It mostly works well, there were several issues we had along the way but were mostly resolved. One feature that is missing for us specifically is being able to perform an upgrade by first savepointing and killing the existing cluster and only then deploying a new one (their approach is dual, meaning have two clusters up and running before doing the rollover).
> 3. At it's current state it looks more like a side project than an actively maintained operator.
> 4. Ververica is definitely an option, we haven't tested their operator, not sure about the maturity level yet.
> 
> I think a Flink community based operator for k8s is a much needed project (which I'd be happy to contribute to).
> 
> 
> 
> 
> On Fri, Aug 6, 2021, 14:49 Niklas Wilcke <niklas.wilcke@uniberg.com <ma...@uniberg.com>> wrote:
> Hi Flink Community,
> 
> I'm currently assessing the situation about how to properly deploy Flink on Kubernetes via GitOps. There are some options available to deploy Flink on Kubernetes, which I would like to discuss.  In general we are looking for an open source or at least unpaid solution, but I don't exclude paid solutions from the beginning.
> I see the following options.
> 
> 1. Kubernetes Standalone [1]
> 	* Seems to be deprecated, since the docs state to use Native Kubernetes instead
> 2. Native Kubernetes [2]
> 	* Doesn't seem to implement the Kubernetes operator pattern
> 	* Seems to require command line activities to be operated / upgraded (not GitOps compatible out of the box)
> 3. "GoogleCloudPlatform/flink-on-k8s-operator" Operator [3]
> 	* Seems not to be well maintained / documented
> 	* We had some trouble with crashes during configuration changes, but we need to investigate further
> 	* There is a "maintained" fork from spotify, which could be an option
> 4. Flink Native Kubernetes Operator [4]
> 	* Seems to be a private project from a Flink Committer, which might not be mature enough for a stable operation
> 5. Proprietary Solution Ververica Platform [5]
> 	* I didn't try it out yet and have no experience with it
> 	* I'm unsure whether the Community Edition is suited for a production environment. (one namespace, no auto scaling, no RBAC, etc.)
> 
> I have the following questions.
> 
> 1. Is the "Native Kubernetes" approach suited to be operated via Gitops and does it have some drawbacks compared to an operator based setup? (e.g. is a rollback during a failed upgrade possible?)
> 2. Are there any experiences with the "GoogleCloudPlatform/flink-on-k8s-operator" or a fork of it in a production environment?
> 3. Is the "Flink Native Kubernetes Operator" an option or is it just a playground project. How is it related to the "Native Kubernetes" setup? Is it going to be "integrated" into Flink?
> 4. Is a proprietary unpaid solution like "Ververica Platform Community Edition" a solution for a production environment or will it definitely lack features I need?
> 
> Any information or feedback is highly appreciated. Thank you very much in advance.
> 
> Kind Regards,
> Niklas Wilcke
> 
> 
> [1] https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/deployment/resource-providers/standalone/kubernetes/ <https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/deployment/resource-providers/standalone/kubernetes/>
> [2] https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/deployment/resource-providers/native_kubernetes/ <https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/deployment/resource-providers/native_kubernetes/>
> [3] https://github.com/GoogleCloudPlatform/flink-on-k8s-operator <https://github.com/GoogleCloudPlatform/flink-on-k8s-operator>
> [4] https://github.com/wangyang0918/flink-native-k8s-operator <https://github.com/wangyang0918/flink-native-k8s-operator>
> [5] https://www.ververica.com/getting-started-flink-ververica <https://www.ververica.com/getting-started-flink-ververica>
> 
> 
> 
> 
> UNIBERG GmbH 
> Simon-von-Utrecht-Straße 85a
> 20359 Hamburg
> 
> niklas.wilcke@uniberg.com <ma...@uniberg.com>
> Mobile: +49 160 9793 2593
> Office: +49 40 2380 6523
> 
> 
> UNIBERG GmbH, Dorfstraße 3, 23816 Bebensee 
> 
> Registergericht / Register: Amtsgericht Kiel HRB SE-1507
> Geschäftsführer / CEO‘s: Andreas Möller, Martin Ulbricht
> 
> Informationen zum Datenschutz / Privacy Information: https://www.uniberg.com/impressum.html <https://www.uniberg.com/impressum.html>


Re: Production Grade GitOps Based Kubernetes Setup

Posted by Yuval Itzchakov <yu...@gmail.com>.
Hi Niklas,

We are currently using the Lyft operator for Flink in production (
https://github.com/lyft/flinkk8soperator), which is additional alternative.
The project itself is pretty much in Zombie state, but commits happen every
now and then.

1. Native Kubernetes could definitely work with GitOps, it would just
require you to do lots of steps "by hand" in terms of application upgrade
and rollover.
2. We're using Lyfts operator as mentioned above. It mostly works well,
there were several issues we had along the way but were mostly resolved.
One feature that is missing for us specifically is being able to perform an
upgrade by first savepointing and killing the existing cluster and only
then deploying a new one (their approach is dual, meaning have two clusters
up and running before doing the rollover).
3. At it's current state it looks more like a side project than an actively
maintained operator.
4. Ververica is definitely an option, we haven't tested their operator, not
sure about the maturity level yet.

I think a Flink community based operator for k8s is a much needed project
(which I'd be happy to contribute to).




On Fri, Aug 6, 2021, 14:49 Niklas Wilcke <ni...@uniberg.com> wrote:

> Hi Flink Community,
>
> I'm currently assessing the situation about how to properly deploy Flink
> on Kubernetes via GitOps. There are some options available to deploy Flink
> on Kubernetes, which I would like to discuss.  In general we are looking
> for an open source or at least unpaid solution, but I don't exclude paid
> solutions from the beginning.
> I see the following options.
>
> 1. Kubernetes Standalone [1]
> * Seems to be deprecated, since the docs state to use Native Kubernetes
> instead
> 2. Native Kubernetes [2]
> * Doesn't seem to implement the Kubernetes operator pattern
> * Seems to require command line activities to be operated / upgraded (not
> GitOps compatible out of the box)
> 3. "GoogleCloudPlatform/flink-on-k8s-operator" Operator [3]
> * Seems not to be well maintained / documented
> * We had some trouble with crashes during configuration changes, but we
> need to investigate further
> * There is a "maintained" fork from spotify, which could be an option
> 4. Flink Native Kubernetes Operator [4]
> * Seems to be a private project from a Flink Committer, which might not be
> mature enough for a stable operation
> 5. Proprietary Solution Ververica Platform [5]
> * I didn't try it out yet and have no experience with it
> * I'm unsure whether the Community Edition is suited for a production
> environment. (one namespace, no auto scaling, no RBAC, etc.)
>
> I have the following questions.
>
> 1. Is the "Native Kubernetes" approach suited to be operated via Gitops
> and does it have some drawbacks compared to an operator based setup? (e.g.
> is a rollback during a failed upgrade possible?)
> 2. Are there any experiences with the
> "GoogleCloudPlatform/flink-on-k8s-operator" or a fork of it in a production
> environment?
> 3. Is the "Flink Native Kubernetes Operator" an option or is it just a
> playground project. How is it related to the "Native Kubernetes" setup? Is
> it going to be "integrated" into Flink?
> 4. Is a proprietary unpaid solution like "Ververica Platform Community
> Edition" a solution for a production environment or will it definitely lack
> features I need?
>
> Any information or feedback is highly appreciated. Thank you very much in
> advance.
>
> Kind Regards,
> Niklas Wilcke
>
>
> [1]
> https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/deployment/resource-providers/standalone/kubernetes/
> [2]
> https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/deployment/resource-providers/native_kubernetes/
> [3] https://github.com/GoogleCloudPlatform/flink-on-k8s-operator
> [4] https://github.com/wangyang0918/flink-native-k8s-operator
> [5] https://www.ververica.com/getting-started-flink-ververica
>
>
>
>
> UNIBERG GmbH
> Simon-von-Utrecht-Straße 85a
> 20359 Hamburg
>
> niklas.wilcke@uniberg.com
> Mobile: +49 160 9793 2593
> Office: +49 40 2380 6523
>
>
> UNIBERG GmbH, Dorfstraße 3, 23816 Bebensee
>
> Registergericht / Register: Amtsgericht Kiel HRB SE-1507
> Geschäftsführer / CEO‘s: Andreas Möller, Martin Ulbricht
>
> Informationen zum Datenschutz / Privacy Information:
> https://www.uniberg.com/impressum.html
>
>

Re: Production Grade GitOps Based Kubernetes Setup

Posted by Niklas Wilcke <ni...@uniberg.com>.
Hi Maciek,

thanks for sharing your insights. It is highly appreciated.

Regards,
Niklas


UNIBERG GmbH 
Simon-von-Utrecht-Straße 85a
20359 Hamburg

niklas.wilcke@uniberg.com
Mobile: +49 160 9793 2593
Office: +49 40 2380 6523


UNIBERG GmbH, Dorfstraße 3, 23816 Bebensee 

Registergericht / Register: Amtsgericht Kiel HRB SE-1507
Geschäftsführer / CEO‘s: Andreas Möller, Martin Ulbricht

Informationen zum Datenschutz / Privacy Information: https://www.uniberg.com/impressum.html

> On 6. Aug 2021, at 15:22, Maciej Bryński <ma...@brynski.pl> wrote:
> 
> Hi Niklas,
> We had the same problem one year ago and we choose Ververica Platform
> Community Edttion.
> Pros:
> - support for jobs on Session Clusters
> - good support for restoring jobs from checkpoints and savepoints
> - support for even hundreds of jobs
> Cons:
> - state in SQLite (we've already corrupted db file once)
> - delay with Flink Versions
> 
> One year later I still think there is no perfect solution for managing
> Flink on K8s, but for us Ververica was the closest match.
> 
> Regards,
> Maciek
> 
> pt., 6 sie 2021 o 13:49 Niklas Wilcke <ni...@uniberg.com> napisał(a):
>> 
>> Hi Flink Community,
>> 
>> I'm currently assessing the situation about how to properly deploy Flink on Kubernetes via GitOps. There are some options available to deploy Flink on Kubernetes, which I would like to discuss.  In general we are looking for an open source or at least unpaid solution, but I don't exclude paid solutions from the beginning.
>> I see the following options.
>> 
>> 1. Kubernetes Standalone [1]
>> * Seems to be deprecated, since the docs state to use Native Kubernetes instead
>> 2. Native Kubernetes [2]
>> * Doesn't seem to implement the Kubernetes operator pattern
>> * Seems to require command line activities to be operated / upgraded (not GitOps compatible out of the box)
>> 3. "GoogleCloudPlatform/flink-on-k8s-operator" Operator [3]
>> * Seems not to be well maintained / documented
>> * We had some trouble with crashes during configuration changes, but we need to investigate further
>> * There is a "maintained" fork from spotify, which could be an option
>> 4. Flink Native Kubernetes Operator [4]
>> * Seems to be a private project from a Flink Committer, which might not be mature enough for a stable operation
>> 5. Proprietary Solution Ververica Platform [5]
>> * I didn't try it out yet and have no experience with it
>> * I'm unsure whether the Community Edition is suited for a production environment. (one namespace, no auto scaling, no RBAC, etc.)
>> 
>> I have the following questions.
>> 
>> 1. Is the "Native Kubernetes" approach suited to be operated via Gitops and does it have some drawbacks compared to an operator based setup? (e.g. is a rollback during a failed upgrade possible?)
>> 2. Are there any experiences with the "GoogleCloudPlatform/flink-on-k8s-operator" or a fork of it in a production environment?
>> 3. Is the "Flink Native Kubernetes Operator" an option or is it just a playground project. How is it related to the "Native Kubernetes" setup? Is it going to be "integrated" into Flink?
>> 4. Is a proprietary unpaid solution like "Ververica Platform Community Edition" a solution for a production environment or will it definitely lack features I need?
>> 
>> Any information or feedback is highly appreciated. Thank you very much in advance.
>> 
>> Kind Regards,
>> Niklas Wilcke
>> 
>> 
>> [1] https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/deployment/resource-providers/standalone/kubernetes/
>> [2] https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/deployment/resource-providers/native_kubernetes/
>> [3] https://github.com/GoogleCloudPlatform/flink-on-k8s-operator
>> [4] https://github.com/wangyang0918/flink-native-k8s-operator
>> [5] https://www.ververica.com/getting-started-flink-ververica
>> 
>> 
>> 
>> 
>> UNIBERG GmbH
>> Simon-von-Utrecht-Straße 85a
>> 20359 Hamburg
>> 
>> niklas.wilcke@uniberg.com
>> Mobile: +49 160 9793 2593
>> Office: +49 40 2380 6523
>> 
>> 
>> UNIBERG GmbH, Dorfstraße 3, 23816 Bebensee
>> 
>> Registergericht / Register: Amtsgericht Kiel HRB SE-1507
>> Geschäftsführer / CEO‘s: Andreas Möller, Martin Ulbricht
>> 
>> Informationen zum Datenschutz / Privacy Information: https://www.uniberg.com/impressum.html
>> 
> 
> 
> -- 
> Maciek Bryński


Re: Production Grade GitOps Based Kubernetes Setup

Posted by Maciej Bryński <ma...@brynski.pl>.
Hi Niklas,
We had the same problem one year ago and we choose Ververica Platform
Community Edttion.
Pros:
- support for jobs on Session Clusters
- good support for restoring jobs from checkpoints and savepoints
- support for even hundreds of jobs
Cons:
- state in SQLite (we've already corrupted db file once)
- delay with Flink Versions

One year later I still think there is no perfect solution for managing
Flink on K8s, but for us Ververica was the closest match.

Regards,
Maciek

pt., 6 sie 2021 o 13:49 Niklas Wilcke <ni...@uniberg.com> napisał(a):
>
> Hi Flink Community,
>
> I'm currently assessing the situation about how to properly deploy Flink on Kubernetes via GitOps. There are some options available to deploy Flink on Kubernetes, which I would like to discuss.  In general we are looking for an open source or at least unpaid solution, but I don't exclude paid solutions from the beginning.
> I see the following options.
>
> 1. Kubernetes Standalone [1]
> * Seems to be deprecated, since the docs state to use Native Kubernetes instead
> 2. Native Kubernetes [2]
> * Doesn't seem to implement the Kubernetes operator pattern
> * Seems to require command line activities to be operated / upgraded (not GitOps compatible out of the box)
> 3. "GoogleCloudPlatform/flink-on-k8s-operator" Operator [3]
> * Seems not to be well maintained / documented
> * We had some trouble with crashes during configuration changes, but we need to investigate further
> * There is a "maintained" fork from spotify, which could be an option
> 4. Flink Native Kubernetes Operator [4]
> * Seems to be a private project from a Flink Committer, which might not be mature enough for a stable operation
> 5. Proprietary Solution Ververica Platform [5]
> * I didn't try it out yet and have no experience with it
> * I'm unsure whether the Community Edition is suited for a production environment. (one namespace, no auto scaling, no RBAC, etc.)
>
> I have the following questions.
>
> 1. Is the "Native Kubernetes" approach suited to be operated via Gitops and does it have some drawbacks compared to an operator based setup? (e.g. is a rollback during a failed upgrade possible?)
> 2. Are there any experiences with the "GoogleCloudPlatform/flink-on-k8s-operator" or a fork of it in a production environment?
> 3. Is the "Flink Native Kubernetes Operator" an option or is it just a playground project. How is it related to the "Native Kubernetes" setup? Is it going to be "integrated" into Flink?
> 4. Is a proprietary unpaid solution like "Ververica Platform Community Edition" a solution for a production environment or will it definitely lack features I need?
>
> Any information or feedback is highly appreciated. Thank you very much in advance.
>
> Kind Regards,
> Niklas Wilcke
>
>
> [1] https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/deployment/resource-providers/standalone/kubernetes/
> [2] https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/deployment/resource-providers/native_kubernetes/
> [3] https://github.com/GoogleCloudPlatform/flink-on-k8s-operator
> [4] https://github.com/wangyang0918/flink-native-k8s-operator
> [5] https://www.ververica.com/getting-started-flink-ververica
>
>
>
>
> UNIBERG GmbH
> Simon-von-Utrecht-Straße 85a
> 20359 Hamburg
>
> niklas.wilcke@uniberg.com
> Mobile: +49 160 9793 2593
> Office: +49 40 2380 6523
>
>
> UNIBERG GmbH, Dorfstraße 3, 23816 Bebensee
>
> Registergericht / Register: Amtsgericht Kiel HRB SE-1507
> Geschäftsführer / CEO‘s: Andreas Möller, Martin Ulbricht
>
> Informationen zum Datenschutz / Privacy Information: https://www.uniberg.com/impressum.html
>


-- 
Maciek Bryński