You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@beam.apache.org by Thomas Weise <th...@apache.org> on 2019/07/25 04:00:52 UTC

Python Beam pipelines on Flink on Kubernetes

Hi,

Recently Lyft open sourced *FlinkK8sOperator,* a Kubernetes operator to
manage Flink deployments on Kubernetes:

https://github.com/lyft/flinkk8soperator/

We are now discussing how to extend this operator to also support
deployment of Python Beam pipelines with the Flink runner. I would like to
share the proposal with the Beam community to enlist feedback as well as
explore opportunities for collaboration:

https://docs.google.com/document/d/1z3LNrRtr8kkiFHonZ5JJM_L4NWNBBNcqRc_yAf6G0VI/

Looking forward to your comments and suggestions!

Thomas

Re: Python Beam pipelines on Flink on Kubernetes

Posted by Chad Dombrova <ch...@gmail.com>.
Done:  https://github.com/lyft/flinkk8soperator/issues/123



On Fri, Nov 1, 2019 at 5:08 PM Thomas Weise <th...@apache.org> wrote:

> That's a good idea. Probably best to add an example in:
> https://github.com/lyft/flinkk8soperator
>
> Do you want to add an issue?
>
> (It will have to wait for 2.18 release though.)
>
>
> On Fri, Nov 1, 2019 at 11:37 AM Chad Dombrova <ch...@gmail.com> wrote:
>
>> Hi Thomas,
>> Do you have an example Dockerfile demonstrating best practices for
>> building an image that contains both Flink and Beam SDK dependencies?  That
>> would be useful.
>>
>> -chad
>>
>>
>> On Fri, Nov 1, 2019 at 10:18 AM Thomas Weise <th...@apache.org> wrote:
>>
>>> For folks looking to run Beam on Flink on k8s, see update in [1]
>>>
>>> I also updated [2]
>>>
>>> TLDR: at this time best option to run portable pipelines on k8s is to
>>> create container images that have both Flink and the SDK dependencies.
>>>
>>> I'm curious how much interest there is to use the official SDK container
>>> images and keep Flink and portable pipeline separate as far as the image
>>> build goes? Deployment can be achieved with the sidecar container approach.
>>> Most of mechanics are in place already, one addition would be an
>>> abstraction for where the SDK entry point executes (process, external),
>>> similar to how we have it for the workers.
>>>
>>> Thanks,
>>> Thomas
>>>
>>> [1]
>>> https://lists.apache.org/thread.html/10aada9c200d4300059d02e4baa47dda0cc2c6fc9432f950194a7f5e@%3Cdev.beam.apache.org%3E
>>>
>>> [2]
>>> https://docs.google.com/document/d/1z3LNrRtr8kkiFHonZ5JJM_L4NWNBBNcqRc_yAf6G0VI
>>>
>>> On Wed, Aug 21, 2019 at 9:44 PM Thomas Weise <th...@apache.org> wrote:
>>>
>>>> The changes to containerize the Python SDK worker pool are nearly
>>>> complete. I also updated the document for next implementation steps.
>>>>
>>>> The favored approach (initially targeted option) for pipeline
>>>> submission is support for the (externally created) fat far. It will keep
>>>> changes to the operator to a minimum and is applicable for any other Flink
>>>> job as well.
>>>>
>>>> Regarding fine grained resource scheduling, that would happen within
>>>> the pods scheduled by the k8s operator (or other external orchestration
>>>> tool) or, further down the road, in a completely elastic/dynamic fashion
>>>> with active execution mode (where Flink would request resources directly
>>>> from k8s, similar to how it would work on YARN).
>>>>
>>>>
>>>> On Tue, Aug 13, 2019 at 10:59 AM Chad Dombrova <ch...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi Thomas,
>>>>> Nice work!  It's really clearly presented.
>>>>>
>>>>> What's the current favored approach for pipeline submission?
>>>>>
>>>>> I'm also interested to know how this plan overlaps (if at all) with
>>>>> the work on Fine-Grained Resource Scheduling [1][2] that's being done for
>>>>> Flink 1.9+, which has implications for creation of task managers in
>>>>> kubernetes.
>>>>>
>>>>> [1]
>>>>> https://docs.google.com/document/d/1h68XOG-EyOFfcomd2N7usHK1X429pJSMiwZwAXCwx1k/edit#heading=h.72szmms7ufza
>>>>> [2] https://issues.apache.org/jira/browse/FLINK-12761
>>>>>
>>>>> -chad
>>>>>
>>>>>
>>>>> On Tue, Aug 13, 2019 at 9:14 AM Thomas Weise <th...@apache.org> wrote:
>>>>>
>>>>>> There have been a few comments in the doc and I'm going to start
>>>>>> working on the worker execution part.
>>>>>>
>>>>>> Instead of running a Docker container for each worker, the preferred
>>>>>> option is to use the worker pool:
>>>>>>
>>>>>>
>>>>>> https://docs.google.com/document/d/1z3LNrRtr8kkiFHonZ5JJM_L4NWNBBNcqRc_yAf6G0VI/edit#heading=h.bzs6y6ms0898
>>>>>>
>>>>>> There are some notes at the end of the doc regarding the
>>>>>> implementation. It would be great if those interested in the environments
>>>>>> and Python docker container could take a look. In a nutshell, the proposal
>>>>>> is to make few changes to the Python container so that it (optionally) can
>>>>>> be used to run the worker pool.
>>>>>>
>>>>>> Thanks,
>>>>>> Thomas
>>>>>>
>>>>>>
>>>>>> On Wed, Jul 24, 2019 at 9:42 PM Pablo Estrada <pa...@google.com>
>>>>>> wrote:
>>>>>>
>>>>>>> I am very happy to see this. I'll take a look, and leave my comments.
>>>>>>>
>>>>>>> I think this is something we'd been needing, and it's great that you
>>>>>>> guys are putting thought into it. Thanks!<3
>>>>>>>
>>>>>>> On Wed, Jul 24, 2019 at 9:01 PM Thomas Weise <th...@apache.org> wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> Recently Lyft open sourced *FlinkK8sOperator,* a Kubernetes
>>>>>>>> operator to manage Flink deployments on Kubernetes:
>>>>>>>>
>>>>>>>> https://github.com/lyft/flinkk8soperator/
>>>>>>>>
>>>>>>>> We are now discussing how to extend this operator to also support
>>>>>>>> deployment of Python Beam pipelines with the Flink runner. I would like to
>>>>>>>> share the proposal with the Beam community to enlist feedback as well as
>>>>>>>> explore opportunities for collaboration:
>>>>>>>>
>>>>>>>>
>>>>>>>> https://docs.google.com/document/d/1z3LNrRtr8kkiFHonZ5JJM_L4NWNBBNcqRc_yAf6G0VI/
>>>>>>>>
>>>>>>>> Looking forward to your comments and suggestions!
>>>>>>>>
>>>>>>>> Thomas
>>>>>>>>
>>>>>>>>

Re: Python Beam pipelines on Flink on Kubernetes

Posted by Thomas Weise <th...@apache.org>.
That's a good idea. Probably best to add an example in:
https://github.com/lyft/flinkk8soperator

Do you want to add an issue?

(It will have to wait for 2.18 release though.)


On Fri, Nov 1, 2019 at 11:37 AM Chad Dombrova <ch...@gmail.com> wrote:

> Hi Thomas,
> Do you have an example Dockerfile demonstrating best practices for
> building an image that contains both Flink and Beam SDK dependencies?  That
> would be useful.
>
> -chad
>
>
> On Fri, Nov 1, 2019 at 10:18 AM Thomas Weise <th...@apache.org> wrote:
>
>> For folks looking to run Beam on Flink on k8s, see update in [1]
>>
>> I also updated [2]
>>
>> TLDR: at this time best option to run portable pipelines on k8s is to
>> create container images that have both Flink and the SDK dependencies.
>>
>> I'm curious how much interest there is to use the official SDK container
>> images and keep Flink and portable pipeline separate as far as the image
>> build goes? Deployment can be achieved with the sidecar container approach.
>> Most of mechanics are in place already, one addition would be an
>> abstraction for where the SDK entry point executes (process, external),
>> similar to how we have it for the workers.
>>
>> Thanks,
>> Thomas
>>
>> [1]
>> https://lists.apache.org/thread.html/10aada9c200d4300059d02e4baa47dda0cc2c6fc9432f950194a7f5e@%3Cdev.beam.apache.org%3E
>>
>> [2]
>> https://docs.google.com/document/d/1z3LNrRtr8kkiFHonZ5JJM_L4NWNBBNcqRc_yAf6G0VI
>>
>> On Wed, Aug 21, 2019 at 9:44 PM Thomas Weise <th...@apache.org> wrote:
>>
>>> The changes to containerize the Python SDK worker pool are nearly
>>> complete. I also updated the document for next implementation steps.
>>>
>>> The favored approach (initially targeted option) for pipeline submission
>>> is support for the (externally created) fat far. It will keep changes to
>>> the operator to a minimum and is applicable for any other Flink job as well.
>>>
>>> Regarding fine grained resource scheduling, that would happen within the
>>> pods scheduled by the k8s operator (or other external orchestration tool)
>>> or, further down the road, in a completely elastic/dynamic fashion with
>>> active execution mode (where Flink would request resources directly from
>>> k8s, similar to how it would work on YARN).
>>>
>>>
>>> On Tue, Aug 13, 2019 at 10:59 AM Chad Dombrova <ch...@gmail.com>
>>> wrote:
>>>
>>>> Hi Thomas,
>>>> Nice work!  It's really clearly presented.
>>>>
>>>> What's the current favored approach for pipeline submission?
>>>>
>>>> I'm also interested to know how this plan overlaps (if at all) with the
>>>> work on Fine-Grained Resource Scheduling [1][2] that's being done for Flink
>>>> 1.9+, which has implications for creation of task managers in kubernetes.
>>>>
>>>> [1]
>>>> https://docs.google.com/document/d/1h68XOG-EyOFfcomd2N7usHK1X429pJSMiwZwAXCwx1k/edit#heading=h.72szmms7ufza
>>>> [2] https://issues.apache.org/jira/browse/FLINK-12761
>>>>
>>>> -chad
>>>>
>>>>
>>>> On Tue, Aug 13, 2019 at 9:14 AM Thomas Weise <th...@apache.org> wrote:
>>>>
>>>>> There have been a few comments in the doc and I'm going to start
>>>>> working on the worker execution part.
>>>>>
>>>>> Instead of running a Docker container for each worker, the preferred
>>>>> option is to use the worker pool:
>>>>>
>>>>>
>>>>> https://docs.google.com/document/d/1z3LNrRtr8kkiFHonZ5JJM_L4NWNBBNcqRc_yAf6G0VI/edit#heading=h.bzs6y6ms0898
>>>>>
>>>>> There are some notes at the end of the doc regarding the
>>>>> implementation. It would be great if those interested in the environments
>>>>> and Python docker container could take a look. In a nutshell, the proposal
>>>>> is to make few changes to the Python container so that it (optionally) can
>>>>> be used to run the worker pool.
>>>>>
>>>>> Thanks,
>>>>> Thomas
>>>>>
>>>>>
>>>>> On Wed, Jul 24, 2019 at 9:42 PM Pablo Estrada <pa...@google.com>
>>>>> wrote:
>>>>>
>>>>>> I am very happy to see this. I'll take a look, and leave my comments.
>>>>>>
>>>>>> I think this is something we'd been needing, and it's great that you
>>>>>> guys are putting thought into it. Thanks!<3
>>>>>>
>>>>>> On Wed, Jul 24, 2019 at 9:01 PM Thomas Weise <th...@apache.org> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> Recently Lyft open sourced *FlinkK8sOperator,* a Kubernetes
>>>>>>> operator to manage Flink deployments on Kubernetes:
>>>>>>>
>>>>>>> https://github.com/lyft/flinkk8soperator/
>>>>>>>
>>>>>>> We are now discussing how to extend this operator to also support
>>>>>>> deployment of Python Beam pipelines with the Flink runner. I would like to
>>>>>>> share the proposal with the Beam community to enlist feedback as well as
>>>>>>> explore opportunities for collaboration:
>>>>>>>
>>>>>>>
>>>>>>> https://docs.google.com/document/d/1z3LNrRtr8kkiFHonZ5JJM_L4NWNBBNcqRc_yAf6G0VI/
>>>>>>>
>>>>>>> Looking forward to your comments and suggestions!
>>>>>>>
>>>>>>> Thomas
>>>>>>>
>>>>>>>

Re: Python Beam pipelines on Flink on Kubernetes

Posted by Chad Dombrova <ch...@gmail.com>.
Hi Thomas,
Do you have an example Dockerfile demonstrating best practices for building
an image that contains both Flink and Beam SDK dependencies?  That would be
useful.

-chad


On Fri, Nov 1, 2019 at 10:18 AM Thomas Weise <th...@apache.org> wrote:

> For folks looking to run Beam on Flink on k8s, see update in [1]
>
> I also updated [2]
>
> TLDR: at this time best option to run portable pipelines on k8s is to
> create container images that have both Flink and the SDK dependencies.
>
> I'm curious how much interest there is to use the official SDK container
> images and keep Flink and portable pipeline separate as far as the image
> build goes? Deployment can be achieved with the sidecar container approach.
> Most of mechanics are in place already, one addition would be an
> abstraction for where the SDK entry point executes (process, external),
> similar to how we have it for the workers.
>
> Thanks,
> Thomas
>
> [1]
> https://lists.apache.org/thread.html/10aada9c200d4300059d02e4baa47dda0cc2c6fc9432f950194a7f5e@%3Cdev.beam.apache.org%3E
>
> [2]
> https://docs.google.com/document/d/1z3LNrRtr8kkiFHonZ5JJM_L4NWNBBNcqRc_yAf6G0VI
>
> On Wed, Aug 21, 2019 at 9:44 PM Thomas Weise <th...@apache.org> wrote:
>
>> The changes to containerize the Python SDK worker pool are nearly
>> complete. I also updated the document for next implementation steps.
>>
>> The favored approach (initially targeted option) for pipeline submission
>> is support for the (externally created) fat far. It will keep changes to
>> the operator to a minimum and is applicable for any other Flink job as well.
>>
>> Regarding fine grained resource scheduling, that would happen within the
>> pods scheduled by the k8s operator (or other external orchestration tool)
>> or, further down the road, in a completely elastic/dynamic fashion with
>> active execution mode (where Flink would request resources directly from
>> k8s, similar to how it would work on YARN).
>>
>>
>> On Tue, Aug 13, 2019 at 10:59 AM Chad Dombrova <ch...@gmail.com> wrote:
>>
>>> Hi Thomas,
>>> Nice work!  It's really clearly presented.
>>>
>>> What's the current favored approach for pipeline submission?
>>>
>>> I'm also interested to know how this plan overlaps (if at all) with the
>>> work on Fine-Grained Resource Scheduling [1][2] that's being done for Flink
>>> 1.9+, which has implications for creation of task managers in kubernetes.
>>>
>>> [1]
>>> https://docs.google.com/document/d/1h68XOG-EyOFfcomd2N7usHK1X429pJSMiwZwAXCwx1k/edit#heading=h.72szmms7ufza
>>> [2] https://issues.apache.org/jira/browse/FLINK-12761
>>>
>>> -chad
>>>
>>>
>>> On Tue, Aug 13, 2019 at 9:14 AM Thomas Weise <th...@apache.org> wrote:
>>>
>>>> There have been a few comments in the doc and I'm going to start
>>>> working on the worker execution part.
>>>>
>>>> Instead of running a Docker container for each worker, the preferred
>>>> option is to use the worker pool:
>>>>
>>>>
>>>> https://docs.google.com/document/d/1z3LNrRtr8kkiFHonZ5JJM_L4NWNBBNcqRc_yAf6G0VI/edit#heading=h.bzs6y6ms0898
>>>>
>>>> There are some notes at the end of the doc regarding the
>>>> implementation. It would be great if those interested in the environments
>>>> and Python docker container could take a look. In a nutshell, the proposal
>>>> is to make few changes to the Python container so that it (optionally) can
>>>> be used to run the worker pool.
>>>>
>>>> Thanks,
>>>> Thomas
>>>>
>>>>
>>>> On Wed, Jul 24, 2019 at 9:42 PM Pablo Estrada <pa...@google.com>
>>>> wrote:
>>>>
>>>>> I am very happy to see this. I'll take a look, and leave my comments.
>>>>>
>>>>> I think this is something we'd been needing, and it's great that you
>>>>> guys are putting thought into it. Thanks!<3
>>>>>
>>>>> On Wed, Jul 24, 2019 at 9:01 PM Thomas Weise <th...@apache.org> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> Recently Lyft open sourced *FlinkK8sOperator,* a Kubernetes operator
>>>>>> to manage Flink deployments on Kubernetes:
>>>>>>
>>>>>> https://github.com/lyft/flinkk8soperator/
>>>>>>
>>>>>> We are now discussing how to extend this operator to also support
>>>>>> deployment of Python Beam pipelines with the Flink runner. I would like to
>>>>>> share the proposal with the Beam community to enlist feedback as well as
>>>>>> explore opportunities for collaboration:
>>>>>>
>>>>>>
>>>>>> https://docs.google.com/document/d/1z3LNrRtr8kkiFHonZ5JJM_L4NWNBBNcqRc_yAf6G0VI/
>>>>>>
>>>>>> Looking forward to your comments and suggestions!
>>>>>>
>>>>>> Thomas
>>>>>>
>>>>>>

Re: Python Beam pipelines on Flink on Kubernetes

Posted by Thomas Weise <th...@apache.org>.
For folks looking to run Beam on Flink on k8s, see update in [1]

I also updated [2]

TLDR: at this time best option to run portable pipelines on k8s is to
create container images that have both Flink and the SDK dependencies.

I'm curious how much interest there is to use the official SDK container
images and keep Flink and portable pipeline separate as far as the image
build goes? Deployment can be achieved with the sidecar container approach.
Most of mechanics are in place already, one addition would be an
abstraction for where the SDK entry point executes (process, external),
similar to how we have it for the workers.

Thanks,
Thomas

[1]
https://lists.apache.org/thread.html/10aada9c200d4300059d02e4baa47dda0cc2c6fc9432f950194a7f5e@%3Cdev.beam.apache.org%3E

[2]
https://docs.google.com/document/d/1z3LNrRtr8kkiFHonZ5JJM_L4NWNBBNcqRc_yAf6G0VI

On Wed, Aug 21, 2019 at 9:44 PM Thomas Weise <th...@apache.org> wrote:

> The changes to containerize the Python SDK worker pool are nearly
> complete. I also updated the document for next implementation steps.
>
> The favored approach (initially targeted option) for pipeline submission
> is support for the (externally created) fat far. It will keep changes to
> the operator to a minimum and is applicable for any other Flink job as well.
>
> Regarding fine grained resource scheduling, that would happen within the
> pods scheduled by the k8s operator (or other external orchestration tool)
> or, further down the road, in a completely elastic/dynamic fashion with
> active execution mode (where Flink would request resources directly from
> k8s, similar to how it would work on YARN).
>
>
> On Tue, Aug 13, 2019 at 10:59 AM Chad Dombrova <ch...@gmail.com> wrote:
>
>> Hi Thomas,
>> Nice work!  It's really clearly presented.
>>
>> What's the current favored approach for pipeline submission?
>>
>> I'm also interested to know how this plan overlaps (if at all) with the
>> work on Fine-Grained Resource Scheduling [1][2] that's being done for Flink
>> 1.9+, which has implications for creation of task managers in kubernetes.
>>
>> [1]
>> https://docs.google.com/document/d/1h68XOG-EyOFfcomd2N7usHK1X429pJSMiwZwAXCwx1k/edit#heading=h.72szmms7ufza
>> [2] https://issues.apache.org/jira/browse/FLINK-12761
>>
>> -chad
>>
>>
>> On Tue, Aug 13, 2019 at 9:14 AM Thomas Weise <th...@apache.org> wrote:
>>
>>> There have been a few comments in the doc and I'm going to start working
>>> on the worker execution part.
>>>
>>> Instead of running a Docker container for each worker, the preferred
>>> option is to use the worker pool:
>>>
>>>
>>> https://docs.google.com/document/d/1z3LNrRtr8kkiFHonZ5JJM_L4NWNBBNcqRc_yAf6G0VI/edit#heading=h.bzs6y6ms0898
>>>
>>> There are some notes at the end of the doc regarding the implementation.
>>> It would be great if those interested in the environments and Python docker
>>> container could take a look. In a nutshell, the proposal is to make few
>>> changes to the Python container so that it (optionally) can be used to run
>>> the worker pool.
>>>
>>> Thanks,
>>> Thomas
>>>
>>>
>>> On Wed, Jul 24, 2019 at 9:42 PM Pablo Estrada <pa...@google.com>
>>> wrote:
>>>
>>>> I am very happy to see this. I'll take a look, and leave my comments.
>>>>
>>>> I think this is something we'd been needing, and it's great that you
>>>> guys are putting thought into it. Thanks!<3
>>>>
>>>> On Wed, Jul 24, 2019 at 9:01 PM Thomas Weise <th...@apache.org> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> Recently Lyft open sourced *FlinkK8sOperator,* a Kubernetes operator
>>>>> to manage Flink deployments on Kubernetes:
>>>>>
>>>>> https://github.com/lyft/flinkk8soperator/
>>>>>
>>>>> We are now discussing how to extend this operator to also support
>>>>> deployment of Python Beam pipelines with the Flink runner. I would like to
>>>>> share the proposal with the Beam community to enlist feedback as well as
>>>>> explore opportunities for collaboration:
>>>>>
>>>>>
>>>>> https://docs.google.com/document/d/1z3LNrRtr8kkiFHonZ5JJM_L4NWNBBNcqRc_yAf6G0VI/
>>>>>
>>>>> Looking forward to your comments and suggestions!
>>>>>
>>>>> Thomas
>>>>>
>>>>>

Re: Python Beam pipelines on Flink on Kubernetes

Posted by Thomas Weise <th...@apache.org>.
The changes to containerize the Python SDK worker pool are nearly complete.
I also updated the document for next implementation steps.

The favored approach (initially targeted option) for pipeline submission is
support for the (externally created) fat far. It will keep changes to the
operator to a minimum and is applicable for any other Flink job as well.

Regarding fine grained resource scheduling, that would happen within the
pods scheduled by the k8s operator (or other external orchestration tool)
or, further down the road, in a completely elastic/dynamic fashion with
active execution mode (where Flink would request resources directly from
k8s, similar to how it would work on YARN).


On Tue, Aug 13, 2019 at 10:59 AM Chad Dombrova <ch...@gmail.com> wrote:

> Hi Thomas,
> Nice work!  It's really clearly presented.
>
> What's the current favored approach for pipeline submission?
>
> I'm also interested to know how this plan overlaps (if at all) with the
> work on Fine-Grained Resource Scheduling [1][2] that's being done for Flink
> 1.9+, which has implications for creation of task managers in kubernetes.
>
> [1]
> https://docs.google.com/document/d/1h68XOG-EyOFfcomd2N7usHK1X429pJSMiwZwAXCwx1k/edit#heading=h.72szmms7ufza
> [2] https://issues.apache.org/jira/browse/FLINK-12761
>
> -chad
>
>
> On Tue, Aug 13, 2019 at 9:14 AM Thomas Weise <th...@apache.org> wrote:
>
>> There have been a few comments in the doc and I'm going to start working
>> on the worker execution part.
>>
>> Instead of running a Docker container for each worker, the preferred
>> option is to use the worker pool:
>>
>>
>> https://docs.google.com/document/d/1z3LNrRtr8kkiFHonZ5JJM_L4NWNBBNcqRc_yAf6G0VI/edit#heading=h.bzs6y6ms0898
>>
>> There are some notes at the end of the doc regarding the implementation.
>> It would be great if those interested in the environments and Python docker
>> container could take a look. In a nutshell, the proposal is to make few
>> changes to the Python container so that it (optionally) can be used to run
>> the worker pool.
>>
>> Thanks,
>> Thomas
>>
>>
>> On Wed, Jul 24, 2019 at 9:42 PM Pablo Estrada <pa...@google.com> wrote:
>>
>>> I am very happy to see this. I'll take a look, and leave my comments.
>>>
>>> I think this is something we'd been needing, and it's great that you
>>> guys are putting thought into it. Thanks!<3
>>>
>>> On Wed, Jul 24, 2019 at 9:01 PM Thomas Weise <th...@apache.org> wrote:
>>>
>>>> Hi,
>>>>
>>>> Recently Lyft open sourced *FlinkK8sOperator,* a Kubernetes operator
>>>> to manage Flink deployments on Kubernetes:
>>>>
>>>> https://github.com/lyft/flinkk8soperator/
>>>>
>>>> We are now discussing how to extend this operator to also support
>>>> deployment of Python Beam pipelines with the Flink runner. I would like to
>>>> share the proposal with the Beam community to enlist feedback as well as
>>>> explore opportunities for collaboration:
>>>>
>>>>
>>>> https://docs.google.com/document/d/1z3LNrRtr8kkiFHonZ5JJM_L4NWNBBNcqRc_yAf6G0VI/
>>>>
>>>> Looking forward to your comments and suggestions!
>>>>
>>>> Thomas
>>>>
>>>>

Re: Python Beam pipelines on Flink on Kubernetes

Posted by Chad Dombrova <ch...@gmail.com>.
Hi Thomas,
Nice work!  It's really clearly presented.

What's the current favored approach for pipeline submission?

I'm also interested to know how this plan overlaps (if at all) with the
work on Fine-Grained Resource Scheduling [1][2] that's being done for Flink
1.9+, which has implications for creation of task managers in kubernetes.

[1]
https://docs.google.com/document/d/1h68XOG-EyOFfcomd2N7usHK1X429pJSMiwZwAXCwx1k/edit#heading=h.72szmms7ufza
[2] https://issues.apache.org/jira/browse/FLINK-12761

-chad


On Tue, Aug 13, 2019 at 9:14 AM Thomas Weise <th...@apache.org> wrote:

> There have been a few comments in the doc and I'm going to start working
> on the worker execution part.
>
> Instead of running a Docker container for each worker, the preferred
> option is to use the worker pool:
>
>
> https://docs.google.com/document/d/1z3LNrRtr8kkiFHonZ5JJM_L4NWNBBNcqRc_yAf6G0VI/edit#heading=h.bzs6y6ms0898
>
> There are some notes at the end of the doc regarding the implementation.
> It would be great if those interested in the environments and Python docker
> container could take a look. In a nutshell, the proposal is to make few
> changes to the Python container so that it (optionally) can be used to run
> the worker pool.
>
> Thanks,
> Thomas
>
>
> On Wed, Jul 24, 2019 at 9:42 PM Pablo Estrada <pa...@google.com> wrote:
>
>> I am very happy to see this. I'll take a look, and leave my comments.
>>
>> I think this is something we'd been needing, and it's great that you guys
>> are putting thought into it. Thanks!<3
>>
>> On Wed, Jul 24, 2019 at 9:01 PM Thomas Weise <th...@apache.org> wrote:
>>
>>> Hi,
>>>
>>> Recently Lyft open sourced *FlinkK8sOperator,* a Kubernetes operator to
>>> manage Flink deployments on Kubernetes:
>>>
>>> https://github.com/lyft/flinkk8soperator/
>>>
>>> We are now discussing how to extend this operator to also support
>>> deployment of Python Beam pipelines with the Flink runner. I would like to
>>> share the proposal with the Beam community to enlist feedback as well as
>>> explore opportunities for collaboration:
>>>
>>>
>>> https://docs.google.com/document/d/1z3LNrRtr8kkiFHonZ5JJM_L4NWNBBNcqRc_yAf6G0VI/
>>>
>>> Looking forward to your comments and suggestions!
>>>
>>> Thomas
>>>
>>>

Re: Python Beam pipelines on Flink on Kubernetes

Posted by Thomas Weise <th...@apache.org>.
There have been a few comments in the doc and I'm going to start working on
the worker execution part.

Instead of running a Docker container for each worker, the preferred option
is to use the worker pool:

https://docs.google.com/document/d/1z3LNrRtr8kkiFHonZ5JJM_L4NWNBBNcqRc_yAf6G0VI/edit#heading=h.bzs6y6ms0898

There are some notes at the end of the doc regarding the implementation. It
would be great if those interested in the environments and Python docker
container could take a look. In a nutshell, the proposal is to make few
changes to the Python container so that it (optionally) can be used to run
the worker pool.

Thanks,
Thomas


On Wed, Jul 24, 2019 at 9:42 PM Pablo Estrada <pa...@google.com> wrote:

> I am very happy to see this. I'll take a look, and leave my comments.
>
> I think this is something we'd been needing, and it's great that you guys
> are putting thought into it. Thanks!<3
>
> On Wed, Jul 24, 2019 at 9:01 PM Thomas Weise <th...@apache.org> wrote:
>
>> Hi,
>>
>> Recently Lyft open sourced *FlinkK8sOperator,* a Kubernetes operator to
>> manage Flink deployments on Kubernetes:
>>
>> https://github.com/lyft/flinkk8soperator/
>>
>> We are now discussing how to extend this operator to also support
>> deployment of Python Beam pipelines with the Flink runner. I would like to
>> share the proposal with the Beam community to enlist feedback as well as
>> explore opportunities for collaboration:
>>
>>
>> https://docs.google.com/document/d/1z3LNrRtr8kkiFHonZ5JJM_L4NWNBBNcqRc_yAf6G0VI/
>>
>> Looking forward to your comments and suggestions!
>>
>> Thomas
>>
>>

Re: Python Beam pipelines on Flink on Kubernetes

Posted by Pablo Estrada <pa...@google.com>.
I am very happy to see this. I'll take a look, and leave my comments.

I think this is something we'd been needing, and it's great that you guys
are putting thought into it. Thanks!<3

On Wed, Jul 24, 2019 at 9:01 PM Thomas Weise <th...@apache.org> wrote:

> Hi,
>
> Recently Lyft open sourced *FlinkK8sOperator,* a Kubernetes operator to
> manage Flink deployments on Kubernetes:
>
> https://github.com/lyft/flinkk8soperator/
>
> We are now discussing how to extend this operator to also support
> deployment of Python Beam pipelines with the Flink runner. I would like to
> share the proposal with the Beam community to enlist feedback as well as
> explore opportunities for collaboration:
>
>
> https://docs.google.com/document/d/1z3LNrRtr8kkiFHonZ5JJM_L4NWNBBNcqRc_yAf6G0VI/
>
> Looking forward to your comments and suggestions!
>
> Thomas
>
>