You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@beam.apache.org by Jaehyeon Kim <do...@gmail.com> on 2024/02/22 23:48:07 UTC

Beam portable runner setup for Flink + Python on Kubernetes

Hello,

I'm playing with the beam portable runner to read/write data from Kafka. I
see a spark runner example on Kubernetes (
https://beam.apache.org/documentation/runners/spark/#kubernetes) but the
flink runner section doesn't include such an example.

Is there a resource that I can learn? Ideally it'll be good if it is
updated in the documentation.

Cheers,
Jaehyeon

Re: Beam portable runner setup for Flink + Python on Kubernetes

Posted by Sam Bourne <sa...@gmail.com>.

Hey Jaehyeon,

Docker is the default environment type
<https://github.com/apache/beam/blob/ae8bbf86c9c5951b2685b8400d6ae3fefe678a9a/sdks/python/apache_beam/options/pipeline_options.py#L1481>
when using the PortableRunner. I included them just for reference because
we found it useful to override the default sdk container with our own.

It is pretty complicated, especially to debug sometimes, but we had some
good success running some simple pipelines in production for around a year.
I was more wary about maintaining my own Flink cluster so eventually we
decided to shed the technical debt and pay for Dataflow. Runners already
rely on docker to support the portability framework
<https://beam.apache.org/roadmap/portability/> so I don't think that is
much of a concern.

On Thu, Feb 22, 2024 at 7:49 PM Jaehyeon Kim <do...@gmail.com> wrote:

> Hi Sam
>
> Thanks for the GitHub repo link. In your example, the environment type is
> set to DOCKER and it requires a docker container running together with the
> task manager. Would you think it is acceptable in a production environment?
>
> Cheers,
> Jaehyeon
>
> On Fri, 23 Feb 2024 at 13:57, Sam Bourne <sa...@gmail.com> wrote:
>
>> I made this a few years ago to help people like yourself.
>>
>> https://github.com/sambvfx/beam-flink-k8s
>>
>> Hopefully it's insightful and I'm happy to accept any MRs to update any
>> outdated information or to flesh it out more.
>>
>> On Thu, Feb 22, 2024 at 3:48 PM Jaehyeon Kim <do...@gmail.com> wrote:
>>
>>> Hello,
>>>
>>> I'm playing with the beam portable runner to read/write data from Kafka.
>>> I see a spark runner example on Kubernetes (
>>> https://beam.apache.org/documentation/runners/spark/#kubernetes) but
>>> the flink runner section doesn't include such an example.
>>>
>>> Is there a resource that I can learn? Ideally it'll be good if it is
>>> updated in the documentation.
>>>
>>> Cheers,
>>> Jaehyeon
>>>
>>

Re: Beam portable runner setup for Flink + Python on Kubernetes

Posted by Jaehyeon Kim <do...@gmail.com>.

Hi Sam

Thanks for the GitHub repo link. In your example, the environment type is
set to DOCKER and it requires a docker container running together with the
task manager. Would you think it is acceptable in a production environment?

Cheers,
Jaehyeon

On Fri, 23 Feb 2024 at 13:57, Sam Bourne <sa...@gmail.com> wrote:

> I made this a few years ago to help people like yourself.
>
> https://github.com/sambvfx/beam-flink-k8s
>
> Hopefully it's insightful and I'm happy to accept any MRs to update any
> outdated information or to flesh it out more.
>
> On Thu, Feb 22, 2024 at 3:48 PM Jaehyeon Kim <do...@gmail.com> wrote:
>
>> Hello,
>>
>> I'm playing with the beam portable runner to read/write data from Kafka.
>> I see a spark runner example on Kubernetes (
>> https://beam.apache.org/documentation/runners/spark/#kubernetes) but the
>> flink runner section doesn't include such an example.
>>
>> Is there a resource that I can learn? Ideally it'll be good if it is
>> updated in the documentation.
>>
>> Cheers,
>> Jaehyeon
>>
>

Re: Beam portable runner setup for Flink + Python on Kubernetes

Posted by Sam Bourne <sa...@gmail.com>.

I made this a few years ago to help people like yourself.

https://github.com/sambvfx/beam-flink-k8s

Hopefully it's insightful and I'm happy to accept any MRs to update any
outdated information or to flesh it out more.

On Thu, Feb 22, 2024 at 3:48 PM Jaehyeon Kim <do...@gmail.com> wrote:

> Hello,
>
> I'm playing with the beam portable runner to read/write data from Kafka. I
> see a spark runner example on Kubernetes (
> https://beam.apache.org/documentation/runners/spark/#kubernetes) but the
> flink runner section doesn't include such an example.
>
> Is there a resource that I can learn? Ideally it'll be good if it is
> updated in the documentation.
>
> Cheers,
> Jaehyeon
>

Re: Beam portable runner setup for Flink + Python on Kubernetes

Posted by Jan Lukavský <je...@seznam.cz>.

Hi,

I have set up such configuration for local environment (minikube), that 
can be found at [1] and [2]. It is somewhat older, but it might serve as 
an inspiration. If you would like write up your solution to the 
documentation, that would be awesome, I'd be happy to review it. :)

Best,
  Jan

[1] 
https://github.com/PacktPublishing/Building-Big-Data-Pipelines-with-Apache-Beam/blob/main/env/manifests/flink.yaml

[2] 
https://github.com/PacktPublishing/Building-Big-Data-Pipelines-with-Apache-Beam/blob/main/env/docker/flink/Dockerfile

On 2/23/24 00:48, Jaehyeon Kim wrote:
> Hello,
>
> I'm playing with the beam portable runner to read/write data from 
> Kafka. I see a spark runner example on Kubernetes 
> (https://beam.apache.org/documentation/runners/spark/#kubernetes) but 
> the flink runner section doesn't include such an example.
>
> Is there a resource that I can learn? Ideally it'll be good if it is 
> updated in the documentation.
>
> Cheers,
> Jaehyeon