You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@beam.apache.org by John Tipper <jo...@hotmail.com> on 2023/07/18 12:10:15 UTC
How to submit Beam Python job onto Kubernetes with Flink runner?
Hi all,
I'm wanting to run a continuous stream processing job using Beam on a Flink runner within Kubernetes. I've been following this tutorial here (https://python.plainenglish.io/apache-beam-flink-cluster-kubernetes-python-a1965f37b7cb) but I'm not sure what the author is referring to when he talks about the "flink master container". I don't understand how I am supposed to submit my Python code into the cluster, when that code is defined within a container image itself.
The Kubernetes Flink cluster architecture looks like this:
* single JobManager, exposes the Flink web UI via a Service and Ingress
* multiple Task Managers, each running 2 containers:
* Flink task manager
* Beam worker pool, which exposes port 50000
The Python code in the example tutorial has Beam configuration which looks like this:
options = PipelineOptions([
"--runner=FlinkRunner",
"--flink_version=1.10",
"--flink_master=localhost:8081",
"--environment_type=EXTERNAL",
"--environment_config=localhost:50000"
])
It's clear that when you run this locally as per the tutorial, it speaks to the Beam worker pool to launch the application. However, if I have a Docker image containing my application code and I want to start this application within Kubernetes, where do I deploy this image in my Kubernetes cluster? Is it as a container within each Task Manager pod (and therefore using localhost:50000 to communicate to Beam)? Or do I create a single pod containing my application code and point that pod at port 50000 of my Task Managers - if so, is the fact that I have multiple Task Managers a problem?
Any pointers to documentation or examples would be really helpful.
Many thanks,
John