You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@beam.apache.org by John Tipper <jo...@hotmail.com> on 2023/07/18 12:10:15 UTC

How to submit Beam Python job onto Kubernetes with Flink runner?

Hi all,

I'm wanting to run a continuous stream processing job using Beam on a Flink runner within Kubernetes. I've been following this tutorial here (https://python.plainenglish.io/apache-beam-flink-cluster-kubernetes-python-a1965f37b7cb) but I'm not sure what the author is referring to when he talks about the "flink master container". I don't understand how I am supposed to submit my Python code into the cluster, when that code is defined within a container image itself.

The Kubernetes Flink cluster architecture looks like this:


  *   single JobManager, exposes the Flink web UI via a Service and Ingress

  *   multiple Task Managers, each running 2 containers:
     *   Flink task manager
     *   Beam worker pool, which exposes port 50000

The Python code in the example tutorial has Beam configuration which looks like this:

    options = PipelineOptions([
        "--runner=FlinkRunner",
        "--flink_version=1.10",
        "--flink_master=localhost:8081",
        "--environment_type=EXTERNAL",
        "--environment_config=localhost:50000"
    ])


It's clear that when you run this locally as per the tutorial, it speaks to the Beam worker pool to launch the application.  However, if I have a Docker image containing my application code and I want to start this application within Kubernetes, where do I deploy this image in my Kubernetes cluster? Is it as a container within each Task Manager pod (and therefore using localhost:50000 to communicate to Beam)? Or do I create a single pod containing my application code and point that pod at port 50000 of my Task Managers - if so, is the fact that I have multiple Task Managers a problem?

Any pointers to documentation or examples would be really helpful.

Many thanks,

John