You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@beam.apache.org by Lydian <ly...@gmail.com> on 2022/06/07 05:48:42 UTC

How to configure external service for Kafka IO to run the flink job in k8s

Hi Folks,

I am trying to set up the Beam environment to run our Python pipeline which
reads data from Kafka.  According to some previous thread
<https://lists.apache.org/thread/kz47y88t6zr9k4z043mx3wnb9mz5dqpq>, it
seems like the Java SDK harness doesn't work with PROCESS environment_type,
and thus I can only use either Docker or External.  Given that I need to
deploy the job to K8s, and there are some security concerns which prevent
me from using the DinD approach. It seems like my best solution is to start
a sidecar container which starts the Java extension service in Flink Task
manager.  However, I am not sure what's the command to start the java
extension service which supports the approach.

It looks like in the Docker environment, it is trying to call
```
/opt/apache/beam/boot --id=1-2 --provision_endpoint=localhost:33025
```
But the script requires me to provide `id` and `provision_endpoint`, which
I am not sure what I should do if I want to set up an external service.
Wondering if someone can help me with this?

for context, I am using beam 2.38.0 with flink 1.13, the job is deployed to
k8s using lyft/flinkk8soperator <https://github.com/lyft/flinkk8soperator>

Thanks!

Re: How to configure external service for Kafka IO to run the flink job in k8s

Posted by Chamikara Jayalath via user <us...@beam.apache.org>.
Command for starting a customer expansion service is available here:
https://beam.apache.org/documentation/sdks/python-multi-language-pipelines/#choose-an-expansion-service

Also you can customize the environment set by this expansion service in the
expanded transforms using the environementType and environmentConfig
PipelineOptions available here:
https://github.com/apache/beam/blob/fdccad20f2af4f4af84b55529acae4b9d0004a01/sdks/java/core/src/main/java/org/apache/beam/sdk/options/PortablePipelineOptions.java#L54

But the end-to-end execution also depends on whether the runner you are
using supports the environment type/config you specify in the expansion
service.

Thanks,
Cham




On Mon, Jul 18, 2022 at 2:54 PM Ahmet Altay <al...@google.com> wrote:

> Adding a few relevant folks who could help answer this question: @John
> Casey <jo...@google.com> @Chamikara Jayalath <ch...@google.com> @Robert
> Bradshaw <ro...@google.com>
>
> Lydian, if you have any other information please share an update.
>
> Ahmet
>
> On Tue, Jun 7, 2022 at 12:49 AM Lydian <ly...@gmail.com> wrote:
>
>> Hi Folks,
>>
>> I am trying to set up the Beam environment to run our Python pipeline
>> which reads data from Kafka.  According to some previous thread
>> <https://lists.apache.org/thread/kz47y88t6zr9k4z043mx3wnb9mz5dqpq>, it
>> seems like the Java SDK harness doesn't work with PROCESS environment_type,
>> and thus I can only use either Docker or External.  Given that I need to
>> deploy the job to K8s, and there are some security concerns which prevent
>> me from using the DinD approach. It seems like my best solution is to start
>> a sidecar container which starts the Java extension service in Flink Task
>> manager.  However, I am not sure what's the command to start the java
>> extension service which supports the approach.
>>
>> It looks like in the Docker environment, it is trying to call
>> ```
>> /opt/apache/beam/boot --id=1-2 --provision_endpoint=localhost:33025
>> ```
>> But the script requires me to provide `id` and `provision_endpoint`,
>> which I am not sure what I should do if I want to set up an external
>> service. Wondering if someone can help me with this?
>>
>> for context, I am using beam 2.38.0 with flink 1.13, the job is deployed
>> to k8s using lyft/flinkk8soperator
>> <https://github.com/lyft/flinkk8soperator>
>>
>> Thanks!
>>
>

Re: How to configure external service for Kafka IO to run the flink job in k8s

Posted by Ahmet Altay via user <us...@beam.apache.org>.
Adding a few relevant folks who could help answer this question: @John Casey
<jo...@google.com> @Chamikara Jayalath <ch...@google.com> @Robert
Bradshaw <ro...@google.com>

Lydian, if you have any other information please share an update.

Ahmet

On Tue, Jun 7, 2022 at 12:49 AM Lydian <ly...@gmail.com> wrote:

> Hi Folks,
>
> I am trying to set up the Beam environment to run our Python pipeline
> which reads data from Kafka.  According to some previous thread
> <https://lists.apache.org/thread/kz47y88t6zr9k4z043mx3wnb9mz5dqpq>, it
> seems like the Java SDK harness doesn't work with PROCESS environment_type,
> and thus I can only use either Docker or External.  Given that I need to
> deploy the job to K8s, and there are some security concerns which prevent
> me from using the DinD approach. It seems like my best solution is to start
> a sidecar container which starts the Java extension service in Flink Task
> manager.  However, I am not sure what's the command to start the java
> extension service which supports the approach.
>
> It looks like in the Docker environment, it is trying to call
> ```
> /opt/apache/beam/boot --id=1-2 --provision_endpoint=localhost:33025
> ```
> But the script requires me to provide `id` and `provision_endpoint`, which
> I am not sure what I should do if I want to set up an external service.
> Wondering if someone can help me with this?
>
> for context, I am using beam 2.38.0 with flink 1.13, the job is deployed
> to k8s using lyft/flinkk8soperator
> <https://github.com/lyft/flinkk8soperator>
>
> Thanks!
>