You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@beam.apache.org by Xiao Ma <xi...@geotab.com> on 2022/10/08 01:24:47 UTC

Java + Python Xlang pipeline

Hello,

I would like to run a pipeline with Java as the main language and python
transformation embedded. The beam pipeline is running on the flink cluster.
Currently, I can run it with a taskmanager + java worker pool and a python
worker pool. Could I ask if there is a way to run the java code on the task
manager directly and keep the python worker pool?

Current: taskmanager + java worker pool + python worker pool
Desired: taskmanager + python worker pool

Thank you very much.

*Mark Ma*

Re: Java + Python Xlang pipeline

Posted by Xiao Ma <xi...@geotab.com>.
Thank you very much for the option provided. Just checked the stackoverflow
settings and tried at our end. It can be an option for us.

Thank you.
Best,
*Xiao Ma*

On Tue, Oct 11, 2022 at 7:41 PM Lydian <ly...@gmail.com> wrote:

> You should be able to provide extra_args to the expansion service:
> ```
>
> --defaultEnvironmentType=PROCESS
> --defaultEnvironmentConfig={"command":"/opt/apache/beam_java/boot"}
>
> ```
>
> I am also running the Xlang pipeline in flink k8s cluster. After setting
> the defaultEnvironmentType to PROCESS, I don't need to use DinD or DooD at
> all.
> You can also find my full settings in:
> https://stackoverflow.com/a/74035035/19259600
>
>
>
> Sincerely,
> Lydian Lee
>
>
>
> On Tue, Oct 11, 2022 at 11:34 AM Xiao Ma <xi...@geotab.com> wrote:
>
>> The worker pool means `starting a java or python sdk`, to accept the java
>> or python pipeline running. For example, to execute python pipeline, we
>> have to start a python worker pool with `--worker_pool` arguments. For the
>> Java code, besides the docker mode (default one), do we have other better
>> ways to start a java worker pool?
>>
>> For now, our flink cluster is running on the k8s. If we choose the
>> default sdk harness mode (docker), we will have the docker (java sdk
>> harness) in docker (flink-taskmanager). So, what we are doing is to call
>> org.apache.beam.fn.harness.ExternalWorkerService class with pipeline
>> options as environment variables and fixed two small issues in the
>> FnHarness class to make sure the java sdk harness can run smoothly.
>>
>> Thank you.
>> *Mark Ma*
>>
>>
>> On Tue, Oct 11, 2022 at 12:46 PM Alexey Romanenko <
>> aromanenko.dev@gmail.com> wrote:
>>
>>> I’m not sure that I get it correctly. What do you mean by “worker pool”
>>> in your case?
>>>
>>> —
>>> Alexey
>>>
>>> On 8 Oct 2022, at 03:24, Xiao Ma <xi...@geotab.com> wrote:
>>>
>>> Hello,
>>>
>>> I would like to run a pipeline with Java as the main language and python
>>> transformation embedded. The beam pipeline is running on the flink cluster.
>>> Currently, I can run it with a taskmanager + java worker pool and a python
>>> worker pool. Could I ask if there is a way to run the java code on the task
>>> manager directly and keep the python worker pool?
>>>
>>> Current: taskmanager + java worker pool + python worker pool
>>> Desired: taskmanager + python worker pool
>>>
>>> Thank you very much.
>>>
>>> *Mark Ma*
>>>
>>>
>>>

Re: Java + Python Xlang pipeline

Posted by Lydian <ly...@gmail.com>.
You should be able to provide extra_args to the expansion service:
```

--defaultEnvironmentType=PROCESS
--defaultEnvironmentConfig={"command":"/opt/apache/beam_java/boot"}

```

I am also running the Xlang pipeline in flink k8s cluster. After setting
the defaultEnvironmentType to PROCESS, I don't need to use DinD or DooD at
all.
You can also find my full settings in:
https://stackoverflow.com/a/74035035/19259600



Sincerely,
Lydian Lee



On Tue, Oct 11, 2022 at 11:34 AM Xiao Ma <xi...@geotab.com> wrote:

> The worker pool means `starting a java or python sdk`, to accept the java
> or python pipeline running. For example, to execute python pipeline, we
> have to start a python worker pool with `--worker_pool` arguments. For the
> Java code, besides the docker mode (default one), do we have other better
> ways to start a java worker pool?
>
> For now, our flink cluster is running on the k8s. If we choose the default
> sdk harness mode (docker), we will have the docker (java sdk harness) in
> docker (flink-taskmanager). So, what we are doing is to call
> org.apache.beam.fn.harness.ExternalWorkerService class with pipeline
> options as environment variables and fixed two small issues in the
> FnHarness class to make sure the java sdk harness can run smoothly.
>
> Thank you.
> *Mark Ma*
>
>
> On Tue, Oct 11, 2022 at 12:46 PM Alexey Romanenko <
> aromanenko.dev@gmail.com> wrote:
>
>> I’m not sure that I get it correctly. What do you mean by “worker pool”
>> in your case?
>>
>> —
>> Alexey
>>
>> On 8 Oct 2022, at 03:24, Xiao Ma <xi...@geotab.com> wrote:
>>
>> Hello,
>>
>> I would like to run a pipeline with Java as the main language and python
>> transformation embedded. The beam pipeline is running on the flink cluster.
>> Currently, I can run it with a taskmanager + java worker pool and a python
>> worker pool. Could I ask if there is a way to run the java code on the task
>> manager directly and keep the python worker pool?
>>
>> Current: taskmanager + java worker pool + python worker pool
>> Desired: taskmanager + python worker pool
>>
>> Thank you very much.
>>
>> *Mark Ma*
>>
>>
>>

Re: Java + Python Xlang pipeline

Posted by Xiao Ma <xi...@geotab.com>.
The worker pool means `starting a java or python sdk`, to accept the java
or python pipeline running. For example, to execute python pipeline, we
have to start a python worker pool with `--worker_pool` arguments. For the
Java code, besides the docker mode (default one), do we have other better
ways to start a java worker pool?

For now, our flink cluster is running on the k8s. If we choose the default
sdk harness mode (docker), we will have the docker (java sdk harness) in
docker (flink-taskmanager). So, what we are doing is to call
org.apache.beam.fn.harness.ExternalWorkerService class with pipeline
options as environment variables and fixed two small issues in the
FnHarness class to make sure the java sdk harness can run smoothly.

Thank you.
*Mark Ma*


On Tue, Oct 11, 2022 at 12:46 PM Alexey Romanenko <ar...@gmail.com>
wrote:

> I’m not sure that I get it correctly. What do you mean by “worker pool” in
> your case?
>
> —
> Alexey
>
> On 8 Oct 2022, at 03:24, Xiao Ma <xi...@geotab.com> wrote:
>
> Hello,
>
> I would like to run a pipeline with Java as the main language and python
> transformation embedded. The beam pipeline is running on the flink cluster.
> Currently, I can run it with a taskmanager + java worker pool and a python
> worker pool. Could I ask if there is a way to run the java code on the task
> manager directly and keep the python worker pool?
>
> Current: taskmanager + java worker pool + python worker pool
> Desired: taskmanager + python worker pool
>
> Thank you very much.
>
> *Mark Ma*
>
>
>

Re: Java + Python Xlang pipeline

Posted by Alexey Romanenko <ar...@gmail.com>.
I’m not sure that I get it correctly. What do you mean by “worker pool” in your case?

—
Alexey

> On 8 Oct 2022, at 03:24, Xiao Ma <xi...@geotab.com> wrote:
> 
> Hello, 
> 
> I would like to run a pipeline with Java as the main language and python transformation embedded. The beam pipeline is running on the flink cluster. Currently, I can run it with a taskmanager + java worker pool and a python worker pool. Could I ask if there is a way to run the java code on the task manager directly and keep the python worker pool?
> 
> Current: taskmanager + java worker pool + python worker pool
> Desired: taskmanager + python worker pool
> 
> Thank you very much.
> 
> Mark Ma
>