You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@beam.apache.org by Xiao Ma <xi...@geotab.com> on 2022/10/08 01:28:33 UTC

Java + Python Xlang pipeline

Hello,

I would like to run a pipeline with Java as the main language and python
transformation embedded. The beam pipeline is running on the flink cluster.
Currently, I can run it with a taskmanager + java worker pool and a python
worker pool. Could I ask if there is a way to run the java code on the task
manager directly and keep the python worker pool?

Current: taskmanager + java worker pool + python worker pool
Desired: taskmanager + python worker pool

Thank you very much.

*Mark Ma*

Re: Java + Python Xlang pipeline

Posted by Xiao Ma <xi...@geotab.com>.
Hello,

Thank you for the reply.  Could I know how the java pipelines is running
for most of the time? Most of people run the java + python x
language pipeline with docker in docker mode (if the flink is running on
the K8s)?

Thank you very much.

Best,
*Xiao Ma*

On Mon, Oct 10, 2022 at 12:14 AM Chamikara Jayalath <ch...@google.com>
wrote:

> By default, it will use Docker. You can try to change the default
> environment type using the option [1] but I'm not sure if other environment
> types will work for Flink Java x-lang pipelines.
>
> Thanks,
> Cham
>
> [1]
> https://github.com/apache/beam/blob/b94cff209cc8d1ae61cc916ff6b0b68561dc34c8/sdks/java/core/src/main/java/org/apache/beam/sdk/options/PortablePipelineOptions.java#L52
>
> On Fri, Oct 7, 2022 at 10:26 PM Xiao Ma <xi...@geotab.com> wrote:
>
>> Thank you  very muchfor the reply and  explaination. For the Java beam
>> sdk, can it start as a worker pool, like the Python worker pool with
>> --worker_pool option? Or the Java sdk doesn't have the external environment
>> type, it has to be as docker started?
>>
>> Thank you.
>>
>> Matk
>>
>> On Sat, Oct 8, 2022 at 12:08 AM Chamikara Jayalath via dev <
>> dev@beam.apache.org> wrote:
>>
>>>
>>>
>>> On Fri, Oct 7, 2022 at 6:29 PM Xiao Ma <xi...@geotab.com> wrote:
>>>
>>>> Hello,
>>>>
>>>> I would like to run a pipeline with Java as the main language and
>>>> python transformation embedded. The beam pipeline is running on the flink
>>>> cluster. Currently, I can run it with a taskmanager + java worker pool and
>>>> a python worker pool. Could I ask if there is a way to run the java code on
>>>> the task manager directly and keep the python worker pool?
>>>>
>>>> Current: taskmanager + java worker pool + python worker pool
>>>> Desired: taskmanager + python worker pool
>>>>
>>>
>>> Generally this is not possible. If the transform has to be executed on
>>> the SDK side, the runner usually sets up an environment (for example, a
>>> Docker container) with the corresponding SDK and executes the bundles with
>>> the transform using the Beam Fn API.  Runners can choose to override this
>>> by executing the transform within the runner itself, but you'll have to
>>> modify the Flink runner to do this.
>>>
>>> Thanks,
>>> Cham
>>>
>>>
>>>>
>>>> Thank you very much.
>>>>
>>>> *Mark Ma*
>>>>
>>>> --
>> Xiao Ma
>> Geotab
>> Software Developer, Data Engineering | B.Sc, M.Sc
>> Direct     +1 (416) 836 - 3541 <(416)%20836-3541>
>> Toll-free  +1 (877) 436 - 8221 <(877)%20436-8221>
>> Visit       www.geotab.com
>> Twitter | Facebook | YouTube | LinkedIn
>>
>

Re: Java + Python Xlang pipeline

Posted by Chamikara Jayalath via dev <de...@beam.apache.org>.
By default, it will use Docker. You can try to change the default
environment type using the option [1] but I'm not sure if other environment
types will work for Flink Java x-lang pipelines.

Thanks,
Cham

[1]
https://github.com/apache/beam/blob/b94cff209cc8d1ae61cc916ff6b0b68561dc34c8/sdks/java/core/src/main/java/org/apache/beam/sdk/options/PortablePipelineOptions.java#L52

On Fri, Oct 7, 2022 at 10:26 PM Xiao Ma <xi...@geotab.com> wrote:

> Thank you  very muchfor the reply and  explaination. For the Java beam
> sdk, can it start as a worker pool, like the Python worker pool with
> --worker_pool option? Or the Java sdk doesn't have the external environment
> type, it has to be as docker started?
>
> Thank you.
>
> Matk
>
> On Sat, Oct 8, 2022 at 12:08 AM Chamikara Jayalath via dev <
> dev@beam.apache.org> wrote:
>
>>
>>
>> On Fri, Oct 7, 2022 at 6:29 PM Xiao Ma <xi...@geotab.com> wrote:
>>
>>> Hello,
>>>
>>> I would like to run a pipeline with Java as the main language and python
>>> transformation embedded. The beam pipeline is running on the flink cluster.
>>> Currently, I can run it with a taskmanager + java worker pool and a python
>>> worker pool. Could I ask if there is a way to run the java code on the task
>>> manager directly and keep the python worker pool?
>>>
>>> Current: taskmanager + java worker pool + python worker pool
>>> Desired: taskmanager + python worker pool
>>>
>>
>> Generally this is not possible. If the transform has to be executed on
>> the SDK side, the runner usually sets up an environment (for example, a
>> Docker container) with the corresponding SDK and executes the bundles with
>> the transform using the Beam Fn API.  Runners can choose to override this
>> by executing the transform within the runner itself, but you'll have to
>> modify the Flink runner to do this.
>>
>> Thanks,
>> Cham
>>
>>
>>>
>>> Thank you very much.
>>>
>>> *Mark Ma*
>>>
>>> --
> Xiao Ma
> Geotab
> Software Developer, Data Engineering | B.Sc, M.Sc
> Direct     +1 (416) 836 - 3541 <(416)%20836-3541>
> Toll-free  +1 (877) 436 - 8221 <(877)%20436-8221>
> Visit       www.geotab.com
> Twitter | Facebook | YouTube | LinkedIn
>

Re: Java + Python Xlang pipeline

Posted by Xiao Ma <xi...@geotab.com>.
Thank you  very muchfor the reply and  explaination. For the Java beam sdk,
can it start as a worker pool, like the Python worker pool with
--worker_pool option? Or the Java sdk doesn't have the external environment
type, it has to be as docker started?

Thank you.

Matk

On Sat, Oct 8, 2022 at 12:08 AM Chamikara Jayalath via dev <
dev@beam.apache.org> wrote:

>
>
> On Fri, Oct 7, 2022 at 6:29 PM Xiao Ma <xi...@geotab.com> wrote:
>
>> Hello,
>>
>> I would like to run a pipeline with Java as the main language and python
>> transformation embedded. The beam pipeline is running on the flink cluster.
>> Currently, I can run it with a taskmanager + java worker pool and a python
>> worker pool. Could I ask if there is a way to run the java code on the task
>> manager directly and keep the python worker pool?
>>
>> Current: taskmanager + java worker pool + python worker pool
>> Desired: taskmanager + python worker pool
>>
>
> Generally this is not possible. If the transform has to be executed on the
> SDK side, the runner usually sets up an environment (for example, a Docker
> container) with the corresponding SDK and executes the bundles with the
> transform using the Beam Fn API.  Runners can choose to override this by
> executing the transform within the runner itself, but you'll have to modify
> the Flink runner to do this.
>
> Thanks,
> Cham
>
>
>>
>> Thank you very much.
>>
>> *Mark Ma*
>>
>> --
Xiao Ma
Geotab
Software Developer, Data Engineering | B.Sc, M.Sc
Direct     +1 (416) 836 - 3541
Toll-free  +1 (877) 436 - 8221
Visit       www.geotab.com
Twitter | Facebook | YouTube | LinkedIn

Re: Java + Python Xlang pipeline

Posted by Chamikara Jayalath via dev <de...@beam.apache.org>.
On Fri, Oct 7, 2022 at 6:29 PM Xiao Ma <xi...@geotab.com> wrote:

> Hello,
>
> I would like to run a pipeline with Java as the main language and python
> transformation embedded. The beam pipeline is running on the flink cluster.
> Currently, I can run it with a taskmanager + java worker pool and a python
> worker pool. Could I ask if there is a way to run the java code on the task
> manager directly and keep the python worker pool?
>
> Current: taskmanager + java worker pool + python worker pool
> Desired: taskmanager + python worker pool
>

Generally this is not possible. If the transform has to be executed on the
SDK side, the runner usually sets up an environment (for example, a Docker
container) with the corresponding SDK and executes the bundles with the
transform using the Beam Fn API.  Runners can choose to override this by
executing the transform within the runner itself, but you'll have to modify
the Flink runner to do this.

Thanks,
Cham


>
> Thank you very much.
>
> *Mark Ma*
>
>