You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@beam.apache.org by Chamikara Jayalath <ch...@google.com> on 2020/11/21 02:36:23 UTC

Re: Documentation for Cross-Language Transforms

PR went in and documentation is live now:
https://beam.apache.org/documentation/programming-guide/#mulit-language-pipelines

Thanks,
Cham

On Wed, Nov 18, 2020 at 10:05 AM Chamikara Jayalath <ch...@google.com>
wrote:

> This was mentioned in a separate thread but thought it would be good to
> highlight here in case more folks wish to take a look before the PR is
> merged.
>
> PR is https://github.com/apache/beam/pull/13317
>
> Thanks,
> Cham
>
> On Thu, Nov 12, 2020 at 1:17 PM Chamikara Jayalath <ch...@google.com>
> wrote:
>
>> Seems like a good place to promote this PR that adds documentation for
>> cross-language transforms :)
>> https://github.com/apache/beam/pull/13317
>>
>> This covers the following for both Java and Python SDKs.
>> * Creating new cross-language transforms - primary audience will be
>> transform authors who wish to make existing Java/Python transforms
>> available to other SDKs.
>> * Using cross-language transforms - primary audience will be pipeline
>> authors that wish to use existing cross-language transforms with or without
>> language specific wrappers.
>>
>> Also this introduces the term "Multi-Language Pipelines" to denote
>> pipelines that use cross-language transforms (and hence utilize more than
>> one SDK language).
>>
>> Thanks +Dave Wrede <dw...@google.com> for working on this.
>>
>> - Cham
>>
>> On Thu, Nov 12, 2020 at 4:56 AM Ismaël Mejía <ie...@gmail.com> wrote:
>>
>>> I was not aware of these examples Brian, thanks for sharing. Maybe we
>>> should
>>> make these examples more discoverable on the website or as part of Beam's
>>> programming guide.
>>>
>>> It would be nice to have an example of the opposite too, calling a Python
>>> transform from Java.
>>>
>>> Additionally Java users who want to integrate python might be lost
>>> because
>>> External is NOT part of Beam's Java SDK (the transform is hidden inside
>>> of a
>>> different module core-construction-java), so it does not even appear in
>>> the
>>> website SDK javadoc.
>>> https://issues.apache.org/jira/browse/BEAM-8546
>>>
>>>
>>> On Wed, Nov 11, 2020 at 8:41 PM Brian Hulette <bh...@google.com>
>>> wrote:
>>> >
>>> > Hi Ke,
>>> >
>>> > A cross-language pipeline looks a lot like a pipeline written natively
>>> in one of the Beam SDKs, the difference is that some of the transforms in
>>> the pipeline may be "external transforms" that actually have
>>> implementations in a different language. There are a few examples in the
>>> beam repo that use Java transforms from Python pipelines:
>>> > - kafkataxi [1]: Uses Java's KafkaIO from Python
>>> > - wordcount_xlang_sql [2] and sql_taxi [3]: Use Java's SqlTransform
>>> from Python
>>> >
>>> > To create your own cross-language pipeline, you'll need to decide
>>> which SDK you want to use primarily, and then create an expansion service
>>> to expose the transforms you want to use from the other SDK (if one doesn't
>>> exist already).
>>> >
>>> > [1]
>>> https://github.com/apache/beam/tree/master/sdks/python/apache_beam/examples/kafkataxi
>>> > [2]
>>> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/wordcount_xlang_sql.py
>>> > [3]
>>> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/sql_taxi.py
>>> >
>>> > On Wed, Nov 11, 2020 at 11:07 AM Ke Wu <ke...@gmail.com> wrote:
>>> >>
>>> >> Hello,
>>> >>
>>> >> Is there an example demonstrating how a cross language pipeline look
>>> like? e.g. a pipeline where it is composes of Java and Python
>>> code/transforms.
>>> >>
>>> >> Best,
>>> >> Ke
>>>
>>

Re: Documentation for Cross-Language Transforms

Posted by Chamikara Jayalath <ch...@google.com>.
On Wed, Nov 25, 2020 at 11:09 AM Alexey Romanenko <ar...@gmail.com>
wrote:

> Great job, it should be very helpful for users!
>
> Just a minor note - it would be great to add an example of how to finally
> run a cross-language pipeline with Portable Runner since, iirc, it was
> supposed to pass some additional arguments, like “
> *--experiments=beam_fn_api*”.
>

+1. I haven't had time to fully test current examples (SQL
<https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/sql_taxi.py>,
Kafka
<https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/kafkataxi/kafka_taxi.py>)
on portable runners but feel free to update if you have the
relevant commands at hand.

Thanks,
Cham


>
> On 21 Nov 2020, at 03:36, Chamikara Jayalath <ch...@google.com> wrote:
>
> PR went in and documentation is live now:
> https://beam.apache.org/documentation/programming-guide/#mulit-language-pipelines
>
> Thanks,
> Cham
>
> On Wed, Nov 18, 2020 at 10:05 AM Chamikara Jayalath <ch...@google.com>
> wrote:
>
>> This was mentioned in a separate thread but thought it would be good to
>> highlight here in case more folks wish to take a look before the PR is
>> merged.
>>
>> PR is https://github.com/apache/beam/pull/13317
>>
>> Thanks,
>> Cham
>>
>> On Thu, Nov 12, 2020 at 1:17 PM Chamikara Jayalath <ch...@google.com>
>> wrote:
>>
>>> Seems like a good place to promote this PR that adds documentation for
>>> cross-language transforms :)
>>> https://github.com/apache/beam/pull/13317
>>>
>>> This covers the following for both Java and Python SDKs.
>>> * Creating new cross-language transforms - primary audience will be
>>> transform authors who wish to make existing Java/Python transforms
>>> available to other SDKs.
>>> * Using cross-language transforms - primary audience will be pipeline
>>> authors that wish to use existing cross-language transforms with or without
>>> language specific wrappers.
>>>
>>> Also this introduces the term "Multi-Language Pipelines" to denote
>>> pipelines that use cross-language transforms (and hence utilize more than
>>> one SDK language).
>>>
>>> Thanks +Dave Wrede <dw...@google.com> for working on this.
>>>
>>> - Cham
>>>
>>> On Thu, Nov 12, 2020 at 4:56 AM Ismaël Mejía <ie...@gmail.com> wrote:
>>>
>>>> I was not aware of these examples Brian, thanks for sharing. Maybe we
>>>> should
>>>> make these examples more discoverable on the website or as part of
>>>> Beam's
>>>> programming guide.
>>>>
>>>> It would be nice to have an example of the opposite too, calling a
>>>> Python
>>>> transform from Java.
>>>>
>>>> Additionally Java users who want to integrate python might be lost
>>>> because
>>>> External is NOT part of Beam's Java SDK (the transform is hidden inside
>>>> of a
>>>> different module core-construction-java), so it does not even appear in
>>>> the
>>>> website SDK javadoc.
>>>> https://issues.apache.org/jira/browse/BEAM-8546
>>>>
>>>>
>>>> On Wed, Nov 11, 2020 at 8:41 PM Brian Hulette <bh...@google.com>
>>>> wrote:
>>>> >
>>>> > Hi Ke,
>>>> >
>>>> > A cross-language pipeline looks a lot like a pipeline written
>>>> natively in one of the Beam SDKs, the difference is that some of the
>>>> transforms in the pipeline may be "external transforms" that actually have
>>>> implementations in a different language. There are a few examples in the
>>>> beam repo that use Java transforms from Python pipelines:
>>>> > - kafkataxi [1]: Uses Java's KafkaIO from Python
>>>> > - wordcount_xlang_sql [2] and sql_taxi [3]: Use Java's SqlTransform
>>>> from Python
>>>> >
>>>> > To create your own cross-language pipeline, you'll need to decide
>>>> which SDK you want to use primarily, and then create an expansion service
>>>> to expose the transforms you want to use from the other SDK (if one doesn't
>>>> exist already).
>>>> >
>>>> > [1]
>>>> https://github.com/apache/beam/tree/master/sdks/python/apache_beam/examples/kafkataxi
>>>> > [2]
>>>> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/wordcount_xlang_sql.py
>>>> > [3]
>>>> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/sql_taxi.py
>>>> >
>>>> > On Wed, Nov 11, 2020 at 11:07 AM Ke Wu <ke...@gmail.com> wrote:
>>>> >>
>>>> >> Hello,
>>>> >>
>>>> >> Is there an example demonstrating how a cross language pipeline look
>>>> like? e.g. a pipeline where it is composes of Java and Python
>>>> code/transforms.
>>>> >>
>>>> >> Best,
>>>> >> Ke
>>>>
>>>
>

Re: Documentation for Cross-Language Transforms

Posted by Alexey Romanenko <ar...@gmail.com>.
Great job, it should be very helpful for users! 

Just a minor note - it would be great to add an example of how to finally run a cross-language pipeline with Portable Runner since, iirc, it was supposed to pass some additional arguments, like “--experiments=beam_fn_api”.

> On 21 Nov 2020, at 03:36, Chamikara Jayalath <ch...@google.com> wrote:
> 
> PR went in and documentation is live now: https://beam.apache.org/documentation/programming-guide/#mulit-language-pipelines <https://beam.apache.org/documentation/programming-guide/#mulit-language-pipelines>
> 
> Thanks,
> Cham
> 
> On Wed, Nov 18, 2020 at 10:05 AM Chamikara Jayalath <chamikara@google.com <ma...@google.com>> wrote:
> This was mentioned in a separate thread but thought it would be good to highlight here in case more folks wish to take a look before the PR is merged.
> 
> PR is https://github.com/apache/beam/pull/13317 <https://github.com/apache/beam/pull/13317>
> 
> Thanks,
> Cham
> 
> On Thu, Nov 12, 2020 at 1:17 PM Chamikara Jayalath <chamikara@google.com <ma...@google.com>> wrote:
> Seems like a good place to promote this PR that adds documentation for cross-language transforms :)
> https://github.com/apache/beam/pull/13317 <https://github.com/apache/beam/pull/13317>
> 
> This covers the following for both Java and Python SDKs.
> * Creating new cross-language transforms - primary audience will be transform authors who wish to make existing Java/Python transforms available to other SDKs.
> * Using cross-language transforms - primary audience will be pipeline authors that wish to use existing cross-language transforms with or without language specific wrappers.
> 
> Also this introduces the term "Multi-Language Pipelines" to denote pipelines that use cross-language transforms (and hence utilize more than one SDK language).
> 
> Thanks +Dave Wrede <ma...@google.com> for working on this.
> 
> - Cham
> 
> On Thu, Nov 12, 2020 at 4:56 AM Ismaël Mejía <iemejia@gmail.com <ma...@gmail.com>> wrote:
> I was not aware of these examples Brian, thanks for sharing. Maybe we should
> make these examples more discoverable on the website or as part of Beam's
> programming guide.
> 
> It would be nice to have an example of the opposite too, calling a Python
> transform from Java.
> 
> Additionally Java users who want to integrate python might be lost because
> External is NOT part of Beam's Java SDK (the transform is hidden inside of a
> different module core-construction-java), so it does not even appear in the
> website SDK javadoc.
> https://issues.apache.org/jira/browse/BEAM-8546 <https://issues.apache.org/jira/browse/BEAM-8546>
> 
> 
> On Wed, Nov 11, 2020 at 8:41 PM Brian Hulette <bhulette@google.com <ma...@google.com>> wrote:
> >
> > Hi Ke,
> >
> > A cross-language pipeline looks a lot like a pipeline written natively in one of the Beam SDKs, the difference is that some of the transforms in the pipeline may be "external transforms" that actually have implementations in a different language. There are a few examples in the beam repo that use Java transforms from Python pipelines:
> > - kafkataxi [1]: Uses Java's KafkaIO from Python
> > - wordcount_xlang_sql [2] and sql_taxi [3]: Use Java's SqlTransform from Python
> >
> > To create your own cross-language pipeline, you'll need to decide which SDK you want to use primarily, and then create an expansion service to expose the transforms you want to use from the other SDK (if one doesn't exist already).
> >
> > [1] https://github.com/apache/beam/tree/master/sdks/python/apache_beam/examples/kafkataxi <https://github.com/apache/beam/tree/master/sdks/python/apache_beam/examples/kafkataxi>
> > [2] https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/wordcount_xlang_sql.py <https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/wordcount_xlang_sql.py>
> > [3] https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/sql_taxi.py <https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/sql_taxi.py>
> >
> > On Wed, Nov 11, 2020 at 11:07 AM Ke Wu <ke.wu.cs@gmail.com <ma...@gmail.com>> wrote:
> >>
> >> Hello,
> >>
> >> Is there an example demonstrating how a cross language pipeline look like? e.g. a pipeline where it is composes of Java and Python code/transforms.
> >>
> >> Best,
> >> Ke