You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@beam.apache.org by Ismaël Mejía <ie...@gmail.com> on 2018/03/23 11:03:05 UTC

[PROPOSAL] Scripting extension based on Java JSR-223

This is a really simple proposal to add an extension with transforms
that package the Java Scripting API )JSR-223) [1] to allow users to
specialize some transforms via a scripting language. This work was
initially created by Romain [2] and I just took it with his
authorization and refined it to make it pass all the Beam validations
+ style. I also added ValueProviders that allow users to template now
scripts also in Dataflow.

Notice that Dataflow recently added something similar to create really
simple data movement pipelines [3], so maybe the rest of the community
can benefit of a similar extension (and eventually dataflow may
converge to this implementation).

I hope there is interest in this extension, so far we have a
ScriptingParDo transform to show the idea, hopefully we can expand
this to other transforms.

For those interested in more details you can check the Jira issue [4]
and the PR [5].

[1] https://www.jcp.org/en/jsr/detail?id=223
[2] https://github.com/rmannibucau/beam-jsr223
[3] https://cloud.google.com/blog/big-data/2018/03/pre-built-cloud-dataflow-templates-kiss-for-data-movement
[4] https://issues.apache.org/jira/browse/BEAM-3921
[5} https://github.com/apache/beam/pull/4944

Re: [PROPOSAL] Scripting extension based on Java JSR-223

Posted by Ismaël Mejía <ie...@gmail.com>.
Nice, it is great to see a good amount of support and enthusiasm on
this. I want just to remind that the whole idea and code donation
comes from Romain Manni-Bucau. I just did some ‘mise-en-forme’ plus
ValueProviders. All credit to Romain!

Eugene thanks a lot for the feedback. I would like to get this initial
version in quickly and iterate incrementally on the design and
implementation of the features you propose afterwards, there are a lot
of good ideas, thanks for sharing those. I will create a doc so we can
iterate on the design of the different points.

All feature requests and ideas are welcome so please to the other
people in the community, feel free to add those here for discussion
then we can bootstrap a better design document.

On Fri, Mar 23, 2018 at 8:32 PM, Thomas Weise <th...@apache.org> wrote:
> +1,  nice!
>
> On Fri, Mar 23, 2018 at 4:03 AM, Ismaël Mejía <ie...@gmail.com> wrote:
>>
>> This is a really simple proposal to add an extension with transforms
>> that package the Java Scripting API )JSR-223) [1] to allow users to
>> specialize some transforms via a scripting language. This work was
>> initially created by Romain [2] and I just took it with his
>> authorization and refined it to make it pass all the Beam validations
>> + style. I also added ValueProviders that allow users to template now
>> scripts also in Dataflow.
>>
>> Notice that Dataflow recently added something similar to create really
>> simple data movement pipelines [3], so maybe the rest of the community
>> can benefit of a similar extension (and eventually dataflow may
>> converge to this implementation).
>>
>> I hope there is interest in this extension, so far we have a
>> ScriptingParDo transform to show the idea, hopefully we can expand
>> this to other transforms.
>>
>> For those interested in more details you can check the Jira issue [4]
>> and the PR [5].
>>
>> [1] https://www.jcp.org/en/jsr/detail?id=223
>> [2] https://github.com/rmannibucau/beam-jsr223
>> [3]
>> https://cloud.google.com/blog/big-data/2018/03/pre-built-cloud-dataflow-templates-kiss-for-data-movement
>> [4] https://issues.apache.org/jira/browse/BEAM-3921
>> [5} https://github.com/apache/beam/pull/4944
>
>

Re: [PROPOSAL] Scripting extension based on Java JSR-223

Posted by Thomas Weise <th...@apache.org>.
+1,  nice!

On Fri, Mar 23, 2018 at 4:03 AM, Ismaël Mejía <ie...@gmail.com> wrote:

> This is a really simple proposal to add an extension with transforms
> that package the Java Scripting API )JSR-223) [1] to allow users to
> specialize some transforms via a scripting language. This work was
> initially created by Romain [2] and I just took it with his
> authorization and refined it to make it pass all the Beam validations
> + style. I also added ValueProviders that allow users to template now
> scripts also in Dataflow.
>
> Notice that Dataflow recently added something similar to create really
> simple data movement pipelines [3], so maybe the rest of the community
> can benefit of a similar extension (and eventually dataflow may
> converge to this implementation).
>
> I hope there is interest in this extension, so far we have a
> ScriptingParDo transform to show the idea, hopefully we can expand
> this to other transforms.
>
> For those interested in more details you can check the Jira issue [4]
> and the PR [5].
>
> [1] https://www.jcp.org/en/jsr/detail?id=223
> [2] https://github.com/rmannibucau/beam-jsr223
> [3] https://cloud.google.com/blog/big-data/2018/03/pre-built-
> cloud-dataflow-templates-kiss-for-data-movement
> [4] https://issues.apache.org/jira/browse/BEAM-3921
> [5} https://github.com/apache/beam/pull/4944
>

Re: [PROPOSAL] Scripting extension based on Java JSR-223

Posted by Eugene Kirpichov <ki...@google.com>.
Ismael - thanks, adding scripting language support to Beam is an awesome
idea and we should absolutely do it.

However I think it the current proposal can be made significantly more
general, and it would merit from a formal design discussion. E.g. a couple
of points I can think of, that seem very important but currently aren't
covered by the PR:
- Having the script return multiple values per element
- Scripting arbitrary user-code callbacks rather than a whole PTransform,
e.g. writing the various lambdas of FileIO.writeDynamic() in a scripting
language
- Integration with Beam SQL
- Specifying dependencies (does this require anything special?)

And less critical but also important or potentially very useful points:
- Support for side inputs and for multiple output tags
- Supporting asynchronous API calls from the script
- Supporting batching multiple elements together

On Fri, Mar 23, 2018 at 12:09 PM Tyler Akidau <ta...@google.com> wrote:

> +1, I like it. Thanks!
>
> On Fri, Mar 23, 2018 at 9:03 AM Ahmet Altay <al...@google.com> wrote:
>
>> Thank you Ismaël, this looks really cool.
>>
>> On Fri, Mar 23, 2018 at 5:33 AM, Jean-Baptiste Onofré <jb...@nanthrax.net>
>> wrote:
>>
>>> Hi,
>>>
>>> it sounds like a very good extension mechanism to PTransform.
>>>
>>> +1
>>>
>>> Regards
>>> JB
>>>
>>> On 03/23/2018 12:03 PM, Ismaël Mejía wrote:
>>> > This is a really simple proposal to add an extension with transforms
>>> > that package the Java Scripting API )JSR-223) [1] to allow users to
>>> > specialize some transforms via a scripting language. This work was
>>> > initially created by Romain [2] and I just took it with his
>>> > authorization and refined it to make it pass all the Beam validations
>>> > + style. I also added ValueProviders that allow users to template now
>>> > scripts also in Dataflow.
>>> >
>>> > Notice that Dataflow recently added something similar to create really
>>> > simple data movement pipelines [3], so maybe the rest of the community
>>> > can benefit of a similar extension (and eventually dataflow may
>>> > converge to this implementation).
>>> >
>>> > I hope there is interest in this extension, so far we have a
>>> > ScriptingParDo transform to show the idea, hopefully we can expand
>>> > this to other transforms.
>>> >
>>> > For those interested in more details you can check the Jira issue [4]
>>> > and the PR [5].
>>> >
>>> > [1] https://www.jcp.org/en/jsr/detail?id=223
>>> > [2] https://github.com/rmannibucau/beam-jsr223
>>> > [3]
>>> https://cloud.google.com/blog/big-data/2018/03/pre-built-cloud-dataflow-templates-kiss-for-data-movement
>>> > [4] https://issues.apache.org/jira/browse/BEAM-3921
>>> > [5} https://github.com/apache/beam/pull/4944
>>> >
>>>
>>> --
>>> Jean-Baptiste Onofré
>>> jbonofre@apache.org
>>> http://blog.nanthrax.net
>>> Talend - http://www.talend.com
>>>
>>
>>

Re: [PROPOSAL] Scripting extension based on Java JSR-223

Posted by Tyler Akidau <ta...@google.com>.
+1, I like it. Thanks!

On Fri, Mar 23, 2018 at 9:03 AM Ahmet Altay <al...@google.com> wrote:

> Thank you Ismaël, this looks really cool.
>
> On Fri, Mar 23, 2018 at 5:33 AM, Jean-Baptiste Onofré <jb...@nanthrax.net>
> wrote:
>
>> Hi,
>>
>> it sounds like a very good extension mechanism to PTransform.
>>
>> +1
>>
>> Regards
>> JB
>>
>> On 03/23/2018 12:03 PM, Ismaël Mejía wrote:
>> > This is a really simple proposal to add an extension with transforms
>> > that package the Java Scripting API )JSR-223) [1] to allow users to
>> > specialize some transforms via a scripting language. This work was
>> > initially created by Romain [2] and I just took it with his
>> > authorization and refined it to make it pass all the Beam validations
>> > + style. I also added ValueProviders that allow users to template now
>> > scripts also in Dataflow.
>> >
>> > Notice that Dataflow recently added something similar to create really
>> > simple data movement pipelines [3], so maybe the rest of the community
>> > can benefit of a similar extension (and eventually dataflow may
>> > converge to this implementation).
>> >
>> > I hope there is interest in this extension, so far we have a
>> > ScriptingParDo transform to show the idea, hopefully we can expand
>> > this to other transforms.
>> >
>> > For those interested in more details you can check the Jira issue [4]
>> > and the PR [5].
>> >
>> > [1] https://www.jcp.org/en/jsr/detail?id=223
>> > [2] https://github.com/rmannibucau/beam-jsr223
>> > [3]
>> https://cloud.google.com/blog/big-data/2018/03/pre-built-cloud-dataflow-templates-kiss-for-data-movement
>> > [4] https://issues.apache.org/jira/browse/BEAM-3921
>> > [5} https://github.com/apache/beam/pull/4944
>> >
>>
>> --
>> Jean-Baptiste Onofré
>> jbonofre@apache.org
>> http://blog.nanthrax.net
>> Talend - http://www.talend.com
>>
>
>

Re: [PROPOSAL] Scripting extension based on Java JSR-223

Posted by Ahmet Altay <al...@google.com>.
Thank you Ismaël, this looks really cool.

On Fri, Mar 23, 2018 at 5:33 AM, Jean-Baptiste Onofré <jb...@nanthrax.net>
wrote:

> Hi,
>
> it sounds like a very good extension mechanism to PTransform.
>
> +1
>
> Regards
> JB
>
> On 03/23/2018 12:03 PM, Ismaël Mejía wrote:
> > This is a really simple proposal to add an extension with transforms
> > that package the Java Scripting API )JSR-223) [1] to allow users to
> > specialize some transforms via a scripting language. This work was
> > initially created by Romain [2] and I just took it with his
> > authorization and refined it to make it pass all the Beam validations
> > + style. I also added ValueProviders that allow users to template now
> > scripts also in Dataflow.
> >
> > Notice that Dataflow recently added something similar to create really
> > simple data movement pipelines [3], so maybe the rest of the community
> > can benefit of a similar extension (and eventually dataflow may
> > converge to this implementation).
> >
> > I hope there is interest in this extension, so far we have a
> > ScriptingParDo transform to show the idea, hopefully we can expand
> > this to other transforms.
> >
> > For those interested in more details you can check the Jira issue [4]
> > and the PR [5].
> >
> > [1] https://www.jcp.org/en/jsr/detail?id=223
> > [2] https://github.com/rmannibucau/beam-jsr223
> > [3] https://cloud.google.com/blog/big-data/2018/03/pre-built-
> cloud-dataflow-templates-kiss-for-data-movement
> > [4] https://issues.apache.org/jira/browse/BEAM-3921
> > [5} https://github.com/apache/beam/pull/4944
> >
>
> --
> Jean-Baptiste Onofré
> jbonofre@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>

Re: [PROPOSAL] Scripting extension based on Java JSR-223

Posted by Jean-Baptiste Onofré <jb...@nanthrax.net>.
Hi,

it sounds like a very good extension mechanism to PTransform.

+1

Regards
JB

On 03/23/2018 12:03 PM, Ismaël Mejía wrote:
> This is a really simple proposal to add an extension with transforms
> that package the Java Scripting API )JSR-223) [1] to allow users to
> specialize some transforms via a scripting language. This work was
> initially created by Romain [2] and I just took it with his
> authorization and refined it to make it pass all the Beam validations
> + style. I also added ValueProviders that allow users to template now
> scripts also in Dataflow.
> 
> Notice that Dataflow recently added something similar to create really
> simple data movement pipelines [3], so maybe the rest of the community
> can benefit of a similar extension (and eventually dataflow may
> converge to this implementation).
> 
> I hope there is interest in this extension, so far we have a
> ScriptingParDo transform to show the idea, hopefully we can expand
> this to other transforms.
> 
> For those interested in more details you can check the Jira issue [4]
> and the PR [5].
> 
> [1] https://www.jcp.org/en/jsr/detail?id=223
> [2] https://github.com/rmannibucau/beam-jsr223
> [3] https://cloud.google.com/blog/big-data/2018/03/pre-built-cloud-dataflow-templates-kiss-for-data-movement
> [4] https://issues.apache.org/jira/browse/BEAM-3921
> [5} https://github.com/apache/beam/pull/4944
> 

-- 
Jean-Baptiste Onofré
jbonofre@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com