You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@beam.apache.org by David Morávek <da...@gmail.com> on 2017/12/17 18:00:19 UTC

Euphoria Java 8 DSL - proposal

Hello,


First of all, thanks for the amazing work the Apache Beam community is
doing!


In 2014, we've started development of the runtime independent Java 8 API,
that helps us to create unified big-data processing flows. It has been used
as a core building block of Seznam.cz web crawler data infrastructure every
since. Its design principles and execution model are very similar to Apache
Beam.


This API was open sourced in 2016, under the name Euphoria API:

https://github.com/seznam/euphoria


As it is very similar to Apache Beam, we feel, that it is not worth of
duplicating effort in terms of development of new runtimes and fine-tuning
of current ones.


The main blocker for us to switch to Apache Beam is lack of the Java 8 API.
*W*e propose the integration of Euphoria API into Apache Beam as a Java 8
DSL, in order to share our effort with the community.


Simple example of the Euphoria API usage, can be found here:

https://github.com/seznam/euphoria/tree/master/euphoria-examples/src/main/java/cz/seznam/euphoria/examples/wordcount


If you feel, that Beam community could leverage from our work, we would
love to start working on Euphoria integration into Apache Beam (we already
have a working POC, with few basic operators implemented).


I look forward to hearing from you,

David

Re: Euphoria Java 8 DSL - proposal

Posted by Kenneth Knowles <kl...@google.com>.
+1 here. I already liked Euphoria, and I like the merger even more :-)

Kenn

On Tue, Jan 2, 2018 at 8:45 AM, Tyler Akidau <ta...@google.com> wrote:

> +1, I'm supportive of seeing this move forward. What remaining concrete
> concerns are there?
>
> -Tyler
>
>
> On Tue, Jan 2, 2018 at 8:35 AM David Morávek <da...@gmail.com>
> wrote:
>
>> Hello JB,
>>
>> can we help in any way to move things forward?
>>
>> Thanks,
>> D.
>>
>> On Mon, Dec 18, 2017 at 4:28 PM, Jean-Baptiste Onofré <jb...@nanthrax.net>
>> wrote:
>>
>>> Thanks Jan,
>>>
>>> It makes sense.
>>>
>>> Let me take a look on the code to understand the "interaction".
>>>
>>> Regards
>>> JB
>>>
>>>
>>> On 12/18/2017 04:26 PM, Jan Lukavský wrote:
>>>
>>>> Hi JB,
>>>>
>>>> basically you are not wrong. The project started about three or four
>>>> years ago with a goal to unify batch and streaming processing into single
>>>> portable, executor independent API. Because of that, it is currently
>>>> "close" to Beam in this sense. But we don't see much added value keeping
>>>> this as a separate project, with one of the key differences to be the API
>>>> (not the model itself), so we would like to focus on translation from
>>>> Euphoria API to Beam's SDK. That's why we would like to see it as a DSL, so
>>>> that it would be possible to use Euphoria API with Beam's runners as much
>>>> natively as possible.
>>>>
>>>> I hope I didn't make the subject even more unclear, if so, I'll be
>>>> happy to explain anything in more detail. :-)
>>>>
>>>>    Jan
>>>>
>>>>
>>>> On 12/18/2017 04:08 PM, Jean-Baptiste Onofré wrote:
>>>>
>>>>> Hi Jan,
>>>>>
>>>>> Thanks for your answers.
>>>>>
>>>>> However, they confused me ;)
>>>>>
>>>>> Regarding what you replied, Euphoria seems like a programming
>>>>> model/SDK "close" to Beam more than a DSL on top of an existing Beam SDK.
>>>>>
>>>>> Am I wrong ?
>>>>>
>>>>> Regards
>>>>> JB
>>>>>
>>>>> On 12/18/2017 03:44 PM, Jan Lukavský wrote:
>>>>>
>>>>>> Hi Ismael,
>>>>>>
>>>>>> basically we adopted the Beam's design regarding partitioning (
>>>>>> https://github.com/seznam/euphoria/issues/160) and implemented the
>>>>>> sorting manually (https://github.com/seznam/euphoria/issues/158).
>>>>>> I'm not aware of the time model differences (Euphoria supports ingestion
>>>>>> and event time, we don't support processing time by decision). Regarding
>>>>>> other differences (looking into Beam capability matrix, I'd say that):
>>>>>>
>>>>>>   - we don't support stateful FlatMap (i.e. ParDo) for now (
>>>>>> https://github.com/seznam/euphoria/issues/192)
>>>>>>
>>>>>>   - we don't support side inputs (by decision now, but might be
>>>>>> reconsidered) and outputs (https://github.com/seznam/
>>>>>> euphoria/issues/124)
>>>>>>
>>>>>>   - we support complete event-time windows (non-merging, merging,
>>>>>> aligned, unaligned) and time control
>>>>>>
>>>>>>   - we don't support processing time by decision (might be
>>>>>> reconsidered if a valid use-case is found)
>>>>>>
>>>>>>   - we support window triggering based on both time and data,
>>>>>> including discarding and accumulating (without accumulating & retracting)
>>>>>>
>>>>>> All our executors (runners) - Flink, Spark and Local - implement the
>>>>>> complete model, which we enforce using "operator test kit" that all
>>>>>> executors must pass. Spark executor supports bounded sources only (for
>>>>>> now). As David said, we currently don't have serialization abstraction, so
>>>>>> there is some work to be done in that regard.
>>>>>>
>>>>>> Our intention is to completely supersede Euphoria, we would like to
>>>>>> consider possibility to use executors that would not rely on Beam, but that
>>>>>> is optional now and should be straightforward.
>>>>>>
>>>>>> We'd be happy to answer any more questions you might have and thanks
>>>>>> a lot!
>>>>>>
>>>>>> Best,
>>>>>>
>>>>>>   Jan
>>>>>>
>>>>>>
>>>>>> On 12/18/2017 03:19 PM, Ismaël Mejía wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> It is great to see that you guys have achieved a maturity point to
>>>>>>> propose this. Congratulations for your work and the idea to
>>>>>>> contribute
>>>>>>> it into Beam.
>>>>>>>
>>>>>>> I remember from a previous discussion with Jan about the model
>>>>>>> mismatch between Euphoria and Beam, because of some design decisions
>>>>>>> of both projects. I remember you guys had some issues with the way
>>>>>>> Beam's sources do partitioning, as well as Beam's lack of sorted data
>>>>>>> (on shuffle a la hadoop). Also if I remember well the 'time' model of
>>>>>>> Euphoria was simpler than Beam's. I talk about all of this because I
>>>>>>> am curious about what parts of the Euphoria model you guys had to
>>>>>>> sacrifice to support Beam, and what parts of Beam's model should
>>>>>>> still
>>>>>>> be integrated into Euphoria (and if there is a straightforward path
>>>>>>> to
>>>>>>> do it).
>>>>>>>
>>>>>>> If I understand well if this gets merged into Apache this means that
>>>>>>> Euphoria's current implementation would be superseded by this DSL? I
>>>>>>> am curious because I would like to understand your level of
>>>>>>> investment
>>>>>>> on supporting the future of this DSL.
>>>>>>>
>>>>>>> Thanks and congrats again !
>>>>>>> Ismaël
>>>>>>>
>>>>>>> On Mon, Dec 18, 2017 at 10:12 AM, Jean-Baptiste Onofré <
>>>>>>> jb@nanthrax.net> wrote:
>>>>>>>
>>>>>>>> Depending of the donation, you would need ICLA for each
>>>>>>>> contributor, and
>>>>>>>> CCLA in addition of SGA.
>>>>>>>>
>>>>>>>> We can sync with Davor and I for the legal stuff.
>>>>>>>> However, I would wait a little bit just to have feedback from the
>>>>>>>> whole team
>>>>>>>> and start a formal vote.
>>>>>>>>
>>>>>>>> I would be happy to start the formal vote.
>>>>>>>>
>>>>>>>> Regards
>>>>>>>> JB
>>>>>>>>
>>>>>>>> On 12/18/2017 10:03 AM, David Morávek wrote:
>>>>>>>>
>>>>>>>>> Hello,
>>>>>>>>>
>>>>>>>>> Thanks for the awesome feedback!
>>>>>>>>>
>>>>>>>>> Romain:
>>>>>>>>>
>>>>>>>>> We already use Java Stream API in all operators where it makes
>>>>>>>>> sense (eg.:
>>>>>>>>> ReduceByKey). Still not sure if it was a good choice, but i can be
>>>>>>>>> easily
>>>>>>>>> converted to iterator anyway.
>>>>>>>>>
>>>>>>>>> Side outputs support is coming soon, we already made an initial
>>>>>>>>> work on
>>>>>>>>> this.
>>>>>>>>>
>>>>>>>>> Side inputs are not supported in a way you are used to from beam,
>>>>>>>>> because
>>>>>>>>> it can be replaced by Join operator on the same key (if annotated
>>>>>>>>> with
>>>>>>>>> broadcastHashJoin, it will be turned into map side join).
>>>>>>>>>
>>>>>>>>> Only significant difference from Beam is, that we decided not to
>>>>>>>>> abstract
>>>>>>>>> serialization, so we need to add support for Type Hints, because
>>>>>>>>> of type
>>>>>>>>> erasure.
>>>>>>>>>
>>>>>>>>> Fluent API:
>>>>>>>>>
>>>>>>>>> API is fluent within one operator. It is designed to "lead the
>>>>>>>>> programmer", which means, that he we'll be only offered methods
>>>>>>>>> that makes
>>>>>>>>> sense after the last method he used (eg.: in ReduceByKey, we know
>>>>>>>>> that after
>>>>>>>>> keyBy either reduceBy method should come). It is implemented as a
>>>>>>>>> series of
>>>>>>>>> builders.
>>>>>>>>>
>>>>>>>>> Davor:
>>>>>>>>>
>>>>>>>>> Thanks, I'll contact you, and will start the process of having all
>>>>>>>>> the
>>>>>>>>> necessary paperwork signed on our side, so we can get things
>>>>>>>>> moving.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Mon, Dec 18, 2017 at 7:46 AM, Romain Manni-Bucau <
>>>>>>>>> rmannibucau@gmail.com
>>>>>>>>> <ma...@gmail.com>> wrote:
>>>>>>>>>
>>>>>>>>>      Hi guys
>>>>>>>>>
>>>>>>>>>      A DSL would be very welcomed, in particular if fluent.
>>>>>>>>>
>>>>>>>>>      Open question: did you study to implement Stream API (surely
>>>>>>>>> extending
>>>>>>>>> it to
>>>>>>>>>      have a BeamStream and a few more features like sides etc)?
>>>>>>>>> Would be
>>>>>>>>> very
>>>>>>>>>      natural and integrable easily anywhere and avoid a new API
>>>>>>>>> discovery.
>>>>>>>>>
>>>>>>>>>      Hazelcast jet did it so I dont see why Beam couldnt.
>>>>>>>>>
>>>>>>>>>      Le 18 déc. 2017 07:26, "Davor Bonaci" <davor@apache.org
>>>>>>>>>      <ma...@apache.org>> a écrit :
>>>>>>>>>
>>>>>>>>>          Hi David,
>>>>>>>>>          As JB noted, merging of these two projects is a great
>>>>>>>>> idea. If
>>>>>>>>> fact,
>>>>>>>>>          some of us have had those discussions in the past.
>>>>>>>>>
>>>>>>>>>          Legally, nothing particular is strictly necessary as the
>>>>>>>>> code seem
>>>>>>>>> to
>>>>>>>>>          already be Apache 2.0 licensed. We don't, however, want
>>>>>>>>> to be
>>>>>>>>> perceived
>>>>>>>>>          as making hostile forks, so it would be great to file a
>>>>>>>>> Software
>>>>>>>>> Grant
>>>>>>>>>          Agreement with the ASF Secretary. I can help with the
>>>>>>>>> process, as
>>>>>>>>> necessary.
>>>>>>>>>
>>>>>>>>>          Project alignment-wise, there aren't any particular
>>>>>>>>> blockers that
>>>>>>>>> I am
>>>>>>>>>          aware of. We welcome DSLs.
>>>>>>>>>
>>>>>>>>>          Technically, the code would start in a feature branch.
>>>>>>>>> During this
>>>>>>>>>          stage, we'd need to validate a few things, including
>>>>>>>>> confirmation
>>>>>>>>> the
>>>>>>>>>          code and dependencies match the ASF policy, automate
>>>>>>>>> testing in
>>>>>>>>> Beam's
>>>>>>>>>          tooling, etc. At that point, we'd take a community vote
>>>>>>>>> to accept
>>>>>>>>> the
>>>>>>>>>          component into master, and consider author(s) for
>>>>>>>>> committership in
>>>>>>>>> the
>>>>>>>>>          overall project.
>>>>>>>>>
>>>>>>>>>          Welcome to the ASF and Beam -- we are thrilled to have
>>>>>>>>> you! Hope
>>>>>>>>> this
>>>>>>>>>          helps, and please reach out if anybody on our end can
>>>>>>>>> help,
>>>>>>>>> including JB
>>>>>>>>>          or myself.
>>>>>>>>>
>>>>>>>>>          Davor
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>          On Sun, Dec 17, 2017 at 10:13 AM, Jean-Baptiste Onofré
>>>>>>>>> <jb@nanthrax.net
>>>>>>>>>          <ma...@nanthrax.net>> wrote:
>>>>>>>>>
>>>>>>>>>              Hi David,
>>>>>>>>>
>>>>>>>>>              Generally speaking, having different fluent DSL on
>>>>>>>>> top of the
>>>>>>>>> Beam
>>>>>>>>>              SDK is great.
>>>>>>>>>
>>>>>>>>>              I would like to take a look on your wordcount
>>>>>>>>> examples to give
>>>>>>>>> you a
>>>>>>>>>              complete feedback. I like the idea and a fluent Java
>>>>>>>>> DSL is
>>>>>>>>> valuable.
>>>>>>>>>
>>>>>>>>>              Let's wait feedback from others. If we have a
>>>>>>>>> consensus, then
>>>>>>>>> I
>>>>>>>>>              would be more than happy to help you for the donation
>>>>>>>>> (I
>>>>>>>>> worked on
>>>>>>>>>              the Camel Java DSL while ago, so I have some
>>>>>>>>> experience here).
>>>>>>>>>
>>>>>>>>>              Thanks !
>>>>>>>>>              Regards
>>>>>>>>>              JB
>>>>>>>>>
>>>>>>>>>              On 12/17/2017 07:00 PM, David Morávek wrote:
>>>>>>>>>
>>>>>>>>>                  Hello,
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>                  First of all, thanks for the amazing work the
>>>>>>>>> Apache Beam
>>>>>>>>>                  community is doing!
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>                  In 2014, we've started development of the runtime
>>>>>>>>> independent
>>>>>>>>>                  Java 8 API, that helps us to create unified
>>>>>>>>> big-data
>>>>>>>>> processing
>>>>>>>>>                  flows. It has been used as a core building block
>>>>>>>>> of
>>>>>>>>> Seznam.cz
>>>>>>>>>                  web crawler data infrastructure every since. Its
>>>>>>>>> design
>>>>>>>>>                  principles and execution model are very similar
>>>>>>>>> to Apache
>>>>>>>>> Beam.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>                  This API was open sourced in 2016, under the name
>>>>>>>>> Euphoria
>>>>>>>>> API:
>>>>>>>>>
>>>>>>>>>                  https://github.com/seznam/euphoria
>>>>>>>>> <https://github.com/seznam/euphoria>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>                  As it is very similar to Apache Beam, we feel,
>>>>>>>>> that it is
>>>>>>>>> not
>>>>>>>>>                  worth of duplicating effort in terms of
>>>>>>>>> development of new
>>>>>>>>>                  runtimes and fine-tuning of current ones.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>                  The main blocker for us to switch to Apache Beam
>>>>>>>>> is lack
>>>>>>>>> of the
>>>>>>>>>                  Java 8 API. *W*e propose the integration of
>>>>>>>>> Euphoria API
>>>>>>>>> into
>>>>>>>>>                  Apache Beam as a Java 8 DSL, in order to share
>>>>>>>>> our effort
>>>>>>>>> with
>>>>>>>>>                  the community.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>                  Simple example of the Euphoria API usage, can be
>>>>>>>>> found
>>>>>>>>> here:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> https://github.com/seznam/euphoria/tree/master/euphoria-
>>>>>>>>> examples/src/main/java/cz/seznam/euphoria/examples/wordcount
>>>>>>>>>
>>>>>>>>> <https://github.com/seznam/euphoria/tree/master/euphoria-
>>>>>>>>> examples/src/main/java/cz/seznam/euphoria/examples/wordcount>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>                  If you feel, that Beam community could leverage
>>>>>>>>> from our
>>>>>>>>> work,
>>>>>>>>>                  we would love to start working on Euphoria
>>>>>>>>> integration
>>>>>>>>> into
>>>>>>>>>                  Apache Beam (we already have a working POC, with
>>>>>>>>> few basic
>>>>>>>>>                  operators implemented).
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>                  I look forward to hearing from you,
>>>>>>>>>
>>>>>>>>>                  David
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>              --             Jean-Baptiste Onofré
>>>>>>>>>              jbonofre@apache.org <ma...@apache.org>
>>>>>>>>>              http://blog.nanthrax.net
>>>>>>>>>              Talend - http://www.talend.com
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> s pozdravem
>>>>>>>>>
>>>>>>>>> David Morávek
>>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Jean-Baptiste Onofré
>>>>>>>> jbonofre@apache.org
>>>>>>>> http://blog.nanthrax.net
>>>>>>>> Talend - http://www.talend.com
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>> --
>>> Jean-Baptiste Onofré
>>> jbonofre@apache.org
>>> http://blog.nanthrax.net
>>> Talend - http://www.talend.com
>>>
>>
>>
>>
>> --
>> s pozdravem
>>
>> David Morávek
>>
>

Re: Euphoria Java 8 DSL - proposal

Posted by Tyler Akidau <ta...@google.com>.
+1, I'm supportive of seeing this move forward. What remaining concrete
concerns are there?

-Tyler


On Tue, Jan 2, 2018 at 8:35 AM David Morávek <da...@gmail.com>
wrote:

> Hello JB,
>
> can we help in any way to move things forward?
>
> Thanks,
> D.
>
> On Mon, Dec 18, 2017 at 4:28 PM, Jean-Baptiste Onofré <jb...@nanthrax.net>
> wrote:
>
>> Thanks Jan,
>>
>> It makes sense.
>>
>> Let me take a look on the code to understand the "interaction".
>>
>> Regards
>> JB
>>
>>
>> On 12/18/2017 04:26 PM, Jan Lukavský wrote:
>>
>>> Hi JB,
>>>
>>> basically you are not wrong. The project started about three or four
>>> years ago with a goal to unify batch and streaming processing into single
>>> portable, executor independent API. Because of that, it is currently
>>> "close" to Beam in this sense. But we don't see much added value keeping
>>> this as a separate project, with one of the key differences to be the API
>>> (not the model itself), so we would like to focus on translation from
>>> Euphoria API to Beam's SDK. That's why we would like to see it as a DSL, so
>>> that it would be possible to use Euphoria API with Beam's runners as much
>>> natively as possible.
>>>
>>> I hope I didn't make the subject even more unclear, if so, I'll be happy
>>> to explain anything in more detail. :-)
>>>
>>>    Jan
>>>
>>>
>>> On 12/18/2017 04:08 PM, Jean-Baptiste Onofré wrote:
>>>
>>>> Hi Jan,
>>>>
>>>> Thanks for your answers.
>>>>
>>>> However, they confused me ;)
>>>>
>>>> Regarding what you replied, Euphoria seems like a programming model/SDK
>>>> "close" to Beam more than a DSL on top of an existing Beam SDK.
>>>>
>>>> Am I wrong ?
>>>>
>>>> Regards
>>>> JB
>>>>
>>>> On 12/18/2017 03:44 PM, Jan Lukavský wrote:
>>>>
>>>>> Hi Ismael,
>>>>>
>>>>> basically we adopted the Beam's design regarding partitioning (
>>>>> https://github.com/seznam/euphoria/issues/160) and implemented the
>>>>> sorting manually (https://github.com/seznam/euphoria/issues/158). I'm
>>>>> not aware of the time model differences (Euphoria supports ingestion and
>>>>> event time, we don't support processing time by decision). Regarding other
>>>>> differences (looking into Beam capability matrix, I'd say that):
>>>>>
>>>>>   - we don't support stateful FlatMap (i.e. ParDo) for now (
>>>>> https://github.com/seznam/euphoria/issues/192)
>>>>>
>>>>>   - we don't support side inputs (by decision now, but might be
>>>>> reconsidered) and outputs (
>>>>> https://github.com/seznam/euphoria/issues/124)
>>>>>
>>>>>   - we support complete event-time windows (non-merging, merging,
>>>>> aligned, unaligned) and time control
>>>>>
>>>>>   - we don't support processing time by decision (might be
>>>>> reconsidered if a valid use-case is found)
>>>>>
>>>>>   - we support window triggering based on both time and data,
>>>>> including discarding and accumulating (without accumulating & retracting)
>>>>>
>>>>> All our executors (runners) - Flink, Spark and Local - implement the
>>>>> complete model, which we enforce using "operator test kit" that all
>>>>> executors must pass. Spark executor supports bounded sources only (for
>>>>> now). As David said, we currently don't have serialization abstraction, so
>>>>> there is some work to be done in that regard.
>>>>>
>>>>> Our intention is to completely supersede Euphoria, we would like to
>>>>> consider possibility to use executors that would not rely on Beam, but that
>>>>> is optional now and should be straightforward.
>>>>>
>>>>> We'd be happy to answer any more questions you might have and thanks a
>>>>> lot!
>>>>>
>>>>> Best,
>>>>>
>>>>>   Jan
>>>>>
>>>>>
>>>>> On 12/18/2017 03:19 PM, Ismaël Mejía wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> It is great to see that you guys have achieved a maturity point to
>>>>>> propose this. Congratulations for your work and the idea to contribute
>>>>>> it into Beam.
>>>>>>
>>>>>> I remember from a previous discussion with Jan about the model
>>>>>> mismatch between Euphoria and Beam, because of some design decisions
>>>>>> of both projects. I remember you guys had some issues with the way
>>>>>> Beam's sources do partitioning, as well as Beam's lack of sorted data
>>>>>> (on shuffle a la hadoop). Also if I remember well the 'time' model of
>>>>>> Euphoria was simpler than Beam's. I talk about all of this because I
>>>>>> am curious about what parts of the Euphoria model you guys had to
>>>>>> sacrifice to support Beam, and what parts of Beam's model should still
>>>>>> be integrated into Euphoria (and if there is a straightforward path to
>>>>>> do it).
>>>>>>
>>>>>> If I understand well if this gets merged into Apache this means that
>>>>>> Euphoria's current implementation would be superseded by this DSL? I
>>>>>> am curious because I would like to understand your level of investment
>>>>>> on supporting the future of this DSL.
>>>>>>
>>>>>> Thanks and congrats again !
>>>>>> Ismaël
>>>>>>
>>>>>> On Mon, Dec 18, 2017 at 10:12 AM, Jean-Baptiste Onofré <
>>>>>> jb@nanthrax.net> wrote:
>>>>>>
>>>>>>> Depending of the donation, you would need ICLA for each contributor,
>>>>>>> and
>>>>>>> CCLA in addition of SGA.
>>>>>>>
>>>>>>> We can sync with Davor and I for the legal stuff.
>>>>>>> However, I would wait a little bit just to have feedback from the
>>>>>>> whole team
>>>>>>> and start a formal vote.
>>>>>>>
>>>>>>> I would be happy to start the formal vote.
>>>>>>>
>>>>>>> Regards
>>>>>>> JB
>>>>>>>
>>>>>>> On 12/18/2017 10:03 AM, David Morávek wrote:
>>>>>>>
>>>>>>>> Hello,
>>>>>>>>
>>>>>>>> Thanks for the awesome feedback!
>>>>>>>>
>>>>>>>> Romain:
>>>>>>>>
>>>>>>>> We already use Java Stream API in all operators where it makes
>>>>>>>> sense (eg.:
>>>>>>>> ReduceByKey). Still not sure if it was a good choice, but i can be
>>>>>>>> easily
>>>>>>>> converted to iterator anyway.
>>>>>>>>
>>>>>>>> Side outputs support is coming soon, we already made an initial
>>>>>>>> work on
>>>>>>>> this.
>>>>>>>>
>>>>>>>> Side inputs are not supported in a way you are used to from beam,
>>>>>>>> because
>>>>>>>> it can be replaced by Join operator on the same key (if annotated
>>>>>>>> with
>>>>>>>> broadcastHashJoin, it will be turned into map side join).
>>>>>>>>
>>>>>>>> Only significant difference from Beam is, that we decided not to
>>>>>>>> abstract
>>>>>>>> serialization, so we need to add support for Type Hints, because of
>>>>>>>> type
>>>>>>>> erasure.
>>>>>>>>
>>>>>>>> Fluent API:
>>>>>>>>
>>>>>>>> API is fluent within one operator. It is designed to "lead the
>>>>>>>> programmer", which means, that he we'll be only offered methods
>>>>>>>> that makes
>>>>>>>> sense after the last method he used (eg.: in ReduceByKey, we know
>>>>>>>> that after
>>>>>>>> keyBy either reduceBy method should come). It is implemented as a
>>>>>>>> series of
>>>>>>>> builders.
>>>>>>>>
>>>>>>>> Davor:
>>>>>>>>
>>>>>>>> Thanks, I'll contact you, and will start the process of having all
>>>>>>>> the
>>>>>>>> necessary paperwork signed on our side, so we can get things moving.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, Dec 18, 2017 at 7:46 AM, Romain Manni-Bucau <
>>>>>>>> rmannibucau@gmail.com
>>>>>>>> <ma...@gmail.com>> wrote:
>>>>>>>>
>>>>>>>>      Hi guys
>>>>>>>>
>>>>>>>>      A DSL would be very welcomed, in particular if fluent.
>>>>>>>>
>>>>>>>>      Open question: did you study to implement Stream API (surely
>>>>>>>> extending
>>>>>>>> it to
>>>>>>>>      have a BeamStream and a few more features like sides etc)?
>>>>>>>> Would be
>>>>>>>> very
>>>>>>>>      natural and integrable easily anywhere and avoid a new API
>>>>>>>> discovery.
>>>>>>>>
>>>>>>>>      Hazelcast jet did it so I dont see why Beam couldnt.
>>>>>>>>
>>>>>>>>      Le 18 déc. 2017 07:26, "Davor Bonaci" <davor@apache.org
>>>>>>>>      <ma...@apache.org>> a écrit :
>>>>>>>>
>>>>>>>>          Hi David,
>>>>>>>>          As JB noted, merging of these two projects is a great
>>>>>>>> idea. If
>>>>>>>> fact,
>>>>>>>>          some of us have had those discussions in the past.
>>>>>>>>
>>>>>>>>          Legally, nothing particular is strictly necessary as the
>>>>>>>> code seem
>>>>>>>> to
>>>>>>>>          already be Apache 2.0 licensed. We don't, however, want to
>>>>>>>> be
>>>>>>>> perceived
>>>>>>>>          as making hostile forks, so it would be great to file a
>>>>>>>> Software
>>>>>>>> Grant
>>>>>>>>          Agreement with the ASF Secretary. I can help with the
>>>>>>>> process, as
>>>>>>>> necessary.
>>>>>>>>
>>>>>>>>          Project alignment-wise, there aren't any particular
>>>>>>>> blockers that
>>>>>>>> I am
>>>>>>>>          aware of. We welcome DSLs.
>>>>>>>>
>>>>>>>>          Technically, the code would start in a feature branch.
>>>>>>>> During this
>>>>>>>>          stage, we'd need to validate a few things, including
>>>>>>>> confirmation
>>>>>>>> the
>>>>>>>>          code and dependencies match the ASF policy, automate
>>>>>>>> testing in
>>>>>>>> Beam's
>>>>>>>>          tooling, etc. At that point, we'd take a community vote to
>>>>>>>> accept
>>>>>>>> the
>>>>>>>>          component into master, and consider author(s) for
>>>>>>>> committership in
>>>>>>>> the
>>>>>>>>          overall project.
>>>>>>>>
>>>>>>>>          Welcome to the ASF and Beam -- we are thrilled to have
>>>>>>>> you! Hope
>>>>>>>> this
>>>>>>>>          helps, and please reach out if anybody on our end can help,
>>>>>>>> including JB
>>>>>>>>          or myself.
>>>>>>>>
>>>>>>>>          Davor
>>>>>>>>
>>>>>>>>
>>>>>>>>          On Sun, Dec 17, 2017 at 10:13 AM, Jean-Baptiste Onofré
>>>>>>>> <jb@nanthrax.net
>>>>>>>>          <ma...@nanthrax.net>> wrote:
>>>>>>>>
>>>>>>>>              Hi David,
>>>>>>>>
>>>>>>>>              Generally speaking, having different fluent DSL on top
>>>>>>>> of the
>>>>>>>> Beam
>>>>>>>>              SDK is great.
>>>>>>>>
>>>>>>>>              I would like to take a look on your wordcount examples
>>>>>>>> to give
>>>>>>>> you a
>>>>>>>>              complete feedback. I like the idea and a fluent Java
>>>>>>>> DSL is
>>>>>>>> valuable.
>>>>>>>>
>>>>>>>>              Let's wait feedback from others. If we have a
>>>>>>>> consensus, then
>>>>>>>> I
>>>>>>>>              would be more than happy to help you for the donation
>>>>>>>> (I
>>>>>>>> worked on
>>>>>>>>              the Camel Java DSL while ago, so I have some
>>>>>>>> experience here).
>>>>>>>>
>>>>>>>>              Thanks !
>>>>>>>>              Regards
>>>>>>>>              JB
>>>>>>>>
>>>>>>>>              On 12/17/2017 07:00 PM, David Morávek wrote:
>>>>>>>>
>>>>>>>>                  Hello,
>>>>>>>>
>>>>>>>>
>>>>>>>>                  First of all, thanks for the amazing work the
>>>>>>>> Apache Beam
>>>>>>>>                  community is doing!
>>>>>>>>
>>>>>>>>
>>>>>>>>                  In 2014, we've started development of the runtime
>>>>>>>> independent
>>>>>>>>                  Java 8 API, that helps us to create unified
>>>>>>>> big-data
>>>>>>>> processing
>>>>>>>>                  flows. It has been used as a core building block of
>>>>>>>> Seznam.cz
>>>>>>>>                  web crawler data infrastructure every since. Its
>>>>>>>> design
>>>>>>>>                  principles and execution model are very similar to
>>>>>>>> Apache
>>>>>>>> Beam.
>>>>>>>>
>>>>>>>>
>>>>>>>>                  This API was open sourced in 2016, under the name
>>>>>>>> Euphoria
>>>>>>>> API:
>>>>>>>>
>>>>>>>>                  https://github.com/seznam/euphoria
>>>>>>>> <https://github.com/seznam/euphoria>
>>>>>>>>
>>>>>>>>
>>>>>>>>                  As it is very similar to Apache Beam, we feel,
>>>>>>>> that it is
>>>>>>>> not
>>>>>>>>                  worth of duplicating effort in terms of
>>>>>>>> development of new
>>>>>>>>                  runtimes and fine-tuning of current ones.
>>>>>>>>
>>>>>>>>
>>>>>>>>                  The main blocker for us to switch to Apache Beam
>>>>>>>> is lack
>>>>>>>> of the
>>>>>>>>                  Java 8 API. *W*e propose the integration of
>>>>>>>> Euphoria API
>>>>>>>> into
>>>>>>>>                  Apache Beam as a Java 8 DSL, in order to share our
>>>>>>>> effort
>>>>>>>> with
>>>>>>>>                  the community.
>>>>>>>>
>>>>>>>>
>>>>>>>>                  Simple example of the Euphoria API usage, can be
>>>>>>>> found
>>>>>>>> here:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> https://github.com/seznam/euphoria/tree/master/euphoria-examples/src/main/java/cz/seznam/euphoria/examples/wordcount
>>>>>>>>
>>>>>>>> <
>>>>>>>> https://github.com/seznam/euphoria/tree/master/euphoria-examples/src/main/java/cz/seznam/euphoria/examples/wordcount>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>                  If you feel, that Beam community could leverage
>>>>>>>> from our
>>>>>>>> work,
>>>>>>>>                  we would love to start working on Euphoria
>>>>>>>> integration
>>>>>>>> into
>>>>>>>>                  Apache Beam (we already have a working POC, with
>>>>>>>> few basic
>>>>>>>>                  operators implemented).
>>>>>>>>
>>>>>>>>
>>>>>>>>                  I look forward to hearing from you,
>>>>>>>>
>>>>>>>>                  David
>>>>>>>>
>>>>>>>>
>>>>>>>>              --             Jean-Baptiste Onofré
>>>>>>>>              jbonofre@apache.org <ma...@apache.org>
>>>>>>>>              http://blog.nanthrax.net
>>>>>>>>              Talend - http://www.talend.com
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> s pozdravem
>>>>>>>>
>>>>>>>> David Morávek
>>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Jean-Baptiste Onofré
>>>>>>> jbonofre@apache.org
>>>>>>> http://blog.nanthrax.net
>>>>>>> Talend - http://www.talend.com
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>> --
>> Jean-Baptiste Onofré
>> jbonofre@apache.org
>> http://blog.nanthrax.net
>> Talend - http://www.talend.com
>>
>
>
>
> --
> s pozdravem
>
> David Morávek
>

Re: Euphoria Java 8 DSL - proposal

Posted by Davor Bonaci <da...@apache.org>.
(Sounds good, thanks! We'll follow-up there.)

On Tue, Feb 27, 2018 at 10:49 AM, David Morávek <da...@gmail.com>
wrote:

> Hi Davor,
>
> sorry for the delay, we were blocked by our legal department. I've send
> both SGA and CCLA to private@apache.beam.org, please let me know if you
> need anything else.
>
> Regards,
> David
>
> On Mon, Feb 19, 2018 at 6:13 AM, Jean-Baptiste Onofré <jb...@nanthrax.net>
> wrote:
>
>> Hi Davor,
>>
>> We still have some discussion/paperwork on Euphoria side (SGA, ...).
>>
>> So, it's on track but it takes a little more time than expected.
>>
>> Regards
>> JB
>>
>> On 02/19/2018 05:40 AM, Davor Bonaci wrote:
>> > I may have missed things, but any update on the progress of this
>> donation?
>> >
>> > On Tue, Jan 2, 2018 at 10:52 PM, Jean-Baptiste Onofré <jb@nanthrax.net
>> > <ma...@nanthrax.net>> wrote:
>> >
>> >     Great !
>> >
>> >     Thanks !
>> >     Regards
>> >     JB
>> >
>> >     On 01/03/2018 07:29 AM, David Morávek wrote:
>> >
>> >         Hello JB,
>> >
>> >         Perfect! I'm already on the Beam Slack workspace, I'll contact
>> you once
>> >         I get to the office.
>> >
>> >         Thanks!
>> >         D.
>> >
>> >         On Wed, Jan 3, 2018 at 6:19 AM, Jean-Baptiste Onofré <
>> jb@nanthrax.net
>> >         <ma...@nanthrax.net> <mailto:jb@nanthrax.net
>> >         <ma...@nanthrax.net>>> wrote:
>> >
>> >             Hi David,
>> >
>> >             absolutely !! Let's move forward on the preparation steps.
>> >
>> >             Are you on Slack and/or hangout to plan this ?
>> >
>> >             Thanks,
>> >             Regards
>> >             JB
>> >
>> >             On 01/02/2018 05:35 PM, David Morávek wrote:
>> >
>> >                 Hello JB,
>> >
>> >                 can we help in any way to move things forward?
>> >
>> >                 Thanks,
>> >                 D.
>> >
>> >                 On Mon, Dec 18, 2017 at 4:28 PM, Jean-Baptiste Onofré
>> >         <jb@nanthrax.net <ma...@nanthrax.net>
>> >                 <mailto:jb@nanthrax.net <ma...@nanthrax.net>>
>> >         <mailto:jb@nanthrax.net <ma...@nanthrax.net>
>> >                 <mailto:jb@nanthrax.net <ma...@nanthrax.net>>>>
>> wrote:
>> >
>> >                      Thanks Jan,
>> >
>> >                      It makes sense.
>> >
>> >                      Let me take a look on the code to understand the
>> "interaction".
>> >
>> >                      Regards
>> >                      JB
>> >
>> >
>> >                      On 12/18/2017 04:26 PM, Jan Lukavský wrote:
>> >
>> >                          Hi JB,
>> >
>> >                          basically you are not wrong. The project
>> started about
>> >         three or
>> >                 four
>> >                          years ago with a goal to unify batch and
>> streaming
>> >         processing into
>> >                          single portable, executor independent API.
>> Because of
>> >         that, it is
>> >                          currently "close" to Beam in this sense. But
>> we don't
>> >         see much
>> >                 added
>> >                          value keeping this as a separate project, with
>> one of
>> >         the key
>> >                          differences to be the API (not the model
>> itself), so we
>> >         would
>> >                 like to
>> >                          focus on translation from Euphoria API to
>> Beam's SDK.
>> >         That's why we
>> >                          would like to see it as a DSL, so that it
>> would be
>> >         possible to use
>> >                          Euphoria API with Beam's runners as much
>> natively as
>> >         possible.
>> >
>> >                          I hope I didn't make the subject even more
>> unclear, if
>> >         so, I'll
>> >                 be happy
>> >                          to explain anything in more detail. :-)
>> >
>> >                              Jan
>> >
>> >
>> >                          On 12/18/2017 04:08 PM, Jean-Baptiste Onofré
>> wrote:
>> >
>> >                              Hi Jan,
>> >
>> >                              Thanks for your answers.
>> >
>> >                              However, they confused me ;)
>> >
>> >                              Regarding what you replied, Euphoria seems
>> like a
>> >         programming
>> >                              model/SDK "close" to Beam more than a DSL
>> on top of an
>> >                 existing Beam
>> >                              SDK.
>> >
>> >                              Am I wrong ?
>> >
>> >                              Regards
>> >                              JB
>> >
>> >                              On 12/18/2017 03:44 PM, Jan Lukavský wrote:
>> >
>> >                                  Hi Ismael,
>> >
>> >                                  basically we adopted the Beam's design
>> regarding
>> >                 partitioning
>> >                                  (https://github.com/seznam/eup
>> horia/issues/160
>> >         <https://github.com/seznam/euphoria/issues/160>
>> >                 <https://github.com/seznam/euphoria/issues/160
>> >         <https://github.com/seznam/euphoria/issues/160>>
>> >                                  <https://github.com/seznam/eup
>> horia/issues/160
>> >         <https://github.com/seznam/euphoria/issues/160>
>> >                 <https://github.com/seznam/euphoria/issues/160
>> >         <https://github.com/seznam/euphoria/issues/160>>>) and
>> implemented
>> >                                  the sorting manually
>> >                                  (https://github.com/seznam/eup
>> horia/issues/158
>> >         <https://github.com/seznam/euphoria/issues/158>
>> >                 <https://github.com/seznam/euphoria/issues/158
>> >         <https://github.com/seznam/euphoria/issues/158>>
>> >                                  <https://github.com/seznam/eup
>> horia/issues/158
>> >         <https://github.com/seznam/euphoria/issues/158>
>> >                 <https://github.com/seznam/euphoria/issues/158
>> >         <https://github.com/seznam/euphoria/issues/158>>>). I'm not
>> aware
>> >                                  of the time model differences
>> (Euphoria supports
>> >                 ingestion and
>> >                                  event time, we don't support
>> processing time by
>> >         decision).
>> >                                  Regarding other differences (looking
>> into Beam
>> >         capability
>> >                                  matrix, I'd say that):
>> >
>> >                                     - we don't support stateful FlatMap
>> (i.e.
>> >         ParDo) for now
>> >                                  (https://github.com/seznam/eup
>> horia/issues/192
>> >         <https://github.com/seznam/euphoria/issues/192>
>> >                 <https://github.com/seznam/euphoria/issues/192
>> >         <https://github.com/seznam/euphoria/issues/192>>
>> >                                  <https://github.com/seznam/eup
>> horia/issues/192
>> >         <https://github.com/seznam/euphoria/issues/192>
>> >                 <https://github.com/seznam/euphoria/issues/192
>> >         <https://github.com/seznam/euphoria/issues/192>>>)
>> >
>> >                                     - we don't support side inputs (by
>> decision
>> >         now, but
>> >                 might be
>> >                                  reconsidered) and outputs
>> >                                  (https://github.com/seznam/eup
>> horia/issues/124
>> >         <https://github.com/seznam/euphoria/issues/124>
>> >                 <https://github.com/seznam/euphoria/issues/124
>> >         <https://github.com/seznam/euphoria/issues/124>>
>> >                                  <https://github.com/seznam/eup
>> horia/issues/124
>> >         <https://github.com/seznam/euphoria/issues/124>
>> >                 <https://github.com/seznam/euphoria/issues/124
>> >         <https://github.com/seznam/euphoria/issues/124>>>)
>> >
>> >
>> >                                     - we support complete event-time
>> windows
>> >         (non-merging,
>> >                                  merging, aligned, unaligned) and time
>> control
>> >
>> >                                     - we don't support processing time
>> by
>> >         decision (might be
>> >                                  reconsidered if a valid use-case is
>> found)
>> >
>> >                                     - we support window triggering
>> based on both
>> >         time
>> >                 and data,
>> >                                  including discarding and accumulating
>> (without
>> >                 accumulating &
>> >                                  retracting)
>> >
>> >                                  All our executors (runners) - Flink,
>> Spark and
>> >         Local -
>> >                 implement
>> >                                  the complete model, which we enforce
>> using
>> >         "operator
>> >                 test kit"
>> >                                  that all executors must pass. Spark
>> executor
>> >         supports
>> >                 bounded
>> >                                  sources only (for now). As David said,
>> we currently
>> >                 don't have
>> >                                  serialization abstraction, so there is
>> some
>> >         work to be
>> >                 done in
>> >                                  that regard.
>> >
>> >                                  Our intention is to completely
>> supersede
>> >         Euphoria, we
>> >                 would like
>> >                                  to consider possibility to use
>> executors that
>> >         would not
>> >                 rely on
>> >                                  Beam, but that is optional now and
>> should be
>> >                 straightforward.
>> >
>> >                                  We'd be happy to answer any more
>> questions you
>> >         might
>> >                 have and
>> >                                  thanks a lot!
>> >
>> >                                  Best,
>> >
>> >                                     Jan
>> >
>> >
>> >                                  On 12/18/2017 03:19 PM, Ismaël Mejía
>> wrote:
>> >
>> >                                      Hi,
>> >
>> >                                      It is great to see that you guys
>> have
>> >         achieved a
>> >                 maturity
>> >                                      point to
>> >                                      propose this. Congratulations for
>> your work
>> >         and the
>> >                 idea to
>> >                                      contribute
>> >                                      it into Beam.
>> >
>> >                                      I remember from a previous
>> discussion with Jan
>> >                 about the model
>> >                                      mismatch between Euphoria and
>> Beam, because
>> >         of some
>> >                 design
>> >                                      decisions
>> >                                      of both projects. I remember you
>> guys had some
>> >                 issues with
>> >                                      the way
>> >                                      Beam's sources do partitioning, as
>> well as
>> >         Beam's
>> >                 lack of
>> >                                      sorted data
>> >                                      (on shuffle a la hadoop). Also if I
>> >         remember well
>> >                 the 'time'
>> >                                      model of
>> >                                      Euphoria was simpler than Beam's.
>> I talk
>> >         about all
>> >                 of this
>> >                                      because I
>> >                                      am curious about what parts of the
>> Euphoria
>> >         model
>> >                 you guys
>> >                                      had to
>> >                                      sacrifice to support Beam, and
>> what parts
>> >         of Beam's
>> >                 model
>> >                                      should still
>> >                                      be integrated into Euphoria (and
>> if there is a
>> >                                      straightforward path to
>> >                                      do it).
>> >
>> >                                      If I understand well if this gets
>> merged into
>> >                 Apache this
>> >                                      means that
>> >                                      Euphoria's current implementation
>> would be
>> >                 superseded by
>> >                                      this DSL? I
>> >                                      am curious because I would like to
>> >         understand your
>> >                 level of
>> >                                      investment
>> >                                      on supporting the future of this
>> DSL.
>> >
>> >                                      Thanks and congrats again !
>> >                                      Ismaël
>> >
>> >                                      On Mon, Dec 18, 2017 at 10:12 AM,
>> >         Jean-Baptiste Onofré
>> >                                      <jb@nanthrax.net <mailto:
>> jb@nanthrax.net>
>> >         <mailto:jb@nanthrax.net <ma...@nanthrax.net>>
>> >                 <mailto:jb@nanthrax.net <ma...@nanthrax.net>
>> >         <mailto:jb@nanthrax.net <ma...@nanthrax.net>>>> wrote:
>> >
>> >                                          Depending of the donation, you
>> would
>> >         need ICLA
>> >                 for each
>> >                                          contributor, and
>> >                                          CCLA in addition of SGA.
>> >
>> >                                          We can sync with Davor and I
>> for the
>> >         legal stuff.
>> >                                          However, I would wait a little
>> bit just
>> >         to have
>> >                 feedback
>> >                                          from the whole team
>> >                                          and start a formal vote.
>> >
>> >                                          I would be happy to start the
>> formal vote.
>> >
>> >                                          Regards
>> >                                          JB
>> >
>> >                                          On 12/18/2017 10:03 AM, David
>> Morávek
>> >         wrote:
>> >
>> >                                              Hello,
>> >
>> >                                              Thanks for the awesome
>> feedback!
>> >
>> >                                              Romain:
>> >
>> >                                              We already use Java Stream
>> API in
>> >         all operators
>> >                                              where it makes sense (eg.:
>> >                                              ReduceByKey). Still not
>> sure if it
>> >         was a good
>> >                                              choice, but i can be easily
>> >                                              converted to iterator
>> anyway.
>> >
>> >                                              Side outputs support is
>> coming soon, we
>> >                 already made
>> >                                              an initial work on
>> >                                              this.
>> >
>> >                                              Side inputs are not
>> supported in a
>> >         way you
>> >                 are used
>> >                                              to from beam, because
>> >                                              it can be replaced by Join
>> operator
>> >         on the
>> >                 same key
>> >                                              (if annotated with
>> >                                              broadcastHashJoin, it will
>> be
>> >         turned into
>> >                 map side
>> >                                              join).
>> >
>> >                                              Only significant
>> difference from
>> >         Beam is,
>> >                 that we
>> >                                              decided not to abstract
>> >                                              serialization, so we need
>> to add
>> >         support
>> >                 for Type
>> >                                              Hints, because of type
>> >                                              erasure.
>> >
>> >                                              Fluent API:
>> >
>> >                                              API is fluent within one
>> operator.
>> >         It is
>> >                 designed to
>> >                                              "lead the
>> >                                              programmer", which means,
>> that he
>> >         we'll be only
>> >                                              offered methods that makes
>> >                                              sense after the last
>> method he used
>> >         (eg.: in
>> >                                              ReduceByKey, we know that
>> after
>> >                                              keyBy either reduceBy
>> method should
>> >         come).
>> >                 It is
>> >                                              implemented as a series of
>> >                                              builders.
>> >
>> >                                              Davor:
>> >
>> >                                              Thanks, I'll contact you,
>> and will
>> >         start
>> >                 the process
>> >                                              of having all the
>> >                                              necessary paperwork signed
>> on our
>> >         side, so
>> >                 we can
>> >                                              get things moving.
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >                                              On Mon, Dec 18, 2017 at
>> 7:46 AM, Romain
>> >                 Manni-Bucau
>> >                                              <rmannibucau@gmail.com
>> >         <ma...@gmail.com>
>> >                 <mailto:rmannibucau@gmail.com <mailto:
>> rmannibucau@gmail.com>>
>> >         <mailto:rmannibucau@gmail.com <ma...@gmail.com>
>> >                 <mailto:rmannibucau@gmail.com <mailto:
>> rmannibucau@gmail.com>>>
>> >                                              <mailto:
>> rmannibucau@gmail.com
>> >         <ma...@gmail.com>
>> >                 <mailto:rmannibucau@gmail.com <mailto:
>> rmannibucau@gmail.com>>
>> >                                              <mailto:
>> rmannibucau@gmail.com
>> >         <ma...@gmail.com>
>> >                 <mailto:rmannibucau@gmail.com <mailto:
>> rmannibucau@gmail.com>>>>>
>> >         wrote:
>> >
>> >                                                    Hi guys
>> >
>> >                                                    A DSL would be very
>> welcomed, in
>> >                 particular if
>> >                                              fluent.
>> >
>> >                                                    Open question: did
>> you study
>> >         to implement
>> >                                              Stream API (surely
>> extending
>> >                                              it to
>> >                                                    have a BeamStream
>> and a few more
>> >                 features like
>> >                                              sides etc)? Would be
>> >                                              very
>> >                                                    natural and
>> integrable easily
>> >                 anywhere and
>> >                                              avoid a new API discovery.
>> >
>> >                                                    Hazelcast jet did it
>> so I
>> >         dont see
>> >                 why Beam
>> >                                              couldnt.
>> >
>> >                                                    Le 18 déc. 2017
>> 07:26, "Davor
>> >         Bonaci"
>> >                                              <davor@apache.org
>> >         <ma...@apache.org> <mailto:davor@apache.org
>> >         <ma...@apache.org>>
>> >                 <mailto:davor@apache.org <ma...@apache.org>
>> >         <mailto:davor@apache.org <ma...@apache.org>>>
>> >                                                    <mailto:
>> davor@apache.org
>> >         <ma...@apache.org>
>> >                 <mailto:davor@apache.org <ma...@apache.org>>
>> >
>> >                                              <mailto:davor@apache.org
>> >         <ma...@apache.org>
>> >                 <mailto:davor@apache.org <ma...@apache.org>>>>>
>> a écrit :
>> >
>> >                                                        Hi David,
>> >                                                        As JB noted,
>> merging of
>> >         these two
>> >                 projects
>> >                                              is a great idea. If
>> >                                              fact,
>> >                                                        some of us have
>> had those
>> >                 discussions in
>> >                                              the past.
>> >
>> >                                                        Legally, nothing
>> >         particular is
>> >                 strictly
>> >                                              necessary as the code seem
>> >                                              to
>> >                                                        already be
>> Apache 2.0
>> >         licensed.
>> >                 We don't,
>> >                                              however, want to be
>> >                                              perceived
>> >                                                        as making
>> hostile forks,
>> >         so it
>> >                 would be
>> >                                              great to file a Software
>> >                                              Grant
>> >                                                        Agreement with
>> the ASF
>> >         Secretary.
>> >                 I can
>> >                                              help with the process, as
>> >                                              necessary.
>> >
>> >                                                        Project
>> alignment-wise, there
>> >                 aren't any
>> >                                              particular blockers that
>> >                                              I am
>> >                                                        aware of. We
>> welcome DSLs.
>> >
>> >                                                        Technically, the
>> code
>> >         would start
>> >                 in a
>> >                                              feature branch. During this
>> >                                                        stage, we'd need
>> to
>> >         validate a
>> >                 few things,
>> >                                              including confirmation
>> >                                              the
>> >                                                        code and
>> dependencies
>> >         match the ASF
>> >                                              policy, automate testing in
>> >                                              Beam's
>> >                                                        tooling, etc. At
>> that
>> >         point, we'd
>> >                 take a
>> >                                              community vote to accept
>> >                                              the
>> >                                                        component into
>> master,
>> >         and consider
>> >                                              author(s) for
>> committership in
>> >                                              the
>> >                                                        overall project.
>> >
>> >                                                        Welcome to the
>> ASF and
>> >         Beam -- we are
>> >                                              thrilled to have you! Hope
>> >                                              this
>> >                                                        helps, and
>> please reach
>> >         out if
>> >                 anybody on
>> >                                              our end can help,
>> >                                              including JB
>> >                                                        or myself.
>> >
>> >                                                        Davor
>> >
>> >
>> >                                                        On Sun, Dec 17,
>> 2017 at
>> >         10:13 AM,
>> >                                              Jean-Baptiste Onofré
>> >                                              <jb@nanthrax.net
>> >         <ma...@nanthrax.net> <mailto:jb@nanthrax.net <mailto:
>> jb@nanthrax.net>>
>> >                 <mailto:jb@nanthrax.net <ma...@nanthrax.net>
>> >         <mailto:jb@nanthrax.net <ma...@nanthrax.net>>>
>> >                                                        <mailto:
>> jb@nanthrax.net
>> >         <ma...@nanthrax.net>
>> >                 <mailto:jb@nanthrax.net <ma...@nanthrax.net>>
>> >
>> >                                              <mailto:jb@nanthrax.net
>> >         <ma...@nanthrax.net>
>> >                 <mailto:jb@nanthrax.net <ma...@nanthrax.net>>>>>
>> wrote:
>> >
>> >                                                            Hi David,
>> >
>> >                                                            Generally
>> speaking,
>> >         having
>> >                 different
>> >                                              fluent DSL on top of the
>> >                                              Beam
>> >                                                            SDK is great.
>> >
>> >                                                            I would like
>> to take
>> >         a look
>> >                 on your
>> >                                              wordcount examples to give
>> >                                              you a
>> >                                                            complete
>> feedback. I
>> >         like the
>> >                 idea and
>> >                                              a fluent Java DSL is
>> >                                              valuable.
>> >
>> >                                                            Let's wait
>> feedback from
>> >                 others. If we
>> >                                              have a consensus, then
>> >                                              I
>> >                                                            would be
>> more than
>> >         happy to
>> >                 help you
>> >                                              for the donation (I
>> >                                              worked on
>> >                                                            the Camel
>> Java DSL
>> >         while ago,
>> >                 so I
>> >                                              have some experience here).
>> >
>> >                                                            Thanks !
>> >                                                            Regards
>> >                                                            JB
>> >
>> >                                                            On
>> 12/17/2017 07:00
>> >         PM, David
>> >                 Morávek
>> >                                              wrote:
>> >
>> >                                                                Hello,
>> >
>> >
>> >                                                                First of
>> all,
>> >         thanks for the
>> >                                              amazing work the Apache
>> Beam
>> >
>> community is doing!
>> >
>> >
>> >                                                                In 2014,
>> we've
>> >         started
>> >                 development
>> >                                              of the runtime
>> >                                              independent
>> >                                                                Java 8
>> API, that
>> >         helps us to
>> >                                              create unified big-data
>> >                                              processing
>> >                                                                flows.
>> It has
>> >         been used
>> >                 as a core
>> >                                              building block of
>> >                                              Seznam.cz
>> >                                                                web
>> crawler data
>> >                 infrastructure
>> >                                              every since. Its design
>> >
>> principles and
>> >         execution
>> >                 model are
>> >                                              very similar to Apache
>> >                                              Beam.
>> >
>> >
>> >                                                                This API
>> was open
>> >         sourced
>> >                 in 2016,
>> >                                              under the name Euphoria
>> >                                              API:
>> >
>> >                 https://github.com/seznam/euphoria
>> >         <https://github.com/seznam/euphoria> <
>> https://github.com/seznam/euphoria
>> >         <https://github.com/seznam/euphoria>>
>> >                                              <
>> https://github.com/seznam/euphoria
>> >         <https://github.com/seznam/euphoria>
>> >                 <https://github.com/seznam/euphoria
>> >         <https://github.com/seznam/euphoria>>>
>> >                                              <
>> https://github.com/seznam/euphoria
>> >         <https://github.com/seznam/euphoria>
>> >                 <https://github.com/seznam/euphoria
>> >         <https://github.com/seznam/euphoria>>
>> >                                              <
>> https://github.com/seznam/euphoria
>> >         <https://github.com/seznam/euphoria>
>> >                 <https://github.com/seznam/euphoria
>> >         <https://github.com/seznam/euphoria>>>>
>> >
>> >
>> >                                                                As it is
>> very
>> >         similar to
>> >                 Apache
>> >                                              Beam, we feel, that it is
>> >                                              not
>> >                                                                worth of
>> duplicating
>> >                 effort in
>> >                                              terms of development of new
>> >                                                                runtimes
>> and
>> >         fine-tuning of
>> >                                              current ones.
>> >
>> >
>> >                                                                The main
>> blocker
>> >         for us
>> >                 to switch
>> >                                              to Apache Beam is lack
>> >                                              of the
>> >                                                                Java 8
>> API. *W*e
>> >         propose the
>> >                                              integration of Euphoria API
>> >                                              into
>> >                                                                Apache
>> Beam as a
>> >         Java 8
>> >                 DSL, in
>> >                                              order to share our effort
>> >                                              with
>> >                                                                the
>> community.
>> >
>> >
>> >                                                                Simple
>> example of the
>> >                 Euphoria API
>> >                                              usage, can be found
>> >                                              here:
>> >
>> >
>> >
>> >         https://github.com/seznam/euphoria/tree/master/euphoria-exa
>> mples/src/main/java/cz/seznam/euphoria/examples/wordcount
>> >         <https://github.com/seznam/euphoria/tree/master/euphoria-ex
>> amples/src/main/java/cz/seznam/euphoria/examples/wordcount>
>> >
>> >         <https://github.com/seznam/euphoria/tree/master/euphoria-ex
>> amples/src/main/java/cz/seznam/euphoria/examples/wordcount
>> >         <https://github.com/seznam/euphoria/tree/master/euphoria-ex
>> amples/src/main/java/cz/seznam/euphoria/examples/wordcount>>
>> >
>> >         <https://github.com/seznam/euphoria/tree/master/euphoria-ex
>> amples/src/main/java/cz/seznam/euphoria/examples/wordcount
>> >         <https://github.com/seznam/euphoria/tree/master/euphoria-ex
>> amples/src/main/java/cz/seznam/euphoria/examples/wordcount>
>> >
>> >         <https://github.com/seznam/euphoria/tree/master/euphoria-ex
>> amples/src/main/java/cz/seznam/euphoria/examples/wordcount
>> >         <https://github.com/seznam/euphoria/tree/master/euphoria-ex
>> amples/src/main/java/cz/seznam/euphoria/examples/wordcount>>>
>> >
>> >
>> >
>> >         <https://github.com/seznam/euphoria/tree/master/euphoria-ex
>> amples/src/main/java/cz/seznam/euphoria/examples/wordcount
>> >         <https://github.com/seznam/euphoria/tree/master/euphoria-ex
>> amples/src/main/java/cz/seznam/euphoria/examples/wordcount>
>> >
>> >         <https://github.com/seznam/euphoria/tree/master/euphoria-ex
>> amples/src/main/java/cz/seznam/euphoria/examples/wordcount
>> >         <https://github.com/seznam/euphoria/tree/master/euphoria-ex
>> amples/src/main/java/cz/seznam/euphoria/examples/wordcount>>
>> >
>> >         <https://github.com/seznam/euphoria/tree/master/euphoria-ex
>> amples/src/main/java/cz/seznam/euphoria/examples/wordcount
>> >         <https://github.com/seznam/euphoria/tree/master/euphoria-ex
>> amples/src/main/java/cz/seznam/euphoria/examples/wordcount>
>> >
>> >         <https://github.com/seznam/euphoria/tree/master/euphoria-ex
>> amples/src/main/java/cz/seznam/euphoria/examples/wordcount
>> >         <https://github.com/seznam/euphoria/tree/master/euphoria-ex
>> amples/src/main/java/cz/seznam/euphoria/examples/wordcount>>>>
>> >
>> >
>> >
>> >                                                                If you
>> feel, that
>> >         Beam
>> >                 community
>> >                                              could leverage from our
>> >                                              work,
>> >                                                                we would
>> love to
>> >         start
>> >                 working on
>> >                                              Euphoria integration
>> >                                              into
>> >                                                                Apache
>> Beam (we
>> >         already
>> >                 have a
>> >                                              working POC, with few basic
>> >                                                                operators
>> >         implemented).
>> >
>> >
>> >                                                                I look
>> forward to
>> >         hearing
>> >                 from you,
>> >
>> >                                                                David
>> >
>> >
>> >
>> --
>> >         Jean-Baptiste
>> >                 Onofré
>> >                 jbonofre@apache.org <ma...@apache.org>
>> >         <mailto:jbonofre@apache.org <ma...@apache.org>>
>> >                 <mailto:jbonofre@apache.org <mailto:jbonofre@apache.org
>> >
>> >         <mailto:jbonofre@apache.org <ma...@apache.org>>>
>> >                                              <mailto:
>> jbonofre@apache.org
>> >         <ma...@apache.org>
>> >                 <mailto:jbonofre@apache.org <mailto:jbonofre@apache.org
>> >>
>> >                                              <mailto:
>> jbonofre@apache.org
>> >         <ma...@apache.org>
>> >                 <mailto:jbonofre@apache.org <mailto:jbonofre@apache.org
>> >>>>
>> >                 http://blog.nanthrax.net
>> >                                                            Talend -
>> >         http://www.talend.com
>> >
>> >
>> >
>> >
>> >
>> >                                              --
>>      s
>> >         pozdravem
>> >
>> >                                              David Morávek
>> >
>> >
>> >                                          --
>> >          Jean-Baptiste Onofré
>> >                 jbonofre@apache.org <ma...@apache.org>
>> >         <mailto:jbonofre@apache.org <ma...@apache.org>>
>> >                 <mailto:jbonofre@apache.org <mailto:jbonofre@apache.org
>> >
>> >         <mailto:jbonofre@apache.org <ma...@apache.org>>>
>> >                 http://blog.nanthrax.net
>> >                                          Talend - http://www.talend.com
>> >
>> >
>> >
>> >
>> >
>> >                      --     Jean-Baptiste Onofré
>> >                 jbonofre@apache.org <ma...@apache.org>
>> >         <mailto:jbonofre@apache.org <ma...@apache.org>>
>> >                 <mailto:jbonofre@apache.org <mailto:jbonofre@apache.org
>> >
>> >         <mailto:jbonofre@apache.org <ma...@apache.org>>>
>> >                 http://blog.nanthrax.net
>> >                      Talend - http://www.talend.com
>> >
>> >
>> >
>> >
>> >                 --         s pozdravem
>> >
>> >                 David Morávek
>> >
>> >
>> >             --     Jean-Baptiste Onofré
>> >             jbonofre@apache.org <ma...@apache.org>
>> >         <mailto:jbonofre@apache.org <ma...@apache.org>>
>> >             http://blog.nanthrax.net
>> >             Talend - http://www.talend.com
>> >
>> >
>> >
>> >     --
>> >     Jean-Baptiste Onofré
>> >     jbonofre@apache.org <ma...@apache.org>
>> >     http://blog.nanthrax.net
>> >     Talend - http://www.talend.com
>> >
>> >
>>
>> --
>> Jean-Baptiste Onofré
>> jbonofre@apache.org
>> http://blog.nanthrax.net
>> Talend - http://www.talend.com
>>
>>
>

Re: Euphoria Java 8 DSL - proposal

Posted by David Morávek <da...@gmail.com>.
Hi Davor,

sorry for the delay, we were blocked by our legal department. I've send
both SGA and CCLA to private@apache.beam.org, please let me know if you
need anything else.

Regards,
David

On Mon, Feb 19, 2018 at 6:13 AM, Jean-Baptiste Onofré <jb...@nanthrax.net>
wrote:

> Hi Davor,
>
> We still have some discussion/paperwork on Euphoria side (SGA, ...).
>
> So, it's on track but it takes a little more time than expected.
>
> Regards
> JB
>
> On 02/19/2018 05:40 AM, Davor Bonaci wrote:
> > I may have missed things, but any update on the progress of this
> donation?
> >
> > On Tue, Jan 2, 2018 at 10:52 PM, Jean-Baptiste Onofré <jb@nanthrax.net
> > <ma...@nanthrax.net>> wrote:
> >
> >     Great !
> >
> >     Thanks !
> >     Regards
> >     JB
> >
> >     On 01/03/2018 07:29 AM, David Morávek wrote:
> >
> >         Hello JB,
> >
> >         Perfect! I'm already on the Beam Slack workspace, I'll contact
> you once
> >         I get to the office.
> >
> >         Thanks!
> >         D.
> >
> >         On Wed, Jan 3, 2018 at 6:19 AM, Jean-Baptiste Onofré <
> jb@nanthrax.net
> >         <ma...@nanthrax.net> <mailto:jb@nanthrax.net
> >         <ma...@nanthrax.net>>> wrote:
> >
> >             Hi David,
> >
> >             absolutely !! Let's move forward on the preparation steps.
> >
> >             Are you on Slack and/or hangout to plan this ?
> >
> >             Thanks,
> >             Regards
> >             JB
> >
> >             On 01/02/2018 05:35 PM, David Morávek wrote:
> >
> >                 Hello JB,
> >
> >                 can we help in any way to move things forward?
> >
> >                 Thanks,
> >                 D.
> >
> >                 On Mon, Dec 18, 2017 at 4:28 PM, Jean-Baptiste Onofré
> >         <jb@nanthrax.net <ma...@nanthrax.net>
> >                 <mailto:jb@nanthrax.net <ma...@nanthrax.net>>
> >         <mailto:jb@nanthrax.net <ma...@nanthrax.net>
> >                 <mailto:jb@nanthrax.net <ma...@nanthrax.net>>>>
> wrote:
> >
> >                      Thanks Jan,
> >
> >                      It makes sense.
> >
> >                      Let me take a look on the code to understand the
> "interaction".
> >
> >                      Regards
> >                      JB
> >
> >
> >                      On 12/18/2017 04:26 PM, Jan Lukavský wrote:
> >
> >                          Hi JB,
> >
> >                          basically you are not wrong. The project
> started about
> >         three or
> >                 four
> >                          years ago with a goal to unify batch and
> streaming
> >         processing into
> >                          single portable, executor independent API.
> Because of
> >         that, it is
> >                          currently "close" to Beam in this sense. But we
> don't
> >         see much
> >                 added
> >                          value keeping this as a separate project, with
> one of
> >         the key
> >                          differences to be the API (not the model
> itself), so we
> >         would
> >                 like to
> >                          focus on translation from Euphoria API to
> Beam's SDK.
> >         That's why we
> >                          would like to see it as a DSL, so that it would
> be
> >         possible to use
> >                          Euphoria API with Beam's runners as much
> natively as
> >         possible.
> >
> >                          I hope I didn't make the subject even more
> unclear, if
> >         so, I'll
> >                 be happy
> >                          to explain anything in more detail. :-)
> >
> >                              Jan
> >
> >
> >                          On 12/18/2017 04:08 PM, Jean-Baptiste Onofré
> wrote:
> >
> >                              Hi Jan,
> >
> >                              Thanks for your answers.
> >
> >                              However, they confused me ;)
> >
> >                              Regarding what you replied, Euphoria seems
> like a
> >         programming
> >                              model/SDK "close" to Beam more than a DSL
> on top of an
> >                 existing Beam
> >                              SDK.
> >
> >                              Am I wrong ?
> >
> >                              Regards
> >                              JB
> >
> >                              On 12/18/2017 03:44 PM, Jan Lukavský wrote:
> >
> >                                  Hi Ismael,
> >
> >                                  basically we adopted the Beam's design
> regarding
> >                 partitioning
> >                                  (https://github.com/seznam/
> euphoria/issues/160
> >         <https://github.com/seznam/euphoria/issues/160>
> >                 <https://github.com/seznam/euphoria/issues/160
> >         <https://github.com/seznam/euphoria/issues/160>>
> >                                  <https://github.com/seznam/
> euphoria/issues/160
> >         <https://github.com/seznam/euphoria/issues/160>
> >                 <https://github.com/seznam/euphoria/issues/160
> >         <https://github.com/seznam/euphoria/issues/160>>>) and
> implemented
> >                                  the sorting manually
> >                                  (https://github.com/seznam/
> euphoria/issues/158
> >         <https://github.com/seznam/euphoria/issues/158>
> >                 <https://github.com/seznam/euphoria/issues/158
> >         <https://github.com/seznam/euphoria/issues/158>>
> >                                  <https://github.com/seznam/
> euphoria/issues/158
> >         <https://github.com/seznam/euphoria/issues/158>
> >                 <https://github.com/seznam/euphoria/issues/158
> >         <https://github.com/seznam/euphoria/issues/158>>>). I'm not
> aware
> >                                  of the time model differences (Euphoria
> supports
> >                 ingestion and
> >                                  event time, we don't support processing
> time by
> >         decision).
> >                                  Regarding other differences (looking
> into Beam
> >         capability
> >                                  matrix, I'd say that):
> >
> >                                     - we don't support stateful FlatMap
> (i.e.
> >         ParDo) for now
> >                                  (https://github.com/seznam/
> euphoria/issues/192
> >         <https://github.com/seznam/euphoria/issues/192>
> >                 <https://github.com/seznam/euphoria/issues/192
> >         <https://github.com/seznam/euphoria/issues/192>>
> >                                  <https://github.com/seznam/
> euphoria/issues/192
> >         <https://github.com/seznam/euphoria/issues/192>
> >                 <https://github.com/seznam/euphoria/issues/192
> >         <https://github.com/seznam/euphoria/issues/192>>>)
> >
> >                                     - we don't support side inputs (by
> decision
> >         now, but
> >                 might be
> >                                  reconsidered) and outputs
> >                                  (https://github.com/seznam/
> euphoria/issues/124
> >         <https://github.com/seznam/euphoria/issues/124>
> >                 <https://github.com/seznam/euphoria/issues/124
> >         <https://github.com/seznam/euphoria/issues/124>>
> >                                  <https://github.com/seznam/
> euphoria/issues/124
> >         <https://github.com/seznam/euphoria/issues/124>
> >                 <https://github.com/seznam/euphoria/issues/124
> >         <https://github.com/seznam/euphoria/issues/124>>>)
> >
> >
> >                                     - we support complete event-time
> windows
> >         (non-merging,
> >                                  merging, aligned, unaligned) and time
> control
> >
> >                                     - we don't support processing time by
> >         decision (might be
> >                                  reconsidered if a valid use-case is
> found)
> >
> >                                     - we support window triggering based
> on both
> >         time
> >                 and data,
> >                                  including discarding and accumulating
> (without
> >                 accumulating &
> >                                  retracting)
> >
> >                                  All our executors (runners) - Flink,
> Spark and
> >         Local -
> >                 implement
> >                                  the complete model, which we enforce
> using
> >         "operator
> >                 test kit"
> >                                  that all executors must pass. Spark
> executor
> >         supports
> >                 bounded
> >                                  sources only (for now). As David said,
> we currently
> >                 don't have
> >                                  serialization abstraction, so there is
> some
> >         work to be
> >                 done in
> >                                  that regard.
> >
> >                                  Our intention is to completely supersede
> >         Euphoria, we
> >                 would like
> >                                  to consider possibility to use
> executors that
> >         would not
> >                 rely on
> >                                  Beam, but that is optional now and
> should be
> >                 straightforward.
> >
> >                                  We'd be happy to answer any more
> questions you
> >         might
> >                 have and
> >                                  thanks a lot!
> >
> >                                  Best,
> >
> >                                     Jan
> >
> >
> >                                  On 12/18/2017 03:19 PM, Ismaël Mejía
> wrote:
> >
> >                                      Hi,
> >
> >                                      It is great to see that you guys
> have
> >         achieved a
> >                 maturity
> >                                      point to
> >                                      propose this. Congratulations for
> your work
> >         and the
> >                 idea to
> >                                      contribute
> >                                      it into Beam.
> >
> >                                      I remember from a previous
> discussion with Jan
> >                 about the model
> >                                      mismatch between Euphoria and Beam,
> because
> >         of some
> >                 design
> >                                      decisions
> >                                      of both projects. I remember you
> guys had some
> >                 issues with
> >                                      the way
> >                                      Beam's sources do partitioning, as
> well as
> >         Beam's
> >                 lack of
> >                                      sorted data
> >                                      (on shuffle a la hadoop). Also if I
> >         remember well
> >                 the 'time'
> >                                      model of
> >                                      Euphoria was simpler than Beam's. I
> talk
> >         about all
> >                 of this
> >                                      because I
> >                                      am curious about what parts of the
> Euphoria
> >         model
> >                 you guys
> >                                      had to
> >                                      sacrifice to support Beam, and what
> parts
> >         of Beam's
> >                 model
> >                                      should still
> >                                      be integrated into Euphoria (and if
> there is a
> >                                      straightforward path to
> >                                      do it).
> >
> >                                      If I understand well if this gets
> merged into
> >                 Apache this
> >                                      means that
> >                                      Euphoria's current implementation
> would be
> >                 superseded by
> >                                      this DSL? I
> >                                      am curious because I would like to
> >         understand your
> >                 level of
> >                                      investment
> >                                      on supporting the future of this
> DSL.
> >
> >                                      Thanks and congrats again !
> >                                      Ismaël
> >
> >                                      On Mon, Dec 18, 2017 at 10:12 AM,
> >         Jean-Baptiste Onofré
> >                                      <jb@nanthrax.net <mailto:
> jb@nanthrax.net>
> >         <mailto:jb@nanthrax.net <ma...@nanthrax.net>>
> >                 <mailto:jb@nanthrax.net <ma...@nanthrax.net>
> >         <mailto:jb@nanthrax.net <ma...@nanthrax.net>>>> wrote:
> >
> >                                          Depending of the donation, you
> would
> >         need ICLA
> >                 for each
> >                                          contributor, and
> >                                          CCLA in addition of SGA.
> >
> >                                          We can sync with Davor and I
> for the
> >         legal stuff.
> >                                          However, I would wait a little
> bit just
> >         to have
> >                 feedback
> >                                          from the whole team
> >                                          and start a formal vote.
> >
> >                                          I would be happy to start the
> formal vote.
> >
> >                                          Regards
> >                                          JB
> >
> >                                          On 12/18/2017 10:03 AM, David
> Morávek
> >         wrote:
> >
> >                                              Hello,
> >
> >                                              Thanks for the awesome
> feedback!
> >
> >                                              Romain:
> >
> >                                              We already use Java Stream
> API in
> >         all operators
> >                                              where it makes sense (eg.:
> >                                              ReduceByKey). Still not
> sure if it
> >         was a good
> >                                              choice, but i can be easily
> >                                              converted to iterator
> anyway.
> >
> >                                              Side outputs support is
> coming soon, we
> >                 already made
> >                                              an initial work on
> >                                              this.
> >
> >                                              Side inputs are not
> supported in a
> >         way you
> >                 are used
> >                                              to from beam, because
> >                                              it can be replaced by Join
> operator
> >         on the
> >                 same key
> >                                              (if annotated with
> >                                              broadcastHashJoin, it will
> be
> >         turned into
> >                 map side
> >                                              join).
> >
> >                                              Only significant difference
> from
> >         Beam is,
> >                 that we
> >                                              decided not to abstract
> >                                              serialization, so we need
> to add
> >         support
> >                 for Type
> >                                              Hints, because of type
> >                                              erasure.
> >
> >                                              Fluent API:
> >
> >                                              API is fluent within one
> operator.
> >         It is
> >                 designed to
> >                                              "lead the
> >                                              programmer", which means,
> that he
> >         we'll be only
> >                                              offered methods that makes
> >                                              sense after the last method
> he used
> >         (eg.: in
> >                                              ReduceByKey, we know that
> after
> >                                              keyBy either reduceBy
> method should
> >         come).
> >                 It is
> >                                              implemented as a series of
> >                                              builders.
> >
> >                                              Davor:
> >
> >                                              Thanks, I'll contact you,
> and will
> >         start
> >                 the process
> >                                              of having all the
> >                                              necessary paperwork signed
> on our
> >         side, so
> >                 we can
> >                                              get things moving.
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >                                              On Mon, Dec 18, 2017 at
> 7:46 AM, Romain
> >                 Manni-Bucau
> >                                              <rmannibucau@gmail.com
> >         <ma...@gmail.com>
> >                 <mailto:rmannibucau@gmail.com <mailto:
> rmannibucau@gmail.com>>
> >         <mailto:rmannibucau@gmail.com <ma...@gmail.com>
> >                 <mailto:rmannibucau@gmail.com <mailto:
> rmannibucau@gmail.com>>>
> >                                              <mailto:
> rmannibucau@gmail.com
> >         <ma...@gmail.com>
> >                 <mailto:rmannibucau@gmail.com <mailto:
> rmannibucau@gmail.com>>
> >                                              <mailto:
> rmannibucau@gmail.com
> >         <ma...@gmail.com>
> >                 <mailto:rmannibucau@gmail.com <mailto:
> rmannibucau@gmail.com>>>>>
> >         wrote:
> >
> >                                                    Hi guys
> >
> >                                                    A DSL would be very
> welcomed, in
> >                 particular if
> >                                              fluent.
> >
> >                                                    Open question: did
> you study
> >         to implement
> >                                              Stream API (surely extending
> >                                              it to
> >                                                    have a BeamStream and
> a few more
> >                 features like
> >                                              sides etc)? Would be
> >                                              very
> >                                                    natural and
> integrable easily
> >                 anywhere and
> >                                              avoid a new API discovery.
> >
> >                                                    Hazelcast jet did it
> so I
> >         dont see
> >                 why Beam
> >                                              couldnt.
> >
> >                                                    Le 18 déc. 2017
> 07:26, "Davor
> >         Bonaci"
> >                                              <davor@apache.org
> >         <ma...@apache.org> <mailto:davor@apache.org
> >         <ma...@apache.org>>
> >                 <mailto:davor@apache.org <ma...@apache.org>
> >         <mailto:davor@apache.org <ma...@apache.org>>>
> >                                                    <mailto:
> davor@apache.org
> >         <ma...@apache.org>
> >                 <mailto:davor@apache.org <ma...@apache.org>>
> >
> >                                              <mailto:davor@apache.org
> >         <ma...@apache.org>
> >                 <mailto:davor@apache.org <ma...@apache.org>>>>>
> a écrit :
> >
> >                                                        Hi David,
> >                                                        As JB noted,
> merging of
> >         these two
> >                 projects
> >                                              is a great idea. If
> >                                              fact,
> >                                                        some of us have
> had those
> >                 discussions in
> >                                              the past.
> >
> >                                                        Legally, nothing
> >         particular is
> >                 strictly
> >                                              necessary as the code seem
> >                                              to
> >                                                        already be Apache
> 2.0
> >         licensed.
> >                 We don't,
> >                                              however, want to be
> >                                              perceived
> >                                                        as making hostile
> forks,
> >         so it
> >                 would be
> >                                              great to file a Software
> >                                              Grant
> >                                                        Agreement with
> the ASF
> >         Secretary.
> >                 I can
> >                                              help with the process, as
> >                                              necessary.
> >
> >                                                        Project
> alignment-wise, there
> >                 aren't any
> >                                              particular blockers that
> >                                              I am
> >                                                        aware of. We
> welcome DSLs.
> >
> >                                                        Technically, the
> code
> >         would start
> >                 in a
> >                                              feature branch. During this
> >                                                        stage, we'd need
> to
> >         validate a
> >                 few things,
> >                                              including confirmation
> >                                              the
> >                                                        code and
> dependencies
> >         match the ASF
> >                                              policy, automate testing in
> >                                              Beam's
> >                                                        tooling, etc. At
> that
> >         point, we'd
> >                 take a
> >                                              community vote to accept
> >                                              the
> >                                                        component into
> master,
> >         and consider
> >                                              author(s) for committership
> in
> >                                              the
> >                                                        overall project.
> >
> >                                                        Welcome to the
> ASF and
> >         Beam -- we are
> >                                              thrilled to have you! Hope
> >                                              this
> >                                                        helps, and please
> reach
> >         out if
> >                 anybody on
> >                                              our end can help,
> >                                              including JB
> >                                                        or myself.
> >
> >                                                        Davor
> >
> >
> >                                                        On Sun, Dec 17,
> 2017 at
> >         10:13 AM,
> >                                              Jean-Baptiste Onofré
> >                                              <jb@nanthrax.net
> >         <ma...@nanthrax.net> <mailto:jb@nanthrax.net <mailto:
> jb@nanthrax.net>>
> >                 <mailto:jb@nanthrax.net <ma...@nanthrax.net>
> >         <mailto:jb@nanthrax.net <ma...@nanthrax.net>>>
> >                                                        <mailto:
> jb@nanthrax.net
> >         <ma...@nanthrax.net>
> >                 <mailto:jb@nanthrax.net <ma...@nanthrax.net>>
> >
> >                                              <mailto:jb@nanthrax.net
> >         <ma...@nanthrax.net>
> >                 <mailto:jb@nanthrax.net <ma...@nanthrax.net>>>>>
> wrote:
> >
> >                                                            Hi David,
> >
> >                                                            Generally
> speaking,
> >         having
> >                 different
> >                                              fluent DSL on top of the
> >                                              Beam
> >                                                            SDK is great.
> >
> >                                                            I would like
> to take
> >         a look
> >                 on your
> >                                              wordcount examples to give
> >                                              you a
> >                                                            complete
> feedback. I
> >         like the
> >                 idea and
> >                                              a fluent Java DSL is
> >                                              valuable.
> >
> >                                                            Let's wait
> feedback from
> >                 others. If we
> >                                              have a consensus, then
> >                                              I
> >                                                            would be more
> than
> >         happy to
> >                 help you
> >                                              for the donation (I
> >                                              worked on
> >                                                            the Camel
> Java DSL
> >         while ago,
> >                 so I
> >                                              have some experience here).
> >
> >                                                            Thanks !
> >                                                            Regards
> >                                                            JB
> >
> >                                                            On 12/17/2017
> 07:00
> >         PM, David
> >                 Morávek
> >                                              wrote:
> >
> >                                                                Hello,
> >
> >
> >                                                                First of
> all,
> >         thanks for the
> >                                              amazing work the Apache Beam
> >                                                                community
> is doing!
> >
> >
> >                                                                In 2014,
> we've
> >         started
> >                 development
> >                                              of the runtime
> >                                              independent
> >                                                                Java 8
> API, that
> >         helps us to
> >                                              create unified big-data
> >                                              processing
> >                                                                flows. It
> has
> >         been used
> >                 as a core
> >                                              building block of
> >                                              Seznam.cz
> >                                                                web
> crawler data
> >                 infrastructure
> >                                              every since. Its design
> >
> principles and
> >         execution
> >                 model are
> >                                              very similar to Apache
> >                                              Beam.
> >
> >
> >                                                                This API
> was open
> >         sourced
> >                 in 2016,
> >                                              under the name Euphoria
> >                                              API:
> >
> >                 https://github.com/seznam/euphoria
> >         <https://github.com/seznam/euphoria> <https://github.com/seznam/
> euphoria
> >         <https://github.com/seznam/euphoria>>
> >                                              <https://github.com/seznam/
> euphoria
> >         <https://github.com/seznam/euphoria>
> >                 <https://github.com/seznam/euphoria
> >         <https://github.com/seznam/euphoria>>>
> >                                              <https://github.com/seznam/
> euphoria
> >         <https://github.com/seznam/euphoria>
> >                 <https://github.com/seznam/euphoria
> >         <https://github.com/seznam/euphoria>>
> >                                              <https://github.com/seznam/
> euphoria
> >         <https://github.com/seznam/euphoria>
> >                 <https://github.com/seznam/euphoria
> >         <https://github.com/seznam/euphoria>>>>
> >
> >
> >                                                                As it is
> very
> >         similar to
> >                 Apache
> >                                              Beam, we feel, that it is
> >                                              not
> >                                                                worth of
> duplicating
> >                 effort in
> >                                              terms of development of new
> >                                                                runtimes
> and
> >         fine-tuning of
> >                                              current ones.
> >
> >
> >                                                                The main
> blocker
> >         for us
> >                 to switch
> >                                              to Apache Beam is lack
> >                                              of the
> >                                                                Java 8
> API. *W*e
> >         propose the
> >                                              integration of Euphoria API
> >                                              into
> >                                                                Apache
> Beam as a
> >         Java 8
> >                 DSL, in
> >                                              order to share our effort
> >                                              with
> >                                                                the
> community.
> >
> >
> >                                                                Simple
> example of the
> >                 Euphoria API
> >                                              usage, can be found
> >                                              here:
> >
> >
> >
> >         https://github.com/seznam/euphoria/tree/master/euphoria-
> examples/src/main/java/cz/seznam/euphoria/examples/wordcount
> >         <https://github.com/seznam/euphoria/tree/master/euphoria-
> examples/src/main/java/cz/seznam/euphoria/examples/wordcount>
> >
> >         <https://github.com/seznam/euphoria/tree/master/euphoria-
> examples/src/main/java/cz/seznam/euphoria/examples/wordcount
> >         <https://github.com/seznam/euphoria/tree/master/euphoria-
> examples/src/main/java/cz/seznam/euphoria/examples/wordcount>>
> >
> >         <https://github.com/seznam/euphoria/tree/master/euphoria-
> examples/src/main/java/cz/seznam/euphoria/examples/wordcount
> >         <https://github.com/seznam/euphoria/tree/master/euphoria-
> examples/src/main/java/cz/seznam/euphoria/examples/wordcount>
> >
> >         <https://github.com/seznam/euphoria/tree/master/euphoria-
> examples/src/main/java/cz/seznam/euphoria/examples/wordcount
> >         <https://github.com/seznam/euphoria/tree/master/euphoria-
> examples/src/main/java/cz/seznam/euphoria/examples/wordcount>>>
> >
> >
> >
> >         <https://github.com/seznam/euphoria/tree/master/euphoria-
> examples/src/main/java/cz/seznam/euphoria/examples/wordcount
> >         <https://github.com/seznam/euphoria/tree/master/euphoria-
> examples/src/main/java/cz/seznam/euphoria/examples/wordcount>
> >
> >         <https://github.com/seznam/euphoria/tree/master/euphoria-
> examples/src/main/java/cz/seznam/euphoria/examples/wordcount
> >         <https://github.com/seznam/euphoria/tree/master/euphoria-
> examples/src/main/java/cz/seznam/euphoria/examples/wordcount>>
> >
> >         <https://github.com/seznam/euphoria/tree/master/euphoria-
> examples/src/main/java/cz/seznam/euphoria/examples/wordcount
> >         <https://github.com/seznam/euphoria/tree/master/euphoria-
> examples/src/main/java/cz/seznam/euphoria/examples/wordcount>
> >
> >         <https://github.com/seznam/euphoria/tree/master/euphoria-
> examples/src/main/java/cz/seznam/euphoria/examples/wordcount
> >         <https://github.com/seznam/euphoria/tree/master/euphoria-
> examples/src/main/java/cz/seznam/euphoria/examples/wordcount>>>>
> >
> >
> >
> >                                                                If you
> feel, that
> >         Beam
> >                 community
> >                                              could leverage from our
> >                                              work,
> >                                                                we would
> love to
> >         start
> >                 working on
> >                                              Euphoria integration
> >                                              into
> >                                                                Apache
> Beam (we
> >         already
> >                 have a
> >                                              working POC, with few basic
> >                                                                operators
> >         implemented).
> >
> >
> >                                                                I look
> forward to
> >         hearing
> >                 from you,
> >
> >                                                                David
> >
> >
> >                                                            --
> >         Jean-Baptiste
> >                 Onofré
> >                 jbonofre@apache.org <ma...@apache.org>
> >         <mailto:jbonofre@apache.org <ma...@apache.org>>
> >                 <mailto:jbonofre@apache.org <ma...@apache.org>
> >         <mailto:jbonofre@apache.org <ma...@apache.org>>>
> >                                              <mailto:jbonofre@apache.org
> >         <ma...@apache.org>
> >                 <mailto:jbonofre@apache.org <mailto:jbonofre@apache.org
> >>
> >                                              <mailto:jbonofre@apache.org
> >         <ma...@apache.org>
> >                 <mailto:jbonofre@apache.org <mailto:jbonofre@apache.org
> >>>>
> >                 http://blog.nanthrax.net
> >                                                            Talend -
> >         http://www.talend.com
> >
> >
> >
> >
> >
> >                                              --
>    s
> >         pozdravem
> >
> >                                              David Morávek
> >
> >
> >                                          --
> >          Jean-Baptiste Onofré
> >                 jbonofre@apache.org <ma...@apache.org>
> >         <mailto:jbonofre@apache.org <ma...@apache.org>>
> >                 <mailto:jbonofre@apache.org <ma...@apache.org>
> >         <mailto:jbonofre@apache.org <ma...@apache.org>>>
> >                 http://blog.nanthrax.net
> >                                          Talend - http://www.talend.com
> >
> >
> >
> >
> >
> >                      --     Jean-Baptiste Onofré
> >                 jbonofre@apache.org <ma...@apache.org>
> >         <mailto:jbonofre@apache.org <ma...@apache.org>>
> >                 <mailto:jbonofre@apache.org <ma...@apache.org>
> >         <mailto:jbonofre@apache.org <ma...@apache.org>>>
> >                 http://blog.nanthrax.net
> >                      Talend - http://www.talend.com
> >
> >
> >
> >
> >                 --         s pozdravem
> >
> >                 David Morávek
> >
> >
> >             --     Jean-Baptiste Onofré
> >             jbonofre@apache.org <ma...@apache.org>
> >         <mailto:jbonofre@apache.org <ma...@apache.org>>
> >             http://blog.nanthrax.net
> >             Talend - http://www.talend.com
> >
> >
> >
> >     --
> >     Jean-Baptiste Onofré
> >     jbonofre@apache.org <ma...@apache.org>
> >     http://blog.nanthrax.net
> >     Talend - http://www.talend.com
> >
> >
>
> --
> Jean-Baptiste Onofré
> jbonofre@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>
>

Re: Euphoria Java 8 DSL - proposal

Posted by Jean-Baptiste Onofré <jb...@nanthrax.net>.
Hi Davor,

We still have some discussion/paperwork on Euphoria side (SGA, ...).

So, it's on track but it takes a little more time than expected.

Regards
JB

On 02/19/2018 05:40 AM, Davor Bonaci wrote:
> I may have missed things, but any update on the progress of this donation?
> 
> On Tue, Jan 2, 2018 at 10:52 PM, Jean-Baptiste Onofré <jb@nanthrax.net
> <ma...@nanthrax.net>> wrote:
> 
>     Great !
> 
>     Thanks !
>     Regards
>     JB
> 
>     On 01/03/2018 07:29 AM, David Morávek wrote:
> 
>         Hello JB,
> 
>         Perfect! I'm already on the Beam Slack workspace, I'll contact you once
>         I get to the office.
> 
>         Thanks!
>         D.
> 
>         On Wed, Jan 3, 2018 at 6:19 AM, Jean-Baptiste Onofré <jb@nanthrax.net
>         <ma...@nanthrax.net> <mailto:jb@nanthrax.net
>         <ma...@nanthrax.net>>> wrote:
> 
>             Hi David,
> 
>             absolutely !! Let's move forward on the preparation steps.
> 
>             Are you on Slack and/or hangout to plan this ?
> 
>             Thanks,
>             Regards
>             JB
> 
>             On 01/02/2018 05:35 PM, David Morávek wrote:
> 
>                 Hello JB,
> 
>                 can we help in any way to move things forward?
> 
>                 Thanks,
>                 D.
> 
>                 On Mon, Dec 18, 2017 at 4:28 PM, Jean-Baptiste Onofré
>         <jb@nanthrax.net <ma...@nanthrax.net>
>                 <mailto:jb@nanthrax.net <ma...@nanthrax.net>>
>         <mailto:jb@nanthrax.net <ma...@nanthrax.net>
>                 <mailto:jb@nanthrax.net <ma...@nanthrax.net>>>> wrote:
> 
>                      Thanks Jan,
> 
>                      It makes sense.
> 
>                      Let me take a look on the code to understand the "interaction".
> 
>                      Regards
>                      JB
> 
> 
>                      On 12/18/2017 04:26 PM, Jan Lukavský wrote:
> 
>                          Hi JB,
> 
>                          basically you are not wrong. The project started about
>         three or
>                 four
>                          years ago with a goal to unify batch and streaming
>         processing into
>                          single portable, executor independent API. Because of
>         that, it is
>                          currently "close" to Beam in this sense. But we don't
>         see much
>                 added
>                          value keeping this as a separate project, with one of
>         the key
>                          differences to be the API (not the model itself), so we
>         would
>                 like to
>                          focus on translation from Euphoria API to Beam's SDK.
>         That's why we
>                          would like to see it as a DSL, so that it would be
>         possible to use
>                          Euphoria API with Beam's runners as much natively as
>         possible.
> 
>                          I hope I didn't make the subject even more unclear, if
>         so, I'll
>                 be happy
>                          to explain anything in more detail. :-)
> 
>                              Jan
> 
> 
>                          On 12/18/2017 04:08 PM, Jean-Baptiste Onofré wrote:
> 
>                              Hi Jan,
> 
>                              Thanks for your answers.
> 
>                              However, they confused me ;)
> 
>                              Regarding what you replied, Euphoria seems like a
>         programming
>                              model/SDK "close" to Beam more than a DSL on top of an
>                 existing Beam
>                              SDK.
> 
>                              Am I wrong ?
> 
>                              Regards
>                              JB
> 
>                              On 12/18/2017 03:44 PM, Jan Lukavský wrote:
> 
>                                  Hi Ismael,
> 
>                                  basically we adopted the Beam's design regarding
>                 partitioning
>                                  (https://github.com/seznam/euphoria/issues/160
>         <https://github.com/seznam/euphoria/issues/160>
>                 <https://github.com/seznam/euphoria/issues/160
>         <https://github.com/seznam/euphoria/issues/160>>
>                                  <https://github.com/seznam/euphoria/issues/160
>         <https://github.com/seznam/euphoria/issues/160>
>                 <https://github.com/seznam/euphoria/issues/160
>         <https://github.com/seznam/euphoria/issues/160>>>) and implemented
>                                  the sorting manually
>                                  (https://github.com/seznam/euphoria/issues/158
>         <https://github.com/seznam/euphoria/issues/158>
>                 <https://github.com/seznam/euphoria/issues/158
>         <https://github.com/seznam/euphoria/issues/158>>
>                                  <https://github.com/seznam/euphoria/issues/158
>         <https://github.com/seznam/euphoria/issues/158>
>                 <https://github.com/seznam/euphoria/issues/158
>         <https://github.com/seznam/euphoria/issues/158>>>). I'm not aware
>                                  of the time model differences (Euphoria supports
>                 ingestion and
>                                  event time, we don't support processing time by
>         decision).
>                                  Regarding other differences (looking into Beam
>         capability
>                                  matrix, I'd say that):
> 
>                                     - we don't support stateful FlatMap (i.e.
>         ParDo) for now
>                                  (https://github.com/seznam/euphoria/issues/192
>         <https://github.com/seznam/euphoria/issues/192>
>                 <https://github.com/seznam/euphoria/issues/192
>         <https://github.com/seznam/euphoria/issues/192>>
>                                  <https://github.com/seznam/euphoria/issues/192
>         <https://github.com/seznam/euphoria/issues/192>
>                 <https://github.com/seznam/euphoria/issues/192
>         <https://github.com/seznam/euphoria/issues/192>>>)
> 
>                                     - we don't support side inputs (by decision
>         now, but
>                 might be
>                                  reconsidered) and outputs
>                                  (https://github.com/seznam/euphoria/issues/124
>         <https://github.com/seznam/euphoria/issues/124>
>                 <https://github.com/seznam/euphoria/issues/124
>         <https://github.com/seznam/euphoria/issues/124>>
>                                  <https://github.com/seznam/euphoria/issues/124
>         <https://github.com/seznam/euphoria/issues/124>
>                 <https://github.com/seznam/euphoria/issues/124
>         <https://github.com/seznam/euphoria/issues/124>>>)
> 
> 
>                                     - we support complete event-time windows
>         (non-merging,
>                                  merging, aligned, unaligned) and time control
> 
>                                     - we don't support processing time by
>         decision (might be
>                                  reconsidered if a valid use-case is found)
> 
>                                     - we support window triggering based on both
>         time
>                 and data,
>                                  including discarding and accumulating (without
>                 accumulating &
>                                  retracting)
> 
>                                  All our executors (runners) - Flink, Spark and
>         Local -
>                 implement
>                                  the complete model, which we enforce using
>         "operator
>                 test kit"
>                                  that all executors must pass. Spark executor
>         supports
>                 bounded
>                                  sources only (for now). As David said, we currently
>                 don't have
>                                  serialization abstraction, so there is some
>         work to be
>                 done in
>                                  that regard.
> 
>                                  Our intention is to completely supersede
>         Euphoria, we
>                 would like
>                                  to consider possibility to use executors that
>         would not
>                 rely on
>                                  Beam, but that is optional now and should be
>                 straightforward.
> 
>                                  We'd be happy to answer any more questions you
>         might
>                 have and
>                                  thanks a lot!
> 
>                                  Best,
> 
>                                     Jan
> 
> 
>                                  On 12/18/2017 03:19 PM, Ismaël Mejía wrote:
> 
>                                      Hi,
> 
>                                      It is great to see that you guys have
>         achieved a
>                 maturity
>                                      point to
>                                      propose this. Congratulations for your work
>         and the
>                 idea to
>                                      contribute
>                                      it into Beam.
> 
>                                      I remember from a previous discussion with Jan
>                 about the model
>                                      mismatch between Euphoria and Beam, because
>         of some
>                 design
>                                      decisions
>                                      of both projects. I remember you guys had some
>                 issues with
>                                      the way
>                                      Beam's sources do partitioning, as well as
>         Beam's
>                 lack of
>                                      sorted data
>                                      (on shuffle a la hadoop). Also if I
>         remember well
>                 the 'time'
>                                      model of
>                                      Euphoria was simpler than Beam's. I talk
>         about all
>                 of this
>                                      because I
>                                      am curious about what parts of the Euphoria
>         model
>                 you guys
>                                      had to
>                                      sacrifice to support Beam, and what parts
>         of Beam's
>                 model
>                                      should still
>                                      be integrated into Euphoria (and if there is a
>                                      straightforward path to
>                                      do it).
> 
>                                      If I understand well if this gets merged into
>                 Apache this
>                                      means that
>                                      Euphoria's current implementation would be
>                 superseded by
>                                      this DSL? I
>                                      am curious because I would like to
>         understand your
>                 level of
>                                      investment
>                                      on supporting the future of this DSL.
> 
>                                      Thanks and congrats again !
>                                      Ismaël
> 
>                                      On Mon, Dec 18, 2017 at 10:12 AM,
>         Jean-Baptiste Onofré
>                                      <jb@nanthrax.net <ma...@nanthrax.net>
>         <mailto:jb@nanthrax.net <ma...@nanthrax.net>>
>                 <mailto:jb@nanthrax.net <ma...@nanthrax.net>
>         <mailto:jb@nanthrax.net <ma...@nanthrax.net>>>> wrote:
> 
>                                          Depending of the donation, you would
>         need ICLA
>                 for each
>                                          contributor, and
>                                          CCLA in addition of SGA.
> 
>                                          We can sync with Davor and I for the
>         legal stuff.
>                                          However, I would wait a little bit just
>         to have
>                 feedback
>                                          from the whole team
>                                          and start a formal vote.
> 
>                                          I would be happy to start the formal vote.
> 
>                                          Regards
>                                          JB
> 
>                                          On 12/18/2017 10:03 AM, David Morávek
>         wrote:
> 
>                                              Hello,
> 
>                                              Thanks for the awesome feedback!
> 
>                                              Romain:
> 
>                                              We already use Java Stream API in
>         all operators
>                                              where it makes sense (eg.:
>                                              ReduceByKey). Still not sure if it
>         was a good
>                                              choice, but i can be easily
>                                              converted to iterator anyway.
> 
>                                              Side outputs support is coming soon, we
>                 already made
>                                              an initial work on
>                                              this.
> 
>                                              Side inputs are not supported in a
>         way you
>                 are used
>                                              to from beam, because
>                                              it can be replaced by Join operator
>         on the
>                 same key
>                                              (if annotated with
>                                              broadcastHashJoin, it will be
>         turned into
>                 map side
>                                              join).
> 
>                                              Only significant difference from
>         Beam is,
>                 that we
>                                              decided not to abstract
>                                              serialization, so we need to add
>         support
>                 for Type
>                                              Hints, because of type
>                                              erasure.
> 
>                                              Fluent API:
> 
>                                              API is fluent within one operator.
>         It is
>                 designed to
>                                              "lead the
>                                              programmer", which means, that he
>         we'll be only
>                                              offered methods that makes
>                                              sense after the last method he used
>         (eg.: in
>                                              ReduceByKey, we know that after
>                                              keyBy either reduceBy method should
>         come).
>                 It is
>                                              implemented as a series of
>                                              builders.
> 
>                                              Davor:
> 
>                                              Thanks, I'll contact you, and will
>         start
>                 the process
>                                              of having all the
>                                              necessary paperwork signed on our
>         side, so
>                 we can
>                                              get things moving.
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
>                                              On Mon, Dec 18, 2017 at 7:46 AM, Romain
>                 Manni-Bucau
>                                              <rmannibucau@gmail.com
>         <ma...@gmail.com>
>                 <mailto:rmannibucau@gmail.com <ma...@gmail.com>>
>         <mailto:rmannibucau@gmail.com <ma...@gmail.com>
>                 <mailto:rmannibucau@gmail.com <ma...@gmail.com>>>
>                                              <mailto:rmannibucau@gmail.com
>         <ma...@gmail.com>
>                 <mailto:rmannibucau@gmail.com <ma...@gmail.com>>
>                                              <mailto:rmannibucau@gmail.com
>         <ma...@gmail.com>
>                 <mailto:rmannibucau@gmail.com <ma...@gmail.com>>>>>
>         wrote:
> 
>                                                    Hi guys
> 
>                                                    A DSL would be very welcomed, in
>                 particular if
>                                              fluent.
> 
>                                                    Open question: did you study
>         to implement
>                                              Stream API (surely extending
>                                              it to
>                                                    have a BeamStream and a few more
>                 features like
>                                              sides etc)? Would be
>                                              very
>                                                    natural and integrable easily
>                 anywhere and
>                                              avoid a new API discovery.
> 
>                                                    Hazelcast jet did it so I
>         dont see
>                 why Beam
>                                              couldnt.
> 
>                                                    Le 18 déc. 2017 07:26, "Davor
>         Bonaci"
>                                              <davor@apache.org
>         <ma...@apache.org> <mailto:davor@apache.org
>         <ma...@apache.org>>
>                 <mailto:davor@apache.org <ma...@apache.org>
>         <mailto:davor@apache.org <ma...@apache.org>>>
>                                                    <mailto:davor@apache.org
>         <ma...@apache.org>
>                 <mailto:davor@apache.org <ma...@apache.org>>
> 
>                                              <mailto:davor@apache.org
>         <ma...@apache.org>
>                 <mailto:davor@apache.org <ma...@apache.org>>>>> a écrit :
> 
>                                                        Hi David,
>                                                        As JB noted, merging of
>         these two
>                 projects
>                                              is a great idea. If
>                                              fact,
>                                                        some of us have had those
>                 discussions in
>                                              the past.
> 
>                                                        Legally, nothing
>         particular is
>                 strictly
>                                              necessary as the code seem
>                                              to
>                                                        already be Apache 2.0
>         licensed.
>                 We don't,
>                                              however, want to be
>                                              perceived
>                                                        as making hostile forks,
>         so it
>                 would be
>                                              great to file a Software
>                                              Grant
>                                                        Agreement with the ASF
>         Secretary.
>                 I can
>                                              help with the process, as
>                                              necessary.
> 
>                                                        Project alignment-wise, there
>                 aren't any
>                                              particular blockers that
>                                              I am
>                                                        aware of. We welcome DSLs.
> 
>                                                        Technically, the code
>         would start
>                 in a
>                                              feature branch. During this
>                                                        stage, we'd need to
>         validate a
>                 few things,
>                                              including confirmation
>                                              the
>                                                        code and dependencies
>         match the ASF
>                                              policy, automate testing in
>                                              Beam's
>                                                        tooling, etc. At that
>         point, we'd
>                 take a
>                                              community vote to accept
>                                              the
>                                                        component into master,
>         and consider
>                                              author(s) for committership in
>                                              the
>                                                        overall project.
> 
>                                                        Welcome to the ASF and
>         Beam -- we are
>                                              thrilled to have you! Hope
>                                              this
>                                                        helps, and please reach
>         out if
>                 anybody on
>                                              our end can help,
>                                              including JB
>                                                        or myself.
> 
>                                                        Davor
> 
> 
>                                                        On Sun, Dec 17, 2017 at
>         10:13 AM,
>                                              Jean-Baptiste Onofré
>                                              <jb@nanthrax.net
>         <ma...@nanthrax.net> <mailto:jb@nanthrax.net <ma...@nanthrax.net>>
>                 <mailto:jb@nanthrax.net <ma...@nanthrax.net>
>         <mailto:jb@nanthrax.net <ma...@nanthrax.net>>>
>                                                        <mailto:jb@nanthrax.net
>         <ma...@nanthrax.net>
>                 <mailto:jb@nanthrax.net <ma...@nanthrax.net>>
> 
>                                              <mailto:jb@nanthrax.net
>         <ma...@nanthrax.net>
>                 <mailto:jb@nanthrax.net <ma...@nanthrax.net>>>>> wrote:
> 
>                                                            Hi David,
> 
>                                                            Generally speaking,
>         having
>                 different
>                                              fluent DSL on top of the
>                                              Beam
>                                                            SDK is great.
> 
>                                                            I would like to take
>         a look
>                 on your
>                                              wordcount examples to give
>                                              you a
>                                                            complete feedback. I
>         like the
>                 idea and
>                                              a fluent Java DSL is
>                                              valuable.
> 
>                                                            Let's wait feedback from
>                 others. If we
>                                              have a consensus, then
>                                              I
>                                                            would be more than
>         happy to
>                 help you
>                                              for the donation (I
>                                              worked on
>                                                            the Camel Java DSL
>         while ago,
>                 so I
>                                              have some experience here).
> 
>                                                            Thanks !
>                                                            Regards
>                                                            JB
> 
>                                                            On 12/17/2017 07:00
>         PM, David
>                 Morávek
>                                              wrote:
> 
>                                                                Hello,
> 
> 
>                                                                First of all,
>         thanks for the
>                                              amazing work the Apache Beam
>                                                                community is doing!
> 
> 
>                                                                In 2014, we've
>         started
>                 development
>                                              of the runtime
>                                              independent
>                                                                Java 8 API, that
>         helps us to
>                                              create unified big-data
>                                              processing
>                                                                flows. It has
>         been used
>                 as a core
>                                              building block of
>                                              Seznam.cz
>                                                                web crawler data
>                 infrastructure
>                                              every since. Its design
>                                                                principles and
>         execution
>                 model are
>                                              very similar to Apache
>                                              Beam.
> 
> 
>                                                                This API was open
>         sourced
>                 in 2016,
>                                              under the name Euphoria
>                                              API:
> 
>                 https://github.com/seznam/euphoria
>         <https://github.com/seznam/euphoria> <https://github.com/seznam/euphoria
>         <https://github.com/seznam/euphoria>>
>                                              <https://github.com/seznam/euphoria
>         <https://github.com/seznam/euphoria>
>                 <https://github.com/seznam/euphoria
>         <https://github.com/seznam/euphoria>>>
>                                              <https://github.com/seznam/euphoria
>         <https://github.com/seznam/euphoria>
>                 <https://github.com/seznam/euphoria
>         <https://github.com/seznam/euphoria>>
>                                              <https://github.com/seznam/euphoria
>         <https://github.com/seznam/euphoria>
>                 <https://github.com/seznam/euphoria
>         <https://github.com/seznam/euphoria>>>>
> 
> 
>                                                                As it is very
>         similar to
>                 Apache
>                                              Beam, we feel, that it is
>                                              not
>                                                                worth of duplicating
>                 effort in
>                                              terms of development of new
>                                                                runtimes and
>         fine-tuning of
>                                              current ones.
> 
> 
>                                                                The main blocker
>         for us
>                 to switch
>                                              to Apache Beam is lack
>                                              of the
>                                                                Java 8 API. *W*e
>         propose the
>                                              integration of Euphoria API
>                                              into
>                                                                Apache Beam as a
>         Java 8
>                 DSL, in
>                                              order to share our effort
>                                              with
>                                                                the community.
> 
> 
>                                                                Simple example of the
>                 Euphoria API
>                                              usage, can be found
>                                              here:
> 
> 
>                
>         https://github.com/seznam/euphoria/tree/master/euphoria-examples/src/main/java/cz/seznam/euphoria/examples/wordcount
>         <https://github.com/seznam/euphoria/tree/master/euphoria-examples/src/main/java/cz/seznam/euphoria/examples/wordcount>
>                
>         <https://github.com/seznam/euphoria/tree/master/euphoria-examples/src/main/java/cz/seznam/euphoria/examples/wordcount
>         <https://github.com/seznam/euphoria/tree/master/euphoria-examples/src/main/java/cz/seznam/euphoria/examples/wordcount>>
>                                                    
>         <https://github.com/seznam/euphoria/tree/master/euphoria-examples/src/main/java/cz/seznam/euphoria/examples/wordcount
>         <https://github.com/seznam/euphoria/tree/master/euphoria-examples/src/main/java/cz/seznam/euphoria/examples/wordcount>
>                
>         <https://github.com/seznam/euphoria/tree/master/euphoria-examples/src/main/java/cz/seznam/euphoria/examples/wordcount
>         <https://github.com/seznam/euphoria/tree/master/euphoria-examples/src/main/java/cz/seznam/euphoria/examples/wordcount>>>
> 
> 
>                                                    
>         <https://github.com/seznam/euphoria/tree/master/euphoria-examples/src/main/java/cz/seznam/euphoria/examples/wordcount
>         <https://github.com/seznam/euphoria/tree/master/euphoria-examples/src/main/java/cz/seznam/euphoria/examples/wordcount>
>                
>         <https://github.com/seznam/euphoria/tree/master/euphoria-examples/src/main/java/cz/seznam/euphoria/examples/wordcount
>         <https://github.com/seznam/euphoria/tree/master/euphoria-examples/src/main/java/cz/seznam/euphoria/examples/wordcount>>
>                                                    
>         <https://github.com/seznam/euphoria/tree/master/euphoria-examples/src/main/java/cz/seznam/euphoria/examples/wordcount
>         <https://github.com/seznam/euphoria/tree/master/euphoria-examples/src/main/java/cz/seznam/euphoria/examples/wordcount>
>                
>         <https://github.com/seznam/euphoria/tree/master/euphoria-examples/src/main/java/cz/seznam/euphoria/examples/wordcount
>         <https://github.com/seznam/euphoria/tree/master/euphoria-examples/src/main/java/cz/seznam/euphoria/examples/wordcount>>>>
> 
> 
> 
>                                                                If you feel, that
>         Beam
>                 community
>                                              could leverage from our
>                                              work,
>                                                                we would love to
>         start
>                 working on
>                                              Euphoria integration
>                                              into
>                                                                Apache Beam (we
>         already
>                 have a
>                                              working POC, with few basic
>                                                                operators
>         implemented).
> 
> 
>                                                                I look forward to
>         hearing
>                 from you,
> 
>                                                                David
> 
> 
>                                                            --            
>         Jean-Baptiste
>                 Onofré
>                 jbonofre@apache.org <ma...@apache.org>
>         <mailto:jbonofre@apache.org <ma...@apache.org>>
>                 <mailto:jbonofre@apache.org <ma...@apache.org>
>         <mailto:jbonofre@apache.org <ma...@apache.org>>>
>                                              <mailto:jbonofre@apache.org
>         <ma...@apache.org>
>                 <mailto:jbonofre@apache.org <ma...@apache.org>>
>                                              <mailto:jbonofre@apache.org
>         <ma...@apache.org>
>                 <mailto:jbonofre@apache.org <ma...@apache.org>>>>
>                 http://blog.nanthrax.net
>                                                            Talend -
>         http://www.talend.com
> 
> 
> 
> 
> 
>                                              --                             s
>         pozdravem
> 
>                                              David Morávek
> 
> 
>                                          --                       
>          Jean-Baptiste Onofré
>                 jbonofre@apache.org <ma...@apache.org>
>         <mailto:jbonofre@apache.org <ma...@apache.org>>
>                 <mailto:jbonofre@apache.org <ma...@apache.org>
>         <mailto:jbonofre@apache.org <ma...@apache.org>>>
>                 http://blog.nanthrax.net
>                                          Talend - http://www.talend.com
> 
> 
> 
> 
> 
>                      --     Jean-Baptiste Onofré
>                 jbonofre@apache.org <ma...@apache.org>
>         <mailto:jbonofre@apache.org <ma...@apache.org>>
>                 <mailto:jbonofre@apache.org <ma...@apache.org>
>         <mailto:jbonofre@apache.org <ma...@apache.org>>>
>                 http://blog.nanthrax.net
>                      Talend - http://www.talend.com
> 
> 
> 
> 
>                 --         s pozdravem
> 
>                 David Morávek
> 
> 
>             --     Jean-Baptiste Onofré
>             jbonofre@apache.org <ma...@apache.org>
>         <mailto:jbonofre@apache.org <ma...@apache.org>>
>             http://blog.nanthrax.net
>             Talend - http://www.talend.com
> 
> 
> 
>     -- 
>     Jean-Baptiste Onofré
>     jbonofre@apache.org <ma...@apache.org>
>     http://blog.nanthrax.net
>     Talend - http://www.talend.com
> 
> 

-- 
Jean-Baptiste Onofré
jbonofre@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


Re: Euphoria Java 8 DSL - proposal

Posted by Davor Bonaci <da...@apache.org>.
I may have missed things, but any update on the progress of this donation?

On Tue, Jan 2, 2018 at 10:52 PM, Jean-Baptiste Onofré <jb...@nanthrax.net>
wrote:

> Great !
>
> Thanks !
> Regards
> JB
>
> On 01/03/2018 07:29 AM, David Morávek wrote:
>
>> Hello JB,
>>
>> Perfect! I'm already on the Beam Slack workspace, I'll contact you once I
>> get to the office.
>>
>> Thanks!
>> D.
>>
>> On Wed, Jan 3, 2018 at 6:19 AM, Jean-Baptiste Onofré <jb@nanthrax.net
>> <ma...@nanthrax.net>> wrote:
>>
>>     Hi David,
>>
>>     absolutely !! Let's move forward on the preparation steps.
>>
>>     Are you on Slack and/or hangout to plan this ?
>>
>>     Thanks,
>>     Regards
>>     JB
>>
>>     On 01/02/2018 05:35 PM, David Morávek wrote:
>>
>>         Hello JB,
>>
>>         can we help in any way to move things forward?
>>
>>         Thanks,
>>         D.
>>
>>         On Mon, Dec 18, 2017 at 4:28 PM, Jean-Baptiste Onofré <
>> jb@nanthrax.net
>>         <ma...@nanthrax.net> <mailto:jb@nanthrax.net
>>         <ma...@nanthrax.net>>> wrote:
>>
>>              Thanks Jan,
>>
>>              It makes sense.
>>
>>              Let me take a look on the code to understand the
>> "interaction".
>>
>>              Regards
>>              JB
>>
>>
>>              On 12/18/2017 04:26 PM, Jan Lukavský wrote:
>>
>>                  Hi JB,
>>
>>                  basically you are not wrong. The project started about
>> three or
>>         four
>>                  years ago with a goal to unify batch and streaming
>> processing into
>>                  single portable, executor independent API. Because of
>> that, it is
>>                  currently "close" to Beam in this sense. But we don't
>> see much
>>         added
>>                  value keeping this as a separate project, with one of
>> the key
>>                  differences to be the API (not the model itself), so we
>> would
>>         like to
>>                  focus on translation from Euphoria API to Beam's SDK.
>> That's why we
>>                  would like to see it as a DSL, so that it would be
>> possible to use
>>                  Euphoria API with Beam's runners as much natively as
>> possible.
>>
>>                  I hope I didn't make the subject even more unclear, if
>> so, I'll
>>         be happy
>>                  to explain anything in more detail. :-)
>>
>>                      Jan
>>
>>
>>                  On 12/18/2017 04:08 PM, Jean-Baptiste Onofré wrote:
>>
>>                      Hi Jan,
>>
>>                      Thanks for your answers.
>>
>>                      However, they confused me ;)
>>
>>                      Regarding what you replied, Euphoria seems like a
>> programming
>>                      model/SDK "close" to Beam more than a DSL on top of
>> an
>>         existing Beam
>>                      SDK.
>>
>>                      Am I wrong ?
>>
>>                      Regards
>>                      JB
>>
>>                      On 12/18/2017 03:44 PM, Jan Lukavský wrote:
>>
>>                          Hi Ismael,
>>
>>                          basically we adopted the Beam's design regarding
>>         partitioning
>>                          (https://github.com/seznam/euphoria/issues/160
>>         <https://github.com/seznam/euphoria/issues/160>
>>                          <https://github.com/seznam/euphoria/issues/160
>>         <https://github.com/seznam/euphoria/issues/160>>) and implemented
>>                          the sorting manually
>>                          (https://github.com/seznam/euphoria/issues/158
>>         <https://github.com/seznam/euphoria/issues/158>
>>                          <https://github.com/seznam/euphoria/issues/158
>>         <https://github.com/seznam/euphoria/issues/158>>). I'm not aware
>>                          of the time model differences (Euphoria supports
>>         ingestion and
>>                          event time, we don't support processing time by
>> decision).
>>                          Regarding other differences (looking into Beam
>> capability
>>                          matrix, I'd say that):
>>
>>                             - we don't support stateful FlatMap (i.e.
>> ParDo) for now
>>                          (https://github.com/seznam/euphoria/issues/192
>>         <https://github.com/seznam/euphoria/issues/192>
>>                          <https://github.com/seznam/euphoria/issues/192
>>         <https://github.com/seznam/euphoria/issues/192>>)
>>
>>                             - we don't support side inputs (by decision
>> now, but
>>         might be
>>                          reconsidered) and outputs
>>                          (https://github.com/seznam/euphoria/issues/124
>>         <https://github.com/seznam/euphoria/issues/124>
>>                          <https://github.com/seznam/euphoria/issues/124
>>         <https://github.com/seznam/euphoria/issues/124>>)
>>
>>
>>                             - we support complete event-time windows
>> (non-merging,
>>                          merging, aligned, unaligned) and time control
>>
>>                             - we don't support processing time by
>> decision (might be
>>                          reconsidered if a valid use-case is found)
>>
>>                             - we support window triggering based on both
>> time
>>         and data,
>>                          including discarding and accumulating (without
>>         accumulating &
>>                          retracting)
>>
>>                          All our executors (runners) - Flink, Spark and
>> Local -
>>         implement
>>                          the complete model, which we enforce using
>> "operator
>>         test kit"
>>                          that all executors must pass. Spark executor
>> supports
>>         bounded
>>                          sources only (for now). As David said, we
>> currently
>>         don't have
>>                          serialization abstraction, so there is some work
>> to be
>>         done in
>>                          that regard.
>>
>>                          Our intention is to completely supersede
>> Euphoria, we
>>         would like
>>                          to consider possibility to use executors that
>> would not
>>         rely on
>>                          Beam, but that is optional now and should be
>>         straightforward.
>>
>>                          We'd be happy to answer any more questions you
>> might
>>         have and
>>                          thanks a lot!
>>
>>                          Best,
>>
>>                             Jan
>>
>>
>>                          On 12/18/2017 03:19 PM, Ismaël Mejía wrote:
>>
>>                              Hi,
>>
>>                              It is great to see that you guys have
>> achieved a
>>         maturity
>>                              point to
>>                              propose this. Congratulations for your work
>> and the
>>         idea to
>>                              contribute
>>                              it into Beam.
>>
>>                              I remember from a previous discussion with
>> Jan
>>         about the model
>>                              mismatch between Euphoria and Beam, because
>> of some
>>         design
>>                              decisions
>>                              of both projects. I remember you guys had
>> some
>>         issues with
>>                              the way
>>                              Beam's sources do partitioning, as well as
>> Beam's
>>         lack of
>>                              sorted data
>>                              (on shuffle a la hadoop). Also if I remember
>> well
>>         the 'time'
>>                              model of
>>                              Euphoria was simpler than Beam's. I talk
>> about all
>>         of this
>>                              because I
>>                              am curious about what parts of the Euphoria
>> model
>>         you guys
>>                              had to
>>                              sacrifice to support Beam, and what parts of
>> Beam's
>>         model
>>                              should still
>>                              be integrated into Euphoria (and if there is
>> a
>>                              straightforward path to
>>                              do it).
>>
>>                              If I understand well if this gets merged into
>>         Apache this
>>                              means that
>>                              Euphoria's current implementation would be
>>         superseded by
>>                              this DSL? I
>>                              am curious because I would like to
>> understand your
>>         level of
>>                              investment
>>                              on supporting the future of this DSL.
>>
>>                              Thanks and congrats again !
>>                              Ismaël
>>
>>                              On Mon, Dec 18, 2017 at 10:12 AM,
>> Jean-Baptiste Onofré
>>                              <jb@nanthrax.net <ma...@nanthrax.net>
>>         <mailto:jb@nanthrax.net <ma...@nanthrax.net>>> wrote:
>>
>>                                  Depending of the donation, you would
>> need ICLA
>>         for each
>>                                  contributor, and
>>                                  CCLA in addition of SGA.
>>
>>                                  We can sync with Davor and I for the
>> legal stuff.
>>                                  However, I would wait a little bit just
>> to have
>>         feedback
>>                                  from the whole team
>>                                  and start a formal vote.
>>
>>                                  I would be happy to start the formal
>> vote.
>>
>>                                  Regards
>>                                  JB
>>
>>                                  On 12/18/2017 10:03 AM, David Morávek
>> wrote:
>>
>>                                      Hello,
>>
>>                                      Thanks for the awesome feedback!
>>
>>                                      Romain:
>>
>>                                      We already use Java Stream API in
>> all operators
>>                                      where it makes sense (eg.:
>>                                      ReduceByKey). Still not sure if it
>> was a good
>>                                      choice, but i can be easily
>>                                      converted to iterator anyway.
>>
>>                                      Side outputs support is coming soon,
>> we
>>         already made
>>                                      an initial work on
>>                                      this.
>>
>>                                      Side inputs are not supported in a
>> way you
>>         are used
>>                                      to from beam, because
>>                                      it can be replaced by Join operator
>> on the
>>         same key
>>                                      (if annotated with
>>                                      broadcastHashJoin, it will be turned
>> into
>>         map side
>>                                      join).
>>
>>                                      Only significant difference from
>> Beam is,
>>         that we
>>                                      decided not to abstract
>>                                      serialization, so we need to add
>> support
>>         for Type
>>                                      Hints, because of type
>>                                      erasure.
>>
>>                                      Fluent API:
>>
>>                                      API is fluent within one operator.
>> It is
>>         designed to
>>                                      "lead the
>>                                      programmer", which means, that he
>> we'll be only
>>                                      offered methods that makes
>>                                      sense after the last method he used
>> (eg.: in
>>                                      ReduceByKey, we know that after
>>                                      keyBy either reduceBy method should
>> come).
>>         It is
>>                                      implemented as a series of
>>                                      builders.
>>
>>                                      Davor:
>>
>>                                      Thanks, I'll contact you, and will
>> start
>>         the process
>>                                      of having all the
>>                                      necessary paperwork signed on our
>> side, so
>>         we can
>>                                      get things moving.
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>                                      On Mon, Dec 18, 2017 at 7:46 AM,
>> Romain
>>         Manni-Bucau
>>                                      <rmannibucau@gmail.com
>>         <ma...@gmail.com> <mailto:rmannibucau@gmail.com
>>         <ma...@gmail.com>>
>>                                      <mailto:rmannibucau@gmail.com
>>         <ma...@gmail.com>
>>                                      <mailto:rmannibucau@gmail.com
>>         <ma...@gmail.com>>>> wrote:
>>
>>                                            Hi guys
>>
>>                                            A DSL would be very welcomed,
>> in
>>         particular if
>>                                      fluent.
>>
>>                                            Open question: did you study
>> to implement
>>                                      Stream API (surely extending
>>                                      it to
>>                                            have a BeamStream and a few
>> more
>>         features like
>>                                      sides etc)? Would be
>>                                      very
>>                                            natural and integrable easily
>>         anywhere and
>>                                      avoid a new API discovery.
>>
>>                                            Hazelcast jet did it so I dont
>> see
>>         why Beam
>>                                      couldnt.
>>
>>                                            Le 18 déc. 2017 07:26, "Davor
>> Bonaci"
>>                                      <davor@apache.org <mailto:
>> davor@apache.org>
>>         <mailto:davor@apache.org <ma...@apache.org>>
>>                                            <mailto:davor@apache.org
>>         <ma...@apache.org>
>>
>>                                      <mailto:davor@apache.org
>>         <ma...@apache.org>>>> a écrit :
>>
>>                                                Hi David,
>>                                                As JB noted, merging of
>> these two
>>         projects
>>                                      is a great idea. If
>>                                      fact,
>>                                                some of us have had those
>>         discussions in
>>                                      the past.
>>
>>                                                Legally, nothing
>> particular is
>>         strictly
>>                                      necessary as the code seem
>>                                      to
>>                                                already be Apache 2.0
>> licensed.
>>         We don't,
>>                                      however, want to be
>>                                      perceived
>>                                                as making hostile forks,
>> so it
>>         would be
>>                                      great to file a Software
>>                                      Grant
>>                                                Agreement with the ASF
>> Secretary.
>>         I can
>>                                      help with the process, as
>>                                      necessary.
>>
>>                                                Project alignment-wise,
>> there
>>         aren't any
>>                                      particular blockers that
>>                                      I am
>>                                                aware of. We welcome DSLs.
>>
>>                                                Technically, the code
>> would start
>>         in a
>>                                      feature branch. During this
>>                                                stage, we'd need to
>> validate a
>>         few things,
>>                                      including confirmation
>>                                      the
>>                                                code and dependencies
>> match the ASF
>>                                      policy, automate testing in
>>                                      Beam's
>>                                                tooling, etc. At that
>> point, we'd
>>         take a
>>                                      community vote to accept
>>                                      the
>>                                                component into master, and
>> consider
>>                                      author(s) for committership in
>>                                      the
>>                                                overall project.
>>
>>                                                Welcome to the ASF and
>> Beam -- we are
>>                                      thrilled to have you! Hope
>>                                      this
>>                                                helps, and please reach
>> out if
>>         anybody on
>>                                      our end can help,
>>                                      including JB
>>                                                or myself.
>>
>>                                                Davor
>>
>>
>>                                                On Sun, Dec 17, 2017 at
>> 10:13 AM,
>>                                      Jean-Baptiste Onofré
>>                                      <jb@nanthrax.net <mailto:
>> jb@nanthrax.net>
>>         <mailto:jb@nanthrax.net <ma...@nanthrax.net>>
>>                                                <mailto:jb@nanthrax.net
>>         <ma...@nanthrax.net>
>>
>>                                      <mailto:jb@nanthrax.net
>>         <ma...@nanthrax.net>>>> wrote:
>>
>>                                                    Hi David,
>>
>>                                                    Generally speaking,
>> having
>>         different
>>                                      fluent DSL on top of the
>>                                      Beam
>>                                                    SDK is great.
>>
>>                                                    I would like to take a
>> look
>>         on your
>>                                      wordcount examples to give
>>                                      you a
>>                                                    complete feedback. I
>> like the
>>         idea and
>>                                      a fluent Java DSL is
>>                                      valuable.
>>
>>                                                    Let's wait feedback
>> from
>>         others. If we
>>                                      have a consensus, then
>>                                      I
>>                                                    would be more than
>> happy to
>>         help you
>>                                      for the donation (I
>>                                      worked on
>>                                                    the Camel Java DSL
>> while ago,
>>         so I
>>                                      have some experience here).
>>
>>                                                    Thanks !
>>                                                    Regards
>>                                                    JB
>>
>>                                                    On 12/17/2017 07:00
>> PM, David
>>         Morávek
>>                                      wrote:
>>
>>                                                        Hello,
>>
>>
>>                                                        First of all,
>> thanks for the
>>                                      amazing work the Apache Beam
>>                                                        community is doing!
>>
>>
>>                                                        In 2014, we've
>> started
>>         development
>>                                      of the runtime
>>                                      independent
>>                                                        Java 8 API, that
>> helps us to
>>                                      create unified big-data
>>                                      processing
>>                                                        flows. It has been
>> used
>>         as a core
>>                                      building block of
>>                                      Seznam.cz
>>                                                        web crawler data
>>         infrastructure
>>                                      every since. Its design
>>                                                        principles and
>> execution
>>         model are
>>                                      very similar to Apache
>>                                      Beam.
>>
>>
>>                                                        This API was open
>> sourced
>>         in 2016,
>>                                      under the name Euphoria
>>                                      API:
>>
>>         https://github.com/seznam/euphoria <https://github.com/seznam/eup
>> horia>
>>                                      <https://github.com/seznam/euphoria
>>         <https://github.com/seznam/euphoria>>
>>                                      <https://github.com/seznam/euphoria
>>         <https://github.com/seznam/euphoria>
>>                                      <https://github.com/seznam/euphoria
>>         <https://github.com/seznam/euphoria>>>
>>
>>
>>                                                        As it is very
>> similar to
>>         Apache
>>                                      Beam, we feel, that it is
>>                                      not
>>                                                        worth of
>> duplicating
>>         effort in
>>                                      terms of development of new
>>                                                        runtimes and
>> fine-tuning of
>>                                      current ones.
>>
>>
>>                                                        The main blocker
>> for us
>>         to switch
>>                                      to Apache Beam is lack
>>                                      of the
>>                                                        Java 8 API. *W*e
>> propose the
>>                                      integration of Euphoria API
>>                                      into
>>                                                        Apache Beam as a
>> Java 8
>>         DSL, in
>>                                      order to share our effort
>>                                      with
>>                                                        the community.
>>
>>
>>                                                        Simple example of
>> the
>>         Euphoria API
>>                                      usage, can be found
>>                                      here:
>>
>>
>>         https://github.com/seznam/euphoria/tree/master/euphoria-exam
>> ples/src/main/java/cz/seznam/euphoria/examples/wordcount
>>         <https://github.com/seznam/euphoria/tree/master/euphoria-exa
>> mples/src/main/java/cz/seznam/euphoria/examples/wordcount>
>>                                             <
>> https://github.com/seznam/euphoria/tree/master/euphoria-exa
>> mples/src/main/java/cz/seznam/euphoria/examples/wordcount
>>         <https://github.com/seznam/euphoria/tree/master/euphoria-exa
>> mples/src/main/java/cz/seznam/euphoria/examples/wordcount>>
>>
>>
>>                                             <
>> https://github.com/seznam/euphoria/tree/master/euphoria-exa
>> mples/src/main/java/cz/seznam/euphoria/examples/wordcount
>>         <https://github.com/seznam/euphoria/tree/master/euphoria-exa
>> mples/src/main/java/cz/seznam/euphoria/examples/wordcount>
>>                                             <
>> https://github.com/seznam/euphoria/tree/master/euphoria-exa
>> mples/src/main/java/cz/seznam/euphoria/examples/wordcount
>>         <https://github.com/seznam/euphoria/tree/master/euphoria-exa
>> mples/src/main/java/cz/seznam/euphoria/examples/wordcount>>>
>>
>>
>>
>>                                                        If you feel, that
>> Beam
>>         community
>>                                      could leverage from our
>>                                      work,
>>                                                        we would love to
>> start
>>         working on
>>                                      Euphoria integration
>>                                      into
>>                                                        Apache Beam (we
>> already
>>         have a
>>                                      working POC, with few basic
>>                                                        operators
>> implemented).
>>
>>
>>                                                        I look forward to
>> hearing
>>         from you,
>>
>>                                                        David
>>
>>
>>                                                    --
>> Jean-Baptiste
>>         Onofré
>>         jbonofre@apache.org <ma...@apache.org>
>>         <mailto:jbonofre@apache.org <ma...@apache.org>>
>>                                      <mailto:jbonofre@apache.org
>>         <ma...@apache.org>
>>                                      <mailto:jbonofre@apache.org
>>         <ma...@apache.org>>>
>>         http://blog.nanthrax.net
>>                                                    Talend -
>> http://www.talend.com
>>
>>
>>
>>
>>
>>                                      --                             s
>> pozdravem
>>
>>                                      David Morávek
>>
>>
>>                                  --                         Jean-Baptiste
>> Onofré
>>         jbonofre@apache.org <ma...@apache.org>
>>         <mailto:jbonofre@apache.org <ma...@apache.org>>
>>         http://blog.nanthrax.net
>>                                  Talend - http://www.talend.com
>>
>>
>>
>>
>>
>>              --     Jean-Baptiste Onofré
>>         jbonofre@apache.org <ma...@apache.org>
>>         <mailto:jbonofre@apache.org <ma...@apache.org>>
>>         http://blog.nanthrax.net
>>              Talend - http://www.talend.com
>>
>>
>>
>>
>>         --         s pozdravem
>>
>>         David Morávek
>>
>>
>>     --     Jean-Baptiste Onofré
>>     jbonofre@apache.org <ma...@apache.org>
>>     http://blog.nanthrax.net
>>     Talend - http://www.talend.com
>>
>>
>>
> --
> Jean-Baptiste Onofré
> jbonofre@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>

Re: Euphoria Java 8 DSL - proposal

Posted by Jean-Baptiste Onofré <jb...@nanthrax.net>.
Great !

Thanks !
Regards
JB

On 01/03/2018 07:29 AM, David Morávek wrote:
> Hello JB,
> 
> Perfect! I'm already on the Beam Slack workspace, I'll contact you once I get to 
> the office.
> 
> Thanks!
> D.
> 
> On Wed, Jan 3, 2018 at 6:19 AM, Jean-Baptiste Onofré <jb@nanthrax.net 
> <ma...@nanthrax.net>> wrote:
> 
>     Hi David,
> 
>     absolutely !! Let's move forward on the preparation steps.
> 
>     Are you on Slack and/or hangout to plan this ?
> 
>     Thanks,
>     Regards
>     JB
> 
>     On 01/02/2018 05:35 PM, David Morávek wrote:
> 
>         Hello JB,
> 
>         can we help in any way to move things forward?
> 
>         Thanks,
>         D.
> 
>         On Mon, Dec 18, 2017 at 4:28 PM, Jean-Baptiste Onofré <jb@nanthrax.net
>         <ma...@nanthrax.net> <mailto:jb@nanthrax.net
>         <ma...@nanthrax.net>>> wrote:
> 
>              Thanks Jan,
> 
>              It makes sense.
> 
>              Let me take a look on the code to understand the "interaction".
> 
>              Regards
>              JB
> 
> 
>              On 12/18/2017 04:26 PM, Jan Lukavský wrote:
> 
>                  Hi JB,
> 
>                  basically you are not wrong. The project started about three or
>         four
>                  years ago with a goal to unify batch and streaming processing into
>                  single portable, executor independent API. Because of that, it is
>                  currently "close" to Beam in this sense. But we don't see much
>         added
>                  value keeping this as a separate project, with one of the key
>                  differences to be the API (not the model itself), so we would
>         like to
>                  focus on translation from Euphoria API to Beam's SDK. That's why we
>                  would like to see it as a DSL, so that it would be possible to use
>                  Euphoria API with Beam's runners as much natively as possible.
> 
>                  I hope I didn't make the subject even more unclear, if so, I'll
>         be happy
>                  to explain anything in more detail. :-)
> 
>                      Jan
> 
> 
>                  On 12/18/2017 04:08 PM, Jean-Baptiste Onofré wrote:
> 
>                      Hi Jan,
> 
>                      Thanks for your answers.
> 
>                      However, they confused me ;)
> 
>                      Regarding what you replied, Euphoria seems like a programming
>                      model/SDK "close" to Beam more than a DSL on top of an
>         existing Beam
>                      SDK.
> 
>                      Am I wrong ?
> 
>                      Regards
>                      JB
> 
>                      On 12/18/2017 03:44 PM, Jan Lukavský wrote:
> 
>                          Hi Ismael,
> 
>                          basically we adopted the Beam's design regarding
>         partitioning
>                          (https://github.com/seznam/euphoria/issues/160
>         <https://github.com/seznam/euphoria/issues/160>
>                          <https://github.com/seznam/euphoria/issues/160
>         <https://github.com/seznam/euphoria/issues/160>>) and implemented
>                          the sorting manually
>                          (https://github.com/seznam/euphoria/issues/158
>         <https://github.com/seznam/euphoria/issues/158>
>                          <https://github.com/seznam/euphoria/issues/158
>         <https://github.com/seznam/euphoria/issues/158>>). I'm not aware
>                          of the time model differences (Euphoria supports
>         ingestion and
>                          event time, we don't support processing time by decision).
>                          Regarding other differences (looking into Beam capability
>                          matrix, I'd say that):
> 
>                             - we don't support stateful FlatMap (i.e. ParDo) for now
>                          (https://github.com/seznam/euphoria/issues/192
>         <https://github.com/seznam/euphoria/issues/192>
>                          <https://github.com/seznam/euphoria/issues/192
>         <https://github.com/seznam/euphoria/issues/192>>)
> 
>                             - we don't support side inputs (by decision now, but
>         might be
>                          reconsidered) and outputs
>                          (https://github.com/seznam/euphoria/issues/124
>         <https://github.com/seznam/euphoria/issues/124>
>                          <https://github.com/seznam/euphoria/issues/124
>         <https://github.com/seznam/euphoria/issues/124>>)
> 
> 
>                             - we support complete event-time windows (non-merging,
>                          merging, aligned, unaligned) and time control
> 
>                             - we don't support processing time by decision (might be
>                          reconsidered if a valid use-case is found)
> 
>                             - we support window triggering based on both time
>         and data,
>                          including discarding and accumulating (without
>         accumulating &
>                          retracting)
> 
>                          All our executors (runners) - Flink, Spark and Local -
>         implement
>                          the complete model, which we enforce using "operator
>         test kit"
>                          that all executors must pass. Spark executor supports
>         bounded
>                          sources only (for now). As David said, we currently
>         don't have
>                          serialization abstraction, so there is some work to be
>         done in
>                          that regard.
> 
>                          Our intention is to completely supersede Euphoria, we
>         would like
>                          to consider possibility to use executors that would not
>         rely on
>                          Beam, but that is optional now and should be
>         straightforward.
> 
>                          We'd be happy to answer any more questions you might
>         have and
>                          thanks a lot!
> 
>                          Best,
> 
>                             Jan
> 
> 
>                          On 12/18/2017 03:19 PM, Ismaël Mejía wrote:
> 
>                              Hi,
> 
>                              It is great to see that you guys have achieved a
>         maturity
>                              point to
>                              propose this. Congratulations for your work and the
>         idea to
>                              contribute
>                              it into Beam.
> 
>                              I remember from a previous discussion with Jan
>         about the model
>                              mismatch between Euphoria and Beam, because of some
>         design
>                              decisions
>                              of both projects. I remember you guys had some
>         issues with
>                              the way
>                              Beam's sources do partitioning, as well as Beam's
>         lack of
>                              sorted data
>                              (on shuffle a la hadoop). Also if I remember well
>         the 'time'
>                              model of
>                              Euphoria was simpler than Beam's. I talk about all
>         of this
>                              because I
>                              am curious about what parts of the Euphoria model
>         you guys
>                              had to
>                              sacrifice to support Beam, and what parts of Beam's
>         model
>                              should still
>                              be integrated into Euphoria (and if there is a
>                              straightforward path to
>                              do it).
> 
>                              If I understand well if this gets merged into
>         Apache this
>                              means that
>                              Euphoria's current implementation would be
>         superseded by
>                              this DSL? I
>                              am curious because I would like to understand your
>         level of
>                              investment
>                              on supporting the future of this DSL.
> 
>                              Thanks and congrats again !
>                              Ismaël
> 
>                              On Mon, Dec 18, 2017 at 10:12 AM, Jean-Baptiste Onofré
>                              <jb@nanthrax.net <ma...@nanthrax.net>
>         <mailto:jb@nanthrax.net <ma...@nanthrax.net>>> wrote:
> 
>                                  Depending of the donation, you would need ICLA
>         for each
>                                  contributor, and
>                                  CCLA in addition of SGA.
> 
>                                  We can sync with Davor and I for the legal stuff.
>                                  However, I would wait a little bit just to have
>         feedback
>                                  from the whole team
>                                  and start a formal vote.
> 
>                                  I would be happy to start the formal vote.
> 
>                                  Regards
>                                  JB
> 
>                                  On 12/18/2017 10:03 AM, David Morávek wrote:
> 
>                                      Hello,
> 
>                                      Thanks for the awesome feedback!
> 
>                                      Romain:
> 
>                                      We already use Java Stream API in all operators
>                                      where it makes sense (eg.:
>                                      ReduceByKey). Still not sure if it was a good
>                                      choice, but i can be easily
>                                      converted to iterator anyway.
> 
>                                      Side outputs support is coming soon, we
>         already made
>                                      an initial work on
>                                      this.
> 
>                                      Side inputs are not supported in a way you
>         are used
>                                      to from beam, because
>                                      it can be replaced by Join operator on the
>         same key
>                                      (if annotated with
>                                      broadcastHashJoin, it will be turned into
>         map side
>                                      join).
> 
>                                      Only significant difference from Beam is,
>         that we
>                                      decided not to abstract
>                                      serialization, so we need to add support
>         for Type
>                                      Hints, because of type
>                                      erasure.
> 
>                                      Fluent API:
> 
>                                      API is fluent within one operator. It is
>         designed to
>                                      "lead the
>                                      programmer", which means, that he we'll be only
>                                      offered methods that makes
>                                      sense after the last method he used (eg.: in
>                                      ReduceByKey, we know that after
>                                      keyBy either reduceBy method should come).
>         It is
>                                      implemented as a series of
>                                      builders.
> 
>                                      Davor:
> 
>                                      Thanks, I'll contact you, and will start
>         the process
>                                      of having all the
>                                      necessary paperwork signed on our side, so
>         we can
>                                      get things moving.
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
>                                      On Mon, Dec 18, 2017 at 7:46 AM, Romain
>         Manni-Bucau
>                                      <rmannibucau@gmail.com
>         <ma...@gmail.com> <mailto:rmannibucau@gmail.com
>         <ma...@gmail.com>>
>                                      <mailto:rmannibucau@gmail.com
>         <ma...@gmail.com>
>                                      <mailto:rmannibucau@gmail.com
>         <ma...@gmail.com>>>> wrote:
> 
>                                            Hi guys
> 
>                                            A DSL would be very welcomed, in
>         particular if
>                                      fluent.
> 
>                                            Open question: did you study to implement
>                                      Stream API (surely extending
>                                      it to
>                                            have a BeamStream and a few more
>         features like
>                                      sides etc)? Would be
>                                      very
>                                            natural and integrable easily
>         anywhere and
>                                      avoid a new API discovery.
> 
>                                            Hazelcast jet did it so I dont see
>         why Beam
>                                      couldnt.
> 
>                                            Le 18 déc. 2017 07:26, "Davor Bonaci"
>                                      <davor@apache.org <ma...@apache.org>
>         <mailto:davor@apache.org <ma...@apache.org>>
>                                            <mailto:davor@apache.org
>         <ma...@apache.org>
> 
>                                      <mailto:davor@apache.org
>         <ma...@apache.org>>>> a écrit :
> 
>                                                Hi David,
>                                                As JB noted, merging of these two
>         projects
>                                      is a great idea. If
>                                      fact,
>                                                some of us have had those
>         discussions in
>                                      the past.
> 
>                                                Legally, nothing particular is
>         strictly
>                                      necessary as the code seem
>                                      to
>                                                already be Apache 2.0 licensed.
>         We don't,
>                                      however, want to be
>                                      perceived
>                                                as making hostile forks, so it
>         would be
>                                      great to file a Software
>                                      Grant
>                                                Agreement with the ASF Secretary.
>         I can
>                                      help with the process, as
>                                      necessary.
> 
>                                                Project alignment-wise, there
>         aren't any
>                                      particular blockers that
>                                      I am
>                                                aware of. We welcome DSLs.
> 
>                                                Technically, the code would start
>         in a
>                                      feature branch. During this
>                                                stage, we'd need to validate a
>         few things,
>                                      including confirmation
>                                      the
>                                                code and dependencies match the ASF
>                                      policy, automate testing in
>                                      Beam's
>                                                tooling, etc. At that point, we'd
>         take a
>                                      community vote to accept
>                                      the
>                                                component into master, and consider
>                                      author(s) for committership in
>                                      the
>                                                overall project.
> 
>                                                Welcome to the ASF and Beam -- we are
>                                      thrilled to have you! Hope
>                                      this
>                                                helps, and please reach out if
>         anybody on
>                                      our end can help,
>                                      including JB
>                                                or myself.
> 
>                                                Davor
> 
> 
>                                                On Sun, Dec 17, 2017 at 10:13 AM,
>                                      Jean-Baptiste Onofré
>                                      <jb@nanthrax.net <ma...@nanthrax.net>
>         <mailto:jb@nanthrax.net <ma...@nanthrax.net>>
>                                                <mailto:jb@nanthrax.net
>         <ma...@nanthrax.net>
> 
>                                      <mailto:jb@nanthrax.net
>         <ma...@nanthrax.net>>>> wrote:
> 
>                                                    Hi David,
> 
>                                                    Generally speaking, having
>         different
>                                      fluent DSL on top of the
>                                      Beam
>                                                    SDK is great.
> 
>                                                    I would like to take a look
>         on your
>                                      wordcount examples to give
>                                      you a
>                                                    complete feedback. I like the
>         idea and
>                                      a fluent Java DSL is
>                                      valuable.
> 
>                                                    Let's wait feedback from
>         others. If we
>                                      have a consensus, then
>                                      I
>                                                    would be more than happy to
>         help you
>                                      for the donation (I
>                                      worked on
>                                                    the Camel Java DSL while ago,
>         so I
>                                      have some experience here).
> 
>                                                    Thanks !
>                                                    Regards
>                                                    JB
> 
>                                                    On 12/17/2017 07:00 PM, David
>         Morávek
>                                      wrote:
> 
>                                                        Hello,
> 
> 
>                                                        First of all, thanks for the
>                                      amazing work the Apache Beam
>                                                        community is doing!
> 
> 
>                                                        In 2014, we've started
>         development
>                                      of the runtime
>                                      independent
>                                                        Java 8 API, that helps us to
>                                      create unified big-data
>                                      processing
>                                                        flows. It has been used
>         as a core
>                                      building block of
>                                      Seznam.cz
>                                                        web crawler data
>         infrastructure
>                                      every since. Its design
>                                                        principles and execution
>         model are
>                                      very similar to Apache
>                                      Beam.
> 
> 
>                                                        This API was open sourced
>         in 2016,
>                                      under the name Euphoria
>                                      API:
> 
>         https://github.com/seznam/euphoria <https://github.com/seznam/euphoria>
>                                      <https://github.com/seznam/euphoria
>         <https://github.com/seznam/euphoria>>
>                                      <https://github.com/seznam/euphoria
>         <https://github.com/seznam/euphoria>
>                                      <https://github.com/seznam/euphoria
>         <https://github.com/seznam/euphoria>>>
> 
> 
>                                                        As it is very similar to
>         Apache
>                                      Beam, we feel, that it is
>                                      not
>                                                        worth of duplicating
>         effort in
>                                      terms of development of new
>                                                        runtimes and fine-tuning of
>                                      current ones.
> 
> 
>                                                        The main blocker for us
>         to switch
>                                      to Apache Beam is lack
>                                      of the
>                                                        Java 8 API. *W*e propose the
>                                      integration of Euphoria API
>                                      into
>                                                        Apache Beam as a Java 8
>         DSL, in
>                                      order to share our effort
>                                      with
>                                                        the community.
> 
> 
>                                                        Simple example of the
>         Euphoria API
>                                      usage, can be found
>                                      here:
> 
> 
>         https://github.com/seznam/euphoria/tree/master/euphoria-examples/src/main/java/cz/seznam/euphoria/examples/wordcount
>         <https://github.com/seznam/euphoria/tree/master/euphoria-examples/src/main/java/cz/seznam/euphoria/examples/wordcount>
>                                     
>         <https://github.com/seznam/euphoria/tree/master/euphoria-examples/src/main/java/cz/seznam/euphoria/examples/wordcount
>         <https://github.com/seznam/euphoria/tree/master/euphoria-examples/src/main/java/cz/seznam/euphoria/examples/wordcount>>
> 
> 
>                                     
>         <https://github.com/seznam/euphoria/tree/master/euphoria-examples/src/main/java/cz/seznam/euphoria/examples/wordcount
>         <https://github.com/seznam/euphoria/tree/master/euphoria-examples/src/main/java/cz/seznam/euphoria/examples/wordcount>
>                                     
>         <https://github.com/seznam/euphoria/tree/master/euphoria-examples/src/main/java/cz/seznam/euphoria/examples/wordcount
>         <https://github.com/seznam/euphoria/tree/master/euphoria-examples/src/main/java/cz/seznam/euphoria/examples/wordcount>>>
> 
> 
> 
>                                                        If you feel, that Beam
>         community
>                                      could leverage from our
>                                      work,
>                                                        we would love to start
>         working on
>                                      Euphoria integration
>                                      into
>                                                        Apache Beam (we already
>         have a
>                                      working POC, with few basic
>                                                        operators implemented).
> 
> 
>                                                        I look forward to hearing
>         from you,
> 
>                                                        David
> 
> 
>                                                    --             Jean-Baptiste
>         Onofré
>         jbonofre@apache.org <ma...@apache.org>
>         <mailto:jbonofre@apache.org <ma...@apache.org>>
>                                      <mailto:jbonofre@apache.org
>         <ma...@apache.org>
>                                      <mailto:jbonofre@apache.org
>         <ma...@apache.org>>>
>         http://blog.nanthrax.net
>                                                    Talend - http://www.talend.com
> 
> 
> 
> 
> 
>                                      --                             s pozdravem
> 
>                                      David Morávek
> 
> 
>                                  --                         Jean-Baptiste Onofré
>         jbonofre@apache.org <ma...@apache.org>
>         <mailto:jbonofre@apache.org <ma...@apache.org>>
>         http://blog.nanthrax.net
>                                  Talend - http://www.talend.com
> 
> 
> 
> 
> 
>              --     Jean-Baptiste Onofré
>         jbonofre@apache.org <ma...@apache.org>
>         <mailto:jbonofre@apache.org <ma...@apache.org>>
>         http://blog.nanthrax.net
>              Talend - http://www.talend.com
> 
> 
> 
> 
>         -- 
>         s pozdravem
> 
>         David Morávek
> 
> 
>     -- 
>     Jean-Baptiste Onofré
>     jbonofre@apache.org <ma...@apache.org>
>     http://blog.nanthrax.net
>     Talend - http://www.talend.com
> 
> 

-- 
Jean-Baptiste Onofré
jbonofre@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com

Re: Euphoria Java 8 DSL - proposal

Posted by David Morávek <da...@gmail.com>.
Hello JB,

Perfect! I'm already on the Beam Slack workspace, I'll contact you once I
get to the office.

Thanks!
D.

On Wed, Jan 3, 2018 at 6:19 AM, Jean-Baptiste Onofré <jb...@nanthrax.net>
wrote:

> Hi David,
>
> absolutely !! Let's move forward on the preparation steps.
>
> Are you on Slack and/or hangout to plan this ?
>
> Thanks,
> Regards
> JB
>
> On 01/02/2018 05:35 PM, David Morávek wrote:
>
>> Hello JB,
>>
>> can we help in any way to move things forward?
>>
>> Thanks,
>> D.
>>
>> On Mon, Dec 18, 2017 at 4:28 PM, Jean-Baptiste Onofré <jb@nanthrax.net
>> <ma...@nanthrax.net>> wrote:
>>
>>     Thanks Jan,
>>
>>     It makes sense.
>>
>>     Let me take a look on the code to understand the "interaction".
>>
>>     Regards
>>     JB
>>
>>
>>     On 12/18/2017 04:26 PM, Jan Lukavský wrote:
>>
>>         Hi JB,
>>
>>         basically you are not wrong. The project started about three or
>> four
>>         years ago with a goal to unify batch and streaming processing into
>>         single portable, executor independent API. Because of that, it is
>>         currently "close" to Beam in this sense. But we don't see much
>> added
>>         value keeping this as a separate project, with one of the key
>>         differences to be the API (not the model itself), so we would
>> like to
>>         focus on translation from Euphoria API to Beam's SDK. That's why
>> we
>>         would like to see it as a DSL, so that it would be possible to use
>>         Euphoria API with Beam's runners as much natively as possible.
>>
>>         I hope I didn't make the subject even more unclear, if so, I'll
>> be happy
>>         to explain anything in more detail. :-)
>>
>>             Jan
>>
>>
>>         On 12/18/2017 04:08 PM, Jean-Baptiste Onofré wrote:
>>
>>             Hi Jan,
>>
>>             Thanks for your answers.
>>
>>             However, they confused me ;)
>>
>>             Regarding what you replied, Euphoria seems like a programming
>>             model/SDK "close" to Beam more than a DSL on top of an
>> existing Beam
>>             SDK.
>>
>>             Am I wrong ?
>>
>>             Regards
>>             JB
>>
>>             On 12/18/2017 03:44 PM, Jan Lukavský wrote:
>>
>>                 Hi Ismael,
>>
>>                 basically we adopted the Beam's design regarding
>> partitioning
>>                 (https://github.com/seznam/euphoria/issues/160
>>                 <https://github.com/seznam/euphoria/issues/160>) and
>> implemented
>>                 the sorting manually
>>                 (https://github.com/seznam/euphoria/issues/158
>>                 <https://github.com/seznam/euphoria/issues/158>). I'm
>> not aware
>>                 of the time model differences (Euphoria supports
>> ingestion and
>>                 event time, we don't support processing time by decision).
>>                 Regarding other differences (looking into Beam capability
>>                 matrix, I'd say that):
>>
>>                    - we don't support stateful FlatMap (i.e. ParDo) for
>> now
>>                 (https://github.com/seznam/euphoria/issues/192
>>                 <https://github.com/seznam/euphoria/issues/192>)
>>
>>                    - we don't support side inputs (by decision now, but
>> might be
>>                 reconsidered) and outputs
>>                 (https://github.com/seznam/euphoria/issues/124
>>                 <https://github.com/seznam/euphoria/issues/124>)
>>
>>
>>                    - we support complete event-time windows (non-merging,
>>                 merging, aligned, unaligned) and time control
>>
>>                    - we don't support processing time by decision (might
>> be
>>                 reconsidered if a valid use-case is found)
>>
>>                    - we support window triggering based on both time and
>> data,
>>                 including discarding and accumulating (without
>> accumulating &
>>                 retracting)
>>
>>                 All our executors (runners) - Flink, Spark and Local -
>> implement
>>                 the complete model, which we enforce using "operator test
>> kit"
>>                 that all executors must pass. Spark executor supports
>> bounded
>>                 sources only (for now). As David said, we currently don't
>> have
>>                 serialization abstraction, so there is some work to be
>> done in
>>                 that regard.
>>
>>                 Our intention is to completely supersede Euphoria, we
>> would like
>>                 to consider possibility to use executors that would not
>> rely on
>>                 Beam, but that is optional now and should be
>> straightforward.
>>
>>                 We'd be happy to answer any more questions you might have
>> and
>>                 thanks a lot!
>>
>>                 Best,
>>
>>                    Jan
>>
>>
>>                 On 12/18/2017 03:19 PM, Ismaël Mejía wrote:
>>
>>                     Hi,
>>
>>                     It is great to see that you guys have achieved a
>> maturity
>>                     point to
>>                     propose this. Congratulations for your work and the
>> idea to
>>                     contribute
>>                     it into Beam.
>>
>>                     I remember from a previous discussion with Jan about
>> the model
>>                     mismatch between Euphoria and Beam, because of some
>> design
>>                     decisions
>>                     of both projects. I remember you guys had some issues
>> with
>>                     the way
>>                     Beam's sources do partitioning, as well as Beam's
>> lack of
>>                     sorted data
>>                     (on shuffle a la hadoop). Also if I remember well the
>> 'time'
>>                     model of
>>                     Euphoria was simpler than Beam's. I talk about all of
>> this
>>                     because I
>>                     am curious about what parts of the Euphoria model you
>> guys
>>                     had to
>>                     sacrifice to support Beam, and what parts of Beam's
>> model
>>                     should still
>>                     be integrated into Euphoria (and if there is a
>>                     straightforward path to
>>                     do it).
>>
>>                     If I understand well if this gets merged into Apache
>> this
>>                     means that
>>                     Euphoria's current implementation would be superseded
>> by
>>                     this DSL? I
>>                     am curious because I would like to understand your
>> level of
>>                     investment
>>                     on supporting the future of this DSL.
>>
>>                     Thanks and congrats again !
>>                     Ismaël
>>
>>                     On Mon, Dec 18, 2017 at 10:12 AM, Jean-Baptiste Onofré
>>                     <jb@nanthrax.net <ma...@nanthrax.net>> wrote:
>>
>>                         Depending of the donation, you would need ICLA
>> for each
>>                         contributor, and
>>                         CCLA in addition of SGA.
>>
>>                         We can sync with Davor and I for the legal stuff.
>>                         However, I would wait a little bit just to have
>> feedback
>>                         from the whole team
>>                         and start a formal vote.
>>
>>                         I would be happy to start the formal vote.
>>
>>                         Regards
>>                         JB
>>
>>                         On 12/18/2017 10:03 AM, David Morávek wrote:
>>
>>                             Hello,
>>
>>                             Thanks for the awesome feedback!
>>
>>                             Romain:
>>
>>                             We already use Java Stream API in all
>> operators
>>                             where it makes sense (eg.:
>>                             ReduceByKey). Still not sure if it was a good
>>                             choice, but i can be easily
>>                             converted to iterator anyway.
>>
>>                             Side outputs support is coming soon, we
>> already made
>>                             an initial work on
>>                             this.
>>
>>                             Side inputs are not supported in a way you
>> are used
>>                             to from beam, because
>>                             it can be replaced by Join operator on the
>> same key
>>                             (if annotated with
>>                             broadcastHashJoin, it will be turned into map
>> side
>>                             join).
>>
>>                             Only significant difference from Beam is,
>> that we
>>                             decided not to abstract
>>                             serialization, so we need to add support for
>> Type
>>                             Hints, because of type
>>                             erasure.
>>
>>                             Fluent API:
>>
>>                             API is fluent within one operator. It is
>> designed to
>>                             "lead the
>>                             programmer", which means, that he we'll be
>> only
>>                             offered methods that makes
>>                             sense after the last method he used (eg.: in
>>                             ReduceByKey, we know that after
>>                             keyBy either reduceBy method should come). It
>> is
>>                             implemented as a series of
>>                             builders.
>>
>>                             Davor:
>>
>>                             Thanks, I'll contact you, and will start the
>> process
>>                             of having all the
>>                             necessary paperwork signed on our side, so we
>> can
>>                             get things moving.
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>                             On Mon, Dec 18, 2017 at 7:46 AM, Romain
>> Manni-Bucau
>>                             <rmannibucau@gmail.com <mailto:
>> rmannibucau@gmail.com>
>>                             <mailto:rmannibucau@gmail.com
>>                             <ma...@gmail.com>>> wrote:
>>
>>                                   Hi guys
>>
>>                                   A DSL would be very welcomed, in
>> particular if
>>                             fluent.
>>
>>                                   Open question: did you study to
>> implement
>>                             Stream API (surely extending
>>                             it to
>>                                   have a BeamStream and a few more
>> features like
>>                             sides etc)? Would be
>>                             very
>>                                   natural and integrable easily anywhere
>> and
>>                             avoid a new API discovery.
>>
>>                                   Hazelcast jet did it so I dont see why
>> Beam
>>                             couldnt.
>>
>>                                   Le 18 déc. 2017 07:26, "Davor Bonaci"
>>                             <davor@apache.org <ma...@apache.org>
>>                                   <mailto:davor@apache.org
>>
>>                             <ma...@apache.org>>> a écrit :
>>
>>                                       Hi David,
>>                                       As JB noted, merging of these two
>> projects
>>                             is a great idea. If
>>                             fact,
>>                                       some of us have had those
>> discussions in
>>                             the past.
>>
>>                                       Legally, nothing particular is
>> strictly
>>                             necessary as the code seem
>>                             to
>>                                       already be Apache 2.0 licensed. We
>> don't,
>>                             however, want to be
>>                             perceived
>>                                       as making hostile forks, so it
>> would be
>>                             great to file a Software
>>                             Grant
>>                                       Agreement with the ASF Secretary. I
>> can
>>                             help with the process, as
>>                             necessary.
>>
>>                                       Project alignment-wise, there
>> aren't any
>>                             particular blockers that
>>                             I am
>>                                       aware of. We welcome DSLs.
>>
>>                                       Technically, the code would start
>> in a
>>                             feature branch. During this
>>                                       stage, we'd need to validate a few
>> things,
>>                             including confirmation
>>                             the
>>                                       code and dependencies match the ASF
>>                             policy, automate testing in
>>                             Beam's
>>                                       tooling, etc. At that point, we'd
>> take a
>>                             community vote to accept
>>                             the
>>                                       component into master, and consider
>>                             author(s) for committership in
>>                             the
>>                                       overall project.
>>
>>                                       Welcome to the ASF and Beam -- we
>> are
>>                             thrilled to have you! Hope
>>                             this
>>                                       helps, and please reach out if
>> anybody on
>>                             our end can help,
>>                             including JB
>>                                       or myself.
>>
>>                                       Davor
>>
>>
>>                                       On Sun, Dec 17, 2017 at 10:13 AM,
>>                             Jean-Baptiste Onofré
>>                             <jb@nanthrax.net <ma...@nanthrax.net>
>>                                       <mailto:jb@nanthrax.net
>>
>>                             <ma...@nanthrax.net>>> wrote:
>>
>>                                           Hi David,
>>
>>                                           Generally speaking, having
>> different
>>                             fluent DSL on top of the
>>                             Beam
>>                                           SDK is great.
>>
>>                                           I would like to take a look on
>> your
>>                             wordcount examples to give
>>                             you a
>>                                           complete feedback. I like the
>> idea and
>>                             a fluent Java DSL is
>>                             valuable.
>>
>>                                           Let's wait feedback from
>> others. If we
>>                             have a consensus, then
>>                             I
>>                                           would be more than happy to
>> help you
>>                             for the donation (I
>>                             worked on
>>                                           the Camel Java DSL while ago,
>> so I
>>                             have some experience here).
>>
>>                                           Thanks !
>>                                           Regards
>>                                           JB
>>
>>                                           On 12/17/2017 07:00 PM, David
>> Morávek
>>                             wrote:
>>
>>                                               Hello,
>>
>>
>>                                               First of all, thanks for the
>>                             amazing work the Apache Beam
>>                                               community is doing!
>>
>>
>>                                               In 2014, we've started
>> development
>>                             of the runtime
>>                             independent
>>                                               Java 8 API, that helps us to
>>                             create unified big-data
>>                             processing
>>                                               flows. It has been used as
>> a core
>>                             building block of
>>                             Seznam.cz
>>                                               web crawler data
>> infrastructure
>>                             every since. Its design
>>                                               principles and execution
>> model are
>>                             very similar to Apache
>>                             Beam.
>>
>>
>>                                               This API was open sourced
>> in 2016,
>>                             under the name Euphoria
>>                             API:
>>
>>                             https://github.com/seznam/euphoria
>>                             <https://github.com/seznam/euphoria>
>>                             <https://github.com/seznam/euphoria
>>                             <https://github.com/seznam/euphoria>>
>>
>>
>>                                               As it is very similar to
>> Apache
>>                             Beam, we feel, that it is
>>                             not
>>                                               worth of duplicating effort
>> in
>>                             terms of development of new
>>                                               runtimes and fine-tuning of
>>                             current ones.
>>
>>
>>                                               The main blocker for us to
>> switch
>>                             to Apache Beam is lack
>>                             of the
>>                                               Java 8 API. *W*e propose the
>>                             integration of Euphoria API
>>                             into
>>                                               Apache Beam as a Java 8
>> DSL, in
>>                             order to share our effort
>>                             with
>>                                               the community.
>>
>>
>>                                               Simple example of the
>> Euphoria API
>>                             usage, can be found
>>                             here:
>>
>>
>>                             https://github.com/seznam/euph
>> oria/tree/master/euphoria-examples/src/main/java/cz/seznam/
>> euphoria/examples/wordcount
>>                             <https://github.com/seznam/eup
>> horia/tree/master/euphoria-examples/src/main/java/cz/seznam/
>> euphoria/examples/wordcount>
>>
>>
>>                             <https://github.com/seznam/eup
>> horia/tree/master/euphoria-examples/src/main/java/cz/seznam/
>> euphoria/examples/wordcount
>>                             <https://github.com/seznam/eup
>> horia/tree/master/euphoria-examples/src/main/java/cz/seznam/
>> euphoria/examples/wordcount>>
>>
>>
>>
>>                                               If you feel, that Beam
>> community
>>                             could leverage from our
>>                             work,
>>                                               we would love to start
>> working on
>>                             Euphoria integration
>>                             into
>>                                               Apache Beam (we already
>> have a
>>                             working POC, with few basic
>>                                               operators implemented).
>>
>>
>>                                               I look forward to hearing
>> from you,
>>
>>                                               David
>>
>>
>>                                           --             Jean-Baptiste
>> Onofré
>>                             jbonofre@apache.org <mailto:
>> jbonofre@apache.org>
>>                             <mailto:jbonofre@apache.org
>>                             <ma...@apache.org>>
>>                             http://blog.nanthrax.net
>>                                           Talend - http://www.talend.com
>>
>>
>>
>>
>>
>>                             --                             s pozdravem
>>
>>                             David Morávek
>>
>>
>>                         --                         Jean-Baptiste Onofré
>>                         jbonofre@apache.org <ma...@apache.org>
>>                         http://blog.nanthrax.net
>>                         Talend - http://www.talend.com
>>
>>
>>
>>
>>
>>     --     Jean-Baptiste Onofré
>>     jbonofre@apache.org <ma...@apache.org>
>>     http://blog.nanthrax.net
>>     Talend - http://www.talend.com
>>
>>
>>
>>
>> --
>> s pozdravem
>>
>> David Morávek
>>
>
> --
> Jean-Baptiste Onofré
> jbonofre@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>

Re: Euphoria Java 8 DSL - proposal

Posted by Jean-Baptiste Onofré <jb...@nanthrax.net>.
Hi David,

absolutely !! Let's move forward on the preparation steps.

Are you on Slack and/or hangout to plan this ?

Thanks,
Regards
JB

On 01/02/2018 05:35 PM, David Morávek wrote:
> Hello JB,
> 
> can we help in any way to move things forward?
> 
> Thanks,
> D.
> 
> On Mon, Dec 18, 2017 at 4:28 PM, Jean-Baptiste Onofré <jb@nanthrax.net 
> <ma...@nanthrax.net>> wrote:
> 
>     Thanks Jan,
> 
>     It makes sense.
> 
>     Let me take a look on the code to understand the "interaction".
> 
>     Regards
>     JB
> 
> 
>     On 12/18/2017 04:26 PM, Jan Lukavský wrote:
> 
>         Hi JB,
> 
>         basically you are not wrong. The project started about three or four
>         years ago with a goal to unify batch and streaming processing into
>         single portable, executor independent API. Because of that, it is
>         currently "close" to Beam in this sense. But we don't see much added
>         value keeping this as a separate project, with one of the key
>         differences to be the API (not the model itself), so we would like to
>         focus on translation from Euphoria API to Beam's SDK. That's why we
>         would like to see it as a DSL, so that it would be possible to use
>         Euphoria API with Beam's runners as much natively as possible.
> 
>         I hope I didn't make the subject even more unclear, if so, I'll be happy
>         to explain anything in more detail. :-)
> 
>             Jan
> 
> 
>         On 12/18/2017 04:08 PM, Jean-Baptiste Onofré wrote:
> 
>             Hi Jan,
> 
>             Thanks for your answers.
> 
>             However, they confused me ;)
> 
>             Regarding what you replied, Euphoria seems like a programming
>             model/SDK "close" to Beam more than a DSL on top of an existing Beam
>             SDK.
> 
>             Am I wrong ?
> 
>             Regards
>             JB
> 
>             On 12/18/2017 03:44 PM, Jan Lukavský wrote:
> 
>                 Hi Ismael,
> 
>                 basically we adopted the Beam's design regarding partitioning
>                 (https://github.com/seznam/euphoria/issues/160
>                 <https://github.com/seznam/euphoria/issues/160>) and implemented
>                 the sorting manually
>                 (https://github.com/seznam/euphoria/issues/158
>                 <https://github.com/seznam/euphoria/issues/158>). I'm not aware
>                 of the time model differences (Euphoria supports ingestion and
>                 event time, we don't support processing time by decision).
>                 Regarding other differences (looking into Beam capability
>                 matrix, I'd say that):
> 
>                    - we don't support stateful FlatMap (i.e. ParDo) for now
>                 (https://github.com/seznam/euphoria/issues/192
>                 <https://github.com/seznam/euphoria/issues/192>)
> 
>                    - we don't support side inputs (by decision now, but might be
>                 reconsidered) and outputs
>                 (https://github.com/seznam/euphoria/issues/124
>                 <https://github.com/seznam/euphoria/issues/124>)
> 
>                    - we support complete event-time windows (non-merging,
>                 merging, aligned, unaligned) and time control
> 
>                    - we don't support processing time by decision (might be
>                 reconsidered if a valid use-case is found)
> 
>                    - we support window triggering based on both time and data,
>                 including discarding and accumulating (without accumulating &
>                 retracting)
> 
>                 All our executors (runners) - Flink, Spark and Local - implement
>                 the complete model, which we enforce using "operator test kit"
>                 that all executors must pass. Spark executor supports bounded
>                 sources only (for now). As David said, we currently don't have
>                 serialization abstraction, so there is some work to be done in
>                 that regard.
> 
>                 Our intention is to completely supersede Euphoria, we would like
>                 to consider possibility to use executors that would not rely on
>                 Beam, but that is optional now and should be straightforward.
> 
>                 We'd be happy to answer any more questions you might have and
>                 thanks a lot!
> 
>                 Best,
> 
>                    Jan
> 
> 
>                 On 12/18/2017 03:19 PM, Ismaël Mejía wrote:
> 
>                     Hi,
> 
>                     It is great to see that you guys have achieved a maturity
>                     point to
>                     propose this. Congratulations for your work and the idea to
>                     contribute
>                     it into Beam.
> 
>                     I remember from a previous discussion with Jan about the model
>                     mismatch between Euphoria and Beam, because of some design
>                     decisions
>                     of both projects. I remember you guys had some issues with
>                     the way
>                     Beam's sources do partitioning, as well as Beam's lack of
>                     sorted data
>                     (on shuffle a la hadoop). Also if I remember well the 'time'
>                     model of
>                     Euphoria was simpler than Beam's. I talk about all of this
>                     because I
>                     am curious about what parts of the Euphoria model you guys
>                     had to
>                     sacrifice to support Beam, and what parts of Beam's model
>                     should still
>                     be integrated into Euphoria (and if there is a
>                     straightforward path to
>                     do it).
> 
>                     If I understand well if this gets merged into Apache this
>                     means that
>                     Euphoria's current implementation would be superseded by
>                     this DSL? I
>                     am curious because I would like to understand your level of
>                     investment
>                     on supporting the future of this DSL.
> 
>                     Thanks and congrats again !
>                     Ismaël
> 
>                     On Mon, Dec 18, 2017 at 10:12 AM, Jean-Baptiste Onofré
>                     <jb@nanthrax.net <ma...@nanthrax.net>> wrote:
> 
>                         Depending of the donation, you would need ICLA for each
>                         contributor, and
>                         CCLA in addition of SGA.
> 
>                         We can sync with Davor and I for the legal stuff.
>                         However, I would wait a little bit just to have feedback
>                         from the whole team
>                         and start a formal vote.
> 
>                         I would be happy to start the formal vote.
> 
>                         Regards
>                         JB
> 
>                         On 12/18/2017 10:03 AM, David Morávek wrote:
> 
>                             Hello,
> 
>                             Thanks for the awesome feedback!
> 
>                             Romain:
> 
>                             We already use Java Stream API in all operators
>                             where it makes sense (eg.:
>                             ReduceByKey). Still not sure if it was a good
>                             choice, but i can be easily
>                             converted to iterator anyway.
> 
>                             Side outputs support is coming soon, we already made
>                             an initial work on
>                             this.
> 
>                             Side inputs are not supported in a way you are used
>                             to from beam, because
>                             it can be replaced by Join operator on the same key
>                             (if annotated with
>                             broadcastHashJoin, it will be turned into map side
>                             join).
> 
>                             Only significant difference from Beam is, that we
>                             decided not to abstract
>                             serialization, so we need to add support for Type
>                             Hints, because of type
>                             erasure.
> 
>                             Fluent API:
> 
>                             API is fluent within one operator. It is designed to
>                             "lead the
>                             programmer", which means, that he we'll be only
>                             offered methods that makes
>                             sense after the last method he used (eg.: in
>                             ReduceByKey, we know that after
>                             keyBy either reduceBy method should come). It is
>                             implemented as a series of
>                             builders.
> 
>                             Davor:
> 
>                             Thanks, I'll contact you, and will start the process
>                             of having all the
>                             necessary paperwork signed on our side, so we can
>                             get things moving.
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
>                             On Mon, Dec 18, 2017 at 7:46 AM, Romain Manni-Bucau
>                             <rmannibucau@gmail.com <ma...@gmail.com>
>                             <mailto:rmannibucau@gmail.com
>                             <ma...@gmail.com>>> wrote:
> 
>                                   Hi guys
> 
>                                   A DSL would be very welcomed, in particular if
>                             fluent.
> 
>                                   Open question: did you study to implement
>                             Stream API (surely extending
>                             it to
>                                   have a BeamStream and a few more features like
>                             sides etc)? Would be
>                             very
>                                   natural and integrable easily anywhere and
>                             avoid a new API discovery.
> 
>                                   Hazelcast jet did it so I dont see why Beam
>                             couldnt.
> 
>                                   Le 18 déc. 2017 07:26, "Davor Bonaci"
>                             <davor@apache.org <ma...@apache.org>
>                                   <mailto:davor@apache.org
>                             <ma...@apache.org>>> a écrit :
> 
>                                       Hi David,
>                                       As JB noted, merging of these two projects
>                             is a great idea. If
>                             fact,
>                                       some of us have had those discussions in
>                             the past.
> 
>                                       Legally, nothing particular is strictly
>                             necessary as the code seem
>                             to
>                                       already be Apache 2.0 licensed. We don't,
>                             however, want to be
>                             perceived
>                                       as making hostile forks, so it would be
>                             great to file a Software
>                             Grant
>                                       Agreement with the ASF Secretary. I can
>                             help with the process, as
>                             necessary.
> 
>                                       Project alignment-wise, there aren't any
>                             particular blockers that
>                             I am
>                                       aware of. We welcome DSLs.
> 
>                                       Technically, the code would start in a
>                             feature branch. During this
>                                       stage, we'd need to validate a few things,
>                             including confirmation
>                             the
>                                       code and dependencies match the ASF
>                             policy, automate testing in
>                             Beam's
>                                       tooling, etc. At that point, we'd take a
>                             community vote to accept
>                             the
>                                       component into master, and consider
>                             author(s) for committership in
>                             the
>                                       overall project.
> 
>                                       Welcome to the ASF and Beam -- we are
>                             thrilled to have you! Hope
>                             this
>                                       helps, and please reach out if anybody on
>                             our end can help,
>                             including JB
>                                       or myself.
> 
>                                       Davor
> 
> 
>                                       On Sun, Dec 17, 2017 at 10:13 AM,
>                             Jean-Baptiste Onofré
>                             <jb@nanthrax.net <ma...@nanthrax.net>
>                                       <mailto:jb@nanthrax.net
>                             <ma...@nanthrax.net>>> wrote:
> 
>                                           Hi David,
> 
>                                           Generally speaking, having different
>                             fluent DSL on top of the
>                             Beam
>                                           SDK is great.
> 
>                                           I would like to take a look on your
>                             wordcount examples to give
>                             you a
>                                           complete feedback. I like the idea and
>                             a fluent Java DSL is
>                             valuable.
> 
>                                           Let's wait feedback from others. If we
>                             have a consensus, then
>                             I
>                                           would be more than happy to help you
>                             for the donation (I
>                             worked on
>                                           the Camel Java DSL while ago, so I
>                             have some experience here).
> 
>                                           Thanks !
>                                           Regards
>                                           JB
> 
>                                           On 12/17/2017 07:00 PM, David Morávek
>                             wrote:
> 
>                                               Hello,
> 
> 
>                                               First of all, thanks for the
>                             amazing work the Apache Beam
>                                               community is doing!
> 
> 
>                                               In 2014, we've started development
>                             of the runtime
>                             independent
>                                               Java 8 API, that helps us to
>                             create unified big-data
>                             processing
>                                               flows. It has been used as a core
>                             building block of
>                             Seznam.cz
>                                               web crawler data infrastructure
>                             every since. Its design
>                                               principles and execution model are
>                             very similar to Apache
>                             Beam.
> 
> 
>                                               This API was open sourced in 2016,
>                             under the name Euphoria
>                             API:
> 
>                             https://github.com/seznam/euphoria
>                             <https://github.com/seznam/euphoria>
>                             <https://github.com/seznam/euphoria
>                             <https://github.com/seznam/euphoria>>
> 
> 
>                                               As it is very similar to Apache
>                             Beam, we feel, that it is
>                             not
>                                               worth of duplicating effort in
>                             terms of development of new
>                                               runtimes and fine-tuning of
>                             current ones.
> 
> 
>                                               The main blocker for us to switch
>                             to Apache Beam is lack
>                             of the
>                                               Java 8 API. *W*e propose the
>                             integration of Euphoria API
>                             into
>                                               Apache Beam as a Java 8 DSL, in
>                             order to share our effort
>                             with
>                                               the community.
> 
> 
>                                               Simple example of the Euphoria API
>                             usage, can be found
>                             here:
> 
> 
>                             https://github.com/seznam/euphoria/tree/master/euphoria-examples/src/main/java/cz/seznam/euphoria/examples/wordcount
>                             <https://github.com/seznam/euphoria/tree/master/euphoria-examples/src/main/java/cz/seznam/euphoria/examples/wordcount>
> 
> 
>                             <https://github.com/seznam/euphoria/tree/master/euphoria-examples/src/main/java/cz/seznam/euphoria/examples/wordcount
>                             <https://github.com/seznam/euphoria/tree/master/euphoria-examples/src/main/java/cz/seznam/euphoria/examples/wordcount>>
> 
> 
> 
>                                               If you feel, that Beam community
>                             could leverage from our
>                             work,
>                                               we would love to start working on
>                             Euphoria integration
>                             into
>                                               Apache Beam (we already have a
>                             working POC, with few basic
>                                               operators implemented).
> 
> 
>                                               I look forward to hearing from you,
> 
>                                               David
> 
> 
>                                           --             Jean-Baptiste Onofré
>                             jbonofre@apache.org <ma...@apache.org>
>                             <mailto:jbonofre@apache.org
>                             <ma...@apache.org>>
>                             http://blog.nanthrax.net
>                                           Talend - http://www.talend.com
> 
> 
> 
> 
> 
>                             -- 
>                             s pozdravem
> 
>                             David Morávek
> 
> 
>                         -- 
>                         Jean-Baptiste Onofré
>                         jbonofre@apache.org <ma...@apache.org>
>                         http://blog.nanthrax.net
>                         Talend - http://www.talend.com
> 
> 
> 
> 
> 
>     -- 
>     Jean-Baptiste Onofré
>     jbonofre@apache.org <ma...@apache.org>
>     http://blog.nanthrax.net
>     Talend - http://www.talend.com
> 
> 
> 
> 
> -- 
> s pozdravem
> 
> David Morávek

-- 
Jean-Baptiste Onofré
jbonofre@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com

Re: Euphoria Java 8 DSL - proposal

Posted by David Morávek <da...@gmail.com>.
Hello JB,

can we help in any way to move things forward?

Thanks,
D.

On Mon, Dec 18, 2017 at 4:28 PM, Jean-Baptiste Onofré <jb...@nanthrax.net>
wrote:

> Thanks Jan,
>
> It makes sense.
>
> Let me take a look on the code to understand the "interaction".
>
> Regards
> JB
>
>
> On 12/18/2017 04:26 PM, Jan Lukavský wrote:
>
>> Hi JB,
>>
>> basically you are not wrong. The project started about three or four
>> years ago with a goal to unify batch and streaming processing into single
>> portable, executor independent API. Because of that, it is currently
>> "close" to Beam in this sense. But we don't see much added value keeping
>> this as a separate project, with one of the key differences to be the API
>> (not the model itself), so we would like to focus on translation from
>> Euphoria API to Beam's SDK. That's why we would like to see it as a DSL, so
>> that it would be possible to use Euphoria API with Beam's runners as much
>> natively as possible.
>>
>> I hope I didn't make the subject even more unclear, if so, I'll be happy
>> to explain anything in more detail. :-)
>>
>>    Jan
>>
>>
>> On 12/18/2017 04:08 PM, Jean-Baptiste Onofré wrote:
>>
>>> Hi Jan,
>>>
>>> Thanks for your answers.
>>>
>>> However, they confused me ;)
>>>
>>> Regarding what you replied, Euphoria seems like a programming model/SDK
>>> "close" to Beam more than a DSL on top of an existing Beam SDK.
>>>
>>> Am I wrong ?
>>>
>>> Regards
>>> JB
>>>
>>> On 12/18/2017 03:44 PM, Jan Lukavský wrote:
>>>
>>>> Hi Ismael,
>>>>
>>>> basically we adopted the Beam's design regarding partitioning (
>>>> https://github.com/seznam/euphoria/issues/160) and implemented the
>>>> sorting manually (https://github.com/seznam/euphoria/issues/158). I'm
>>>> not aware of the time model differences (Euphoria supports ingestion and
>>>> event time, we don't support processing time by decision). Regarding other
>>>> differences (looking into Beam capability matrix, I'd say that):
>>>>
>>>>   - we don't support stateful FlatMap (i.e. ParDo) for now (
>>>> https://github.com/seznam/euphoria/issues/192)
>>>>
>>>>   - we don't support side inputs (by decision now, but might be
>>>> reconsidered) and outputs (https://github.com/seznam/eup
>>>> horia/issues/124)
>>>>
>>>>   - we support complete event-time windows (non-merging, merging,
>>>> aligned, unaligned) and time control
>>>>
>>>>   - we don't support processing time by decision (might be reconsidered
>>>> if a valid use-case is found)
>>>>
>>>>   - we support window triggering based on both time and data, including
>>>> discarding and accumulating (without accumulating & retracting)
>>>>
>>>> All our executors (runners) - Flink, Spark and Local - implement the
>>>> complete model, which we enforce using "operator test kit" that all
>>>> executors must pass. Spark executor supports bounded sources only (for
>>>> now). As David said, we currently don't have serialization abstraction, so
>>>> there is some work to be done in that regard.
>>>>
>>>> Our intention is to completely supersede Euphoria, we would like to
>>>> consider possibility to use executors that would not rely on Beam, but that
>>>> is optional now and should be straightforward.
>>>>
>>>> We'd be happy to answer any more questions you might have and thanks a
>>>> lot!
>>>>
>>>> Best,
>>>>
>>>>   Jan
>>>>
>>>>
>>>> On 12/18/2017 03:19 PM, Ismaël Mejía wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> It is great to see that you guys have achieved a maturity point to
>>>>> propose this. Congratulations for your work and the idea to contribute
>>>>> it into Beam.
>>>>>
>>>>> I remember from a previous discussion with Jan about the model
>>>>> mismatch between Euphoria and Beam, because of some design decisions
>>>>> of both projects. I remember you guys had some issues with the way
>>>>> Beam's sources do partitioning, as well as Beam's lack of sorted data
>>>>> (on shuffle a la hadoop). Also if I remember well the 'time' model of
>>>>> Euphoria was simpler than Beam's. I talk about all of this because I
>>>>> am curious about what parts of the Euphoria model you guys had to
>>>>> sacrifice to support Beam, and what parts of Beam's model should still
>>>>> be integrated into Euphoria (and if there is a straightforward path to
>>>>> do it).
>>>>>
>>>>> If I understand well if this gets merged into Apache this means that
>>>>> Euphoria's current implementation would be superseded by this DSL? I
>>>>> am curious because I would like to understand your level of investment
>>>>> on supporting the future of this DSL.
>>>>>
>>>>> Thanks and congrats again !
>>>>> Ismaël
>>>>>
>>>>> On Mon, Dec 18, 2017 at 10:12 AM, Jean-Baptiste Onofré <
>>>>> jb@nanthrax.net> wrote:
>>>>>
>>>>>> Depending of the donation, you would need ICLA for each contributor,
>>>>>> and
>>>>>> CCLA in addition of SGA.
>>>>>>
>>>>>> We can sync with Davor and I for the legal stuff.
>>>>>> However, I would wait a little bit just to have feedback from the
>>>>>> whole team
>>>>>> and start a formal vote.
>>>>>>
>>>>>> I would be happy to start the formal vote.
>>>>>>
>>>>>> Regards
>>>>>> JB
>>>>>>
>>>>>> On 12/18/2017 10:03 AM, David Morávek wrote:
>>>>>>
>>>>>>> Hello,
>>>>>>>
>>>>>>> Thanks for the awesome feedback!
>>>>>>>
>>>>>>> Romain:
>>>>>>>
>>>>>>> We already use Java Stream API in all operators where it makes sense
>>>>>>> (eg.:
>>>>>>> ReduceByKey). Still not sure if it was a good choice, but i can be
>>>>>>> easily
>>>>>>> converted to iterator anyway.
>>>>>>>
>>>>>>> Side outputs support is coming soon, we already made an initial work
>>>>>>> on
>>>>>>> this.
>>>>>>>
>>>>>>> Side inputs are not supported in a way you are used to from beam,
>>>>>>> because
>>>>>>> it can be replaced by Join operator on the same key (if annotated
>>>>>>> with
>>>>>>> broadcastHashJoin, it will be turned into map side join).
>>>>>>>
>>>>>>> Only significant difference from Beam is, that we decided not to
>>>>>>> abstract
>>>>>>> serialization, so we need to add support for Type Hints, because of
>>>>>>> type
>>>>>>> erasure.
>>>>>>>
>>>>>>> Fluent API:
>>>>>>>
>>>>>>> API is fluent within one operator. It is designed to "lead the
>>>>>>> programmer", which means, that he we'll be only offered methods that
>>>>>>> makes
>>>>>>> sense after the last method he used (eg.: in ReduceByKey, we know
>>>>>>> that after
>>>>>>> keyBy either reduceBy method should come). It is implemented as a
>>>>>>> series of
>>>>>>> builders.
>>>>>>>
>>>>>>> Davor:
>>>>>>>
>>>>>>> Thanks, I'll contact you, and will start the process of having all
>>>>>>> the
>>>>>>> necessary paperwork signed on our side, so we can get things moving.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Dec 18, 2017 at 7:46 AM, Romain Manni-Bucau <
>>>>>>> rmannibucau@gmail.com
>>>>>>> <ma...@gmail.com>> wrote:
>>>>>>>
>>>>>>>      Hi guys
>>>>>>>
>>>>>>>      A DSL would be very welcomed, in particular if fluent.
>>>>>>>
>>>>>>>      Open question: did you study to implement Stream API (surely
>>>>>>> extending
>>>>>>> it to
>>>>>>>      have a BeamStream and a few more features like sides etc)?
>>>>>>> Would be
>>>>>>> very
>>>>>>>      natural and integrable easily anywhere and avoid a new API
>>>>>>> discovery.
>>>>>>>
>>>>>>>      Hazelcast jet did it so I dont see why Beam couldnt.
>>>>>>>
>>>>>>>      Le 18 déc. 2017 07:26, "Davor Bonaci" <davor@apache.org
>>>>>>>      <ma...@apache.org>> a écrit :
>>>>>>>
>>>>>>>          Hi David,
>>>>>>>          As JB noted, merging of these two projects is a great idea.
>>>>>>> If
>>>>>>> fact,
>>>>>>>          some of us have had those discussions in the past.
>>>>>>>
>>>>>>>          Legally, nothing particular is strictly necessary as the
>>>>>>> code seem
>>>>>>> to
>>>>>>>          already be Apache 2.0 licensed. We don't, however, want to
>>>>>>> be
>>>>>>> perceived
>>>>>>>          as making hostile forks, so it would be great to file a
>>>>>>> Software
>>>>>>> Grant
>>>>>>>          Agreement with the ASF Secretary. I can help with the
>>>>>>> process, as
>>>>>>> necessary.
>>>>>>>
>>>>>>>          Project alignment-wise, there aren't any particular
>>>>>>> blockers that
>>>>>>> I am
>>>>>>>          aware of. We welcome DSLs.
>>>>>>>
>>>>>>>          Technically, the code would start in a feature branch.
>>>>>>> During this
>>>>>>>          stage, we'd need to validate a few things, including
>>>>>>> confirmation
>>>>>>> the
>>>>>>>          code and dependencies match the ASF policy, automate
>>>>>>> testing in
>>>>>>> Beam's
>>>>>>>          tooling, etc. At that point, we'd take a community vote to
>>>>>>> accept
>>>>>>> the
>>>>>>>          component into master, and consider author(s) for
>>>>>>> committership in
>>>>>>> the
>>>>>>>          overall project.
>>>>>>>
>>>>>>>          Welcome to the ASF and Beam -- we are thrilled to have you!
>>>>>>> Hope
>>>>>>> this
>>>>>>>          helps, and please reach out if anybody on our end can help,
>>>>>>> including JB
>>>>>>>          or myself.
>>>>>>>
>>>>>>>          Davor
>>>>>>>
>>>>>>>
>>>>>>>          On Sun, Dec 17, 2017 at 10:13 AM, Jean-Baptiste Onofré
>>>>>>> <jb@nanthrax.net
>>>>>>>          <ma...@nanthrax.net>> wrote:
>>>>>>>
>>>>>>>              Hi David,
>>>>>>>
>>>>>>>              Generally speaking, having different fluent DSL on top
>>>>>>> of the
>>>>>>> Beam
>>>>>>>              SDK is great.
>>>>>>>
>>>>>>>              I would like to take a look on your wordcount examples
>>>>>>> to give
>>>>>>> you a
>>>>>>>              complete feedback. I like the idea and a fluent Java
>>>>>>> DSL is
>>>>>>> valuable.
>>>>>>>
>>>>>>>              Let's wait feedback from others. If we have a
>>>>>>> consensus, then
>>>>>>> I
>>>>>>>              would be more than happy to help you for the donation (I
>>>>>>> worked on
>>>>>>>              the Camel Java DSL while ago, so I have some experience
>>>>>>> here).
>>>>>>>
>>>>>>>              Thanks !
>>>>>>>              Regards
>>>>>>>              JB
>>>>>>>
>>>>>>>              On 12/17/2017 07:00 PM, David Morávek wrote:
>>>>>>>
>>>>>>>                  Hello,
>>>>>>>
>>>>>>>
>>>>>>>                  First of all, thanks for the amazing work the
>>>>>>> Apache Beam
>>>>>>>                  community is doing!
>>>>>>>
>>>>>>>
>>>>>>>                  In 2014, we've started development of the runtime
>>>>>>> independent
>>>>>>>                  Java 8 API, that helps us to create unified big-data
>>>>>>> processing
>>>>>>>                  flows. It has been used as a core building block of
>>>>>>> Seznam.cz
>>>>>>>                  web crawler data infrastructure every since. Its
>>>>>>> design
>>>>>>>                  principles and execution model are very similar to
>>>>>>> Apache
>>>>>>> Beam.
>>>>>>>
>>>>>>>
>>>>>>>                  This API was open sourced in 2016, under the name
>>>>>>> Euphoria
>>>>>>> API:
>>>>>>>
>>>>>>>                  https://github.com/seznam/euphoria
>>>>>>> <https://github.com/seznam/euphoria>
>>>>>>>
>>>>>>>
>>>>>>>                  As it is very similar to Apache Beam, we feel, that
>>>>>>> it is
>>>>>>> not
>>>>>>>                  worth of duplicating effort in terms of development
>>>>>>> of new
>>>>>>>                  runtimes and fine-tuning of current ones.
>>>>>>>
>>>>>>>
>>>>>>>                  The main blocker for us to switch to Apache Beam is
>>>>>>> lack
>>>>>>> of the
>>>>>>>                  Java 8 API. *W*e propose the integration of
>>>>>>> Euphoria API
>>>>>>> into
>>>>>>>                  Apache Beam as a Java 8 DSL, in order to share our
>>>>>>> effort
>>>>>>> with
>>>>>>>                  the community.
>>>>>>>
>>>>>>>
>>>>>>>                  Simple example of the Euphoria API usage, can be
>>>>>>> found
>>>>>>> here:
>>>>>>>
>>>>>>>
>>>>>>> https://github.com/seznam/euphoria/tree/master/euphoria-exam
>>>>>>> ples/src/main/java/cz/seznam/euphoria/examples/wordcount
>>>>>>>
>>>>>>> <https://github.com/seznam/euphoria/tree/master/euphoria-exa
>>>>>>> mples/src/main/java/cz/seznam/euphoria/examples/wordcount>
>>>>>>>
>>>>>>>
>>>>>>>                  If you feel, that Beam community could leverage
>>>>>>> from our
>>>>>>> work,
>>>>>>>                  we would love to start working on Euphoria
>>>>>>> integration
>>>>>>> into
>>>>>>>                  Apache Beam (we already have a working POC, with
>>>>>>> few basic
>>>>>>>                  operators implemented).
>>>>>>>
>>>>>>>
>>>>>>>                  I look forward to hearing from you,
>>>>>>>
>>>>>>>                  David
>>>>>>>
>>>>>>>
>>>>>>>              --             Jean-Baptiste Onofré
>>>>>>>              jbonofre@apache.org <ma...@apache.org>
>>>>>>>              http://blog.nanthrax.net
>>>>>>>              Talend - http://www.talend.com
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> s pozdravem
>>>>>>>
>>>>>>> David Morávek
>>>>>>>
>>>>>>
>>>>>> --
>>>>>> Jean-Baptiste Onofré
>>>>>> jbonofre@apache.org
>>>>>> http://blog.nanthrax.net
>>>>>> Talend - http://www.talend.com
>>>>>>
>>>>>
>>>>
>>>
>>
> --
> Jean-Baptiste Onofré
> jbonofre@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>



-- 
s pozdravem

David Morávek

Re: Euphoria Java 8 DSL - proposal

Posted by Jean-Baptiste Onofré <jb...@nanthrax.net>.
Thanks Jan,

It makes sense.

Let me take a look on the code to understand the "interaction".

Regards
JB

On 12/18/2017 04:26 PM, Jan Lukavský wrote:
> Hi JB,
> 
> basically you are not wrong. The project started about three or four years ago 
> with a goal to unify batch and streaming processing into single portable, 
> executor independent API. Because of that, it is currently "close" to Beam in 
> this sense. But we don't see much added value keeping this as a separate 
> project, with one of the key differences to be the API (not the model itself), 
> so we would like to focus on translation from Euphoria API to Beam's SDK. That's 
> why we would like to see it as a DSL, so that it would be possible to use 
> Euphoria API with Beam's runners as much natively as possible.
> 
> I hope I didn't make the subject even more unclear, if so, I'll be happy to 
> explain anything in more detail. :-)
> 
>    Jan
> 
> 
> On 12/18/2017 04:08 PM, Jean-Baptiste Onofré wrote:
>> Hi Jan,
>>
>> Thanks for your answers.
>>
>> However, they confused me ;)
>>
>> Regarding what you replied, Euphoria seems like a programming model/SDK 
>> "close" to Beam more than a DSL on top of an existing Beam SDK.
>>
>> Am I wrong ?
>>
>> Regards
>> JB
>>
>> On 12/18/2017 03:44 PM, Jan Lukavský wrote:
>>> Hi Ismael,
>>>
>>> basically we adopted the Beam's design regarding partitioning 
>>> (https://github.com/seznam/euphoria/issues/160) and implemented the sorting 
>>> manually (https://github.com/seznam/euphoria/issues/158). I'm not aware of 
>>> the time model differences (Euphoria supports ingestion and event time, we 
>>> don't support processing time by decision). Regarding other differences 
>>> (looking into Beam capability matrix, I'd say that):
>>>
>>>   - we don't support stateful FlatMap (i.e. ParDo) for now 
>>> (https://github.com/seznam/euphoria/issues/192)
>>>
>>>   - we don't support side inputs (by decision now, but might be reconsidered) 
>>> and outputs (https://github.com/seznam/euphoria/issues/124)
>>>
>>>   - we support complete event-time windows (non-merging, merging, aligned, 
>>> unaligned) and time control
>>>
>>>   - we don't support processing time by decision (might be reconsidered if a 
>>> valid use-case is found)
>>>
>>>   - we support window triggering based on both time and data, including 
>>> discarding and accumulating (without accumulating & retracting)
>>>
>>> All our executors (runners) - Flink, Spark and Local - implement the complete 
>>> model, which we enforce using "operator test kit" that all executors must 
>>> pass. Spark executor supports bounded sources only (for now). As David said, 
>>> we currently don't have serialization abstraction, so there is some work to 
>>> be done in that regard.
>>>
>>> Our intention is to completely supersede Euphoria, we would like to consider 
>>> possibility to use executors that would not rely on Beam, but that is 
>>> optional now and should be straightforward.
>>>
>>> We'd be happy to answer any more questions you might have and thanks a lot!
>>>
>>> Best,
>>>
>>>   Jan
>>>
>>>
>>> On 12/18/2017 03:19 PM, Ismaël Mejía wrote:
>>>> Hi,
>>>>
>>>> It is great to see that you guys have achieved a maturity point to
>>>> propose this. Congratulations for your work and the idea to contribute
>>>> it into Beam.
>>>>
>>>> I remember from a previous discussion with Jan about the model
>>>> mismatch between Euphoria and Beam, because of some design decisions
>>>> of both projects. I remember you guys had some issues with the way
>>>> Beam's sources do partitioning, as well as Beam's lack of sorted data
>>>> (on shuffle a la hadoop). Also if I remember well the 'time' model of
>>>> Euphoria was simpler than Beam's. I talk about all of this because I
>>>> am curious about what parts of the Euphoria model you guys had to
>>>> sacrifice to support Beam, and what parts of Beam's model should still
>>>> be integrated into Euphoria (and if there is a straightforward path to
>>>> do it).
>>>>
>>>> If I understand well if this gets merged into Apache this means that
>>>> Euphoria's current implementation would be superseded by this DSL? I
>>>> am curious because I would like to understand your level of investment
>>>> on supporting the future of this DSL.
>>>>
>>>> Thanks and congrats again !
>>>> Ismaël
>>>>
>>>> On Mon, Dec 18, 2017 at 10:12 AM, Jean-Baptiste Onofré <jb...@nanthrax.net> wrote:
>>>>> Depending of the donation, you would need ICLA for each contributor, and
>>>>> CCLA in addition of SGA.
>>>>>
>>>>> We can sync with Davor and I for the legal stuff.
>>>>> However, I would wait a little bit just to have feedback from the whole team
>>>>> and start a formal vote.
>>>>>
>>>>> I would be happy to start the formal vote.
>>>>>
>>>>> Regards
>>>>> JB
>>>>>
>>>>> On 12/18/2017 10:03 AM, David Morávek wrote:
>>>>>> Hello,
>>>>>>
>>>>>> Thanks for the awesome feedback!
>>>>>>
>>>>>> Romain:
>>>>>>
>>>>>> We already use Java Stream API in all operators where it makes sense (eg.:
>>>>>> ReduceByKey). Still not sure if it was a good choice, but i can be easily
>>>>>> converted to iterator anyway.
>>>>>>
>>>>>> Side outputs support is coming soon, we already made an initial work on
>>>>>> this.
>>>>>>
>>>>>> Side inputs are not supported in a way you are used to from beam, because
>>>>>> it can be replaced by Join operator on the same key (if annotated with
>>>>>> broadcastHashJoin, it will be turned into map side join).
>>>>>>
>>>>>> Only significant difference from Beam is, that we decided not to abstract
>>>>>> serialization, so we need to add support for Type Hints, because of type
>>>>>> erasure.
>>>>>>
>>>>>> Fluent API:
>>>>>>
>>>>>> API is fluent within one operator. It is designed to "lead the
>>>>>> programmer", which means, that he we'll be only offered methods that makes
>>>>>> sense after the last method he used (eg.: in ReduceByKey, we know that after
>>>>>> keyBy either reduceBy method should come). It is implemented as a series of
>>>>>> builders.
>>>>>>
>>>>>> Davor:
>>>>>>
>>>>>> Thanks, I'll contact you, and will start the process of having all the
>>>>>> necessary paperwork signed on our side, so we can get things moving.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Mon, Dec 18, 2017 at 7:46 AM, Romain Manni-Bucau <rmannibucau@gmail.com
>>>>>> <ma...@gmail.com>> wrote:
>>>>>>
>>>>>>      Hi guys
>>>>>>
>>>>>>      A DSL would be very welcomed, in particular if fluent.
>>>>>>
>>>>>>      Open question: did you study to implement Stream API (surely extending
>>>>>> it to
>>>>>>      have a BeamStream and a few more features like sides etc)? Would be
>>>>>> very
>>>>>>      natural and integrable easily anywhere and avoid a new API discovery.
>>>>>>
>>>>>>      Hazelcast jet did it so I dont see why Beam couldnt.
>>>>>>
>>>>>>      Le 18 déc. 2017 07:26, "Davor Bonaci" <davor@apache.org
>>>>>>      <ma...@apache.org>> a écrit :
>>>>>>
>>>>>>          Hi David,
>>>>>>          As JB noted, merging of these two projects is a great idea. If
>>>>>> fact,
>>>>>>          some of us have had those discussions in the past.
>>>>>>
>>>>>>          Legally, nothing particular is strictly necessary as the code seem
>>>>>> to
>>>>>>          already be Apache 2.0 licensed. We don't, however, want to be
>>>>>> perceived
>>>>>>          as making hostile forks, so it would be great to file a Software
>>>>>> Grant
>>>>>>          Agreement with the ASF Secretary. I can help with the process, as
>>>>>> necessary.
>>>>>>
>>>>>>          Project alignment-wise, there aren't any particular blockers that
>>>>>> I am
>>>>>>          aware of. We welcome DSLs.
>>>>>>
>>>>>>          Technically, the code would start in a feature branch. During this
>>>>>>          stage, we'd need to validate a few things, including confirmation
>>>>>> the
>>>>>>          code and dependencies match the ASF policy, automate testing in
>>>>>> Beam's
>>>>>>          tooling, etc. At that point, we'd take a community vote to accept
>>>>>> the
>>>>>>          component into master, and consider author(s) for committership in
>>>>>> the
>>>>>>          overall project.
>>>>>>
>>>>>>          Welcome to the ASF and Beam -- we are thrilled to have you! Hope
>>>>>> this
>>>>>>          helps, and please reach out if anybody on our end can help,
>>>>>> including JB
>>>>>>          or myself.
>>>>>>
>>>>>>          Davor
>>>>>>
>>>>>>
>>>>>>          On Sun, Dec 17, 2017 at 10:13 AM, Jean-Baptiste Onofré
>>>>>> <jb@nanthrax.net
>>>>>>          <ma...@nanthrax.net>> wrote:
>>>>>>
>>>>>>              Hi David,
>>>>>>
>>>>>>              Generally speaking, having different fluent DSL on top of the
>>>>>> Beam
>>>>>>              SDK is great.
>>>>>>
>>>>>>              I would like to take a look on your wordcount examples to give
>>>>>> you a
>>>>>>              complete feedback. I like the idea and a fluent Java DSL is
>>>>>> valuable.
>>>>>>
>>>>>>              Let's wait feedback from others. If we have a consensus, then
>>>>>> I
>>>>>>              would be more than happy to help you for the donation (I
>>>>>> worked on
>>>>>>              the Camel Java DSL while ago, so I have some experience here).
>>>>>>
>>>>>>              Thanks !
>>>>>>              Regards
>>>>>>              JB
>>>>>>
>>>>>>              On 12/17/2017 07:00 PM, David Morávek wrote:
>>>>>>
>>>>>>                  Hello,
>>>>>>
>>>>>>
>>>>>>                  First of all, thanks for the amazing work the Apache Beam
>>>>>>                  community is doing!
>>>>>>
>>>>>>
>>>>>>                  In 2014, we've started development of the runtime
>>>>>> independent
>>>>>>                  Java 8 API, that helps us to create unified big-data
>>>>>> processing
>>>>>>                  flows. It has been used as a core building block of
>>>>>> Seznam.cz
>>>>>>                  web crawler data infrastructure every since. Its design
>>>>>>                  principles and execution model are very similar to Apache
>>>>>> Beam.
>>>>>>
>>>>>>
>>>>>>                  This API was open sourced in 2016, under the name Euphoria
>>>>>> API:
>>>>>>
>>>>>>                  https://github.com/seznam/euphoria
>>>>>> <https://github.com/seznam/euphoria>
>>>>>>
>>>>>>
>>>>>>                  As it is very similar to Apache Beam, we feel, that it is
>>>>>> not
>>>>>>                  worth of duplicating effort in terms of development of new
>>>>>>                  runtimes and fine-tuning of current ones.
>>>>>>
>>>>>>
>>>>>>                  The main blocker for us to switch to Apache Beam is lack
>>>>>> of the
>>>>>>                  Java 8 API. *W*e propose the integration of Euphoria API
>>>>>> into
>>>>>>                  Apache Beam as a Java 8 DSL, in order to share our effort
>>>>>> with
>>>>>>                  the community.
>>>>>>
>>>>>>
>>>>>>                  Simple example of the Euphoria API usage, can be found
>>>>>> here:
>>>>>>
>>>>>>
>>>>>> https://github.com/seznam/euphoria/tree/master/euphoria-examples/src/main/java/cz/seznam/euphoria/examples/wordcount 
>>>>>>
>>>>>>
>>>>>> <https://github.com/seznam/euphoria/tree/master/euphoria-examples/src/main/java/cz/seznam/euphoria/examples/wordcount> 
>>>>>>
>>>>>>
>>>>>>
>>>>>>                  If you feel, that Beam community could leverage from our
>>>>>> work,
>>>>>>                  we would love to start working on Euphoria integration
>>>>>> into
>>>>>>                  Apache Beam (we already have a working POC, with few basic
>>>>>>                  operators implemented).
>>>>>>
>>>>>>
>>>>>>                  I look forward to hearing from you,
>>>>>>
>>>>>>                  David
>>>>>>
>>>>>>
>>>>>>              --             Jean-Baptiste Onofré
>>>>>>              jbonofre@apache.org <ma...@apache.org>
>>>>>>              http://blog.nanthrax.net
>>>>>>              Talend - http://www.talend.com
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> -- 
>>>>>> s pozdravem
>>>>>>
>>>>>> David Morávek
>>>>>
>>>>> -- 
>>>>> Jean-Baptiste Onofré
>>>>> jbonofre@apache.org
>>>>> http://blog.nanthrax.net
>>>>> Talend - http://www.talend.com
>>>
>>
> 

-- 
Jean-Baptiste Onofré
jbonofre@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com

Re: Euphoria Java 8 DSL - proposal

Posted by Jan Lukavský <je...@seznam.cz>.
Hi JB,

basically you are not wrong. The project started about three or four 
years ago with a goal to unify batch and streaming processing into 
single portable, executor independent API. Because of that, it is 
currently "close" to Beam in this sense. But we don't see much added 
value keeping this as a separate project, with one of the key 
differences to be the API (not the model itself), so we would like to 
focus on translation from Euphoria API to Beam's SDK. That's why we 
would like to see it as a DSL, so that it would be possible to use 
Euphoria API with Beam's runners as much natively as possible.

I hope I didn't make the subject even more unclear, if so, I'll be happy 
to explain anything in more detail. :-)

   Jan


On 12/18/2017 04:08 PM, Jean-Baptiste Onofré wrote:
> Hi Jan,
>
> Thanks for your answers.
>
> However, they confused me ;)
>
> Regarding what you replied, Euphoria seems like a programming 
> model/SDK "close" to Beam more than a DSL on top of an existing Beam SDK.
>
> Am I wrong ?
>
> Regards
> JB
>
> On 12/18/2017 03:44 PM, Jan Lukavský wrote:
>> Hi Ismael,
>>
>> basically we adopted the Beam's design regarding partitioning 
>> (https://github.com/seznam/euphoria/issues/160) and implemented the 
>> sorting manually (https://github.com/seznam/euphoria/issues/158). I'm 
>> not aware of the time model differences (Euphoria supports ingestion 
>> and event time, we don't support processing time by decision). 
>> Regarding other differences (looking into Beam capability matrix, I'd 
>> say that):
>>
>>   - we don't support stateful FlatMap (i.e. ParDo) for now 
>> (https://github.com/seznam/euphoria/issues/192)
>>
>>   - we don't support side inputs (by decision now, but might be 
>> reconsidered) and outputs 
>> (https://github.com/seznam/euphoria/issues/124)
>>
>>   - we support complete event-time windows (non-merging, merging, 
>> aligned, unaligned) and time control
>>
>>   - we don't support processing time by decision (might be 
>> reconsidered if a valid use-case is found)
>>
>>   - we support window triggering based on both time and data, 
>> including discarding and accumulating (without accumulating & 
>> retracting)
>>
>> All our executors (runners) - Flink, Spark and Local - implement the 
>> complete model, which we enforce using "operator test kit" that all 
>> executors must pass. Spark executor supports bounded sources only 
>> (for now). As David said, we currently don't have serialization 
>> abstraction, so there is some work to be done in that regard.
>>
>> Our intention is to completely supersede Euphoria, we would like to 
>> consider possibility to use executors that would not rely on Beam, 
>> but that is optional now and should be straightforward.
>>
>> We'd be happy to answer any more questions you might have and thanks 
>> a lot!
>>
>> Best,
>>
>>   Jan
>>
>>
>> On 12/18/2017 03:19 PM, Ismaël Mejía wrote:
>>> Hi,
>>>
>>> It is great to see that you guys have achieved a maturity point to
>>> propose this. Congratulations for your work and the idea to contribute
>>> it into Beam.
>>>
>>> I remember from a previous discussion with Jan about the model
>>> mismatch between Euphoria and Beam, because of some design decisions
>>> of both projects. I remember you guys had some issues with the way
>>> Beam's sources do partitioning, as well as Beam's lack of sorted data
>>> (on shuffle a la hadoop). Also if I remember well the 'time' model of
>>> Euphoria was simpler than Beam's. I talk about all of this because I
>>> am curious about what parts of the Euphoria model you guys had to
>>> sacrifice to support Beam, and what parts of Beam's model should still
>>> be integrated into Euphoria (and if there is a straightforward path to
>>> do it).
>>>
>>> If I understand well if this gets merged into Apache this means that
>>> Euphoria's current implementation would be superseded by this DSL? I
>>> am curious because I would like to understand your level of investment
>>> on supporting the future of this DSL.
>>>
>>> Thanks and congrats again !
>>> Ismaël
>>>
>>> On Mon, Dec 18, 2017 at 10:12 AM, Jean-Baptiste Onofré 
>>> <jb...@nanthrax.net> wrote:
>>>> Depending of the donation, you would need ICLA for each 
>>>> contributor, and
>>>> CCLA in addition of SGA.
>>>>
>>>> We can sync with Davor and I for the legal stuff.
>>>> However, I would wait a little bit just to have feedback from the 
>>>> whole team
>>>> and start a formal vote.
>>>>
>>>> I would be happy to start the formal vote.
>>>>
>>>> Regards
>>>> JB
>>>>
>>>> On 12/18/2017 10:03 AM, David Morávek wrote:
>>>>> Hello,
>>>>>
>>>>> Thanks for the awesome feedback!
>>>>>
>>>>> Romain:
>>>>>
>>>>> We already use Java Stream API in all operators where it makes 
>>>>> sense (eg.:
>>>>> ReduceByKey). Still not sure if it was a good choice, but i can be 
>>>>> easily
>>>>> converted to iterator anyway.
>>>>>
>>>>> Side outputs support is coming soon, we already made an initial 
>>>>> work on
>>>>> this.
>>>>>
>>>>> Side inputs are not supported in a way you are used to from beam, 
>>>>> because
>>>>> it can be replaced by Join operator on the same key (if annotated 
>>>>> with
>>>>> broadcastHashJoin, it will be turned into map side join).
>>>>>
>>>>> Only significant difference from Beam is, that we decided not to 
>>>>> abstract
>>>>> serialization, so we need to add support for Type Hints, because 
>>>>> of type
>>>>> erasure.
>>>>>
>>>>> Fluent API:
>>>>>
>>>>> API is fluent within one operator. It is designed to "lead the
>>>>> programmer", which means, that he we'll be only offered methods 
>>>>> that makes
>>>>> sense after the last method he used (eg.: in ReduceByKey, we know 
>>>>> that after
>>>>> keyBy either reduceBy method should come). It is implemented as a 
>>>>> series of
>>>>> builders.
>>>>>
>>>>> Davor:
>>>>>
>>>>> Thanks, I'll contact you, and will start the process of having all 
>>>>> the
>>>>> necessary paperwork signed on our side, so we can get things moving.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Mon, Dec 18, 2017 at 7:46 AM, Romain Manni-Bucau 
>>>>> <rmannibucau@gmail.com
>>>>> <ma...@gmail.com>> wrote:
>>>>>
>>>>>      Hi guys
>>>>>
>>>>>      A DSL would be very welcomed, in particular if fluent.
>>>>>
>>>>>      Open question: did you study to implement Stream API (surely 
>>>>> extending
>>>>> it to
>>>>>      have a BeamStream and a few more features like sides etc)? 
>>>>> Would be
>>>>> very
>>>>>      natural and integrable easily anywhere and avoid a new API 
>>>>> discovery.
>>>>>
>>>>>      Hazelcast jet did it so I dont see why Beam couldnt.
>>>>>
>>>>>      Le 18 déc. 2017 07:26, "Davor Bonaci" <davor@apache.org
>>>>>      <ma...@apache.org>> a écrit :
>>>>>
>>>>>          Hi David,
>>>>>          As JB noted, merging of these two projects is a great 
>>>>> idea. If
>>>>> fact,
>>>>>          some of us have had those discussions in the past.
>>>>>
>>>>>          Legally, nothing particular is strictly necessary as the 
>>>>> code seem
>>>>> to
>>>>>          already be Apache 2.0 licensed. We don't, however, want 
>>>>> to be
>>>>> perceived
>>>>>          as making hostile forks, so it would be great to file a 
>>>>> Software
>>>>> Grant
>>>>>          Agreement with the ASF Secretary. I can help with the 
>>>>> process, as
>>>>> necessary.
>>>>>
>>>>>          Project alignment-wise, there aren't any particular 
>>>>> blockers that
>>>>> I am
>>>>>          aware of. We welcome DSLs.
>>>>>
>>>>>          Technically, the code would start in a feature branch. 
>>>>> During this
>>>>>          stage, we'd need to validate a few things, including 
>>>>> confirmation
>>>>> the
>>>>>          code and dependencies match the ASF policy, automate 
>>>>> testing in
>>>>> Beam's
>>>>>          tooling, etc. At that point, we'd take a community vote 
>>>>> to accept
>>>>> the
>>>>>          component into master, and consider author(s) for 
>>>>> committership in
>>>>> the
>>>>>          overall project.
>>>>>
>>>>>          Welcome to the ASF and Beam -- we are thrilled to have 
>>>>> you! Hope
>>>>> this
>>>>>          helps, and please reach out if anybody on our end can help,
>>>>> including JB
>>>>>          or myself.
>>>>>
>>>>>          Davor
>>>>>
>>>>>
>>>>>          On Sun, Dec 17, 2017 at 10:13 AM, Jean-Baptiste Onofré
>>>>> <jb@nanthrax.net
>>>>>          <ma...@nanthrax.net>> wrote:
>>>>>
>>>>>              Hi David,
>>>>>
>>>>>              Generally speaking, having different fluent DSL on 
>>>>> top of the
>>>>> Beam
>>>>>              SDK is great.
>>>>>
>>>>>              I would like to take a look on your wordcount 
>>>>> examples to give
>>>>> you a
>>>>>              complete feedback. I like the idea and a fluent Java 
>>>>> DSL is
>>>>> valuable.
>>>>>
>>>>>              Let's wait feedback from others. If we have a 
>>>>> consensus, then
>>>>> I
>>>>>              would be more than happy to help you for the donation (I
>>>>> worked on
>>>>>              the Camel Java DSL while ago, so I have some 
>>>>> experience here).
>>>>>
>>>>>              Thanks !
>>>>>              Regards
>>>>>              JB
>>>>>
>>>>>              On 12/17/2017 07:00 PM, David Morávek wrote:
>>>>>
>>>>>                  Hello,
>>>>>
>>>>>
>>>>>                  First of all, thanks for the amazing work the 
>>>>> Apache Beam
>>>>>                  community is doing!
>>>>>
>>>>>
>>>>>                  In 2014, we've started development of the runtime
>>>>> independent
>>>>>                  Java 8 API, that helps us to create unified big-data
>>>>> processing
>>>>>                  flows. It has been used as a core building block of
>>>>> Seznam.cz
>>>>>                  web crawler data infrastructure every since. Its 
>>>>> design
>>>>>                  principles and execution model are very similar 
>>>>> to Apache
>>>>> Beam.
>>>>>
>>>>>
>>>>>                  This API was open sourced in 2016, under the name 
>>>>> Euphoria
>>>>> API:
>>>>>
>>>>>                  https://github.com/seznam/euphoria
>>>>> <https://github.com/seznam/euphoria>
>>>>>
>>>>>
>>>>>                  As it is very similar to Apache Beam, we feel, 
>>>>> that it is
>>>>> not
>>>>>                  worth of duplicating effort in terms of 
>>>>> development of new
>>>>>                  runtimes and fine-tuning of current ones.
>>>>>
>>>>>
>>>>>                  The main blocker for us to switch to Apache Beam 
>>>>> is lack
>>>>> of the
>>>>>                  Java 8 API. *W*e propose the integration of 
>>>>> Euphoria API
>>>>> into
>>>>>                  Apache Beam as a Java 8 DSL, in order to share 
>>>>> our effort
>>>>> with
>>>>>                  the community.
>>>>>
>>>>>
>>>>>                  Simple example of the Euphoria API usage, can be 
>>>>> found
>>>>> here:
>>>>>
>>>>>
>>>>> https://github.com/seznam/euphoria/tree/master/euphoria-examples/src/main/java/cz/seznam/euphoria/examples/wordcount 
>>>>>
>>>>>
>>>>> <https://github.com/seznam/euphoria/tree/master/euphoria-examples/src/main/java/cz/seznam/euphoria/examples/wordcount> 
>>>>>
>>>>>
>>>>>
>>>>>                  If you feel, that Beam community could leverage 
>>>>> from our
>>>>> work,
>>>>>                  we would love to start working on Euphoria 
>>>>> integration
>>>>> into
>>>>>                  Apache Beam (we already have a working POC, with 
>>>>> few basic
>>>>>                  operators implemented).
>>>>>
>>>>>
>>>>>                  I look forward to hearing from you,
>>>>>
>>>>>                  David
>>>>>
>>>>>
>>>>>              --             Jean-Baptiste Onofré
>>>>>              jbonofre@apache.org <ma...@apache.org>
>>>>>              http://blog.nanthrax.net
>>>>>              Talend - http://www.talend.com
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> -- 
>>>>> s pozdravem
>>>>>
>>>>> David Morávek
>>>>
>>>> -- 
>>>> Jean-Baptiste Onofré
>>>> jbonofre@apache.org
>>>> http://blog.nanthrax.net
>>>> Talend - http://www.talend.com
>>
>


Re: Euphoria Java 8 DSL - proposal

Posted by Jean-Baptiste Onofré <jb...@nanthrax.net>.
Hi Jan,

Thanks for your answers.

However, they confused me ;)

Regarding what you replied, Euphoria seems like a programming model/SDK "close" 
to Beam more than a DSL on top of an existing Beam SDK.

Am I wrong ?

Regards
JB

On 12/18/2017 03:44 PM, Jan Lukavský wrote:
> Hi Ismael,
> 
> basically we adopted the Beam's design regarding partitioning 
> (https://github.com/seznam/euphoria/issues/160) and implemented the sorting 
> manually (https://github.com/seznam/euphoria/issues/158). I'm not aware of the 
> time model differences (Euphoria supports ingestion and event time, we don't 
> support processing time by decision). Regarding other differences (looking into 
> Beam capability matrix, I'd say that):
> 
>   - we don't support stateful FlatMap (i.e. ParDo) for now 
> (https://github.com/seznam/euphoria/issues/192)
> 
>   - we don't support side inputs (by decision now, but might be reconsidered) 
> and outputs (https://github.com/seznam/euphoria/issues/124)
> 
>   - we support complete event-time windows (non-merging, merging, aligned, 
> unaligned) and time control
> 
>   - we don't support processing time by decision (might be reconsidered if a 
> valid use-case is found)
> 
>   - we support window triggering based on both time and data, including 
> discarding and accumulating (without accumulating & retracting)
> 
> All our executors (runners) - Flink, Spark and Local - implement the complete 
> model, which we enforce using "operator test kit" that all executors must pass. 
> Spark executor supports bounded sources only (for now). As David said, we 
> currently don't have serialization abstraction, so there is some work to be done 
> in that regard.
> 
> Our intention is to completely supersede Euphoria, we would like to consider 
> possibility to use executors that would not rely on Beam, but that is optional 
> now and should be straightforward.
> 
> We'd be happy to answer any more questions you might have and thanks a lot!
> 
> Best,
> 
>   Jan
> 
> 
> On 12/18/2017 03:19 PM, Ismaël Mejía wrote:
>> Hi,
>>
>> It is great to see that you guys have achieved a maturity point to
>> propose this. Congratulations for your work and the idea to contribute
>> it into Beam.
>>
>> I remember from a previous discussion with Jan about the model
>> mismatch between Euphoria and Beam, because of some design decisions
>> of both projects. I remember you guys had some issues with the way
>> Beam's sources do partitioning, as well as Beam's lack of sorted data
>> (on shuffle a la hadoop). Also if I remember well the 'time' model of
>> Euphoria was simpler than Beam's. I talk about all of this because I
>> am curious about what parts of the Euphoria model you guys had to
>> sacrifice to support Beam, and what parts of Beam's model should still
>> be integrated into Euphoria (and if there is a straightforward path to
>> do it).
>>
>> If I understand well if this gets merged into Apache this means that
>> Euphoria's current implementation would be superseded by this DSL? I
>> am curious because I would like to understand your level of investment
>> on supporting the future of this DSL.
>>
>> Thanks and congrats again !
>> Ismaël
>>
>> On Mon, Dec 18, 2017 at 10:12 AM, Jean-Baptiste Onofré <jb...@nanthrax.net> wrote:
>>> Depending of the donation, you would need ICLA for each contributor, and
>>> CCLA in addition of SGA.
>>>
>>> We can sync with Davor and I for the legal stuff.
>>> However, I would wait a little bit just to have feedback from the whole team
>>> and start a formal vote.
>>>
>>> I would be happy to start the formal vote.
>>>
>>> Regards
>>> JB
>>>
>>> On 12/18/2017 10:03 AM, David Morávek wrote:
>>>> Hello,
>>>>
>>>> Thanks for the awesome feedback!
>>>>
>>>> Romain:
>>>>
>>>> We already use Java Stream API in all operators where it makes sense (eg.:
>>>> ReduceByKey). Still not sure if it was a good choice, but i can be easily
>>>> converted to iterator anyway.
>>>>
>>>> Side outputs support is coming soon, we already made an initial work on
>>>> this.
>>>>
>>>> Side inputs are not supported in a way you are used to from beam, because
>>>> it can be replaced by Join operator on the same key (if annotated with
>>>> broadcastHashJoin, it will be turned into map side join).
>>>>
>>>> Only significant difference from Beam is, that we decided not to abstract
>>>> serialization, so we need to add support for Type Hints, because of type
>>>> erasure.
>>>>
>>>> Fluent API:
>>>>
>>>> API is fluent within one operator. It is designed to "lead the
>>>> programmer", which means, that he we'll be only offered methods that makes
>>>> sense after the last method he used (eg.: in ReduceByKey, we know that after
>>>> keyBy either reduceBy method should come). It is implemented as a series of
>>>> builders.
>>>>
>>>> Davor:
>>>>
>>>> Thanks, I'll contact you, and will start the process of having all the
>>>> necessary paperwork signed on our side, so we can get things moving.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Mon, Dec 18, 2017 at 7:46 AM, Romain Manni-Bucau <rmannibucau@gmail.com
>>>> <ma...@gmail.com>> wrote:
>>>>
>>>>      Hi guys
>>>>
>>>>      A DSL would be very welcomed, in particular if fluent.
>>>>
>>>>      Open question: did you study to implement Stream API (surely extending
>>>> it to
>>>>      have a BeamStream and a few more features like sides etc)? Would be
>>>> very
>>>>      natural and integrable easily anywhere and avoid a new API discovery.
>>>>
>>>>      Hazelcast jet did it so I dont see why Beam couldnt.
>>>>
>>>>      Le 18 déc. 2017 07:26, "Davor Bonaci" <davor@apache.org
>>>>      <ma...@apache.org>> a écrit :
>>>>
>>>>          Hi David,
>>>>          As JB noted, merging of these two projects is a great idea. If
>>>> fact,
>>>>          some of us have had those discussions in the past.
>>>>
>>>>          Legally, nothing particular is strictly necessary as the code seem
>>>> to
>>>>          already be Apache 2.0 licensed. We don't, however, want to be
>>>> perceived
>>>>          as making hostile forks, so it would be great to file a Software
>>>> Grant
>>>>          Agreement with the ASF Secretary. I can help with the process, as
>>>> necessary.
>>>>
>>>>          Project alignment-wise, there aren't any particular blockers that
>>>> I am
>>>>          aware of. We welcome DSLs.
>>>>
>>>>          Technically, the code would start in a feature branch. During this
>>>>          stage, we'd need to validate a few things, including confirmation
>>>> the
>>>>          code and dependencies match the ASF policy, automate testing in
>>>> Beam's
>>>>          tooling, etc. At that point, we'd take a community vote to accept
>>>> the
>>>>          component into master, and consider author(s) for committership in
>>>> the
>>>>          overall project.
>>>>
>>>>          Welcome to the ASF and Beam -- we are thrilled to have you! Hope
>>>> this
>>>>          helps, and please reach out if anybody on our end can help,
>>>> including JB
>>>>          or myself.
>>>>
>>>>          Davor
>>>>
>>>>
>>>>          On Sun, Dec 17, 2017 at 10:13 AM, Jean-Baptiste Onofré
>>>> <jb@nanthrax.net
>>>>          <ma...@nanthrax.net>> wrote:
>>>>
>>>>              Hi David,
>>>>
>>>>              Generally speaking, having different fluent DSL on top of the
>>>> Beam
>>>>              SDK is great.
>>>>
>>>>              I would like to take a look on your wordcount examples to give
>>>> you a
>>>>              complete feedback. I like the idea and a fluent Java DSL is
>>>> valuable.
>>>>
>>>>              Let's wait feedback from others. If we have a consensus, then
>>>> I
>>>>              would be more than happy to help you for the donation (I
>>>> worked on
>>>>              the Camel Java DSL while ago, so I have some experience here).
>>>>
>>>>              Thanks !
>>>>              Regards
>>>>              JB
>>>>
>>>>              On 12/17/2017 07:00 PM, David Morávek wrote:
>>>>
>>>>                  Hello,
>>>>
>>>>
>>>>                  First of all, thanks for the amazing work the Apache Beam
>>>>                  community is doing!
>>>>
>>>>
>>>>                  In 2014, we've started development of the runtime
>>>> independent
>>>>                  Java 8 API, that helps us to create unified big-data
>>>> processing
>>>>                  flows. It has been used as a core building block of
>>>> Seznam.cz
>>>>                  web crawler data infrastructure every since. Its design
>>>>                  principles and execution model are very similar to Apache
>>>> Beam.
>>>>
>>>>
>>>>                  This API was open sourced in 2016, under the name Euphoria
>>>> API:
>>>>
>>>>                  https://github.com/seznam/euphoria
>>>>                  <https://github.com/seznam/euphoria>
>>>>
>>>>
>>>>                  As it is very similar to Apache Beam, we feel, that it is
>>>> not
>>>>                  worth of duplicating effort in terms of development of new
>>>>                  runtimes and fine-tuning of current ones.
>>>>
>>>>
>>>>                  The main blocker for us to switch to Apache Beam is lack
>>>> of the
>>>>                  Java 8 API. *W*e propose the integration of Euphoria API
>>>> into
>>>>                  Apache Beam as a Java 8 DSL, in order to share our effort
>>>> with
>>>>                  the community.
>>>>
>>>>
>>>>                  Simple example of the Euphoria API usage, can be found
>>>> here:
>>>>
>>>>
>>>> https://github.com/seznam/euphoria/tree/master/euphoria-examples/src/main/java/cz/seznam/euphoria/examples/wordcount 
>>>>
>>>>
>>>> <https://github.com/seznam/euphoria/tree/master/euphoria-examples/src/main/java/cz/seznam/euphoria/examples/wordcount> 
>>>>
>>>>
>>>>
>>>>                  If you feel, that Beam community could leverage from our
>>>> work,
>>>>                  we would love to start working on Euphoria integration
>>>> into
>>>>                  Apache Beam (we already have a working POC, with few basic
>>>>                  operators implemented).
>>>>
>>>>
>>>>                  I look forward to hearing from you,
>>>>
>>>>                  David
>>>>
>>>>
>>>>              --             Jean-Baptiste Onofré
>>>>              jbonofre@apache.org <ma...@apache.org>
>>>>              http://blog.nanthrax.net
>>>>              Talend - http://www.talend.com
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> -- 
>>>> s pozdravem
>>>>
>>>> David Morávek
>>>
>>> -- 
>>> Jean-Baptiste Onofré
>>> jbonofre@apache.org
>>> http://blog.nanthrax.net
>>> Talend - http://www.talend.com
> 

-- 
Jean-Baptiste Onofré
jbonofre@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com

Re: Euphoria Java 8 DSL - proposal

Posted by Jan Lukavský <je...@seznam.cz>.
Hi Ismael,

basically we adopted the Beam's design regarding partitioning 
(https://github.com/seznam/euphoria/issues/160) and implemented the 
sorting manually (https://github.com/seznam/euphoria/issues/158). I'm 
not aware of the time model differences (Euphoria supports ingestion and 
event time, we don't support processing time by decision). Regarding 
other differences (looking into Beam capability matrix, I'd say that):

  - we don't support stateful FlatMap (i.e. ParDo) for now 
(https://github.com/seznam/euphoria/issues/192)

  - we don't support side inputs (by decision now, but might be 
reconsidered) and outputs (https://github.com/seznam/euphoria/issues/124)

  - we support complete event-time windows (non-merging, merging, 
aligned, unaligned) and time control

  - we don't support processing time by decision (might be reconsidered 
if a valid use-case is found)

  - we support window triggering based on both time and data, including 
discarding and accumulating (without accumulating & retracting)

All our executors (runners) - Flink, Spark and Local - implement the 
complete model, which we enforce using "operator test kit" that all 
executors must pass. Spark executor supports bounded sources only (for 
now). As David said, we currently don't have serialization abstraction, 
so there is some work to be done in that regard.

Our intention is to completely supersede Euphoria, we would like to 
consider possibility to use executors that would not rely on Beam, but 
that is optional now and should be straightforward.

We'd be happy to answer any more questions you might have and thanks a lot!

Best,

  Jan


On 12/18/2017 03:19 PM, Ismaël Mejía wrote:
> Hi,
>
> It is great to see that you guys have achieved a maturity point to
> propose this. Congratulations for your work and the idea to contribute
> it into Beam.
>
> I remember from a previous discussion with Jan about the model
> mismatch between Euphoria and Beam, because of some design decisions
> of both projects. I remember you guys had some issues with the way
> Beam's sources do partitioning, as well as Beam's lack of sorted data
> (on shuffle a la hadoop). Also if I remember well the 'time' model of
> Euphoria was simpler than Beam's. I talk about all of this because I
> am curious about what parts of the Euphoria model you guys had to
> sacrifice to support Beam, and what parts of Beam's model should still
> be integrated into Euphoria (and if there is a straightforward path to
> do it).
>
> If I understand well if this gets merged into Apache this means that
> Euphoria's current implementation would be superseded by this DSL? I
> am curious because I would like to understand your level of investment
> on supporting the future of this DSL.
>
> Thanks and congrats again !
> Ismaël
>
> On Mon, Dec 18, 2017 at 10:12 AM, Jean-Baptiste Onofré <jb...@nanthrax.net> wrote:
>> Depending of the donation, you would need ICLA for each contributor, and
>> CCLA in addition of SGA.
>>
>> We can sync with Davor and I for the legal stuff.
>> However, I would wait a little bit just to have feedback from the whole team
>> and start a formal vote.
>>
>> I would be happy to start the formal vote.
>>
>> Regards
>> JB
>>
>> On 12/18/2017 10:03 AM, David Morávek wrote:
>>> Hello,
>>>
>>> Thanks for the awesome feedback!
>>>
>>> Romain:
>>>
>>> We already use Java Stream API in all operators where it makes sense (eg.:
>>> ReduceByKey). Still not sure if it was a good choice, but i can be easily
>>> converted to iterator anyway.
>>>
>>> Side outputs support is coming soon, we already made an initial work on
>>> this.
>>>
>>> Side inputs are not supported in a way you are used to from beam, because
>>> it can be replaced by Join operator on the same key (if annotated with
>>> broadcastHashJoin, it will be turned into map side join).
>>>
>>> Only significant difference from Beam is, that we decided not to abstract
>>> serialization, so we need to add support for Type Hints, because of type
>>> erasure.
>>>
>>> Fluent API:
>>>
>>> API is fluent within one operator. It is designed to "lead the
>>> programmer", which means, that he we'll be only offered methods that makes
>>> sense after the last method he used (eg.: in ReduceByKey, we know that after
>>> keyBy either reduceBy method should come). It is implemented as a series of
>>> builders.
>>>
>>> Davor:
>>>
>>> Thanks, I'll contact you, and will start the process of having all the
>>> necessary paperwork signed on our side, so we can get things moving.
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Mon, Dec 18, 2017 at 7:46 AM, Romain Manni-Bucau <rmannibucau@gmail.com
>>> <ma...@gmail.com>> wrote:
>>>
>>>      Hi guys
>>>
>>>      A DSL would be very welcomed, in particular if fluent.
>>>
>>>      Open question: did you study to implement Stream API (surely extending
>>> it to
>>>      have a BeamStream and a few more features like sides etc)? Would be
>>> very
>>>      natural and integrable easily anywhere and avoid a new API discovery.
>>>
>>>      Hazelcast jet did it so I dont see why Beam couldnt.
>>>
>>>      Le 18 déc. 2017 07:26, "Davor Bonaci" <davor@apache.org
>>>      <ma...@apache.org>> a écrit :
>>>
>>>          Hi David,
>>>          As JB noted, merging of these two projects is a great idea. If
>>> fact,
>>>          some of us have had those discussions in the past.
>>>
>>>          Legally, nothing particular is strictly necessary as the code seem
>>> to
>>>          already be Apache 2.0 licensed. We don't, however, want to be
>>> perceived
>>>          as making hostile forks, so it would be great to file a Software
>>> Grant
>>>          Agreement with the ASF Secretary. I can help with the process, as
>>> necessary.
>>>
>>>          Project alignment-wise, there aren't any particular blockers that
>>> I am
>>>          aware of. We welcome DSLs.
>>>
>>>          Technically, the code would start in a feature branch. During this
>>>          stage, we'd need to validate a few things, including confirmation
>>> the
>>>          code and dependencies match the ASF policy, automate testing in
>>> Beam's
>>>          tooling, etc. At that point, we'd take a community vote to accept
>>> the
>>>          component into master, and consider author(s) for committership in
>>> the
>>>          overall project.
>>>
>>>          Welcome to the ASF and Beam -- we are thrilled to have you! Hope
>>> this
>>>          helps, and please reach out if anybody on our end can help,
>>> including JB
>>>          or myself.
>>>
>>>          Davor
>>>
>>>
>>>          On Sun, Dec 17, 2017 at 10:13 AM, Jean-Baptiste Onofré
>>> <jb@nanthrax.net
>>>          <ma...@nanthrax.net>> wrote:
>>>
>>>              Hi David,
>>>
>>>              Generally speaking, having different fluent DSL on top of the
>>> Beam
>>>              SDK is great.
>>>
>>>              I would like to take a look on your wordcount examples to give
>>> you a
>>>              complete feedback. I like the idea and a fluent Java DSL is
>>> valuable.
>>>
>>>              Let's wait feedback from others. If we have a consensus, then
>>> I
>>>              would be more than happy to help you for the donation (I
>>> worked on
>>>              the Camel Java DSL while ago, so I have some experience here).
>>>
>>>              Thanks !
>>>              Regards
>>>              JB
>>>
>>>              On 12/17/2017 07:00 PM, David Morávek wrote:
>>>
>>>                  Hello,
>>>
>>>
>>>                  First of all, thanks for the amazing work the Apache Beam
>>>                  community is doing!
>>>
>>>
>>>                  In 2014, we've started development of the runtime
>>> independent
>>>                  Java 8 API, that helps us to create unified big-data
>>> processing
>>>                  flows. It has been used as a core building block of
>>> Seznam.cz
>>>                  web crawler data infrastructure every since. Its design
>>>                  principles and execution model are very similar to Apache
>>> Beam.
>>>
>>>
>>>                  This API was open sourced in 2016, under the name Euphoria
>>> API:
>>>
>>>                  https://github.com/seznam/euphoria
>>>                  <https://github.com/seznam/euphoria>
>>>
>>>
>>>                  As it is very similar to Apache Beam, we feel, that it is
>>> not
>>>                  worth of duplicating effort in terms of development of new
>>>                  runtimes and fine-tuning of current ones.
>>>
>>>
>>>                  The main blocker for us to switch to Apache Beam is lack
>>> of the
>>>                  Java 8 API. *W*e propose the integration of Euphoria API
>>> into
>>>                  Apache Beam as a Java 8 DSL, in order to share our effort
>>> with
>>>                  the community.
>>>
>>>
>>>                  Simple example of the Euphoria API usage, can be found
>>> here:
>>>
>>>
>>> https://github.com/seznam/euphoria/tree/master/euphoria-examples/src/main/java/cz/seznam/euphoria/examples/wordcount
>>>
>>> <https://github.com/seznam/euphoria/tree/master/euphoria-examples/src/main/java/cz/seznam/euphoria/examples/wordcount>
>>>
>>>
>>>                  If you feel, that Beam community could leverage from our
>>> work,
>>>                  we would love to start working on Euphoria integration
>>> into
>>>                  Apache Beam (we already have a working POC, with few basic
>>>                  operators implemented).
>>>
>>>
>>>                  I look forward to hearing from you,
>>>
>>>                  David
>>>
>>>
>>>              --             Jean-Baptiste Onofré
>>>              jbonofre@apache.org <ma...@apache.org>
>>>              http://blog.nanthrax.net
>>>              Talend - http://www.talend.com
>>>
>>>
>>>
>>>
>>>
>>> --
>>> s pozdravem
>>>
>>> David Morávek
>>
>> --
>> Jean-Baptiste Onofré
>> jbonofre@apache.org
>> http://blog.nanthrax.net
>> Talend - http://www.talend.com


Re: Euphoria Java 8 DSL - proposal

Posted by Ismaël Mejía <ie...@gmail.com>.
Hi,

It is great to see that you guys have achieved a maturity point to
propose this. Congratulations for your work and the idea to contribute
it into Beam.

I remember from a previous discussion with Jan about the model
mismatch between Euphoria and Beam, because of some design decisions
of both projects. I remember you guys had some issues with the way
Beam's sources do partitioning, as well as Beam's lack of sorted data
(on shuffle a la hadoop). Also if I remember well the 'time' model of
Euphoria was simpler than Beam's. I talk about all of this because I
am curious about what parts of the Euphoria model you guys had to
sacrifice to support Beam, and what parts of Beam's model should still
be integrated into Euphoria (and if there is a straightforward path to
do it).

If I understand well if this gets merged into Apache this means that
Euphoria's current implementation would be superseded by this DSL? I
am curious because I would like to understand your level of investment
on supporting the future of this DSL.

Thanks and congrats again !
Ismaël

On Mon, Dec 18, 2017 at 10:12 AM, Jean-Baptiste Onofré <jb...@nanthrax.net> wrote:
> Depending of the donation, you would need ICLA for each contributor, and
> CCLA in addition of SGA.
>
> We can sync with Davor and I for the legal stuff.
> However, I would wait a little bit just to have feedback from the whole team
> and start a formal vote.
>
> I would be happy to start the formal vote.
>
> Regards
> JB
>
> On 12/18/2017 10:03 AM, David Morávek wrote:
>>
>> Hello,
>>
>> Thanks for the awesome feedback!
>>
>> Romain:
>>
>> We already use Java Stream API in all operators where it makes sense (eg.:
>> ReduceByKey). Still not sure if it was a good choice, but i can be easily
>> converted to iterator anyway.
>>
>> Side outputs support is coming soon, we already made an initial work on
>> this.
>>
>> Side inputs are not supported in a way you are used to from beam, because
>> it can be replaced by Join operator on the same key (if annotated with
>> broadcastHashJoin, it will be turned into map side join).
>>
>> Only significant difference from Beam is, that we decided not to abstract
>> serialization, so we need to add support for Type Hints, because of type
>> erasure.
>>
>> Fluent API:
>>
>> API is fluent within one operator. It is designed to "lead the
>> programmer", which means, that he we'll be only offered methods that makes
>> sense after the last method he used (eg.: in ReduceByKey, we know that after
>> keyBy either reduceBy method should come). It is implemented as a series of
>> builders.
>>
>> Davor:
>>
>> Thanks, I'll contact you, and will start the process of having all the
>> necessary paperwork signed on our side, so we can get things moving.
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> On Mon, Dec 18, 2017 at 7:46 AM, Romain Manni-Bucau <rmannibucau@gmail.com
>> <ma...@gmail.com>> wrote:
>>
>>     Hi guys
>>
>>     A DSL would be very welcomed, in particular if fluent.
>>
>>     Open question: did you study to implement Stream API (surely extending
>> it to
>>     have a BeamStream and a few more features like sides etc)? Would be
>> very
>>     natural and integrable easily anywhere and avoid a new API discovery.
>>
>>     Hazelcast jet did it so I dont see why Beam couldnt.
>>
>>     Le 18 déc. 2017 07:26, "Davor Bonaci" <davor@apache.org
>>     <ma...@apache.org>> a écrit :
>>
>>         Hi David,
>>         As JB noted, merging of these two projects is a great idea. If
>> fact,
>>         some of us have had those discussions in the past.
>>
>>         Legally, nothing particular is strictly necessary as the code seem
>> to
>>         already be Apache 2.0 licensed. We don't, however, want to be
>> perceived
>>         as making hostile forks, so it would be great to file a Software
>> Grant
>>         Agreement with the ASF Secretary. I can help with the process, as
>> necessary.
>>
>>         Project alignment-wise, there aren't any particular blockers that
>> I am
>>         aware of. We welcome DSLs.
>>
>>         Technically, the code would start in a feature branch. During this
>>         stage, we'd need to validate a few things, including confirmation
>> the
>>         code and dependencies match the ASF policy, automate testing in
>> Beam's
>>         tooling, etc. At that point, we'd take a community vote to accept
>> the
>>         component into master, and consider author(s) for committership in
>> the
>>         overall project.
>>
>>         Welcome to the ASF and Beam -- we are thrilled to have you! Hope
>> this
>>         helps, and please reach out if anybody on our end can help,
>> including JB
>>         or myself.
>>
>>         Davor
>>
>>
>>         On Sun, Dec 17, 2017 at 10:13 AM, Jean-Baptiste Onofré
>> <jb@nanthrax.net
>>         <ma...@nanthrax.net>> wrote:
>>
>>             Hi David,
>>
>>             Generally speaking, having different fluent DSL on top of the
>> Beam
>>             SDK is great.
>>
>>             I would like to take a look on your wordcount examples to give
>> you a
>>             complete feedback. I like the idea and a fluent Java DSL is
>> valuable.
>>
>>             Let's wait feedback from others. If we have a consensus, then
>> I
>>             would be more than happy to help you for the donation (I
>> worked on
>>             the Camel Java DSL while ago, so I have some experience here).
>>
>>             Thanks !
>>             Regards
>>             JB
>>
>>             On 12/17/2017 07:00 PM, David Morávek wrote:
>>
>>                 Hello,
>>
>>
>>                 First of all, thanks for the amazing work the Apache Beam
>>                 community is doing!
>>
>>
>>                 In 2014, we've started development of the runtime
>> independent
>>                 Java 8 API, that helps us to create unified big-data
>> processing
>>                 flows. It has been used as a core building block of
>> Seznam.cz
>>                 web crawler data infrastructure every since. Its design
>>                 principles and execution model are very similar to Apache
>> Beam.
>>
>>
>>                 This API was open sourced in 2016, under the name Euphoria
>> API:
>>
>>                 https://github.com/seznam/euphoria
>>                 <https://github.com/seznam/euphoria>
>>
>>
>>                 As it is very similar to Apache Beam, we feel, that it is
>> not
>>                 worth of duplicating effort in terms of development of new
>>                 runtimes and fine-tuning of current ones.
>>
>>
>>                 The main blocker for us to switch to Apache Beam is lack
>> of the
>>                 Java 8 API. *W*e propose the integration of Euphoria API
>> into
>>                 Apache Beam as a Java 8 DSL, in order to share our effort
>> with
>>                 the community.
>>
>>
>>                 Simple example of the Euphoria API usage, can be found
>> here:
>>
>>
>> https://github.com/seznam/euphoria/tree/master/euphoria-examples/src/main/java/cz/seznam/euphoria/examples/wordcount
>>
>> <https://github.com/seznam/euphoria/tree/master/euphoria-examples/src/main/java/cz/seznam/euphoria/examples/wordcount>
>>
>>
>>                 If you feel, that Beam community could leverage from our
>> work,
>>                 we would love to start working on Euphoria integration
>> into
>>                 Apache Beam (we already have a working POC, with few basic
>>                 operators implemented).
>>
>>
>>                 I look forward to hearing from you,
>>
>>                 David
>>
>>
>>             --             Jean-Baptiste Onofré
>>             jbonofre@apache.org <ma...@apache.org>
>>             http://blog.nanthrax.net
>>             Talend - http://www.talend.com
>>
>>
>>
>>
>>
>> --
>> s pozdravem
>>
>> David Morávek
>
>
> --
> Jean-Baptiste Onofré
> jbonofre@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com

Re: Euphoria Java 8 DSL - proposal

Posted by Jean-Baptiste Onofré <jb...@nanthrax.net>.
Depending of the donation, you would need ICLA for each contributor, and CCLA in 
addition of SGA.

We can sync with Davor and I for the legal stuff.
However, I would wait a little bit just to have feedback from the whole team and 
start a formal vote.

I would be happy to start the formal vote.

Regards
JB

On 12/18/2017 10:03 AM, David Morávek wrote:
> Hello,
> 
> Thanks for the awesome feedback!
> 
> Romain:
> 
> We already use Java Stream API in all operators where it makes sense (eg.: 
> ReduceByKey). Still not sure if it was a good choice, but i can be easily 
> converted to iterator anyway.
> 
> Side outputs support is coming soon, we already made an initial work on this.
> 
> Side inputs are not supported in a way you are used to from beam, because it can 
> be replaced by Join operator on the same key (if annotated with 
> broadcastHashJoin, it will be turned into map side join).
> 
> Only significant difference from Beam is, that we decided not to abstract 
> serialization, so we need to add support for Type Hints, because of type erasure.
> 
> Fluent API:
> 
> API is fluent within one operator. It is designed to "lead the programmer", 
> which means, that he we'll be only offered methods that makes sense after the 
> last method he used (eg.: in ReduceByKey, we know that after keyBy either 
> reduceBy method should come). It is implemented as a series of builders.
> 
> Davor:
> 
> Thanks, I'll contact you, and will start the process of having all the necessary 
> paperwork signed on our side, so we can get things moving.
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> On Mon, Dec 18, 2017 at 7:46 AM, Romain Manni-Bucau <rmannibucau@gmail.com 
> <ma...@gmail.com>> wrote:
> 
>     Hi guys
> 
>     A DSL would be very welcomed, in particular if fluent.
> 
>     Open question: did you study to implement Stream API (surely extending it to
>     have a BeamStream and a few more features like sides etc)? Would be very
>     natural and integrable easily anywhere and avoid a new API discovery.
> 
>     Hazelcast jet did it so I dont see why Beam couldnt.
> 
>     Le 18 déc. 2017 07:26, "Davor Bonaci" <davor@apache.org
>     <ma...@apache.org>> a écrit :
> 
>         Hi David,
>         As JB noted, merging of these two projects is a great idea. If fact,
>         some of us have had those discussions in the past.
> 
>         Legally, nothing particular is strictly necessary as the code seem to
>         already be Apache 2.0 licensed. We don't, however, want to be perceived
>         as making hostile forks, so it would be great to file a Software Grant
>         Agreement with the ASF Secretary. I can help with the process, as necessary.
> 
>         Project alignment-wise, there aren't any particular blockers that I am
>         aware of. We welcome DSLs.
> 
>         Technically, the code would start in a feature branch. During this
>         stage, we'd need to validate a few things, including confirmation the
>         code and dependencies match the ASF policy, automate testing in Beam's
>         tooling, etc. At that point, we'd take a community vote to accept the
>         component into master, and consider author(s) for committership in the
>         overall project.
> 
>         Welcome to the ASF and Beam -- we are thrilled to have you! Hope this
>         helps, and please reach out if anybody on our end can help, including JB
>         or myself.
> 
>         Davor
> 
> 
>         On Sun, Dec 17, 2017 at 10:13 AM, Jean-Baptiste Onofré <jb@nanthrax.net
>         <ma...@nanthrax.net>> wrote:
> 
>             Hi David,
> 
>             Generally speaking, having different fluent DSL on top of the Beam
>             SDK is great.
> 
>             I would like to take a look on your wordcount examples to give you a
>             complete feedback. I like the idea and a fluent Java DSL is valuable.
> 
>             Let's wait feedback from others. If we have a consensus, then I
>             would be more than happy to help you for the donation (I worked on
>             the Camel Java DSL while ago, so I have some experience here).
> 
>             Thanks !
>             Regards
>             JB
> 
>             On 12/17/2017 07:00 PM, David Morávek wrote:
> 
>                 Hello,
> 
> 
>                 First of all, thanks for the amazing work the Apache Beam
>                 community is doing!
> 
> 
>                 In 2014, we've started development of the runtime independent
>                 Java 8 API, that helps us to create unified big-data processing
>                 flows. It has been used as a core building block of Seznam.cz
>                 web crawler data infrastructure every since. Its design
>                 principles and execution model are very similar to Apache Beam.
> 
> 
>                 This API was open sourced in 2016, under the name Euphoria API:
> 
>                 https://github.com/seznam/euphoria
>                 <https://github.com/seznam/euphoria>
> 
> 
>                 As it is very similar to Apache Beam, we feel, that it is not
>                 worth of duplicating effort in terms of development of new
>                 runtimes and fine-tuning of current ones.
> 
> 
>                 The main blocker for us to switch to Apache Beam is lack of the
>                 Java 8 API. *W*e propose the integration of Euphoria API into
>                 Apache Beam as a Java 8 DSL, in order to share our effort with
>                 the community.
> 
> 
>                 Simple example of the Euphoria API usage, can be found here:
> 
>                 https://github.com/seznam/euphoria/tree/master/euphoria-examples/src/main/java/cz/seznam/euphoria/examples/wordcount
>                 <https://github.com/seznam/euphoria/tree/master/euphoria-examples/src/main/java/cz/seznam/euphoria/examples/wordcount>
> 
> 
>                 If you feel, that Beam community could leverage from our work,
>                 we would love to start working on Euphoria integration into
>                 Apache Beam (we already have a working POC, with few basic
>                 operators implemented).
> 
> 
>                 I look forward to hearing from you,
> 
>                 David
> 
> 
>             -- 
>             Jean-Baptiste Onofré
>             jbonofre@apache.org <ma...@apache.org>
>             http://blog.nanthrax.net
>             Talend - http://www.talend.com
> 
> 
> 
> 
> 
> -- 
> s pozdravem
> 
> David Morávek

-- 
Jean-Baptiste Onofré
jbonofre@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com

Re: Euphoria Java 8 DSL - proposal

Posted by Romain Manni-Bucau <rm...@gmail.com>.
If it helps: if you type your lambda parameters you can get their types
this way:
https://github.com/rmannibucau/cookit-project/blob/master/cookit-core/src/main/java/com/github/rmannibucau/cookit/impl/container/OWBContainer.java#L117
which allows this kind of usage
https://github.com/rmannibucau/cookit-project/blob/master/cookit-core/src/test/java/com/github/rmannibucau/cookit/api/RecipesTest.java#L177


Romain Manni-Bucau
@rmannibucau <https://twitter.com/rmannibucau> |  Blog
<https://rmannibucau.metawerx.net/> | Old Blog
<http://rmannibucau.wordpress.com> | Github <https://github.com/rmannibucau> |
LinkedIn <https://www.linkedin.com/in/rmannibucau>

2017-12-18 10:03 GMT+01:00 David Morávek <da...@gmail.com>:

> Hello,
>
> Thanks for the awesome feedback!
>
> Romain:
>
> We already use Java Stream API in all operators where it makes sense (eg.:
> ReduceByKey). Still not sure if it was a good choice, but i can be easily
> converted to iterator anyway.
>
> Side outputs support is coming soon, we already made an initial work on
> this.
>
> Side inputs are not supported in a way you are used to from beam, because
> it can be replaced by Join operator on the same key (if annotated with
> broadcastHashJoin, it will be turned into map side join).
>
> Only significant difference from Beam is, that we decided not to abstract
> serialization, so we need to add support for Type Hints, because of type
> erasure.
>
> Fluent API:
>
> API is fluent within one operator. It is designed to "lead the
> programmer", which means, that he we'll be only offered methods that makes
> sense after the last method he used (eg.: in ReduceByKey, we know that
> after keyBy either reduceBy method should come). It is implemented as a
> series of builders.
>
> Davor:
>
> Thanks, I'll contact you, and will start the process of having all the
> necessary paperwork signed on our side, so we can get things moving.
>
>
>
>
>
>
>
>
>
>
>
>
> On Mon, Dec 18, 2017 at 7:46 AM, Romain Manni-Bucau <rmannibucau@gmail.com
> > wrote:
>
>> Hi guys
>>
>> A DSL would be very welcomed, in particular if fluent.
>>
>> Open question: did you study to implement Stream API (surely extending it
>> to have a BeamStream and a few more features like sides etc)? Would be very
>> natural and integrable easily anywhere and avoid a new API discovery.
>>
>> Hazelcast jet did it so I dont see why Beam couldnt.
>>
>> Le 18 déc. 2017 07:26, "Davor Bonaci" <da...@apache.org> a écrit :
>>
>>> Hi David,
>>> As JB noted, merging of these two projects is a great idea. If fact,
>>> some of us have had those discussions in the past.
>>>
>>> Legally, nothing particular is strictly necessary as the code seem to
>>> already be Apache 2.0 licensed. We don't, however, want to be perceived as
>>> making hostile forks, so it would be great to file a Software Grant
>>> Agreement with the ASF Secretary. I can help with the process, as necessary.
>>>
>>> Project alignment-wise, there aren't any particular blockers that I am
>>> aware of. We welcome DSLs.
>>>
>>> Technically, the code would start in a feature branch. During this
>>> stage, we'd need to validate a few things, including confirmation the code
>>> and dependencies match the ASF policy, automate testing in Beam's tooling,
>>> etc. At that point, we'd take a community vote to accept the component into
>>> master, and consider author(s) for committership in the overall project.
>>>
>>> Welcome to the ASF and Beam -- we are thrilled to have you! Hope this
>>> helps, and please reach out if anybody on our end can help, including JB or
>>> myself.
>>>
>>> Davor
>>>
>>>
>>> On Sun, Dec 17, 2017 at 10:13 AM, Jean-Baptiste Onofré <jb...@nanthrax.net>
>>> wrote:
>>>
>>>> Hi David,
>>>>
>>>> Generally speaking, having different fluent DSL on top of the Beam SDK
>>>> is great.
>>>>
>>>> I would like to take a look on your wordcount examples to give you a
>>>> complete feedback. I like the idea and a fluent Java DSL is valuable.
>>>>
>>>> Let's wait feedback from others. If we have a consensus, then I would
>>>> be more than happy to help you for the donation (I worked on the Camel Java
>>>> DSL while ago, so I have some experience here).
>>>>
>>>> Thanks !
>>>> Regards
>>>> JB
>>>>
>>>> On 12/17/2017 07:00 PM, David Morávek wrote:
>>>>
>>>>> Hello,
>>>>>
>>>>>
>>>>> First of all, thanks for the amazing work the Apache Beam community is
>>>>> doing!
>>>>>
>>>>>
>>>>> In 2014, we've started development of the runtime independent Java 8
>>>>> API, that helps us to create unified big-data processing flows. It has been
>>>>> used as a core building block of Seznam.cz web crawler data infrastructure
>>>>> every since. Its design principles and execution model are very similar to
>>>>> Apache Beam.
>>>>>
>>>>>
>>>>> This API was open sourced in 2016, under the name Euphoria API:
>>>>>
>>>>> https://github.com/seznam/euphoria
>>>>>
>>>>>
>>>>> As it is very similar to Apache Beam, we feel, that it is not worth of
>>>>> duplicating effort in terms of development of new runtimes and fine-tuning
>>>>> of current ones.
>>>>>
>>>>>
>>>>> The main blocker for us to switch to Apache Beam is lack of the Java 8
>>>>> API. *W*e propose the integration of Euphoria API into Apache Beam as a
>>>>> Java 8 DSL, in order to share our effort with the community.
>>>>>
>>>>>
>>>>> Simple example of the Euphoria API usage, can be found here:
>>>>>
>>>>> https://github.com/seznam/euphoria/tree/master/euphoria-exam
>>>>> ples/src/main/java/cz/seznam/euphoria/examples/wordcount
>>>>>
>>>>>
>>>>> If you feel, that Beam community could leverage from our work, we
>>>>> would love to start working on Euphoria integration into Apache Beam (we
>>>>> already have a working POC, with few basic operators implemented).
>>>>>
>>>>>
>>>>> I look forward to hearing from you,
>>>>>
>>>>> David
>>>>>
>>>>>
>>>> --
>>>> Jean-Baptiste Onofré
>>>> jbonofre@apache.org
>>>> http://blog.nanthrax.net
>>>> Talend - http://www.talend.com
>>>>
>>>
>>>
>
>
> --
> s pozdravem
>
> David Morávek
>

Re: Euphoria Java 8 DSL - proposal

Posted by David Morávek <da...@gmail.com>.
Hello,

Thanks for the awesome feedback!

Romain:

We already use Java Stream API in all operators where it makes sense (eg.:
ReduceByKey). Still not sure if it was a good choice, but i can be easily
converted to iterator anyway.

Side outputs support is coming soon, we already made an initial work on
this.

Side inputs are not supported in a way you are used to from beam, because
it can be replaced by Join operator on the same key (if annotated with
broadcastHashJoin, it will be turned into map side join).

Only significant difference from Beam is, that we decided not to abstract
serialization, so we need to add support for Type Hints, because of type
erasure.

Fluent API:

API is fluent within one operator. It is designed to "lead the programmer",
which means, that he we'll be only offered methods that makes sense after
the last method he used (eg.: in ReduceByKey, we know that after keyBy
either reduceBy method should come). It is implemented as a series of
builders.

Davor:

Thanks, I'll contact you, and will start the process of having all the
necessary paperwork signed on our side, so we can get things moving.












On Mon, Dec 18, 2017 at 7:46 AM, Romain Manni-Bucau <rm...@gmail.com>
wrote:

> Hi guys
>
> A DSL would be very welcomed, in particular if fluent.
>
> Open question: did you study to implement Stream API (surely extending it
> to have a BeamStream and a few more features like sides etc)? Would be very
> natural and integrable easily anywhere and avoid a new API discovery.
>
> Hazelcast jet did it so I dont see why Beam couldnt.
>
> Le 18 déc. 2017 07:26, "Davor Bonaci" <da...@apache.org> a écrit :
>
>> Hi David,
>> As JB noted, merging of these two projects is a great idea. If fact, some
>> of us have had those discussions in the past.
>>
>> Legally, nothing particular is strictly necessary as the code seem to
>> already be Apache 2.0 licensed. We don't, however, want to be perceived as
>> making hostile forks, so it would be great to file a Software Grant
>> Agreement with the ASF Secretary. I can help with the process, as necessary.
>>
>> Project alignment-wise, there aren't any particular blockers that I am
>> aware of. We welcome DSLs.
>>
>> Technically, the code would start in a feature branch. During this stage,
>> we'd need to validate a few things, including confirmation the code and
>> dependencies match the ASF policy, automate testing in Beam's tooling, etc.
>> At that point, we'd take a community vote to accept the component into
>> master, and consider author(s) for committership in the overall project.
>>
>> Welcome to the ASF and Beam -- we are thrilled to have you! Hope this
>> helps, and please reach out if anybody on our end can help, including JB or
>> myself.
>>
>> Davor
>>
>>
>> On Sun, Dec 17, 2017 at 10:13 AM, Jean-Baptiste Onofré <jb...@nanthrax.net>
>> wrote:
>>
>>> Hi David,
>>>
>>> Generally speaking, having different fluent DSL on top of the Beam SDK
>>> is great.
>>>
>>> I would like to take a look on your wordcount examples to give you a
>>> complete feedback. I like the idea and a fluent Java DSL is valuable.
>>>
>>> Let's wait feedback from others. If we have a consensus, then I would be
>>> more than happy to help you for the donation (I worked on the Camel Java
>>> DSL while ago, so I have some experience here).
>>>
>>> Thanks !
>>> Regards
>>> JB
>>>
>>> On 12/17/2017 07:00 PM, David Morávek wrote:
>>>
>>>> Hello,
>>>>
>>>>
>>>> First of all, thanks for the amazing work the Apache Beam community is
>>>> doing!
>>>>
>>>>
>>>> In 2014, we've started development of the runtime independent Java 8
>>>> API, that helps us to create unified big-data processing flows. It has been
>>>> used as a core building block of Seznam.cz web crawler data infrastructure
>>>> every since. Its design principles and execution model are very similar to
>>>> Apache Beam.
>>>>
>>>>
>>>> This API was open sourced in 2016, under the name Euphoria API:
>>>>
>>>> https://github.com/seznam/euphoria
>>>>
>>>>
>>>> As it is very similar to Apache Beam, we feel, that it is not worth of
>>>> duplicating effort in terms of development of new runtimes and fine-tuning
>>>> of current ones.
>>>>
>>>>
>>>> The main blocker for us to switch to Apache Beam is lack of the Java 8
>>>> API. *W*e propose the integration of Euphoria API into Apache Beam as a
>>>> Java 8 DSL, in order to share our effort with the community.
>>>>
>>>>
>>>> Simple example of the Euphoria API usage, can be found here:
>>>>
>>>> https://github.com/seznam/euphoria/tree/master/euphoria-exam
>>>> ples/src/main/java/cz/seznam/euphoria/examples/wordcount
>>>>
>>>>
>>>> If you feel, that Beam community could leverage from our work, we would
>>>> love to start working on Euphoria integration into Apache Beam (we already
>>>> have a working POC, with few basic operators implemented).
>>>>
>>>>
>>>> I look forward to hearing from you,
>>>>
>>>> David
>>>>
>>>>
>>> --
>>> Jean-Baptiste Onofré
>>> jbonofre@apache.org
>>> http://blog.nanthrax.net
>>> Talend - http://www.talend.com
>>>
>>
>>


-- 
s pozdravem

David Morávek

Re: Euphoria Java 8 DSL - proposal

Posted by Romain Manni-Bucau <rm...@gmail.com>.
Hi guys

A DSL would be very welcomed, in particular if fluent.

Open question: did you study to implement Stream API (surely extending it
to have a BeamStream and a few more features like sides etc)? Would be very
natural and integrable easily anywhere and avoid a new API discovery.

Hazelcast jet did it so I dont see why Beam couldnt.

Le 18 déc. 2017 07:26, "Davor Bonaci" <da...@apache.org> a écrit :

> Hi David,
> As JB noted, merging of these two projects is a great idea. If fact, some
> of us have had those discussions in the past.
>
> Legally, nothing particular is strictly necessary as the code seem to
> already be Apache 2.0 licensed. We don't, however, want to be perceived as
> making hostile forks, so it would be great to file a Software Grant
> Agreement with the ASF Secretary. I can help with the process, as necessary.
>
> Project alignment-wise, there aren't any particular blockers that I am
> aware of. We welcome DSLs.
>
> Technically, the code would start in a feature branch. During this stage,
> we'd need to validate a few things, including confirmation the code and
> dependencies match the ASF policy, automate testing in Beam's tooling, etc.
> At that point, we'd take a community vote to accept the component into
> master, and consider author(s) for committership in the overall project.
>
> Welcome to the ASF and Beam -- we are thrilled to have you! Hope this
> helps, and please reach out if anybody on our end can help, including JB or
> myself.
>
> Davor
>
>
> On Sun, Dec 17, 2017 at 10:13 AM, Jean-Baptiste Onofré <jb...@nanthrax.net>
> wrote:
>
>> Hi David,
>>
>> Generally speaking, having different fluent DSL on top of the Beam SDK is
>> great.
>>
>> I would like to take a look on your wordcount examples to give you a
>> complete feedback. I like the idea and a fluent Java DSL is valuable.
>>
>> Let's wait feedback from others. If we have a consensus, then I would be
>> more than happy to help you for the donation (I worked on the Camel Java
>> DSL while ago, so I have some experience here).
>>
>> Thanks !
>> Regards
>> JB
>>
>> On 12/17/2017 07:00 PM, David Morávek wrote:
>>
>>> Hello,
>>>
>>>
>>> First of all, thanks for the amazing work the Apache Beam community is
>>> doing!
>>>
>>>
>>> In 2014, we've started development of the runtime independent Java 8
>>> API, that helps us to create unified big-data processing flows. It has been
>>> used as a core building block of Seznam.cz web crawler data infrastructure
>>> every since. Its design principles and execution model are very similar to
>>> Apache Beam.
>>>
>>>
>>> This API was open sourced in 2016, under the name Euphoria API:
>>>
>>> https://github.com/seznam/euphoria
>>>
>>>
>>> As it is very similar to Apache Beam, we feel, that it is not worth of
>>> duplicating effort in terms of development of new runtimes and fine-tuning
>>> of current ones.
>>>
>>>
>>> The main blocker for us to switch to Apache Beam is lack of the Java 8
>>> API. *W*e propose the integration of Euphoria API into Apache Beam as a
>>> Java 8 DSL, in order to share our effort with the community.
>>>
>>>
>>> Simple example of the Euphoria API usage, can be found here:
>>>
>>> https://github.com/seznam/euphoria/tree/master/euphoria-exam
>>> ples/src/main/java/cz/seznam/euphoria/examples/wordcount
>>>
>>>
>>> If you feel, that Beam community could leverage from our work, we would
>>> love to start working on Euphoria integration into Apache Beam (we already
>>> have a working POC, with few basic operators implemented).
>>>
>>>
>>> I look forward to hearing from you,
>>>
>>> David
>>>
>>>
>> --
>> Jean-Baptiste Onofré
>> jbonofre@apache.org
>> http://blog.nanthrax.net
>> Talend - http://www.talend.com
>>
>
>

Re: Euphoria Java 8 DSL - proposal

Posted by Davor Bonaci <da...@apache.org>.
Hi David,
As JB noted, merging of these two projects is a great idea. If fact, some
of us have had those discussions in the past.

Legally, nothing particular is strictly necessary as the code seem to
already be Apache 2.0 licensed. We don't, however, want to be perceived as
making hostile forks, so it would be great to file a Software Grant
Agreement with the ASF Secretary. I can help with the process, as necessary.

Project alignment-wise, there aren't any particular blockers that I am
aware of. We welcome DSLs.

Technically, the code would start in a feature branch. During this stage,
we'd need to validate a few things, including confirmation the code and
dependencies match the ASF policy, automate testing in Beam's tooling, etc.
At that point, we'd take a community vote to accept the component into
master, and consider author(s) for committership in the overall project.

Welcome to the ASF and Beam -- we are thrilled to have you! Hope this
helps, and please reach out if anybody on our end can help, including JB or
myself.

Davor


On Sun, Dec 17, 2017 at 10:13 AM, Jean-Baptiste Onofré <jb...@nanthrax.net>
wrote:

> Hi David,
>
> Generally speaking, having different fluent DSL on top of the Beam SDK is
> great.
>
> I would like to take a look on your wordcount examples to give you a
> complete feedback. I like the idea and a fluent Java DSL is valuable.
>
> Let's wait feedback from others. If we have a consensus, then I would be
> more than happy to help you for the donation (I worked on the Camel Java
> DSL while ago, so I have some experience here).
>
> Thanks !
> Regards
> JB
>
> On 12/17/2017 07:00 PM, David Morávek wrote:
>
>> Hello,
>>
>>
>> First of all, thanks for the amazing work the Apache Beam community is
>> doing!
>>
>>
>> In 2014, we've started development of the runtime independent Java 8 API,
>> that helps us to create unified big-data processing flows. It has been used
>> as a core building block of Seznam.cz web crawler data infrastructure every
>> since. Its design principles and execution model are very similar to Apache
>> Beam.
>>
>>
>> This API was open sourced in 2016, under the name Euphoria API:
>>
>> https://github.com/seznam/euphoria
>>
>>
>> As it is very similar to Apache Beam, we feel, that it is not worth of
>> duplicating effort in terms of development of new runtimes and fine-tuning
>> of current ones.
>>
>>
>> The main blocker for us to switch to Apache Beam is lack of the Java 8
>> API. *W*e propose the integration of Euphoria API into Apache Beam as a
>> Java 8 DSL, in order to share our effort with the community.
>>
>>
>> Simple example of the Euphoria API usage, can be found here:
>>
>> https://github.com/seznam/euphoria/tree/master/euphoria-exam
>> ples/src/main/java/cz/seznam/euphoria/examples/wordcount
>>
>>
>> If you feel, that Beam community could leverage from our work, we would
>> love to start working on Euphoria integration into Apache Beam (we already
>> have a working POC, with few basic operators implemented).
>>
>>
>> I look forward to hearing from you,
>>
>> David
>>
>>
> --
> Jean-Baptiste Onofré
> jbonofre@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>

Re: Euphoria Java 8 DSL - proposal

Posted by Jean-Baptiste Onofré <jb...@nanthrax.net>.
Hi David,

Generally speaking, having different fluent DSL on top of the Beam SDK is great.

I would like to take a look on your wordcount examples to give you a complete 
feedback. I like the idea and a fluent Java DSL is valuable.

Let's wait feedback from others. If we have a consensus, then I would be more 
than happy to help you for the donation (I worked on the Camel Java DSL while 
ago, so I have some experience here).

Thanks !
Regards
JB

On 12/17/2017 07:00 PM, David Morávek wrote:
> Hello,
> 
> 
> First of all, thanks for the amazing work the Apache Beam community is doing!
> 
> 
> In 2014, we've started development of the runtime independent Java 8 API, that 
> helps us to create unified big-data processing flows. It has been used as a core 
> building block of Seznam.cz web crawler data infrastructure every since. Its 
> design principles and execution model are very similar to Apache Beam.
> 
> 
> This API was open sourced in 2016, under the name Euphoria API:
> 
> https://github.com/seznam/euphoria
> 
> 
> As it is very similar to Apache Beam, we feel, that it is not worth of 
> duplicating effort in terms of development of new runtimes and fine-tuning of 
> current ones.
> 
> 
> The main blocker for us to switch to Apache Beam is lack of the Java 8 API. *W*e 
> propose the integration of Euphoria API into Apache Beam as a Java 8 DSL, in 
> order to share our effort with the community.
> 
> 
> Simple example of the Euphoria API usage, can be found here:
> 
> https://github.com/seznam/euphoria/tree/master/euphoria-examples/src/main/java/cz/seznam/euphoria/examples/wordcount
> 
> 
> If you feel, that Beam community could leverage from our work, we would love to 
> start working on Euphoria integration into Apache Beam (we already have a 
> working POC, with few basic operators implemented).
> 
> 
> I look forward to hearing from you,
> 
> David
> 

-- 
Jean-Baptiste Onofré
jbonofre@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com