You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@beam.apache.org by Neville Li <ne...@gmail.com> on 2016/06/23 21:56:11 UTC

Scala DSL

Hi all,

I'm the co-author of Scio <https://github.com/spotify/scio> and am in the
progress of moving code to Beam (BEAM-302
<https://issues.apache.org/jira/browse/BEAM-302>). Just wondering if
sdks/scala is the right place for this code or if something like dsls/scio
is a better choice? What do you think?

A little background: Scio was built as a high-level Scala API for Google
Cloud Dataflow (now also Apache Beam) and is heavily influenced by Spark
and Scalding. It wraps around the Dataflow/Beam Java SDK while also
providing features comparable to other Scala data frameworks. We use Scio
on Dataflow for production extensively inside Spotify.

Cheers,
Neville

Re: Scala DSL

Posted by Aljoscha Krettek <al...@apache.org>.
I'm also in favor of branding it a DSL rather than an SDK. Mostly because
it uses the Java SDK and because it does not (necessarily) follow/implement
the Beam model. As the Java SDK does and what the Python SDK is apparently
going for.

On Sat, 25 Jun 2016 at 10:04 Amit Sela <am...@gmail.com> wrote:

> Just looked at some Scio examples - and saw Spark Scala code ;-)
>
> For me, this made some sense - Spark is written in Scala (let's call it
> Scala SDK ?) but it also provides Java API. New version has a unified API
> (Java-Scala interop.) So I see Scio in a similar way, It's Scala API
> because it's built on top of the Java SDK.
> Having said that, Scio could offer more than just Scala API over the Java
> SDK (i.e., repl) so in the lack of a native fit, I'd go with DSL.  And to
> relate to the very valid notes people had about saying "Hi, we support
> Scala!", we can call it Scala API, even if it's under dsls/scio.
>
> So +1 for dsls/scio
>
> Thanks,
> Amit
>
> On Sat, Jun 25, 2016 at 5:06 AM Dan Halperin <dh...@google.com.invalid>
> wrote:
>
> > On Fri, Jun 24, 2016 at 7:05 PM, Dan Halperin <dh...@google.com>
> wrote:
> >
> > > On Fri, Jun 24, 2016 at 2:03 PM, Raghu Angadi
> <rangadi@google.com.invalid
> > >
> > > wrote:
> > >
> > >> DSL is a pretty generic term..
> > >>
> > >
> > > I agree and am not married to it. Neville?
> > >
> > >
> > >> The fact that scio uses Java SDK is an implementation detail.
> > >
> > >
> > > Reasonable, which is why I am also not pushing hard for '/java/scio' to
> > be
> > > in the path.
> > >
> > >
> > >> I love the
> > >> name scio. But I think sdks/scala might be most appropriate and would
> > make
> > >> it a first class citizen for Beam.
> > >>
> > >
> > > I am strongly against it being in the 'sdks/' top-level module -- it's
> > not
> > > a Beam SDK. Unlike DSL, SDK is a very specific term in Beam.
> > >
> > >
> > >> Where would a future python sdk reside?
> > >>
> > >
> > > The Python SDK is in the python-sdk branch on Apache already, and it
> > lives
> > > in `sdks/python`. (And it is aiming to become a proper Beam SDK. ;)
> > >
> >
> > Now with a link:
> > https://github.com/apache/incubator-beam/tree/python-sdk/sdks
> >
> > >
> > > Thanks,
> > > Dan
> > >
> > > On Fri, Jun 24, 2016 at 1:50 PM, Jean-Baptiste Onofré <jb@nanthrax.net
> >
> > >> wrote:
> > >>
> > >> > Agree for dsls/scio
> > >> >
> > >> > Regards
> > >> > JB
> > >> >
> > >> >
> > >> > On 06/24/2016 10:22 PM, Lukasz Cwik wrote:
> > >> >
> > >> >> +1 for dsls/scio for the already listed reasons
> > >> >>
> > >> >> On Fri, Jun 24, 2016 at 11:21 AM, Rafal Wojdyla
> > >> <ra...@spotify.com.invalid>
> > >> >> wrote:
> > >> >>
> > >> >> Hello. When it comes to SDK vs DSL - I fully agree with Frances.
> > About
> > >> >>> dsls/java/scio or dsls/scio - dsls/java/scio may cause confusion,
> > scio
> > >> >>> is a
> > >> >>> scala DSL but lives under java directory (?) - that makes sense
> only
> > >> once
> > >> >>> you get that scio is using java SDK under the hood. Thus, +1 to
> > >> >>> dsls/scio.
> > >> >>> - Rafal
> > >> >>>
> > >> >>> On Fri, Jun 24, 2016 at 2:01 PM, Kenneth Knowles
> > >> <klk@google.com.invalid
> > >> >>> >
> > >> >>> wrote:
> > >> >>>
> > >> >>> My +1 goes to dsls/scio. It already has a cool name, so let's use
> > it.
> > >> And
> > >> >>>> there might be other Scala-based DSLs.
> > >> >>>>
> > >> >>>> On Fri, Jun 24, 2016 at 8:39 AM, Ismaël Mejía <iemejia@gmail.com
> >
> > >> >>>> wrote:
> > >> >>>>
> > >> >>>> ​Hello everyone,
> > >> >>>>>
> > >> >>>>> Neville, thanks a lot for your contribution. Your work is
> amazing
> > >> and I
> > >> >>>>>
> > >> >>>> am
> > >> >>>>
> > >> >>>>> really happy that this scala integration is finally happening.
> > >> >>>>> Congratulations to you and your team.
> > >> >>>>>
> > >> >>>>> I *strongly* disagree about the DSL classification for scio for
> > one
> > >> >>>>>
> > >> >>>> reason,
> > >> >>>>
> > >> >>>>> if you go to the root of the term, Domain Specific Languages are
> > >> about
> > >> >>>>>
> > >> >>>> a
> > >> >>>
> > >> >>>> domain, and the domain in this case is writing Beam pipelines,
> > which
> > >> >>>>>
> > >> >>>> is a
> > >> >>>
> > >> >>>> really broad domain.
> > >> >>>>>
> > >> >>>>> I agree with Frances’ argument that scio is not an SDK e.g. it
> > >> reuses
> > >> >>>>>
> > >> >>>> the
> > >> >>>
> > >> >>>> existing Beam java SDK. My proposition is that scio will be
> called
> > >> the
> > >> >>>>> Scala API because in the end this is what it is. I think the
> > >> confusion
> > >> >>>>> comes from the common definition of SDK which is normally an API
> > + a
> > >> >>>>> Runtime. In this case scio will share the runtime with what we
> > call
> > >> the
> > >> >>>>> Beam Java SDK.
> > >> >>>>>
> > >> >>>>> One additional point of using the term API is that it sends the
> > >> clear
> > >> >>>>> message that Beam has a Scala API too (which is good for
> > visibility
> > >> as
> > >> >>>>>
> > >> >>>> JB
> > >> >>>
> > >> >>>> mentioned).
> > >> >>>>>
> > >> >>>>> Regards,
> > >> >>>>> Ismaël​
> > >> >>>>>
> > >> >>>>>
> > >> >>>>> On Fri, Jun 24, 2016 at 5:08 PM, Jean-Baptiste Onofré <
> > >> jb@nanthrax.net
> > >> >>>>>
> > >> >>>>
> > >> >>>> wrote:
> > >> >>>>>
> > >> >>>>> Hi Dan,
> > >> >>>>>>
> > >> >>>>>> fair enough.
> > >> >>>>>>
> > >> >>>>>> As I'm also working on new DSLs (XML, JSON), I already created
> > the
> > >> >>>>>>
> > >> >>>>> dsls
> > >> >>>
> > >> >>>> module.
> > >> >>>>>>
> > >> >>>>>> So, I would say dsls/scala.
> > >> >>>>>>
> > >> >>>>>> WDYT ?
> > >> >>>>>>
> > >> >>>>>> Regards
> > >> >>>>>> JB
> > >> >>>>>>
> > >> >>>>>>
> > >> >>>>>> On 06/24/2016 05:07 PM, Dan Halperin wrote:
> > >> >>>>>>
> > >> >>>>>> I don't think that sdks/scala is the right place -- scio is
> not a
> > >> >>>>>>>
> > >> >>>>>> Beam
> > >> >>>
> > >> >>>> Scala SDK; it wraps the existing Java SDK.
> > >> >>>>>>>
> > >> >>>>>>> Some options:
> > >> >>>>>>> * sdks/java/extensions  (Scio builds on the Java SDK) --
> > mentally
> > >> >>>>>>>
> > >> >>>>>> vetoed
> > >> >>>>
> > >> >>>>> since Scio isn't an extension for the Java SDK, but rather a
> > wrapper
> > >> >>>>>>>
> > >> >>>>>>> * dsls/java/scio (Scio is a Beam DSL that uses the Java SDK)
> > >> >>>>>>> * dsls/scio  (Scio is a Beam DSL that could eventually use
> > >> multiple
> > >> >>>>>>>
> > >> >>>>>> SDKs)
> > >> >>>>>
> > >> >>>>>> * extensions/java/scio  (Scio is an extension of Beam that uses
> > the
> > >> >>>>>>>
> > >> >>>>>> Java
> > >> >>>>
> > >> >>>>> SDK)
> > >> >>>>>>> * extensions/scio  (Scio is an extension of Beam that is not
> > >> limited
> > >> >>>>>>>
> > >> >>>>>> to
> > >> >>>>
> > >> >>>>> one
> > >> >>>>>>> SDK)
> > >> >>>>>>>
> > >> >>>>>>> I lean towards either dsls/java/scio or extensions/java/scio,
> > >> since
> > >> >>>>>>>
> > >> >>>>>> I
> > >> >>>
> > >> >>>> don't
> > >> >>>>>>> think there are plans for Scio to handle multiple different
> SDKs
> > >> (in
> > >> >>>>>>> different languages). The question between these two is
> whether
> > we
> > >> >>>>>>>
> > >> >>>>>> think
> > >> >>>>
> > >> >>>>> DSLs are "big enough" to be a top level concept.
> > >> >>>>>>>
> > >> >>>>>>> On Thu, Jun 23, 2016 at 11:05 PM, Jean-Baptiste Onofré <
> > >> >>>>>>>
> > >> >>>>>> jb@nanthrax.net
> > >> >>>>
> > >> >>>>>
> > >> >>>>>> wrote:
> > >> >>>>>>>
> > >> >>>>>>> Good point about new Fn and the fact it's based on the Java
> SDK.
> > >> >>>>>>>
> > >> >>>>>>>>
> > >> >>>>>>>> It's just that in term of "marketing", it's a good message to
> > >> >>>>>>>>
> > >> >>>>>>> provide a
> > >> >>>>
> > >> >>>>> Scala SDK even if technically it's more a DSL.
> > >> >>>>>>>>
> > >> >>>>>>>> For instance, a valid "marketing" DSL would be a Java fluent
> > DSL
> > >> on
> > >> >>>>>>>>
> > >> >>>>>>> top
> > >> >>>>
> > >> >>>>> of
> > >> >>>>>>>> the Java SDK, or a declarative XML DSL.
> > >> >>>>>>>>
> > >> >>>>>>>> However, from a technical perspective, it can go into dsl
> > module.
> > >> >>>>>>>>
> > >> >>>>>>>> My $0.02 ;)
> > >> >>>>>>>>
> > >> >>>>>>>> Regards
> > >> >>>>>>>> JB
> > >> >>>>>>>>
> > >> >>>>>>>>
> > >> >>>>>>>> On 06/24/2016 06:51 AM, Frances Perry wrote:
> > >> >>>>>>>>
> > >> >>>>>>>> +Rafal & Andrew again
> > >> >>>>>>>>
> > >> >>>>>>>>>
> > >> >>>>>>>>> I am leaning DSL for two reasons: (1) scio uses the existing
> > >> java
> > >> >>>>>>>>> execution
> > >> >>>>>>>>> environment (and won't have a language-specific fn harness
> of
> > >> its
> > >> >>>>>>>>>
> > >> >>>>>>>> own),
> > >> >>>>>
> > >> >>>>>> and
> > >> >>>>>>>>> (2) it changes the abstractions that users interact with.
> > >> >>>>>>>>>
> > >> >>>>>>>>> I recently saw a scio repl demo from Reuven -- there's some
> > >> really
> > >> >>>>>>>>>
> > >> >>>>>>>> cool
> > >> >>>>>
> > >> >>>>>> stuff in there. I'd love to dive into it a bit more and see
> what
> > >> >>>>>>>>>
> > >> >>>>>>>> can
> > >> >>>
> > >> >>>> be
> > >> >>>>>
> > >> >>>>>> generalized beyond scio. The repl-like interactive graph
> > >> >>>>>>>>>
> > >> >>>>>>>> construction
> > >> >>>>
> > >> >>>>> is
> > >> >>>>>
> > >> >>>>>> very similar to what we've seen with ipython, in that it
> doesn't
> > >> >>>>>>>>>
> > >> >>>>>>>> always
> > >> >>>>>
> > >> >>>>>> play nicely with the graph construction / graph execution
> > >> >>>>>>>>>
> > >> >>>>>>>> distinction. I
> > >> >>>>>
> > >> >>>>>> wonder what changes to Beam might more generally support this.
> > The
> > >> >>>>>>>>> materialize stuff looks similar to some functionality in
> > >> FlumeJava
> > >> >>>>>>>>>
> > >> >>>>>>>> we
> > >> >>>>
> > >> >>>>> used
> > >> >>>>>>>>> to support multi-segment pipelines with some shared
> > intermediate
> > >> >>>>>>>>> PCollections.
> > >> >>>>>>>>>
> > >> >>>>>>>>> On Thu, Jun 23, 2016 at 9:22 PM, Jean-Baptiste Onofré <
> > >> >>>>>>>>>
> > >> >>>>>>>> jb@nanthrax.net>
> > >> >>>>>
> > >> >>>>>> wrote:
> > >> >>>>>>>>>
> > >> >>>>>>>>> Hi Neville,
> > >> >>>>>>>>>
> > >> >>>>>>>>>
> > >> >>>>>>>>>> thanks for the update !
> > >> >>>>>>>>>>
> > >> >>>>>>>>>> As it's another language support, and to clearly identify
> the
> > >> >>>>>>>>>>
> > >> >>>>>>>>> purpose,
> > >> >>>>>
> > >> >>>>>> I
> > >> >>>>>>>>>> would say sdks/scala.
> > >> >>>>>>>>>>
> > >> >>>>>>>>>> Regards
> > >> >>>>>>>>>> JB
> > >> >>>>>>>>>>
> > >> >>>>>>>>>>
> > >> >>>>>>>>>> On 06/23/2016 11:56 PM, Neville Li wrote:
> > >> >>>>>>>>>>
> > >> >>>>>>>>>> +folks in my team
> > >> >>>>>>>>>>
> > >> >>>>>>>>>>
> > >> >>>>>>>>>>> On Thu, Jun 23, 2016 at 5:57 PM Neville Li <
> > >> >>>>>>>>>>>
> > >> >>>>>>>>>> neville.lyh@gmail.com
> > >> >>>
> > >> >>>>
> > >> >>>>> wrote:
> > >> >>>>>>>>>>>
> > >> >>>>>>>>>>> Hi all,
> > >> >>>>>>>>>>>
> > >> >>>>>>>>>>>
> > >> >>>>>>>>>>> I'm the co-author of Scio <
> https://github.com/spotify/scio>
> > >> >>>>>>>>>>>>
> > >> >>>>>>>>>>> and
> > >> >>>
> > >> >>>> am
> > >> >>>>
> > >> >>>>> in
> > >> >>>>>>>>>>>> the
> > >> >>>>>>>>>>>> progress of moving code to Beam (BEAM-302
> > >> >>>>>>>>>>>> <https://issues.apache.org/jira/browse/BEAM-302>). Just
> > >> >>>>>>>>>>>>
> > >> >>>>>>>>>>> wondering
> > >> >>>>
> > >> >>>>> if
> > >> >>>>>
> > >> >>>>>> sdks/scala is the right place for this code or if something
> > >> >>>>>>>>>>>>
> > >> >>>>>>>>>>> like
> > >> >>>
> > >> >>>> dsls/scio
> > >> >>>>>>>>>>>> is a better choice? What do you think?
> > >> >>>>>>>>>>>>
> > >> >>>>>>>>>>>> A little background: Scio was built as a high-level Scala
> > API
> > >> >>>>>>>>>>>>
> > >> >>>>>>>>>>> for
> > >> >>>
> > >> >>>> Google
> > >> >>>>>>>>>>>> Cloud Dataflow (now also Apache Beam) and is heavily
> > >> influenced
> > >> >>>>>>>>>>>>
> > >> >>>>>>>>>>> by
> > >> >>>>
> > >> >>>>> Spark
> > >> >>>>>>>>>>>> and Scalding. It wraps around the Dataflow/Beam Java SDK
> > >> while
> > >> >>>>>>>>>>>>
> > >> >>>>>>>>>>> also
> > >> >>>>
> > >> >>>>> providing features comparable to other Scala data frameworks.
> > >> >>>>>>>>>>>>
> > >> >>>>>>>>>>> We
> > >> >>>
> > >> >>>> use
> > >> >>>>>
> > >> >>>>>> Scio
> > >> >>>>>>>>>>>> on Dataflow for production extensively inside Spotify.
> > >> >>>>>>>>>>>>
> > >> >>>>>>>>>>>> Cheers,
> > >> >>>>>>>>>>>> Neville
> > >> >>>>>>>>>>>>
> > >> >>>>>>>>>>>>
> > >> >>>>>>>>>>>>
> > >> >>>>>>>>>>>> --
> > >> >>>>>>>>>>>>
> > >> >>>>>>>>>>>
> > >> >>>>>>>>>>> Jean-Baptiste Onofré
> > >> >>>>>>>>>> jbonofre@apache.org
> > >> >>>>>>>>>> http://blog.nanthrax.net
> > >> >>>>>>>>>> Talend - http://www.talend.com
> > >> >>>>>>>>>>
> > >> >>>>>>>>>>
> > >> >>>>>>>>>>
> > >> >>>>>>>>>> --
> > >> >>>>>>>>>
> > >> >>>>>>>> Jean-Baptiste Onofré
> > >> >>>>>>>> jbonofre@apache.org
> > >> >>>>>>>> http://blog.nanthrax.net
> > >> >>>>>>>> Talend - http://www.talend.com
> > >> >>>>>>>>
> > >> >>>>>>>>
> > >> >>>>>>>>
> > >> >>>>>>> --
> > >> >>>>>> Jean-Baptiste Onofré
> > >> >>>>>> jbonofre@apache.org
> > >> >>>>>> http://blog.nanthrax.net
> > >> >>>>>> Talend - http://www.talend.com
> > >> >>>>>>
> > >> >>>>>>
> > >> >>>>>
> > >> >>>>
> > >> >>>
> > >> >>
> > >> > --
> > >> > Jean-Baptiste Onofré
> > >> > jbonofre@apache.org
> > >> > http://blog.nanthrax.net
> > >> > Talend - http://www.talend.com
> > >> >
> > >>
> > >
> > >
> >
>

Re: Scala DSL

Posted by Amit Sela <am...@gmail.com>.
Just looked at some Scio examples - and saw Spark Scala code ;-)

For me, this made some sense - Spark is written in Scala (let's call it
Scala SDK ?) but it also provides Java API. New version has a unified API
(Java-Scala interop.) So I see Scio in a similar way, It's Scala API
because it's built on top of the Java SDK.
Having said that, Scio could offer more than just Scala API over the Java
SDK (i.e., repl) so in the lack of a native fit, I'd go with DSL.  And to
relate to the very valid notes people had about saying "Hi, we support
Scala!", we can call it Scala API, even if it's under dsls/scio.

So +1 for dsls/scio

Thanks,
Amit

On Sat, Jun 25, 2016 at 5:06 AM Dan Halperin <dh...@google.com.invalid>
wrote:

> On Fri, Jun 24, 2016 at 7:05 PM, Dan Halperin <dh...@google.com> wrote:
>
> > On Fri, Jun 24, 2016 at 2:03 PM, Raghu Angadi <rangadi@google.com.invalid
> >
> > wrote:
> >
> >> DSL is a pretty generic term..
> >>
> >
> > I agree and am not married to it. Neville?
> >
> >
> >> The fact that scio uses Java SDK is an implementation detail.
> >
> >
> > Reasonable, which is why I am also not pushing hard for '/java/scio' to
> be
> > in the path.
> >
> >
> >> I love the
> >> name scio. But I think sdks/scala might be most appropriate and would
> make
> >> it a first class citizen for Beam.
> >>
> >
> > I am strongly against it being in the 'sdks/' top-level module -- it's
> not
> > a Beam SDK. Unlike DSL, SDK is a very specific term in Beam.
> >
> >
> >> Where would a future python sdk reside?
> >>
> >
> > The Python SDK is in the python-sdk branch on Apache already, and it
> lives
> > in `sdks/python`. (And it is aiming to become a proper Beam SDK. ;)
> >
>
> Now with a link:
> https://github.com/apache/incubator-beam/tree/python-sdk/sdks
>
> >
> > Thanks,
> > Dan
> >
> > On Fri, Jun 24, 2016 at 1:50 PM, Jean-Baptiste Onofré <jb...@nanthrax.net>
> >> wrote:
> >>
> >> > Agree for dsls/scio
> >> >
> >> > Regards
> >> > JB
> >> >
> >> >
> >> > On 06/24/2016 10:22 PM, Lukasz Cwik wrote:
> >> >
> >> >> +1 for dsls/scio for the already listed reasons
> >> >>
> >> >> On Fri, Jun 24, 2016 at 11:21 AM, Rafal Wojdyla
> >> <ra...@spotify.com.invalid>
> >> >> wrote:
> >> >>
> >> >> Hello. When it comes to SDK vs DSL - I fully agree with Frances.
> About
> >> >>> dsls/java/scio or dsls/scio - dsls/java/scio may cause confusion,
> scio
> >> >>> is a
> >> >>> scala DSL but lives under java directory (?) - that makes sense only
> >> once
> >> >>> you get that scio is using java SDK under the hood. Thus, +1 to
> >> >>> dsls/scio.
> >> >>> - Rafal
> >> >>>
> >> >>> On Fri, Jun 24, 2016 at 2:01 PM, Kenneth Knowles
> >> <klk@google.com.invalid
> >> >>> >
> >> >>> wrote:
> >> >>>
> >> >>> My +1 goes to dsls/scio. It already has a cool name, so let's use
> it.
> >> And
> >> >>>> there might be other Scala-based DSLs.
> >> >>>>
> >> >>>> On Fri, Jun 24, 2016 at 8:39 AM, Ismaël Mejía <ie...@gmail.com>
> >> >>>> wrote:
> >> >>>>
> >> >>>> ​Hello everyone,
> >> >>>>>
> >> >>>>> Neville, thanks a lot for your contribution. Your work is amazing
> >> and I
> >> >>>>>
> >> >>>> am
> >> >>>>
> >> >>>>> really happy that this scala integration is finally happening.
> >> >>>>> Congratulations to you and your team.
> >> >>>>>
> >> >>>>> I *strongly* disagree about the DSL classification for scio for
> one
> >> >>>>>
> >> >>>> reason,
> >> >>>>
> >> >>>>> if you go to the root of the term, Domain Specific Languages are
> >> about
> >> >>>>>
> >> >>>> a
> >> >>>
> >> >>>> domain, and the domain in this case is writing Beam pipelines,
> which
> >> >>>>>
> >> >>>> is a
> >> >>>
> >> >>>> really broad domain.
> >> >>>>>
> >> >>>>> I agree with Frances’ argument that scio is not an SDK e.g. it
> >> reuses
> >> >>>>>
> >> >>>> the
> >> >>>
> >> >>>> existing Beam java SDK. My proposition is that scio will be called
> >> the
> >> >>>>> Scala API because in the end this is what it is. I think the
> >> confusion
> >> >>>>> comes from the common definition of SDK which is normally an API
> + a
> >> >>>>> Runtime. In this case scio will share the runtime with what we
> call
> >> the
> >> >>>>> Beam Java SDK.
> >> >>>>>
> >> >>>>> One additional point of using the term API is that it sends the
> >> clear
> >> >>>>> message that Beam has a Scala API too (which is good for
> visibility
> >> as
> >> >>>>>
> >> >>>> JB
> >> >>>
> >> >>>> mentioned).
> >> >>>>>
> >> >>>>> Regards,
> >> >>>>> Ismaël​
> >> >>>>>
> >> >>>>>
> >> >>>>> On Fri, Jun 24, 2016 at 5:08 PM, Jean-Baptiste Onofré <
> >> jb@nanthrax.net
> >> >>>>>
> >> >>>>
> >> >>>> wrote:
> >> >>>>>
> >> >>>>> Hi Dan,
> >> >>>>>>
> >> >>>>>> fair enough.
> >> >>>>>>
> >> >>>>>> As I'm also working on new DSLs (XML, JSON), I already created
> the
> >> >>>>>>
> >> >>>>> dsls
> >> >>>
> >> >>>> module.
> >> >>>>>>
> >> >>>>>> So, I would say dsls/scala.
> >> >>>>>>
> >> >>>>>> WDYT ?
> >> >>>>>>
> >> >>>>>> Regards
> >> >>>>>> JB
> >> >>>>>>
> >> >>>>>>
> >> >>>>>> On 06/24/2016 05:07 PM, Dan Halperin wrote:
> >> >>>>>>
> >> >>>>>> I don't think that sdks/scala is the right place -- scio is not a
> >> >>>>>>>
> >> >>>>>> Beam
> >> >>>
> >> >>>> Scala SDK; it wraps the existing Java SDK.
> >> >>>>>>>
> >> >>>>>>> Some options:
> >> >>>>>>> * sdks/java/extensions  (Scio builds on the Java SDK) --
> mentally
> >> >>>>>>>
> >> >>>>>> vetoed
> >> >>>>
> >> >>>>> since Scio isn't an extension for the Java SDK, but rather a
> wrapper
> >> >>>>>>>
> >> >>>>>>> * dsls/java/scio (Scio is a Beam DSL that uses the Java SDK)
> >> >>>>>>> * dsls/scio  (Scio is a Beam DSL that could eventually use
> >> multiple
> >> >>>>>>>
> >> >>>>>> SDKs)
> >> >>>>>
> >> >>>>>> * extensions/java/scio  (Scio is an extension of Beam that uses
> the
> >> >>>>>>>
> >> >>>>>> Java
> >> >>>>
> >> >>>>> SDK)
> >> >>>>>>> * extensions/scio  (Scio is an extension of Beam that is not
> >> limited
> >> >>>>>>>
> >> >>>>>> to
> >> >>>>
> >> >>>>> one
> >> >>>>>>> SDK)
> >> >>>>>>>
> >> >>>>>>> I lean towards either dsls/java/scio or extensions/java/scio,
> >> since
> >> >>>>>>>
> >> >>>>>> I
> >> >>>
> >> >>>> don't
> >> >>>>>>> think there are plans for Scio to handle multiple different SDKs
> >> (in
> >> >>>>>>> different languages). The question between these two is whether
> we
> >> >>>>>>>
> >> >>>>>> think
> >> >>>>
> >> >>>>> DSLs are "big enough" to be a top level concept.
> >> >>>>>>>
> >> >>>>>>> On Thu, Jun 23, 2016 at 11:05 PM, Jean-Baptiste Onofré <
> >> >>>>>>>
> >> >>>>>> jb@nanthrax.net
> >> >>>>
> >> >>>>>
> >> >>>>>> wrote:
> >> >>>>>>>
> >> >>>>>>> Good point about new Fn and the fact it's based on the Java SDK.
> >> >>>>>>>
> >> >>>>>>>>
> >> >>>>>>>> It's just that in term of "marketing", it's a good message to
> >> >>>>>>>>
> >> >>>>>>> provide a
> >> >>>>
> >> >>>>> Scala SDK even if technically it's more a DSL.
> >> >>>>>>>>
> >> >>>>>>>> For instance, a valid "marketing" DSL would be a Java fluent
> DSL
> >> on
> >> >>>>>>>>
> >> >>>>>>> top
> >> >>>>
> >> >>>>> of
> >> >>>>>>>> the Java SDK, or a declarative XML DSL.
> >> >>>>>>>>
> >> >>>>>>>> However, from a technical perspective, it can go into dsl
> module.
> >> >>>>>>>>
> >> >>>>>>>> My $0.02 ;)
> >> >>>>>>>>
> >> >>>>>>>> Regards
> >> >>>>>>>> JB
> >> >>>>>>>>
> >> >>>>>>>>
> >> >>>>>>>> On 06/24/2016 06:51 AM, Frances Perry wrote:
> >> >>>>>>>>
> >> >>>>>>>> +Rafal & Andrew again
> >> >>>>>>>>
> >> >>>>>>>>>
> >> >>>>>>>>> I am leaning DSL for two reasons: (1) scio uses the existing
> >> java
> >> >>>>>>>>> execution
> >> >>>>>>>>> environment (and won't have a language-specific fn harness of
> >> its
> >> >>>>>>>>>
> >> >>>>>>>> own),
> >> >>>>>
> >> >>>>>> and
> >> >>>>>>>>> (2) it changes the abstractions that users interact with.
> >> >>>>>>>>>
> >> >>>>>>>>> I recently saw a scio repl demo from Reuven -- there's some
> >> really
> >> >>>>>>>>>
> >> >>>>>>>> cool
> >> >>>>>
> >> >>>>>> stuff in there. I'd love to dive into it a bit more and see what
> >> >>>>>>>>>
> >> >>>>>>>> can
> >> >>>
> >> >>>> be
> >> >>>>>
> >> >>>>>> generalized beyond scio. The repl-like interactive graph
> >> >>>>>>>>>
> >> >>>>>>>> construction
> >> >>>>
> >> >>>>> is
> >> >>>>>
> >> >>>>>> very similar to what we've seen with ipython, in that it doesn't
> >> >>>>>>>>>
> >> >>>>>>>> always
> >> >>>>>
> >> >>>>>> play nicely with the graph construction / graph execution
> >> >>>>>>>>>
> >> >>>>>>>> distinction. I
> >> >>>>>
> >> >>>>>> wonder what changes to Beam might more generally support this.
> The
> >> >>>>>>>>> materialize stuff looks similar to some functionality in
> >> FlumeJava
> >> >>>>>>>>>
> >> >>>>>>>> we
> >> >>>>
> >> >>>>> used
> >> >>>>>>>>> to support multi-segment pipelines with some shared
> intermediate
> >> >>>>>>>>> PCollections.
> >> >>>>>>>>>
> >> >>>>>>>>> On Thu, Jun 23, 2016 at 9:22 PM, Jean-Baptiste Onofré <
> >> >>>>>>>>>
> >> >>>>>>>> jb@nanthrax.net>
> >> >>>>>
> >> >>>>>> wrote:
> >> >>>>>>>>>
> >> >>>>>>>>> Hi Neville,
> >> >>>>>>>>>
> >> >>>>>>>>>
> >> >>>>>>>>>> thanks for the update !
> >> >>>>>>>>>>
> >> >>>>>>>>>> As it's another language support, and to clearly identify the
> >> >>>>>>>>>>
> >> >>>>>>>>> purpose,
> >> >>>>>
> >> >>>>>> I
> >> >>>>>>>>>> would say sdks/scala.
> >> >>>>>>>>>>
> >> >>>>>>>>>> Regards
> >> >>>>>>>>>> JB
> >> >>>>>>>>>>
> >> >>>>>>>>>>
> >> >>>>>>>>>> On 06/23/2016 11:56 PM, Neville Li wrote:
> >> >>>>>>>>>>
> >> >>>>>>>>>> +folks in my team
> >> >>>>>>>>>>
> >> >>>>>>>>>>
> >> >>>>>>>>>>> On Thu, Jun 23, 2016 at 5:57 PM Neville Li <
> >> >>>>>>>>>>>
> >> >>>>>>>>>> neville.lyh@gmail.com
> >> >>>
> >> >>>>
> >> >>>>> wrote:
> >> >>>>>>>>>>>
> >> >>>>>>>>>>> Hi all,
> >> >>>>>>>>>>>
> >> >>>>>>>>>>>
> >> >>>>>>>>>>> I'm the co-author of Scio <https://github.com/spotify/scio>
> >> >>>>>>>>>>>>
> >> >>>>>>>>>>> and
> >> >>>
> >> >>>> am
> >> >>>>
> >> >>>>> in
> >> >>>>>>>>>>>> the
> >> >>>>>>>>>>>> progress of moving code to Beam (BEAM-302
> >> >>>>>>>>>>>> <https://issues.apache.org/jira/browse/BEAM-302>). Just
> >> >>>>>>>>>>>>
> >> >>>>>>>>>>> wondering
> >> >>>>
> >> >>>>> if
> >> >>>>>
> >> >>>>>> sdks/scala is the right place for this code or if something
> >> >>>>>>>>>>>>
> >> >>>>>>>>>>> like
> >> >>>
> >> >>>> dsls/scio
> >> >>>>>>>>>>>> is a better choice? What do you think?
> >> >>>>>>>>>>>>
> >> >>>>>>>>>>>> A little background: Scio was built as a high-level Scala
> API
> >> >>>>>>>>>>>>
> >> >>>>>>>>>>> for
> >> >>>
> >> >>>> Google
> >> >>>>>>>>>>>> Cloud Dataflow (now also Apache Beam) and is heavily
> >> influenced
> >> >>>>>>>>>>>>
> >> >>>>>>>>>>> by
> >> >>>>
> >> >>>>> Spark
> >> >>>>>>>>>>>> and Scalding. It wraps around the Dataflow/Beam Java SDK
> >> while
> >> >>>>>>>>>>>>
> >> >>>>>>>>>>> also
> >> >>>>
> >> >>>>> providing features comparable to other Scala data frameworks.
> >> >>>>>>>>>>>>
> >> >>>>>>>>>>> We
> >> >>>
> >> >>>> use
> >> >>>>>
> >> >>>>>> Scio
> >> >>>>>>>>>>>> on Dataflow for production extensively inside Spotify.
> >> >>>>>>>>>>>>
> >> >>>>>>>>>>>> Cheers,
> >> >>>>>>>>>>>> Neville
> >> >>>>>>>>>>>>
> >> >>>>>>>>>>>>
> >> >>>>>>>>>>>>
> >> >>>>>>>>>>>> --
> >> >>>>>>>>>>>>
> >> >>>>>>>>>>>
> >> >>>>>>>>>>> Jean-Baptiste Onofré
> >> >>>>>>>>>> jbonofre@apache.org
> >> >>>>>>>>>> http://blog.nanthrax.net
> >> >>>>>>>>>> Talend - http://www.talend.com
> >> >>>>>>>>>>
> >> >>>>>>>>>>
> >> >>>>>>>>>>
> >> >>>>>>>>>> --
> >> >>>>>>>>>
> >> >>>>>>>> Jean-Baptiste Onofré
> >> >>>>>>>> jbonofre@apache.org
> >> >>>>>>>> http://blog.nanthrax.net
> >> >>>>>>>> Talend - http://www.talend.com
> >> >>>>>>>>
> >> >>>>>>>>
> >> >>>>>>>>
> >> >>>>>>> --
> >> >>>>>> Jean-Baptiste Onofré
> >> >>>>>> jbonofre@apache.org
> >> >>>>>> http://blog.nanthrax.net
> >> >>>>>> Talend - http://www.talend.com
> >> >>>>>>
> >> >>>>>>
> >> >>>>>
> >> >>>>
> >> >>>
> >> >>
> >> > --
> >> > Jean-Baptiste Onofré
> >> > jbonofre@apache.org
> >> > http://blog.nanthrax.net
> >> > Talend - http://www.talend.com
> >> >
> >>
> >
> >
>

Re: Scala DSL

Posted by Dan Halperin <dh...@google.com.INVALID>.
On Fri, Jun 24, 2016 at 7:05 PM, Dan Halperin <dh...@google.com> wrote:

> On Fri, Jun 24, 2016 at 2:03 PM, Raghu Angadi <ra...@google.com.invalid>
> wrote:
>
>> DSL is a pretty generic term..
>>
>
> I agree and am not married to it. Neville?
>
>
>> The fact that scio uses Java SDK is an implementation detail.
>
>
> Reasonable, which is why I am also not pushing hard for '/java/scio' to be
> in the path.
>
>
>> I love the
>> name scio. But I think sdks/scala might be most appropriate and would make
>> it a first class citizen for Beam.
>>
>
> I am strongly against it being in the 'sdks/' top-level module -- it's not
> a Beam SDK. Unlike DSL, SDK is a very specific term in Beam.
>
>
>> Where would a future python sdk reside?
>>
>
> The Python SDK is in the python-sdk branch on Apache already, and it lives
> in `sdks/python`. (And it is aiming to become a proper Beam SDK. ;)
>

Now with a link:
https://github.com/apache/incubator-beam/tree/python-sdk/sdks

>
> Thanks,
> Dan
>
> On Fri, Jun 24, 2016 at 1:50 PM, Jean-Baptiste Onofré <jb...@nanthrax.net>
>> wrote:
>>
>> > Agree for dsls/scio
>> >
>> > Regards
>> > JB
>> >
>> >
>> > On 06/24/2016 10:22 PM, Lukasz Cwik wrote:
>> >
>> >> +1 for dsls/scio for the already listed reasons
>> >>
>> >> On Fri, Jun 24, 2016 at 11:21 AM, Rafal Wojdyla
>> <ra...@spotify.com.invalid>
>> >> wrote:
>> >>
>> >> Hello. When it comes to SDK vs DSL - I fully agree with Frances. About
>> >>> dsls/java/scio or dsls/scio - dsls/java/scio may cause confusion, scio
>> >>> is a
>> >>> scala DSL but lives under java directory (?) - that makes sense only
>> once
>> >>> you get that scio is using java SDK under the hood. Thus, +1 to
>> >>> dsls/scio.
>> >>> - Rafal
>> >>>
>> >>> On Fri, Jun 24, 2016 at 2:01 PM, Kenneth Knowles
>> <klk@google.com.invalid
>> >>> >
>> >>> wrote:
>> >>>
>> >>> My +1 goes to dsls/scio. It already has a cool name, so let's use it.
>> And
>> >>>> there might be other Scala-based DSLs.
>> >>>>
>> >>>> On Fri, Jun 24, 2016 at 8:39 AM, Ismaël Mejía <ie...@gmail.com>
>> >>>> wrote:
>> >>>>
>> >>>> ​Hello everyone,
>> >>>>>
>> >>>>> Neville, thanks a lot for your contribution. Your work is amazing
>> and I
>> >>>>>
>> >>>> am
>> >>>>
>> >>>>> really happy that this scala integration is finally happening.
>> >>>>> Congratulations to you and your team.
>> >>>>>
>> >>>>> I *strongly* disagree about the DSL classification for scio for one
>> >>>>>
>> >>>> reason,
>> >>>>
>> >>>>> if you go to the root of the term, Domain Specific Languages are
>> about
>> >>>>>
>> >>>> a
>> >>>
>> >>>> domain, and the domain in this case is writing Beam pipelines, which
>> >>>>>
>> >>>> is a
>> >>>
>> >>>> really broad domain.
>> >>>>>
>> >>>>> I agree with Frances’ argument that scio is not an SDK e.g. it
>> reuses
>> >>>>>
>> >>>> the
>> >>>
>> >>>> existing Beam java SDK. My proposition is that scio will be called
>> the
>> >>>>> Scala API because in the end this is what it is. I think the
>> confusion
>> >>>>> comes from the common definition of SDK which is normally an API + a
>> >>>>> Runtime. In this case scio will share the runtime with what we call
>> the
>> >>>>> Beam Java SDK.
>> >>>>>
>> >>>>> One additional point of using the term API is that it sends the
>> clear
>> >>>>> message that Beam has a Scala API too (which is good for visibility
>> as
>> >>>>>
>> >>>> JB
>> >>>
>> >>>> mentioned).
>> >>>>>
>> >>>>> Regards,
>> >>>>> Ismaël​
>> >>>>>
>> >>>>>
>> >>>>> On Fri, Jun 24, 2016 at 5:08 PM, Jean-Baptiste Onofré <
>> jb@nanthrax.net
>> >>>>>
>> >>>>
>> >>>> wrote:
>> >>>>>
>> >>>>> Hi Dan,
>> >>>>>>
>> >>>>>> fair enough.
>> >>>>>>
>> >>>>>> As I'm also working on new DSLs (XML, JSON), I already created the
>> >>>>>>
>> >>>>> dsls
>> >>>
>> >>>> module.
>> >>>>>>
>> >>>>>> So, I would say dsls/scala.
>> >>>>>>
>> >>>>>> WDYT ?
>> >>>>>>
>> >>>>>> Regards
>> >>>>>> JB
>> >>>>>>
>> >>>>>>
>> >>>>>> On 06/24/2016 05:07 PM, Dan Halperin wrote:
>> >>>>>>
>> >>>>>> I don't think that sdks/scala is the right place -- scio is not a
>> >>>>>>>
>> >>>>>> Beam
>> >>>
>> >>>> Scala SDK; it wraps the existing Java SDK.
>> >>>>>>>
>> >>>>>>> Some options:
>> >>>>>>> * sdks/java/extensions  (Scio builds on the Java SDK) -- mentally
>> >>>>>>>
>> >>>>>> vetoed
>> >>>>
>> >>>>> since Scio isn't an extension for the Java SDK, but rather a wrapper
>> >>>>>>>
>> >>>>>>> * dsls/java/scio (Scio is a Beam DSL that uses the Java SDK)
>> >>>>>>> * dsls/scio  (Scio is a Beam DSL that could eventually use
>> multiple
>> >>>>>>>
>> >>>>>> SDKs)
>> >>>>>
>> >>>>>> * extensions/java/scio  (Scio is an extension of Beam that uses the
>> >>>>>>>
>> >>>>>> Java
>> >>>>
>> >>>>> SDK)
>> >>>>>>> * extensions/scio  (Scio is an extension of Beam that is not
>> limited
>> >>>>>>>
>> >>>>>> to
>> >>>>
>> >>>>> one
>> >>>>>>> SDK)
>> >>>>>>>
>> >>>>>>> I lean towards either dsls/java/scio or extensions/java/scio,
>> since
>> >>>>>>>
>> >>>>>> I
>> >>>
>> >>>> don't
>> >>>>>>> think there are plans for Scio to handle multiple different SDKs
>> (in
>> >>>>>>> different languages). The question between these two is whether we
>> >>>>>>>
>> >>>>>> think
>> >>>>
>> >>>>> DSLs are "big enough" to be a top level concept.
>> >>>>>>>
>> >>>>>>> On Thu, Jun 23, 2016 at 11:05 PM, Jean-Baptiste Onofré <
>> >>>>>>>
>> >>>>>> jb@nanthrax.net
>> >>>>
>> >>>>>
>> >>>>>> wrote:
>> >>>>>>>
>> >>>>>>> Good point about new Fn and the fact it's based on the Java SDK.
>> >>>>>>>
>> >>>>>>>>
>> >>>>>>>> It's just that in term of "marketing", it's a good message to
>> >>>>>>>>
>> >>>>>>> provide a
>> >>>>
>> >>>>> Scala SDK even if technically it's more a DSL.
>> >>>>>>>>
>> >>>>>>>> For instance, a valid "marketing" DSL would be a Java fluent DSL
>> on
>> >>>>>>>>
>> >>>>>>> top
>> >>>>
>> >>>>> of
>> >>>>>>>> the Java SDK, or a declarative XML DSL.
>> >>>>>>>>
>> >>>>>>>> However, from a technical perspective, it can go into dsl module.
>> >>>>>>>>
>> >>>>>>>> My $0.02 ;)
>> >>>>>>>>
>> >>>>>>>> Regards
>> >>>>>>>> JB
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>> On 06/24/2016 06:51 AM, Frances Perry wrote:
>> >>>>>>>>
>> >>>>>>>> +Rafal & Andrew again
>> >>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>> I am leaning DSL for two reasons: (1) scio uses the existing
>> java
>> >>>>>>>>> execution
>> >>>>>>>>> environment (and won't have a language-specific fn harness of
>> its
>> >>>>>>>>>
>> >>>>>>>> own),
>> >>>>>
>> >>>>>> and
>> >>>>>>>>> (2) it changes the abstractions that users interact with.
>> >>>>>>>>>
>> >>>>>>>>> I recently saw a scio repl demo from Reuven -- there's some
>> really
>> >>>>>>>>>
>> >>>>>>>> cool
>> >>>>>
>> >>>>>> stuff in there. I'd love to dive into it a bit more and see what
>> >>>>>>>>>
>> >>>>>>>> can
>> >>>
>> >>>> be
>> >>>>>
>> >>>>>> generalized beyond scio. The repl-like interactive graph
>> >>>>>>>>>
>> >>>>>>>> construction
>> >>>>
>> >>>>> is
>> >>>>>
>> >>>>>> very similar to what we've seen with ipython, in that it doesn't
>> >>>>>>>>>
>> >>>>>>>> always
>> >>>>>
>> >>>>>> play nicely with the graph construction / graph execution
>> >>>>>>>>>
>> >>>>>>>> distinction. I
>> >>>>>
>> >>>>>> wonder what changes to Beam might more generally support this. The
>> >>>>>>>>> materialize stuff looks similar to some functionality in
>> FlumeJava
>> >>>>>>>>>
>> >>>>>>>> we
>> >>>>
>> >>>>> used
>> >>>>>>>>> to support multi-segment pipelines with some shared intermediate
>> >>>>>>>>> PCollections.
>> >>>>>>>>>
>> >>>>>>>>> On Thu, Jun 23, 2016 at 9:22 PM, Jean-Baptiste Onofré <
>> >>>>>>>>>
>> >>>>>>>> jb@nanthrax.net>
>> >>>>>
>> >>>>>> wrote:
>> >>>>>>>>>
>> >>>>>>>>> Hi Neville,
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>>> thanks for the update !
>> >>>>>>>>>>
>> >>>>>>>>>> As it's another language support, and to clearly identify the
>> >>>>>>>>>>
>> >>>>>>>>> purpose,
>> >>>>>
>> >>>>>> I
>> >>>>>>>>>> would say sdks/scala.
>> >>>>>>>>>>
>> >>>>>>>>>> Regards
>> >>>>>>>>>> JB
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>> On 06/23/2016 11:56 PM, Neville Li wrote:
>> >>>>>>>>>>
>> >>>>>>>>>> +folks in my team
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>>> On Thu, Jun 23, 2016 at 5:57 PM Neville Li <
>> >>>>>>>>>>>
>> >>>>>>>>>> neville.lyh@gmail.com
>> >>>
>> >>>>
>> >>>>> wrote:
>> >>>>>>>>>>>
>> >>>>>>>>>>> Hi all,
>> >>>>>>>>>>>
>> >>>>>>>>>>>
>> >>>>>>>>>>> I'm the co-author of Scio <https://github.com/spotify/scio>
>> >>>>>>>>>>>>
>> >>>>>>>>>>> and
>> >>>
>> >>>> am
>> >>>>
>> >>>>> in
>> >>>>>>>>>>>> the
>> >>>>>>>>>>>> progress of moving code to Beam (BEAM-302
>> >>>>>>>>>>>> <https://issues.apache.org/jira/browse/BEAM-302>). Just
>> >>>>>>>>>>>>
>> >>>>>>>>>>> wondering
>> >>>>
>> >>>>> if
>> >>>>>
>> >>>>>> sdks/scala is the right place for this code or if something
>> >>>>>>>>>>>>
>> >>>>>>>>>>> like
>> >>>
>> >>>> dsls/scio
>> >>>>>>>>>>>> is a better choice? What do you think?
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> A little background: Scio was built as a high-level Scala API
>> >>>>>>>>>>>>
>> >>>>>>>>>>> for
>> >>>
>> >>>> Google
>> >>>>>>>>>>>> Cloud Dataflow (now also Apache Beam) and is heavily
>> influenced
>> >>>>>>>>>>>>
>> >>>>>>>>>>> by
>> >>>>
>> >>>>> Spark
>> >>>>>>>>>>>> and Scalding. It wraps around the Dataflow/Beam Java SDK
>> while
>> >>>>>>>>>>>>
>> >>>>>>>>>>> also
>> >>>>
>> >>>>> providing features comparable to other Scala data frameworks.
>> >>>>>>>>>>>>
>> >>>>>>>>>>> We
>> >>>
>> >>>> use
>> >>>>>
>> >>>>>> Scio
>> >>>>>>>>>>>> on Dataflow for production extensively inside Spotify.
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> Cheers,
>> >>>>>>>>>>>> Neville
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> --
>> >>>>>>>>>>>>
>> >>>>>>>>>>>
>> >>>>>>>>>>> Jean-Baptiste Onofré
>> >>>>>>>>>> jbonofre@apache.org
>> >>>>>>>>>> http://blog.nanthrax.net
>> >>>>>>>>>> Talend - http://www.talend.com
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>> --
>> >>>>>>>>>
>> >>>>>>>> Jean-Baptiste Onofré
>> >>>>>>>> jbonofre@apache.org
>> >>>>>>>> http://blog.nanthrax.net
>> >>>>>>>> Talend - http://www.talend.com
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>> --
>> >>>>>> Jean-Baptiste Onofré
>> >>>>>> jbonofre@apache.org
>> >>>>>> http://blog.nanthrax.net
>> >>>>>> Talend - http://www.talend.com
>> >>>>>>
>> >>>>>>
>> >>>>>
>> >>>>
>> >>>
>> >>
>> > --
>> > Jean-Baptiste Onofré
>> > jbonofre@apache.org
>> > http://blog.nanthrax.net
>> > Talend - http://www.talend.com
>> >
>>
>
>

Re: Scala DSL

Posted by Jean-Baptiste Onofré <jb...@nanthrax.net>.
+1 for dsls/scio.

Let me know how I can help there !

Thanks
Regards
JB

On 07/01/2016 08:43 PM, Neville Li wrote:
> Looks like dsls/scio is the winner :)
>
> I like it too plus we get to keep the Scio name. This also leaves room for
> other Scala wrappers of different flavor.
> Scio is a DSL in the domain of functional style data pipelines.
>
> On Mon, Jun 27, 2016 at 3:55 AM Isma�l Mej�a <ie...@gmail.com> wrote:
>
>> Just to summarize, at this point:
>>
>> - Everybody agrees about the fact that scio is not an SDK.
>> - Almost everybody agrees that given the current choice they would prefer
>> \u2018dsls/scio\u2019
>> - Some of us are not particularly married with the DSL classification.
>>
>> I have a proposition to make, we can define two concepts with their given
>> structure in the Beam repository:
>>
>> 1. Beam API: A set of abstractions to program the complete Beam Model in a
>> given programming language.
>>
>> These are idiomatic versions of the Beam Model, and ideally should cover
>> the complete Beam Model e.g. scio is one example. The directory structure
>> for Beam APIs could be:
>>
>> apis/scala
>> apis/clojure
>> apis/groovy
>> ...
>>
>> 2. Beam DSL: A domain-specific set of abstractions that run on Beam, e.g.
>> graphs, machine learning, etc
>>
>> These represent domain specific idioms, e.g. a graph DSL would represent
>> graph concepts. e.g. edges, vertex, etc as first citizens. The directory
>> structure for Beam DSLs could be:
>>
>> dsls/graph
>> dsls/ml
>> dsls/cep
>> ...
>>
>> Given these definitions for the concrete scio case I think the most
>> accurate directory would be:
>>
>> apis/scala
>> or
>> apis/scala/scio
>>
>> I personally prefer the first one (apis/scala) because we don\u2019t have any
>> other scala API for the moment and because I think that we shouldn\u2019t have
>> more than one API per language to avoid confusion e.g. imagine that someone
>> creates apis/java/bcollections to represent Beam Pipelines as distributed
>> collections, that would be confusing. However I understand the arguments
>> for the second directory e.g. to support different APIs per language, and
>> to preserve their original names (scio). Anyway I would be ok with any of
>> the two.
>>
>> I excuse myself for this long message, and for not choosing any of the two
>> structures proposed in this thread, but I think it is important to be clear
>> about the differences in scope of both Beam APIs and DSLs in particular if
>> we think about new users.
>>
>> What do you think, do you think my proposition makes sense, any suggestions
>> ?
>>
>> Regards,
>> Isma�l
>>
>> ps. One last thing, I found this text that in part corroborates my feeling
>> about scio been an API and not a DSL:
>>
>> \u201c\u2026 a Scala Dataflow API (a nascent open-source version of which already
>> exists, and which seems likely to flower into maturity in due time given
>> Dataflow's move to join the ASF).\u201d
>> https://cloud.google.com/dataflow/blog/dataflow-beam-and-spark-comparison
>>
>>
>> On Mon, Jun 27, 2016 at 4:52 AM, Raghu Angadi <ra...@google.com.invalid>
>> wrote:
>>
>>> On Fri, Jun 24, 2016 at 7:05 PM, Dan Halperin
>> <dhalperi@google.com.invalid
>>>>
>>> wrote:
>>>
>>>>> I love the
>>>>> name scio. But I think sdks/scala might be most appropriate and would
>>>> make
>>>>> it a first class citizen for Beam.
>>>>>
>>>>
>>>> I am strongly against it being in the 'sdks/' top-level module -- it's
>>> not
>>>> a Beam SDK. Unlike DSL, SDK is a very specific term in Beam.
>>>>
>>>
>>> +1. I agree, it is not Beam SDK in that sense.
>>>
>>> Raghu.
>>>
>>>
>>>>
>>>>> Where would a future python sdk reside?
>>>>>
>>>>
>>>> The Python SDK is in the python-sdk branch on Apache already, and it
>>> lives
>>>> in `sdks/python`. (And it is aiming to become a proper Beam SDK. ;)
>>>
>>
>

-- 
Jean-Baptiste Onofr�
jbonofre@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com

Re: Scala DSL

Posted by Neville Li <ne...@gmail.com>.
Looks like dsls/scio is the winner :)

I like it too plus we get to keep the Scio name. This also leaves room for
other Scala wrappers of different flavor.
Scio is a DSL in the domain of functional style data pipelines.

On Mon, Jun 27, 2016 at 3:55 AM Ismaël Mejía <ie...@gmail.com> wrote:

> Just to summarize, at this point:
>
> - Everybody agrees about the fact that scio is not an SDK.
> - Almost everybody agrees that given the current choice they would prefer
> ‘dsls/scio’
> - Some of us are not particularly married with the DSL classification.
>
> I have a proposition to make, we can define two concepts with their given
> structure in the Beam repository:
>
> 1. Beam API: A set of abstractions to program the complete Beam Model in a
> given programming language.
>
> These are idiomatic versions of the Beam Model, and ideally should cover
> the complete Beam Model e.g. scio is one example. The directory structure
> for Beam APIs could be:
>
> apis/scala
> apis/clojure
> apis/groovy
> ...
>
> 2. Beam DSL: A domain-specific set of abstractions that run on Beam, e.g.
> graphs, machine learning, etc
>
> These represent domain specific idioms, e.g. a graph DSL would represent
> graph concepts. e.g. edges, vertex, etc as first citizens. The directory
> structure for Beam DSLs could be:
>
> dsls/graph
> dsls/ml
> dsls/cep
> ...
>
> Given these definitions for the concrete scio case I think the most
> accurate directory would be:
>
> apis/scala
> or
> apis/scala/scio
>
> I personally prefer the first one (apis/scala) because we don’t have any
> other scala API for the moment and because I think that we shouldn’t have
> more than one API per language to avoid confusion e.g. imagine that someone
> creates apis/java/bcollections to represent Beam Pipelines as distributed
> collections, that would be confusing. However I understand the arguments
> for the second directory e.g. to support different APIs per language, and
> to preserve their original names (scio). Anyway I would be ok with any of
> the two.
>
> I excuse myself for this long message, and for not choosing any of the two
> structures proposed in this thread, but I think it is important to be clear
> about the differences in scope of both Beam APIs and DSLs in particular if
> we think about new users.
>
> What do you think, do you think my proposition makes sense, any suggestions
> ?
>
> Regards,
> Ismaël
>
> ps. One last thing, I found this text that in part corroborates my feeling
> about scio been an API and not a DSL:
>
> “… a Scala Dataflow API (a nascent open-source version of which already
> exists, and which seems likely to flower into maturity in due time given
> Dataflow's move to join the ASF).”
> https://cloud.google.com/dataflow/blog/dataflow-beam-and-spark-comparison
>
>
> On Mon, Jun 27, 2016 at 4:52 AM, Raghu Angadi <ra...@google.com.invalid>
> wrote:
>
> > On Fri, Jun 24, 2016 at 7:05 PM, Dan Halperin
> <dhalperi@google.com.invalid
> > >
> > wrote:
> >
> > > > I love the
> > > > name scio. But I think sdks/scala might be most appropriate and would
> > > make
> > > > it a first class citizen for Beam.
> > > >
> > >
> > > I am strongly against it being in the 'sdks/' top-level module -- it's
> > not
> > > a Beam SDK. Unlike DSL, SDK is a very specific term in Beam.
> > >
> >
> > +1. I agree, it is not Beam SDK in that sense.
> >
> > Raghu.
> >
> >
> > >
> > > > Where would a future python sdk reside?
> > > >
> > >
> > > The Python SDK is in the python-sdk branch on Apache already, and it
> > lives
> > > in `sdks/python`. (And it is aiming to become a proper Beam SDK. ;)
> >
>

Re: Scala DSL

Posted by Ismaël Mejía <ie...@gmail.com>.
Just to summarize, at this point:

- Everybody agrees about the fact that scio is not an SDK.
- Almost everybody agrees that given the current choice they would prefer
‘dsls/scio’
- Some of us are not particularly married with the DSL classification.

I have a proposition to make, we can define two concepts with their given
structure in the Beam repository:

1. Beam API: A set of abstractions to program the complete Beam Model in a
given programming language.

These are idiomatic versions of the Beam Model, and ideally should cover
the complete Beam Model e.g. scio is one example. The directory structure
for Beam APIs could be:

apis/scala
apis/clojure
apis/groovy
...

2. Beam DSL: A domain-specific set of abstractions that run on Beam, e.g.
graphs, machine learning, etc

These represent domain specific idioms, e.g. a graph DSL would represent
graph concepts. e.g. edges, vertex, etc as first citizens. The directory
structure for Beam DSLs could be:

dsls/graph
dsls/ml
dsls/cep
...

Given these definitions for the concrete scio case I think the most
accurate directory would be:

apis/scala
or
apis/scala/scio

I personally prefer the first one (apis/scala) because we don’t have any
other scala API for the moment and because I think that we shouldn’t have
more than one API per language to avoid confusion e.g. imagine that someone
creates apis/java/bcollections to represent Beam Pipelines as distributed
collections, that would be confusing. However I understand the arguments
for the second directory e.g. to support different APIs per language, and
to preserve their original names (scio). Anyway I would be ok with any of
the two.

I excuse myself for this long message, and for not choosing any of the two
structures proposed in this thread, but I think it is important to be clear
about the differences in scope of both Beam APIs and DSLs in particular if
we think about new users.

What do you think, do you think my proposition makes sense, any suggestions
?

Regards,
Ismaël

ps. One last thing, I found this text that in part corroborates my feeling
about scio been an API and not a DSL:

“… a Scala Dataflow API (a nascent open-source version of which already
exists, and which seems likely to flower into maturity in due time given
Dataflow's move to join the ASF).”
https://cloud.google.com/dataflow/blog/dataflow-beam-and-spark-comparison


On Mon, Jun 27, 2016 at 4:52 AM, Raghu Angadi <ra...@google.com.invalid>
wrote:

> On Fri, Jun 24, 2016 at 7:05 PM, Dan Halperin <dhalperi@google.com.invalid
> >
> wrote:
>
> > > I love the
> > > name scio. But I think sdks/scala might be most appropriate and would
> > make
> > > it a first class citizen for Beam.
> > >
> >
> > I am strongly against it being in the 'sdks/' top-level module -- it's
> not
> > a Beam SDK. Unlike DSL, SDK is a very specific term in Beam.
> >
>
> +1. I agree, it is not Beam SDK in that sense.
>
> Raghu.
>
>
> >
> > > Where would a future python sdk reside?
> > >
> >
> > The Python SDK is in the python-sdk branch on Apache already, and it
> lives
> > in `sdks/python`. (And it is aiming to become a proper Beam SDK. ;)
>

Re: Scala DSL

Posted by Raghu Angadi <ra...@google.com.INVALID>.
On Fri, Jun 24, 2016 at 7:05 PM, Dan Halperin <dh...@google.com.invalid>
wrote:

> > I love the
> > name scio. But I think sdks/scala might be most appropriate and would
> make
> > it a first class citizen for Beam.
> >
>
> I am strongly against it being in the 'sdks/' top-level module -- it's not
> a Beam SDK. Unlike DSL, SDK is a very specific term in Beam.
>

+1. I agree, it is not Beam SDK in that sense.

Raghu.


>
> > Where would a future python sdk reside?
> >
>
> The Python SDK is in the python-sdk branch on Apache already, and it lives
> in `sdks/python`. (And it is aiming to become a proper Beam SDK. ;)

Re: Scala DSL

Posted by Dan Halperin <dh...@google.com.INVALID>.
On Fri, Jun 24, 2016 at 2:03 PM, Raghu Angadi <ra...@google.com.invalid>
wrote:

> DSL is a pretty generic term..
>

I agree and am not married to it. Neville?


> The fact that scio uses Java SDK is an implementation detail.


Reasonable, which is why I am also not pushing hard for '/java/scio' to be
in the path.


> I love the
> name scio. But I think sdks/scala might be most appropriate and would make
> it a first class citizen for Beam.
>

I am strongly against it being in the 'sdks/' top-level module -- it's not
a Beam SDK. Unlike DSL, SDK is a very specific term in Beam.


> Where would a future python sdk reside?
>

The Python SDK is in the python-sdk branch on Apache already, and it lives
in `sdks/python`. (And it is aiming to become a proper Beam SDK. ;)

Thanks,
Dan

On Fri, Jun 24, 2016 at 1:50 PM, Jean-Baptiste Onofré <jb...@nanthrax.net>
> wrote:
>
> > Agree for dsls/scio
> >
> > Regards
> > JB
> >
> >
> > On 06/24/2016 10:22 PM, Lukasz Cwik wrote:
> >
> >> +1 for dsls/scio for the already listed reasons
> >>
> >> On Fri, Jun 24, 2016 at 11:21 AM, Rafal Wojdyla <rav@spotify.com.invalid
> >
> >> wrote:
> >>
> >> Hello. When it comes to SDK vs DSL - I fully agree with Frances. About
> >>> dsls/java/scio or dsls/scio - dsls/java/scio may cause confusion, scio
> >>> is a
> >>> scala DSL but lives under java directory (?) - that makes sense only
> once
> >>> you get that scio is using java SDK under the hood. Thus, +1 to
> >>> dsls/scio.
> >>> - Rafal
> >>>
> >>> On Fri, Jun 24, 2016 at 2:01 PM, Kenneth Knowles
> <klk@google.com.invalid
> >>> >
> >>> wrote:
> >>>
> >>> My +1 goes to dsls/scio. It already has a cool name, so let's use it.
> And
> >>>> there might be other Scala-based DSLs.
> >>>>
> >>>> On Fri, Jun 24, 2016 at 8:39 AM, Ismaël Mejía <ie...@gmail.com>
> >>>> wrote:
> >>>>
> >>>> ​Hello everyone,
> >>>>>
> >>>>> Neville, thanks a lot for your contribution. Your work is amazing
> and I
> >>>>>
> >>>> am
> >>>>
> >>>>> really happy that this scala integration is finally happening.
> >>>>> Congratulations to you and your team.
> >>>>>
> >>>>> I *strongly* disagree about the DSL classification for scio for one
> >>>>>
> >>>> reason,
> >>>>
> >>>>> if you go to the root of the term, Domain Specific Languages are
> about
> >>>>>
> >>>> a
> >>>
> >>>> domain, and the domain in this case is writing Beam pipelines, which
> >>>>>
> >>>> is a
> >>>
> >>>> really broad domain.
> >>>>>
> >>>>> I agree with Frances’ argument that scio is not an SDK e.g. it reuses
> >>>>>
> >>>> the
> >>>
> >>>> existing Beam java SDK. My proposition is that scio will be called the
> >>>>> Scala API because in the end this is what it is. I think the
> confusion
> >>>>> comes from the common definition of SDK which is normally an API + a
> >>>>> Runtime. In this case scio will share the runtime with what we call
> the
> >>>>> Beam Java SDK.
> >>>>>
> >>>>> One additional point of using the term API is that it sends the clear
> >>>>> message that Beam has a Scala API too (which is good for visibility
> as
> >>>>>
> >>>> JB
> >>>
> >>>> mentioned).
> >>>>>
> >>>>> Regards,
> >>>>> Ismaël​
> >>>>>
> >>>>>
> >>>>> On Fri, Jun 24, 2016 at 5:08 PM, Jean-Baptiste Onofré <
> jb@nanthrax.net
> >>>>>
> >>>>
> >>>> wrote:
> >>>>>
> >>>>> Hi Dan,
> >>>>>>
> >>>>>> fair enough.
> >>>>>>
> >>>>>> As I'm also working on new DSLs (XML, JSON), I already created the
> >>>>>>
> >>>>> dsls
> >>>
> >>>> module.
> >>>>>>
> >>>>>> So, I would say dsls/scala.
> >>>>>>
> >>>>>> WDYT ?
> >>>>>>
> >>>>>> Regards
> >>>>>> JB
> >>>>>>
> >>>>>>
> >>>>>> On 06/24/2016 05:07 PM, Dan Halperin wrote:
> >>>>>>
> >>>>>> I don't think that sdks/scala is the right place -- scio is not a
> >>>>>>>
> >>>>>> Beam
> >>>
> >>>> Scala SDK; it wraps the existing Java SDK.
> >>>>>>>
> >>>>>>> Some options:
> >>>>>>> * sdks/java/extensions  (Scio builds on the Java SDK) -- mentally
> >>>>>>>
> >>>>>> vetoed
> >>>>
> >>>>> since Scio isn't an extension for the Java SDK, but rather a wrapper
> >>>>>>>
> >>>>>>> * dsls/java/scio (Scio is a Beam DSL that uses the Java SDK)
> >>>>>>> * dsls/scio  (Scio is a Beam DSL that could eventually use multiple
> >>>>>>>
> >>>>>> SDKs)
> >>>>>
> >>>>>> * extensions/java/scio  (Scio is an extension of Beam that uses the
> >>>>>>>
> >>>>>> Java
> >>>>
> >>>>> SDK)
> >>>>>>> * extensions/scio  (Scio is an extension of Beam that is not
> limited
> >>>>>>>
> >>>>>> to
> >>>>
> >>>>> one
> >>>>>>> SDK)
> >>>>>>>
> >>>>>>> I lean towards either dsls/java/scio or extensions/java/scio, since
> >>>>>>>
> >>>>>> I
> >>>
> >>>> don't
> >>>>>>> think there are plans for Scio to handle multiple different SDKs
> (in
> >>>>>>> different languages). The question between these two is whether we
> >>>>>>>
> >>>>>> think
> >>>>
> >>>>> DSLs are "big enough" to be a top level concept.
> >>>>>>>
> >>>>>>> On Thu, Jun 23, 2016 at 11:05 PM, Jean-Baptiste Onofré <
> >>>>>>>
> >>>>>> jb@nanthrax.net
> >>>>
> >>>>>
> >>>>>> wrote:
> >>>>>>>
> >>>>>>> Good point about new Fn and the fact it's based on the Java SDK.
> >>>>>>>
> >>>>>>>>
> >>>>>>>> It's just that in term of "marketing", it's a good message to
> >>>>>>>>
> >>>>>>> provide a
> >>>>
> >>>>> Scala SDK even if technically it's more a DSL.
> >>>>>>>>
> >>>>>>>> For instance, a valid "marketing" DSL would be a Java fluent DSL
> on
> >>>>>>>>
> >>>>>>> top
> >>>>
> >>>>> of
> >>>>>>>> the Java SDK, or a declarative XML DSL.
> >>>>>>>>
> >>>>>>>> However, from a technical perspective, it can go into dsl module.
> >>>>>>>>
> >>>>>>>> My $0.02 ;)
> >>>>>>>>
> >>>>>>>> Regards
> >>>>>>>> JB
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On 06/24/2016 06:51 AM, Frances Perry wrote:
> >>>>>>>>
> >>>>>>>> +Rafal & Andrew again
> >>>>>>>>
> >>>>>>>>>
> >>>>>>>>> I am leaning DSL for two reasons: (1) scio uses the existing java
> >>>>>>>>> execution
> >>>>>>>>> environment (and won't have a language-specific fn harness of its
> >>>>>>>>>
> >>>>>>>> own),
> >>>>>
> >>>>>> and
> >>>>>>>>> (2) it changes the abstractions that users interact with.
> >>>>>>>>>
> >>>>>>>>> I recently saw a scio repl demo from Reuven -- there's some
> really
> >>>>>>>>>
> >>>>>>>> cool
> >>>>>
> >>>>>> stuff in there. I'd love to dive into it a bit more and see what
> >>>>>>>>>
> >>>>>>>> can
> >>>
> >>>> be
> >>>>>
> >>>>>> generalized beyond scio. The repl-like interactive graph
> >>>>>>>>>
> >>>>>>>> construction
> >>>>
> >>>>> is
> >>>>>
> >>>>>> very similar to what we've seen with ipython, in that it doesn't
> >>>>>>>>>
> >>>>>>>> always
> >>>>>
> >>>>>> play nicely with the graph construction / graph execution
> >>>>>>>>>
> >>>>>>>> distinction. I
> >>>>>
> >>>>>> wonder what changes to Beam might more generally support this. The
> >>>>>>>>> materialize stuff looks similar to some functionality in
> FlumeJava
> >>>>>>>>>
> >>>>>>>> we
> >>>>
> >>>>> used
> >>>>>>>>> to support multi-segment pipelines with some shared intermediate
> >>>>>>>>> PCollections.
> >>>>>>>>>
> >>>>>>>>> On Thu, Jun 23, 2016 at 9:22 PM, Jean-Baptiste Onofré <
> >>>>>>>>>
> >>>>>>>> jb@nanthrax.net>
> >>>>>
> >>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>> Hi Neville,
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>> thanks for the update !
> >>>>>>>>>>
> >>>>>>>>>> As it's another language support, and to clearly identify the
> >>>>>>>>>>
> >>>>>>>>> purpose,
> >>>>>
> >>>>>> I
> >>>>>>>>>> would say sdks/scala.
> >>>>>>>>>>
> >>>>>>>>>> Regards
> >>>>>>>>>> JB
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> On 06/23/2016 11:56 PM, Neville Li wrote:
> >>>>>>>>>>
> >>>>>>>>>> +folks in my team
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>> On Thu, Jun 23, 2016 at 5:57 PM Neville Li <
> >>>>>>>>>>>
> >>>>>>>>>> neville.lyh@gmail.com
> >>>
> >>>>
> >>>>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>> Hi all,
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> I'm the co-author of Scio <https://github.com/spotify/scio>
> >>>>>>>>>>>>
> >>>>>>>>>>> and
> >>>
> >>>> am
> >>>>
> >>>>> in
> >>>>>>>>>>>> the
> >>>>>>>>>>>> progress of moving code to Beam (BEAM-302
> >>>>>>>>>>>> <https://issues.apache.org/jira/browse/BEAM-302>). Just
> >>>>>>>>>>>>
> >>>>>>>>>>> wondering
> >>>>
> >>>>> if
> >>>>>
> >>>>>> sdks/scala is the right place for this code or if something
> >>>>>>>>>>>>
> >>>>>>>>>>> like
> >>>
> >>>> dsls/scio
> >>>>>>>>>>>> is a better choice? What do you think?
> >>>>>>>>>>>>
> >>>>>>>>>>>> A little background: Scio was built as a high-level Scala API
> >>>>>>>>>>>>
> >>>>>>>>>>> for
> >>>
> >>>> Google
> >>>>>>>>>>>> Cloud Dataflow (now also Apache Beam) and is heavily
> influenced
> >>>>>>>>>>>>
> >>>>>>>>>>> by
> >>>>
> >>>>> Spark
> >>>>>>>>>>>> and Scalding. It wraps around the Dataflow/Beam Java SDK while
> >>>>>>>>>>>>
> >>>>>>>>>>> also
> >>>>
> >>>>> providing features comparable to other Scala data frameworks.
> >>>>>>>>>>>>
> >>>>>>>>>>> We
> >>>
> >>>> use
> >>>>>
> >>>>>> Scio
> >>>>>>>>>>>> on Dataflow for production extensively inside Spotify.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Cheers,
> >>>>>>>>>>>> Neville
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> --
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> Jean-Baptiste Onofré
> >>>>>>>>>> jbonofre@apache.org
> >>>>>>>>>> http://blog.nanthrax.net
> >>>>>>>>>> Talend - http://www.talend.com
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> --
> >>>>>>>>>
> >>>>>>>> Jean-Baptiste Onofré
> >>>>>>>> jbonofre@apache.org
> >>>>>>>> http://blog.nanthrax.net
> >>>>>>>> Talend - http://www.talend.com
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>> --
> >>>>>> Jean-Baptiste Onofré
> >>>>>> jbonofre@apache.org
> >>>>>> http://blog.nanthrax.net
> >>>>>> Talend - http://www.talend.com
> >>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
> > --
> > Jean-Baptiste Onofré
> > jbonofre@apache.org
> > http://blog.nanthrax.net
> > Talend - http://www.talend.com
> >
>

Re: Scala DSL

Posted by Raghu Angadi <ra...@google.com.INVALID>.
DSL is a pretty generic term..

The fact that scio uses Java SDK is an implementation detail. I love the
name scio. But I think sdks/scala might be most appropriate and would make
it a first class citizen for Beam.

Where would a future python sdk reside?

On Fri, Jun 24, 2016 at 1:50 PM, Jean-Baptiste Onofré <jb...@nanthrax.net>
wrote:

> Agree for dsls/scio
>
> Regards
> JB
>
>
> On 06/24/2016 10:22 PM, Lukasz Cwik wrote:
>
>> +1 for dsls/scio for the already listed reasons
>>
>> On Fri, Jun 24, 2016 at 11:21 AM, Rafal Wojdyla <ra...@spotify.com.invalid>
>> wrote:
>>
>> Hello. When it comes to SDK vs DSL - I fully agree with Frances. About
>>> dsls/java/scio or dsls/scio - dsls/java/scio may cause confusion, scio
>>> is a
>>> scala DSL but lives under java directory (?) - that makes sense only once
>>> you get that scio is using java SDK under the hood. Thus, +1 to
>>> dsls/scio.
>>> - Rafal
>>>
>>> On Fri, Jun 24, 2016 at 2:01 PM, Kenneth Knowles <klk@google.com.invalid
>>> >
>>> wrote:
>>>
>>> My +1 goes to dsls/scio. It already has a cool name, so let's use it. And
>>>> there might be other Scala-based DSLs.
>>>>
>>>> On Fri, Jun 24, 2016 at 8:39 AM, Ismaël Mejía <ie...@gmail.com>
>>>> wrote:
>>>>
>>>> ​Hello everyone,
>>>>>
>>>>> Neville, thanks a lot for your contribution. Your work is amazing and I
>>>>>
>>>> am
>>>>
>>>>> really happy that this scala integration is finally happening.
>>>>> Congratulations to you and your team.
>>>>>
>>>>> I *strongly* disagree about the DSL classification for scio for one
>>>>>
>>>> reason,
>>>>
>>>>> if you go to the root of the term, Domain Specific Languages are about
>>>>>
>>>> a
>>>
>>>> domain, and the domain in this case is writing Beam pipelines, which
>>>>>
>>>> is a
>>>
>>>> really broad domain.
>>>>>
>>>>> I agree with Frances’ argument that scio is not an SDK e.g. it reuses
>>>>>
>>>> the
>>>
>>>> existing Beam java SDK. My proposition is that scio will be called the
>>>>> Scala API because in the end this is what it is. I think the confusion
>>>>> comes from the common definition of SDK which is normally an API + a
>>>>> Runtime. In this case scio will share the runtime with what we call the
>>>>> Beam Java SDK.
>>>>>
>>>>> One additional point of using the term API is that it sends the clear
>>>>> message that Beam has a Scala API too (which is good for visibility as
>>>>>
>>>> JB
>>>
>>>> mentioned).
>>>>>
>>>>> Regards,
>>>>> Ismaël​
>>>>>
>>>>>
>>>>> On Fri, Jun 24, 2016 at 5:08 PM, Jean-Baptiste Onofré <jb@nanthrax.net
>>>>>
>>>>
>>>> wrote:
>>>>>
>>>>> Hi Dan,
>>>>>>
>>>>>> fair enough.
>>>>>>
>>>>>> As I'm also working on new DSLs (XML, JSON), I already created the
>>>>>>
>>>>> dsls
>>>
>>>> module.
>>>>>>
>>>>>> So, I would say dsls/scala.
>>>>>>
>>>>>> WDYT ?
>>>>>>
>>>>>> Regards
>>>>>> JB
>>>>>>
>>>>>>
>>>>>> On 06/24/2016 05:07 PM, Dan Halperin wrote:
>>>>>>
>>>>>> I don't think that sdks/scala is the right place -- scio is not a
>>>>>>>
>>>>>> Beam
>>>
>>>> Scala SDK; it wraps the existing Java SDK.
>>>>>>>
>>>>>>> Some options:
>>>>>>> * sdks/java/extensions  (Scio builds on the Java SDK) -- mentally
>>>>>>>
>>>>>> vetoed
>>>>
>>>>> since Scio isn't an extension for the Java SDK, but rather a wrapper
>>>>>>>
>>>>>>> * dsls/java/scio (Scio is a Beam DSL that uses the Java SDK)
>>>>>>> * dsls/scio  (Scio is a Beam DSL that could eventually use multiple
>>>>>>>
>>>>>> SDKs)
>>>>>
>>>>>> * extensions/java/scio  (Scio is an extension of Beam that uses the
>>>>>>>
>>>>>> Java
>>>>
>>>>> SDK)
>>>>>>> * extensions/scio  (Scio is an extension of Beam that is not limited
>>>>>>>
>>>>>> to
>>>>
>>>>> one
>>>>>>> SDK)
>>>>>>>
>>>>>>> I lean towards either dsls/java/scio or extensions/java/scio, since
>>>>>>>
>>>>>> I
>>>
>>>> don't
>>>>>>> think there are plans for Scio to handle multiple different SDKs (in
>>>>>>> different languages). The question between these two is whether we
>>>>>>>
>>>>>> think
>>>>
>>>>> DSLs are "big enough" to be a top level concept.
>>>>>>>
>>>>>>> On Thu, Jun 23, 2016 at 11:05 PM, Jean-Baptiste Onofré <
>>>>>>>
>>>>>> jb@nanthrax.net
>>>>
>>>>>
>>>>>> wrote:
>>>>>>>
>>>>>>> Good point about new Fn and the fact it's based on the Java SDK.
>>>>>>>
>>>>>>>>
>>>>>>>> It's just that in term of "marketing", it's a good message to
>>>>>>>>
>>>>>>> provide a
>>>>
>>>>> Scala SDK even if technically it's more a DSL.
>>>>>>>>
>>>>>>>> For instance, a valid "marketing" DSL would be a Java fluent DSL on
>>>>>>>>
>>>>>>> top
>>>>
>>>>> of
>>>>>>>> the Java SDK, or a declarative XML DSL.
>>>>>>>>
>>>>>>>> However, from a technical perspective, it can go into dsl module.
>>>>>>>>
>>>>>>>> My $0.02 ;)
>>>>>>>>
>>>>>>>> Regards
>>>>>>>> JB
>>>>>>>>
>>>>>>>>
>>>>>>>> On 06/24/2016 06:51 AM, Frances Perry wrote:
>>>>>>>>
>>>>>>>> +Rafal & Andrew again
>>>>>>>>
>>>>>>>>>
>>>>>>>>> I am leaning DSL for two reasons: (1) scio uses the existing java
>>>>>>>>> execution
>>>>>>>>> environment (and won't have a language-specific fn harness of its
>>>>>>>>>
>>>>>>>> own),
>>>>>
>>>>>> and
>>>>>>>>> (2) it changes the abstractions that users interact with.
>>>>>>>>>
>>>>>>>>> I recently saw a scio repl demo from Reuven -- there's some really
>>>>>>>>>
>>>>>>>> cool
>>>>>
>>>>>> stuff in there. I'd love to dive into it a bit more and see what
>>>>>>>>>
>>>>>>>> can
>>>
>>>> be
>>>>>
>>>>>> generalized beyond scio. The repl-like interactive graph
>>>>>>>>>
>>>>>>>> construction
>>>>
>>>>> is
>>>>>
>>>>>> very similar to what we've seen with ipython, in that it doesn't
>>>>>>>>>
>>>>>>>> always
>>>>>
>>>>>> play nicely with the graph construction / graph execution
>>>>>>>>>
>>>>>>>> distinction. I
>>>>>
>>>>>> wonder what changes to Beam might more generally support this. The
>>>>>>>>> materialize stuff looks similar to some functionality in FlumeJava
>>>>>>>>>
>>>>>>>> we
>>>>
>>>>> used
>>>>>>>>> to support multi-segment pipelines with some shared intermediate
>>>>>>>>> PCollections.
>>>>>>>>>
>>>>>>>>> On Thu, Jun 23, 2016 at 9:22 PM, Jean-Baptiste Onofré <
>>>>>>>>>
>>>>>>>> jb@nanthrax.net>
>>>>>
>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> Hi Neville,
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> thanks for the update !
>>>>>>>>>>
>>>>>>>>>> As it's another language support, and to clearly identify the
>>>>>>>>>>
>>>>>>>>> purpose,
>>>>>
>>>>>> I
>>>>>>>>>> would say sdks/scala.
>>>>>>>>>>
>>>>>>>>>> Regards
>>>>>>>>>> JB
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 06/23/2016 11:56 PM, Neville Li wrote:
>>>>>>>>>>
>>>>>>>>>> +folks in my team
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> On Thu, Jun 23, 2016 at 5:57 PM Neville Li <
>>>>>>>>>>>
>>>>>>>>>> neville.lyh@gmail.com
>>>
>>>>
>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>> Hi all,
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> I'm the co-author of Scio <https://github.com/spotify/scio>
>>>>>>>>>>>>
>>>>>>>>>>> and
>>>
>>>> am
>>>>
>>>>> in
>>>>>>>>>>>> the
>>>>>>>>>>>> progress of moving code to Beam (BEAM-302
>>>>>>>>>>>> <https://issues.apache.org/jira/browse/BEAM-302>). Just
>>>>>>>>>>>>
>>>>>>>>>>> wondering
>>>>
>>>>> if
>>>>>
>>>>>> sdks/scala is the right place for this code or if something
>>>>>>>>>>>>
>>>>>>>>>>> like
>>>
>>>> dsls/scio
>>>>>>>>>>>> is a better choice? What do you think?
>>>>>>>>>>>>
>>>>>>>>>>>> A little background: Scio was built as a high-level Scala API
>>>>>>>>>>>>
>>>>>>>>>>> for
>>>
>>>> Google
>>>>>>>>>>>> Cloud Dataflow (now also Apache Beam) and is heavily influenced
>>>>>>>>>>>>
>>>>>>>>>>> by
>>>>
>>>>> Spark
>>>>>>>>>>>> and Scalding. It wraps around the Dataflow/Beam Java SDK while
>>>>>>>>>>>>
>>>>>>>>>>> also
>>>>
>>>>> providing features comparable to other Scala data frameworks.
>>>>>>>>>>>>
>>>>>>>>>>> We
>>>
>>>> use
>>>>>
>>>>>> Scio
>>>>>>>>>>>> on Dataflow for production extensively inside Spotify.
>>>>>>>>>>>>
>>>>>>>>>>>> Cheers,
>>>>>>>>>>>> Neville
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Jean-Baptiste Onofré
>>>>>>>>>> jbonofre@apache.org
>>>>>>>>>> http://blog.nanthrax.net
>>>>>>>>>> Talend - http://www.talend.com
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>
>>>>>>>> Jean-Baptiste Onofré
>>>>>>>> jbonofre@apache.org
>>>>>>>> http://blog.nanthrax.net
>>>>>>>> Talend - http://www.talend.com
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>> --
>>>>>> Jean-Baptiste Onofré
>>>>>> jbonofre@apache.org
>>>>>> http://blog.nanthrax.net
>>>>>> Talend - http://www.talend.com
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
> --
> Jean-Baptiste Onofré
> jbonofre@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>

Re: Scala DSL

Posted by Jean-Baptiste Onofré <jb...@nanthrax.net>.
Agree for dsls/scio

Regards
JB

On 06/24/2016 10:22 PM, Lukasz Cwik wrote:
> +1 for dsls/scio for the already listed reasons
>
> On Fri, Jun 24, 2016 at 11:21 AM, Rafal Wojdyla <ra...@spotify.com.invalid>
> wrote:
>
>> Hello. When it comes to SDK vs DSL - I fully agree with Frances. About
>> dsls/java/scio or dsls/scio - dsls/java/scio may cause confusion, scio is a
>> scala DSL but lives under java directory (?) - that makes sense only once
>> you get that scio is using java SDK under the hood. Thus, +1 to dsls/scio.
>> - Rafal
>>
>> On Fri, Jun 24, 2016 at 2:01 PM, Kenneth Knowles <kl...@google.com.invalid>
>> wrote:
>>
>>> My +1 goes to dsls/scio. It already has a cool name, so let's use it. And
>>> there might be other Scala-based DSLs.
>>>
>>> On Fri, Jun 24, 2016 at 8:39 AM, Isma�l Mej�a <ie...@gmail.com> wrote:
>>>
>>>> \u200bHello everyone,
>>>>
>>>> Neville, thanks a lot for your contribution. Your work is amazing and I
>>> am
>>>> really happy that this scala integration is finally happening.
>>>> Congratulations to you and your team.
>>>>
>>>> I *strongly* disagree about the DSL classification for scio for one
>>> reason,
>>>> if you go to the root of the term, Domain Specific Languages are about
>> a
>>>> domain, and the domain in this case is writing Beam pipelines, which
>> is a
>>>> really broad domain.
>>>>
>>>> I agree with Frances\u2019 argument that scio is not an SDK e.g. it reuses
>> the
>>>> existing Beam java SDK. My proposition is that scio will be called the
>>>> Scala API because in the end this is what it is. I think the confusion
>>>> comes from the common definition of SDK which is normally an API + a
>>>> Runtime. In this case scio will share the runtime with what we call the
>>>> Beam Java SDK.
>>>>
>>>> One additional point of using the term API is that it sends the clear
>>>> message that Beam has a Scala API too (which is good for visibility as
>> JB
>>>> mentioned).
>>>>
>>>> Regards,
>>>> Isma�l\u200b
>>>>
>>>>
>>>> On Fri, Jun 24, 2016 at 5:08 PM, Jean-Baptiste Onofr� <jb@nanthrax.net
>>>
>>>> wrote:
>>>>
>>>>> Hi Dan,
>>>>>
>>>>> fair enough.
>>>>>
>>>>> As I'm also working on new DSLs (XML, JSON), I already created the
>> dsls
>>>>> module.
>>>>>
>>>>> So, I would say dsls/scala.
>>>>>
>>>>> WDYT ?
>>>>>
>>>>> Regards
>>>>> JB
>>>>>
>>>>>
>>>>> On 06/24/2016 05:07 PM, Dan Halperin wrote:
>>>>>
>>>>>> I don't think that sdks/scala is the right place -- scio is not a
>> Beam
>>>>>> Scala SDK; it wraps the existing Java SDK.
>>>>>>
>>>>>> Some options:
>>>>>> * sdks/java/extensions  (Scio builds on the Java SDK) -- mentally
>>> vetoed
>>>>>> since Scio isn't an extension for the Java SDK, but rather a wrapper
>>>>>>
>>>>>> * dsls/java/scio (Scio is a Beam DSL that uses the Java SDK)
>>>>>> * dsls/scio  (Scio is a Beam DSL that could eventually use multiple
>>>> SDKs)
>>>>>> * extensions/java/scio  (Scio is an extension of Beam that uses the
>>> Java
>>>>>> SDK)
>>>>>> * extensions/scio  (Scio is an extension of Beam that is not limited
>>> to
>>>>>> one
>>>>>> SDK)
>>>>>>
>>>>>> I lean towards either dsls/java/scio or extensions/java/scio, since
>> I
>>>>>> don't
>>>>>> think there are plans for Scio to handle multiple different SDKs (in
>>>>>> different languages). The question between these two is whether we
>>> think
>>>>>> DSLs are "big enough" to be a top level concept.
>>>>>>
>>>>>> On Thu, Jun 23, 2016 at 11:05 PM, Jean-Baptiste Onofr� <
>>> jb@nanthrax.net
>>>>>
>>>>>> wrote:
>>>>>>
>>>>>> Good point about new Fn and the fact it's based on the Java SDK.
>>>>>>>
>>>>>>> It's just that in term of "marketing", it's a good message to
>>> provide a
>>>>>>> Scala SDK even if technically it's more a DSL.
>>>>>>>
>>>>>>> For instance, a valid "marketing" DSL would be a Java fluent DSL on
>>> top
>>>>>>> of
>>>>>>> the Java SDK, or a declarative XML DSL.
>>>>>>>
>>>>>>> However, from a technical perspective, it can go into dsl module.
>>>>>>>
>>>>>>> My $0.02 ;)
>>>>>>>
>>>>>>> Regards
>>>>>>> JB
>>>>>>>
>>>>>>>
>>>>>>> On 06/24/2016 06:51 AM, Frances Perry wrote:
>>>>>>>
>>>>>>> +Rafal & Andrew again
>>>>>>>>
>>>>>>>> I am leaning DSL for two reasons: (1) scio uses the existing java
>>>>>>>> execution
>>>>>>>> environment (and won't have a language-specific fn harness of its
>>>> own),
>>>>>>>> and
>>>>>>>> (2) it changes the abstractions that users interact with.
>>>>>>>>
>>>>>>>> I recently saw a scio repl demo from Reuven -- there's some really
>>>> cool
>>>>>>>> stuff in there. I'd love to dive into it a bit more and see what
>> can
>>>> be
>>>>>>>> generalized beyond scio. The repl-like interactive graph
>>> construction
>>>> is
>>>>>>>> very similar to what we've seen with ipython, in that it doesn't
>>>> always
>>>>>>>> play nicely with the graph construction / graph execution
>>>> distinction. I
>>>>>>>> wonder what changes to Beam might more generally support this. The
>>>>>>>> materialize stuff looks similar to some functionality in FlumeJava
>>> we
>>>>>>>> used
>>>>>>>> to support multi-segment pipelines with some shared intermediate
>>>>>>>> PCollections.
>>>>>>>>
>>>>>>>> On Thu, Jun 23, 2016 at 9:22 PM, Jean-Baptiste Onofr� <
>>>> jb@nanthrax.net>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>> Hi Neville,
>>>>>>>>
>>>>>>>>>
>>>>>>>>> thanks for the update !
>>>>>>>>>
>>>>>>>>> As it's another language support, and to clearly identify the
>>>> purpose,
>>>>>>>>> I
>>>>>>>>> would say sdks/scala.
>>>>>>>>>
>>>>>>>>> Regards
>>>>>>>>> JB
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 06/23/2016 11:56 PM, Neville Li wrote:
>>>>>>>>>
>>>>>>>>> +folks in my team
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Thu, Jun 23, 2016 at 5:57 PM Neville Li <
>> neville.lyh@gmail.com
>>>>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>> Hi all,
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> I'm the co-author of Scio <https://github.com/spotify/scio>
>> and
>>> am
>>>>>>>>>>> in
>>>>>>>>>>> the
>>>>>>>>>>> progress of moving code to Beam (BEAM-302
>>>>>>>>>>> <https://issues.apache.org/jira/browse/BEAM-302>). Just
>>> wondering
>>>> if
>>>>>>>>>>> sdks/scala is the right place for this code or if something
>> like
>>>>>>>>>>> dsls/scio
>>>>>>>>>>> is a better choice? What do you think?
>>>>>>>>>>>
>>>>>>>>>>> A little background: Scio was built as a high-level Scala API
>> for
>>>>>>>>>>> Google
>>>>>>>>>>> Cloud Dataflow (now also Apache Beam) and is heavily influenced
>>> by
>>>>>>>>>>> Spark
>>>>>>>>>>> and Scalding. It wraps around the Dataflow/Beam Java SDK while
>>> also
>>>>>>>>>>> providing features comparable to other Scala data frameworks.
>> We
>>>> use
>>>>>>>>>>> Scio
>>>>>>>>>>> on Dataflow for production extensively inside Spotify.
>>>>>>>>>>>
>>>>>>>>>>> Cheers,
>>>>>>>>>>> Neville
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>
>>>>>>>>> Jean-Baptiste Onofr�
>>>>>>>>> jbonofre@apache.org
>>>>>>>>> http://blog.nanthrax.net
>>>>>>>>> Talend - http://www.talend.com
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>> --
>>>>>>> Jean-Baptiste Onofr�
>>>>>>> jbonofre@apache.org
>>>>>>> http://blog.nanthrax.net
>>>>>>> Talend - http://www.talend.com
>>>>>>>
>>>>>>>
>>>>>>
>>>>> --
>>>>> Jean-Baptiste Onofr�
>>>>> jbonofre@apache.org
>>>>> http://blog.nanthrax.net
>>>>> Talend - http://www.talend.com
>>>>>
>>>>
>>>
>>
>

-- 
Jean-Baptiste Onofr�
jbonofre@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com

Re: Scala DSL

Posted by Lukasz Cwik <lc...@google.com.INVALID>.
+1 for dsls/scio for the already listed reasons

On Fri, Jun 24, 2016 at 11:21 AM, Rafal Wojdyla <ra...@spotify.com.invalid>
wrote:

> Hello. When it comes to SDK vs DSL - I fully agree with Frances. About
> dsls/java/scio or dsls/scio - dsls/java/scio may cause confusion, scio is a
> scala DSL but lives under java directory (?) - that makes sense only once
> you get that scio is using java SDK under the hood. Thus, +1 to dsls/scio.
> - Rafal
>
> On Fri, Jun 24, 2016 at 2:01 PM, Kenneth Knowles <kl...@google.com.invalid>
> wrote:
>
> > My +1 goes to dsls/scio. It already has a cool name, so let's use it. And
> > there might be other Scala-based DSLs.
> >
> > On Fri, Jun 24, 2016 at 8:39 AM, Ismaël Mejía <ie...@gmail.com> wrote:
> >
> > > ​Hello everyone,
> > >
> > > Neville, thanks a lot for your contribution. Your work is amazing and I
> > am
> > > really happy that this scala integration is finally happening.
> > > Congratulations to you and your team.
> > >
> > > I *strongly* disagree about the DSL classification for scio for one
> > reason,
> > > if you go to the root of the term, Domain Specific Languages are about
> a
> > > domain, and the domain in this case is writing Beam pipelines, which
> is a
> > > really broad domain.
> > >
> > > I agree with Frances’ argument that scio is not an SDK e.g. it reuses
> the
> > > existing Beam java SDK. My proposition is that scio will be called the
> > > Scala API because in the end this is what it is. I think the confusion
> > > comes from the common definition of SDK which is normally an API + a
> > > Runtime. In this case scio will share the runtime with what we call the
> > > Beam Java SDK.
> > >
> > > One additional point of using the term API is that it sends the clear
> > > message that Beam has a Scala API too (which is good for visibility as
> JB
> > > mentioned).
> > >
> > > Regards,
> > > Ismaël​
> > >
> > >
> > > On Fri, Jun 24, 2016 at 5:08 PM, Jean-Baptiste Onofré <jb@nanthrax.net
> >
> > > wrote:
> > >
> > > > Hi Dan,
> > > >
> > > > fair enough.
> > > >
> > > > As I'm also working on new DSLs (XML, JSON), I already created the
> dsls
> > > > module.
> > > >
> > > > So, I would say dsls/scala.
> > > >
> > > > WDYT ?
> > > >
> > > > Regards
> > > > JB
> > > >
> > > >
> > > > On 06/24/2016 05:07 PM, Dan Halperin wrote:
> > > >
> > > >> I don't think that sdks/scala is the right place -- scio is not a
> Beam
> > > >> Scala SDK; it wraps the existing Java SDK.
> > > >>
> > > >> Some options:
> > > >> * sdks/java/extensions  (Scio builds on the Java SDK) -- mentally
> > vetoed
> > > >> since Scio isn't an extension for the Java SDK, but rather a wrapper
> > > >>
> > > >> * dsls/java/scio (Scio is a Beam DSL that uses the Java SDK)
> > > >> * dsls/scio  (Scio is a Beam DSL that could eventually use multiple
> > > SDKs)
> > > >> * extensions/java/scio  (Scio is an extension of Beam that uses the
> > Java
> > > >> SDK)
> > > >> * extensions/scio  (Scio is an extension of Beam that is not limited
> > to
> > > >> one
> > > >> SDK)
> > > >>
> > > >> I lean towards either dsls/java/scio or extensions/java/scio, since
> I
> > > >> don't
> > > >> think there are plans for Scio to handle multiple different SDKs (in
> > > >> different languages). The question between these two is whether we
> > think
> > > >> DSLs are "big enough" to be a top level concept.
> > > >>
> > > >> On Thu, Jun 23, 2016 at 11:05 PM, Jean-Baptiste Onofré <
> > jb@nanthrax.net
> > > >
> > > >> wrote:
> > > >>
> > > >> Good point about new Fn and the fact it's based on the Java SDK.
> > > >>>
> > > >>> It's just that in term of "marketing", it's a good message to
> > provide a
> > > >>> Scala SDK even if technically it's more a DSL.
> > > >>>
> > > >>> For instance, a valid "marketing" DSL would be a Java fluent DSL on
> > top
> > > >>> of
> > > >>> the Java SDK, or a declarative XML DSL.
> > > >>>
> > > >>> However, from a technical perspective, it can go into dsl module.
> > > >>>
> > > >>> My $0.02 ;)
> > > >>>
> > > >>> Regards
> > > >>> JB
> > > >>>
> > > >>>
> > > >>> On 06/24/2016 06:51 AM, Frances Perry wrote:
> > > >>>
> > > >>> +Rafal & Andrew again
> > > >>>>
> > > >>>> I am leaning DSL for two reasons: (1) scio uses the existing java
> > > >>>> execution
> > > >>>> environment (and won't have a language-specific fn harness of its
> > > own),
> > > >>>> and
> > > >>>> (2) it changes the abstractions that users interact with.
> > > >>>>
> > > >>>> I recently saw a scio repl demo from Reuven -- there's some really
> > > cool
> > > >>>> stuff in there. I'd love to dive into it a bit more and see what
> can
> > > be
> > > >>>> generalized beyond scio. The repl-like interactive graph
> > construction
> > > is
> > > >>>> very similar to what we've seen with ipython, in that it doesn't
> > > always
> > > >>>> play nicely with the graph construction / graph execution
> > > distinction. I
> > > >>>> wonder what changes to Beam might more generally support this. The
> > > >>>> materialize stuff looks similar to some functionality in FlumeJava
> > we
> > > >>>> used
> > > >>>> to support multi-segment pipelines with some shared intermediate
> > > >>>> PCollections.
> > > >>>>
> > > >>>> On Thu, Jun 23, 2016 at 9:22 PM, Jean-Baptiste Onofré <
> > > jb@nanthrax.net>
> > > >>>> wrote:
> > > >>>>
> > > >>>> Hi Neville,
> > > >>>>
> > > >>>>>
> > > >>>>> thanks for the update !
> > > >>>>>
> > > >>>>> As it's another language support, and to clearly identify the
> > > purpose,
> > > >>>>> I
> > > >>>>> would say sdks/scala.
> > > >>>>>
> > > >>>>> Regards
> > > >>>>> JB
> > > >>>>>
> > > >>>>>
> > > >>>>> On 06/23/2016 11:56 PM, Neville Li wrote:
> > > >>>>>
> > > >>>>> +folks in my team
> > > >>>>>
> > > >>>>>>
> > > >>>>>> On Thu, Jun 23, 2016 at 5:57 PM Neville Li <
> neville.lyh@gmail.com
> > >
> > > >>>>>> wrote:
> > > >>>>>>
> > > >>>>>> Hi all,
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>> I'm the co-author of Scio <https://github.com/spotify/scio>
> and
> > am
> > > >>>>>>> in
> > > >>>>>>> the
> > > >>>>>>> progress of moving code to Beam (BEAM-302
> > > >>>>>>> <https://issues.apache.org/jira/browse/BEAM-302>). Just
> > wondering
> > > if
> > > >>>>>>> sdks/scala is the right place for this code or if something
> like
> > > >>>>>>> dsls/scio
> > > >>>>>>> is a better choice? What do you think?
> > > >>>>>>>
> > > >>>>>>> A little background: Scio was built as a high-level Scala API
> for
> > > >>>>>>> Google
> > > >>>>>>> Cloud Dataflow (now also Apache Beam) and is heavily influenced
> > by
> > > >>>>>>> Spark
> > > >>>>>>> and Scalding. It wraps around the Dataflow/Beam Java SDK while
> > also
> > > >>>>>>> providing features comparable to other Scala data frameworks.
> We
> > > use
> > > >>>>>>> Scio
> > > >>>>>>> on Dataflow for production extensively inside Spotify.
> > > >>>>>>>
> > > >>>>>>> Cheers,
> > > >>>>>>> Neville
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>> --
> > > >>>>>>
> > > >>>>> Jean-Baptiste Onofré
> > > >>>>> jbonofre@apache.org
> > > >>>>> http://blog.nanthrax.net
> > > >>>>> Talend - http://www.talend.com
> > > >>>>>
> > > >>>>>
> > > >>>>>
> > > >>>> --
> > > >>> Jean-Baptiste Onofré
> > > >>> jbonofre@apache.org
> > > >>> http://blog.nanthrax.net
> > > >>> Talend - http://www.talend.com
> > > >>>
> > > >>>
> > > >>
> > > > --
> > > > Jean-Baptiste Onofré
> > > > jbonofre@apache.org
> > > > http://blog.nanthrax.net
> > > > Talend - http://www.talend.com
> > > >
> > >
> >
>

Re: Scala DSL

Posted by Rafal Wojdyla <ra...@spotify.com.INVALID>.
Hello. When it comes to SDK vs DSL - I fully agree with Frances. About
dsls/java/scio or dsls/scio - dsls/java/scio may cause confusion, scio is a
scala DSL but lives under java directory (?) - that makes sense only once
you get that scio is using java SDK under the hood. Thus, +1 to dsls/scio.
- Rafal

On Fri, Jun 24, 2016 at 2:01 PM, Kenneth Knowles <kl...@google.com.invalid>
wrote:

> My +1 goes to dsls/scio. It already has a cool name, so let's use it. And
> there might be other Scala-based DSLs.
>
> On Fri, Jun 24, 2016 at 8:39 AM, Ismaël Mejía <ie...@gmail.com> wrote:
>
> > ​Hello everyone,
> >
> > Neville, thanks a lot for your contribution. Your work is amazing and I
> am
> > really happy that this scala integration is finally happening.
> > Congratulations to you and your team.
> >
> > I *strongly* disagree about the DSL classification for scio for one
> reason,
> > if you go to the root of the term, Domain Specific Languages are about a
> > domain, and the domain in this case is writing Beam pipelines, which is a
> > really broad domain.
> >
> > I agree with Frances’ argument that scio is not an SDK e.g. it reuses the
> > existing Beam java SDK. My proposition is that scio will be called the
> > Scala API because in the end this is what it is. I think the confusion
> > comes from the common definition of SDK which is normally an API + a
> > Runtime. In this case scio will share the runtime with what we call the
> > Beam Java SDK.
> >
> > One additional point of using the term API is that it sends the clear
> > message that Beam has a Scala API too (which is good for visibility as JB
> > mentioned).
> >
> > Regards,
> > Ismaël​
> >
> >
> > On Fri, Jun 24, 2016 at 5:08 PM, Jean-Baptiste Onofré <jb...@nanthrax.net>
> > wrote:
> >
> > > Hi Dan,
> > >
> > > fair enough.
> > >
> > > As I'm also working on new DSLs (XML, JSON), I already created the dsls
> > > module.
> > >
> > > So, I would say dsls/scala.
> > >
> > > WDYT ?
> > >
> > > Regards
> > > JB
> > >
> > >
> > > On 06/24/2016 05:07 PM, Dan Halperin wrote:
> > >
> > >> I don't think that sdks/scala is the right place -- scio is not a Beam
> > >> Scala SDK; it wraps the existing Java SDK.
> > >>
> > >> Some options:
> > >> * sdks/java/extensions  (Scio builds on the Java SDK) -- mentally
> vetoed
> > >> since Scio isn't an extension for the Java SDK, but rather a wrapper
> > >>
> > >> * dsls/java/scio (Scio is a Beam DSL that uses the Java SDK)
> > >> * dsls/scio  (Scio is a Beam DSL that could eventually use multiple
> > SDKs)
> > >> * extensions/java/scio  (Scio is an extension of Beam that uses the
> Java
> > >> SDK)
> > >> * extensions/scio  (Scio is an extension of Beam that is not limited
> to
> > >> one
> > >> SDK)
> > >>
> > >> I lean towards either dsls/java/scio or extensions/java/scio, since I
> > >> don't
> > >> think there are plans for Scio to handle multiple different SDKs (in
> > >> different languages). The question between these two is whether we
> think
> > >> DSLs are "big enough" to be a top level concept.
> > >>
> > >> On Thu, Jun 23, 2016 at 11:05 PM, Jean-Baptiste Onofré <
> jb@nanthrax.net
> > >
> > >> wrote:
> > >>
> > >> Good point about new Fn and the fact it's based on the Java SDK.
> > >>>
> > >>> It's just that in term of "marketing", it's a good message to
> provide a
> > >>> Scala SDK even if technically it's more a DSL.
> > >>>
> > >>> For instance, a valid "marketing" DSL would be a Java fluent DSL on
> top
> > >>> of
> > >>> the Java SDK, or a declarative XML DSL.
> > >>>
> > >>> However, from a technical perspective, it can go into dsl module.
> > >>>
> > >>> My $0.02 ;)
> > >>>
> > >>> Regards
> > >>> JB
> > >>>
> > >>>
> > >>> On 06/24/2016 06:51 AM, Frances Perry wrote:
> > >>>
> > >>> +Rafal & Andrew again
> > >>>>
> > >>>> I am leaning DSL for two reasons: (1) scio uses the existing java
> > >>>> execution
> > >>>> environment (and won't have a language-specific fn harness of its
> > own),
> > >>>> and
> > >>>> (2) it changes the abstractions that users interact with.
> > >>>>
> > >>>> I recently saw a scio repl demo from Reuven -- there's some really
> > cool
> > >>>> stuff in there. I'd love to dive into it a bit more and see what can
> > be
> > >>>> generalized beyond scio. The repl-like interactive graph
> construction
> > is
> > >>>> very similar to what we've seen with ipython, in that it doesn't
> > always
> > >>>> play nicely with the graph construction / graph execution
> > distinction. I
> > >>>> wonder what changes to Beam might more generally support this. The
> > >>>> materialize stuff looks similar to some functionality in FlumeJava
> we
> > >>>> used
> > >>>> to support multi-segment pipelines with some shared intermediate
> > >>>> PCollections.
> > >>>>
> > >>>> On Thu, Jun 23, 2016 at 9:22 PM, Jean-Baptiste Onofré <
> > jb@nanthrax.net>
> > >>>> wrote:
> > >>>>
> > >>>> Hi Neville,
> > >>>>
> > >>>>>
> > >>>>> thanks for the update !
> > >>>>>
> > >>>>> As it's another language support, and to clearly identify the
> > purpose,
> > >>>>> I
> > >>>>> would say sdks/scala.
> > >>>>>
> > >>>>> Regards
> > >>>>> JB
> > >>>>>
> > >>>>>
> > >>>>> On 06/23/2016 11:56 PM, Neville Li wrote:
> > >>>>>
> > >>>>> +folks in my team
> > >>>>>
> > >>>>>>
> > >>>>>> On Thu, Jun 23, 2016 at 5:57 PM Neville Li <neville.lyh@gmail.com
> >
> > >>>>>> wrote:
> > >>>>>>
> > >>>>>> Hi all,
> > >>>>>>
> > >>>>>>
> > >>>>>>> I'm the co-author of Scio <https://github.com/spotify/scio> and
> am
> > >>>>>>> in
> > >>>>>>> the
> > >>>>>>> progress of moving code to Beam (BEAM-302
> > >>>>>>> <https://issues.apache.org/jira/browse/BEAM-302>). Just
> wondering
> > if
> > >>>>>>> sdks/scala is the right place for this code or if something like
> > >>>>>>> dsls/scio
> > >>>>>>> is a better choice? What do you think?
> > >>>>>>>
> > >>>>>>> A little background: Scio was built as a high-level Scala API for
> > >>>>>>> Google
> > >>>>>>> Cloud Dataflow (now also Apache Beam) and is heavily influenced
> by
> > >>>>>>> Spark
> > >>>>>>> and Scalding. It wraps around the Dataflow/Beam Java SDK while
> also
> > >>>>>>> providing features comparable to other Scala data frameworks. We
> > use
> > >>>>>>> Scio
> > >>>>>>> on Dataflow for production extensively inside Spotify.
> > >>>>>>>
> > >>>>>>> Cheers,
> > >>>>>>> Neville
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> --
> > >>>>>>
> > >>>>> Jean-Baptiste Onofré
> > >>>>> jbonofre@apache.org
> > >>>>> http://blog.nanthrax.net
> > >>>>> Talend - http://www.talend.com
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>> --
> > >>> Jean-Baptiste Onofré
> > >>> jbonofre@apache.org
> > >>> http://blog.nanthrax.net
> > >>> Talend - http://www.talend.com
> > >>>
> > >>>
> > >>
> > > --
> > > Jean-Baptiste Onofré
> > > jbonofre@apache.org
> > > http://blog.nanthrax.net
> > > Talend - http://www.talend.com
> > >
> >
>

Re: Scala DSL

Posted by Kenneth Knowles <kl...@google.com.INVALID>.
My +1 goes to dsls/scio. It already has a cool name, so let's use it. And
there might be other Scala-based DSLs.

On Fri, Jun 24, 2016 at 8:39 AM, Ismaël Mejía <ie...@gmail.com> wrote:

> ​Hello everyone,
>
> Neville, thanks a lot for your contribution. Your work is amazing and I am
> really happy that this scala integration is finally happening.
> Congratulations to you and your team.
>
> I *strongly* disagree about the DSL classification for scio for one reason,
> if you go to the root of the term, Domain Specific Languages are about a
> domain, and the domain in this case is writing Beam pipelines, which is a
> really broad domain.
>
> I agree with Frances’ argument that scio is not an SDK e.g. it reuses the
> existing Beam java SDK. My proposition is that scio will be called the
> Scala API because in the end this is what it is. I think the confusion
> comes from the common definition of SDK which is normally an API + a
> Runtime. In this case scio will share the runtime with what we call the
> Beam Java SDK.
>
> One additional point of using the term API is that it sends the clear
> message that Beam has a Scala API too (which is good for visibility as JB
> mentioned).
>
> Regards,
> Ismaël​
>
>
> On Fri, Jun 24, 2016 at 5:08 PM, Jean-Baptiste Onofré <jb...@nanthrax.net>
> wrote:
>
> > Hi Dan,
> >
> > fair enough.
> >
> > As I'm also working on new DSLs (XML, JSON), I already created the dsls
> > module.
> >
> > So, I would say dsls/scala.
> >
> > WDYT ?
> >
> > Regards
> > JB
> >
> >
> > On 06/24/2016 05:07 PM, Dan Halperin wrote:
> >
> >> I don't think that sdks/scala is the right place -- scio is not a Beam
> >> Scala SDK; it wraps the existing Java SDK.
> >>
> >> Some options:
> >> * sdks/java/extensions  (Scio builds on the Java SDK) -- mentally vetoed
> >> since Scio isn't an extension for the Java SDK, but rather a wrapper
> >>
> >> * dsls/java/scio (Scio is a Beam DSL that uses the Java SDK)
> >> * dsls/scio  (Scio is a Beam DSL that could eventually use multiple
> SDKs)
> >> * extensions/java/scio  (Scio is an extension of Beam that uses the Java
> >> SDK)
> >> * extensions/scio  (Scio is an extension of Beam that is not limited to
> >> one
> >> SDK)
> >>
> >> I lean towards either dsls/java/scio or extensions/java/scio, since I
> >> don't
> >> think there are plans for Scio to handle multiple different SDKs (in
> >> different languages). The question between these two is whether we think
> >> DSLs are "big enough" to be a top level concept.
> >>
> >> On Thu, Jun 23, 2016 at 11:05 PM, Jean-Baptiste Onofré <jb@nanthrax.net
> >
> >> wrote:
> >>
> >> Good point about new Fn and the fact it's based on the Java SDK.
> >>>
> >>> It's just that in term of "marketing", it's a good message to provide a
> >>> Scala SDK even if technically it's more a DSL.
> >>>
> >>> For instance, a valid "marketing" DSL would be a Java fluent DSL on top
> >>> of
> >>> the Java SDK, or a declarative XML DSL.
> >>>
> >>> However, from a technical perspective, it can go into dsl module.
> >>>
> >>> My $0.02 ;)
> >>>
> >>> Regards
> >>> JB
> >>>
> >>>
> >>> On 06/24/2016 06:51 AM, Frances Perry wrote:
> >>>
> >>> +Rafal & Andrew again
> >>>>
> >>>> I am leaning DSL for two reasons: (1) scio uses the existing java
> >>>> execution
> >>>> environment (and won't have a language-specific fn harness of its
> own),
> >>>> and
> >>>> (2) it changes the abstractions that users interact with.
> >>>>
> >>>> I recently saw a scio repl demo from Reuven -- there's some really
> cool
> >>>> stuff in there. I'd love to dive into it a bit more and see what can
> be
> >>>> generalized beyond scio. The repl-like interactive graph construction
> is
> >>>> very similar to what we've seen with ipython, in that it doesn't
> always
> >>>> play nicely with the graph construction / graph execution
> distinction. I
> >>>> wonder what changes to Beam might more generally support this. The
> >>>> materialize stuff looks similar to some functionality in FlumeJava we
> >>>> used
> >>>> to support multi-segment pipelines with some shared intermediate
> >>>> PCollections.
> >>>>
> >>>> On Thu, Jun 23, 2016 at 9:22 PM, Jean-Baptiste Onofré <
> jb@nanthrax.net>
> >>>> wrote:
> >>>>
> >>>> Hi Neville,
> >>>>
> >>>>>
> >>>>> thanks for the update !
> >>>>>
> >>>>> As it's another language support, and to clearly identify the
> purpose,
> >>>>> I
> >>>>> would say sdks/scala.
> >>>>>
> >>>>> Regards
> >>>>> JB
> >>>>>
> >>>>>
> >>>>> On 06/23/2016 11:56 PM, Neville Li wrote:
> >>>>>
> >>>>> +folks in my team
> >>>>>
> >>>>>>
> >>>>>> On Thu, Jun 23, 2016 at 5:57 PM Neville Li <ne...@gmail.com>
> >>>>>> wrote:
> >>>>>>
> >>>>>> Hi all,
> >>>>>>
> >>>>>>
> >>>>>>> I'm the co-author of Scio <https://github.com/spotify/scio> and am
> >>>>>>> in
> >>>>>>> the
> >>>>>>> progress of moving code to Beam (BEAM-302
> >>>>>>> <https://issues.apache.org/jira/browse/BEAM-302>). Just wondering
> if
> >>>>>>> sdks/scala is the right place for this code or if something like
> >>>>>>> dsls/scio
> >>>>>>> is a better choice? What do you think?
> >>>>>>>
> >>>>>>> A little background: Scio was built as a high-level Scala API for
> >>>>>>> Google
> >>>>>>> Cloud Dataflow (now also Apache Beam) and is heavily influenced by
> >>>>>>> Spark
> >>>>>>> and Scalding. It wraps around the Dataflow/Beam Java SDK while also
> >>>>>>> providing features comparable to other Scala data frameworks. We
> use
> >>>>>>> Scio
> >>>>>>> on Dataflow for production extensively inside Spotify.
> >>>>>>>
> >>>>>>> Cheers,
> >>>>>>> Neville
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> --
> >>>>>>
> >>>>> Jean-Baptiste Onofré
> >>>>> jbonofre@apache.org
> >>>>> http://blog.nanthrax.net
> >>>>> Talend - http://www.talend.com
> >>>>>
> >>>>>
> >>>>>
> >>>> --
> >>> Jean-Baptiste Onofré
> >>> jbonofre@apache.org
> >>> http://blog.nanthrax.net
> >>> Talend - http://www.talend.com
> >>>
> >>>
> >>
> > --
> > Jean-Baptiste Onofré
> > jbonofre@apache.org
> > http://blog.nanthrax.net
> > Talend - http://www.talend.com
> >
>

Re: Scala DSL

Posted by Ismaël Mejía <ie...@gmail.com>.
​Hello everyone,

Neville, thanks a lot for your contribution. Your work is amazing and I am
really happy that this scala integration is finally happening.
Congratulations to you and your team.

I *strongly* disagree about the DSL classification for scio for one reason,
if you go to the root of the term, Domain Specific Languages are about a
domain, and the domain in this case is writing Beam pipelines, which is a
really broad domain.

I agree with Frances’ argument that scio is not an SDK e.g. it reuses the
existing Beam java SDK. My proposition is that scio will be called the
Scala API because in the end this is what it is. I think the confusion
comes from the common definition of SDK which is normally an API + a
Runtime. In this case scio will share the runtime with what we call the
Beam Java SDK.

One additional point of using the term API is that it sends the clear
message that Beam has a Scala API too (which is good for visibility as JB
mentioned).

Regards,
Ismaël​


On Fri, Jun 24, 2016 at 5:08 PM, Jean-Baptiste Onofré <jb...@nanthrax.net>
wrote:

> Hi Dan,
>
> fair enough.
>
> As I'm also working on new DSLs (XML, JSON), I already created the dsls
> module.
>
> So, I would say dsls/scala.
>
> WDYT ?
>
> Regards
> JB
>
>
> On 06/24/2016 05:07 PM, Dan Halperin wrote:
>
>> I don't think that sdks/scala is the right place -- scio is not a Beam
>> Scala SDK; it wraps the existing Java SDK.
>>
>> Some options:
>> * sdks/java/extensions  (Scio builds on the Java SDK) -- mentally vetoed
>> since Scio isn't an extension for the Java SDK, but rather a wrapper
>>
>> * dsls/java/scio (Scio is a Beam DSL that uses the Java SDK)
>> * dsls/scio  (Scio is a Beam DSL that could eventually use multiple SDKs)
>> * extensions/java/scio  (Scio is an extension of Beam that uses the Java
>> SDK)
>> * extensions/scio  (Scio is an extension of Beam that is not limited to
>> one
>> SDK)
>>
>> I lean towards either dsls/java/scio or extensions/java/scio, since I
>> don't
>> think there are plans for Scio to handle multiple different SDKs (in
>> different languages). The question between these two is whether we think
>> DSLs are "big enough" to be a top level concept.
>>
>> On Thu, Jun 23, 2016 at 11:05 PM, Jean-Baptiste Onofré <jb...@nanthrax.net>
>> wrote:
>>
>> Good point about new Fn and the fact it's based on the Java SDK.
>>>
>>> It's just that in term of "marketing", it's a good message to provide a
>>> Scala SDK even if technically it's more a DSL.
>>>
>>> For instance, a valid "marketing" DSL would be a Java fluent DSL on top
>>> of
>>> the Java SDK, or a declarative XML DSL.
>>>
>>> However, from a technical perspective, it can go into dsl module.
>>>
>>> My $0.02 ;)
>>>
>>> Regards
>>> JB
>>>
>>>
>>> On 06/24/2016 06:51 AM, Frances Perry wrote:
>>>
>>> +Rafal & Andrew again
>>>>
>>>> I am leaning DSL for two reasons: (1) scio uses the existing java
>>>> execution
>>>> environment (and won't have a language-specific fn harness of its own),
>>>> and
>>>> (2) it changes the abstractions that users interact with.
>>>>
>>>> I recently saw a scio repl demo from Reuven -- there's some really cool
>>>> stuff in there. I'd love to dive into it a bit more and see what can be
>>>> generalized beyond scio. The repl-like interactive graph construction is
>>>> very similar to what we've seen with ipython, in that it doesn't always
>>>> play nicely with the graph construction / graph execution distinction. I
>>>> wonder what changes to Beam might more generally support this. The
>>>> materialize stuff looks similar to some functionality in FlumeJava we
>>>> used
>>>> to support multi-segment pipelines with some shared intermediate
>>>> PCollections.
>>>>
>>>> On Thu, Jun 23, 2016 at 9:22 PM, Jean-Baptiste Onofré <jb...@nanthrax.net>
>>>> wrote:
>>>>
>>>> Hi Neville,
>>>>
>>>>>
>>>>> thanks for the update !
>>>>>
>>>>> As it's another language support, and to clearly identify the purpose,
>>>>> I
>>>>> would say sdks/scala.
>>>>>
>>>>> Regards
>>>>> JB
>>>>>
>>>>>
>>>>> On 06/23/2016 11:56 PM, Neville Li wrote:
>>>>>
>>>>> +folks in my team
>>>>>
>>>>>>
>>>>>> On Thu, Jun 23, 2016 at 5:57 PM Neville Li <ne...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>> Hi all,
>>>>>>
>>>>>>
>>>>>>> I'm the co-author of Scio <https://github.com/spotify/scio> and am
>>>>>>> in
>>>>>>> the
>>>>>>> progress of moving code to Beam (BEAM-302
>>>>>>> <https://issues.apache.org/jira/browse/BEAM-302>). Just wondering if
>>>>>>> sdks/scala is the right place for this code or if something like
>>>>>>> dsls/scio
>>>>>>> is a better choice? What do you think?
>>>>>>>
>>>>>>> A little background: Scio was built as a high-level Scala API for
>>>>>>> Google
>>>>>>> Cloud Dataflow (now also Apache Beam) and is heavily influenced by
>>>>>>> Spark
>>>>>>> and Scalding. It wraps around the Dataflow/Beam Java SDK while also
>>>>>>> providing features comparable to other Scala data frameworks. We use
>>>>>>> Scio
>>>>>>> on Dataflow for production extensively inside Spotify.
>>>>>>>
>>>>>>> Cheers,
>>>>>>> Neville
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>
>>>>> Jean-Baptiste Onofré
>>>>> jbonofre@apache.org
>>>>> http://blog.nanthrax.net
>>>>> Talend - http://www.talend.com
>>>>>
>>>>>
>>>>>
>>>> --
>>> Jean-Baptiste Onofré
>>> jbonofre@apache.org
>>> http://blog.nanthrax.net
>>> Talend - http://www.talend.com
>>>
>>>
>>
> --
> Jean-Baptiste Onofré
> jbonofre@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>

Re: Scala DSL

Posted by Jean-Baptiste Onofré <jb...@nanthrax.net>.
Hi Dan,

fair enough.

As I'm also working on new DSLs (XML, JSON), I already created the dsls 
module.

So, I would say dsls/scala.

WDYT ?

Regards
JB

On 06/24/2016 05:07 PM, Dan Halperin wrote:
> I don't think that sdks/scala is the right place -- scio is not a Beam
> Scala SDK; it wraps the existing Java SDK.
>
> Some options:
> * sdks/java/extensions  (Scio builds on the Java SDK) -- mentally vetoed
> since Scio isn't an extension for the Java SDK, but rather a wrapper
>
> * dsls/java/scio (Scio is a Beam DSL that uses the Java SDK)
> * dsls/scio  (Scio is a Beam DSL that could eventually use multiple SDKs)
> * extensions/java/scio  (Scio is an extension of Beam that uses the Java
> SDK)
> * extensions/scio  (Scio is an extension of Beam that is not limited to one
> SDK)
>
> I lean towards either dsls/java/scio or extensions/java/scio, since I don't
> think there are plans for Scio to handle multiple different SDKs (in
> different languages). The question between these two is whether we think
> DSLs are "big enough" to be a top level concept.
>
> On Thu, Jun 23, 2016 at 11:05 PM, Jean-Baptiste Onofr� <jb...@nanthrax.net>
> wrote:
>
>> Good point about new Fn and the fact it's based on the Java SDK.
>>
>> It's just that in term of "marketing", it's a good message to provide a
>> Scala SDK even if technically it's more a DSL.
>>
>> For instance, a valid "marketing" DSL would be a Java fluent DSL on top of
>> the Java SDK, or a declarative XML DSL.
>>
>> However, from a technical perspective, it can go into dsl module.
>>
>> My $0.02 ;)
>>
>> Regards
>> JB
>>
>>
>> On 06/24/2016 06:51 AM, Frances Perry wrote:
>>
>>> +Rafal & Andrew again
>>>
>>> I am leaning DSL for two reasons: (1) scio uses the existing java
>>> execution
>>> environment (and won't have a language-specific fn harness of its own),
>>> and
>>> (2) it changes the abstractions that users interact with.
>>>
>>> I recently saw a scio repl demo from Reuven -- there's some really cool
>>> stuff in there. I'd love to dive into it a bit more and see what can be
>>> generalized beyond scio. The repl-like interactive graph construction is
>>> very similar to what we've seen with ipython, in that it doesn't always
>>> play nicely with the graph construction / graph execution distinction. I
>>> wonder what changes to Beam might more generally support this. The
>>> materialize stuff looks similar to some functionality in FlumeJava we used
>>> to support multi-segment pipelines with some shared intermediate
>>> PCollections.
>>>
>>> On Thu, Jun 23, 2016 at 9:22 PM, Jean-Baptiste Onofr� <jb...@nanthrax.net>
>>> wrote:
>>>
>>> Hi Neville,
>>>>
>>>> thanks for the update !
>>>>
>>>> As it's another language support, and to clearly identify the purpose, I
>>>> would say sdks/scala.
>>>>
>>>> Regards
>>>> JB
>>>>
>>>>
>>>> On 06/23/2016 11:56 PM, Neville Li wrote:
>>>>
>>>> +folks in my team
>>>>>
>>>>> On Thu, Jun 23, 2016 at 5:57 PM Neville Li <ne...@gmail.com>
>>>>> wrote:
>>>>>
>>>>> Hi all,
>>>>>
>>>>>>
>>>>>> I'm the co-author of Scio <https://github.com/spotify/scio> and am in
>>>>>> the
>>>>>> progress of moving code to Beam (BEAM-302
>>>>>> <https://issues.apache.org/jira/browse/BEAM-302>). Just wondering if
>>>>>> sdks/scala is the right place for this code or if something like
>>>>>> dsls/scio
>>>>>> is a better choice? What do you think?
>>>>>>
>>>>>> A little background: Scio was built as a high-level Scala API for
>>>>>> Google
>>>>>> Cloud Dataflow (now also Apache Beam) and is heavily influenced by
>>>>>> Spark
>>>>>> and Scalding. It wraps around the Dataflow/Beam Java SDK while also
>>>>>> providing features comparable to other Scala data frameworks. We use
>>>>>> Scio
>>>>>> on Dataflow for production extensively inside Spotify.
>>>>>>
>>>>>> Cheers,
>>>>>> Neville
>>>>>>
>>>>>>
>>>>>>
>>>>> --
>>>> Jean-Baptiste Onofr�
>>>> jbonofre@apache.org
>>>> http://blog.nanthrax.net
>>>> Talend - http://www.talend.com
>>>>
>>>>
>>>
>> --
>> Jean-Baptiste Onofr�
>> jbonofre@apache.org
>> http://blog.nanthrax.net
>> Talend - http://www.talend.com
>>
>

-- 
Jean-Baptiste Onofr�
jbonofre@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com

Re: Scala DSL

Posted by Dan Halperin <dh...@google.com.INVALID>.
I don't think that sdks/scala is the right place -- scio is not a Beam
Scala SDK; it wraps the existing Java SDK.

Some options:
* sdks/java/extensions  (Scio builds on the Java SDK) -- mentally vetoed
since Scio isn't an extension for the Java SDK, but rather a wrapper

* dsls/java/scio (Scio is a Beam DSL that uses the Java SDK)
* dsls/scio  (Scio is a Beam DSL that could eventually use multiple SDKs)
* extensions/java/scio  (Scio is an extension of Beam that uses the Java
SDK)
* extensions/scio  (Scio is an extension of Beam that is not limited to one
SDK)

I lean towards either dsls/java/scio or extensions/java/scio, since I don't
think there are plans for Scio to handle multiple different SDKs (in
different languages). The question between these two is whether we think
DSLs are "big enough" to be a top level concept.

On Thu, Jun 23, 2016 at 11:05 PM, Jean-Baptiste Onofré <jb...@nanthrax.net>
wrote:

> Good point about new Fn and the fact it's based on the Java SDK.
>
> It's just that in term of "marketing", it's a good message to provide a
> Scala SDK even if technically it's more a DSL.
>
> For instance, a valid "marketing" DSL would be a Java fluent DSL on top of
> the Java SDK, or a declarative XML DSL.
>
> However, from a technical perspective, it can go into dsl module.
>
> My $0.02 ;)
>
> Regards
> JB
>
>
> On 06/24/2016 06:51 AM, Frances Perry wrote:
>
>> +Rafal & Andrew again
>>
>> I am leaning DSL for two reasons: (1) scio uses the existing java
>> execution
>> environment (and won't have a language-specific fn harness of its own),
>> and
>> (2) it changes the abstractions that users interact with.
>>
>> I recently saw a scio repl demo from Reuven -- there's some really cool
>> stuff in there. I'd love to dive into it a bit more and see what can be
>> generalized beyond scio. The repl-like interactive graph construction is
>> very similar to what we've seen with ipython, in that it doesn't always
>> play nicely with the graph construction / graph execution distinction. I
>> wonder what changes to Beam might more generally support this. The
>> materialize stuff looks similar to some functionality in FlumeJava we used
>> to support multi-segment pipelines with some shared intermediate
>> PCollections.
>>
>> On Thu, Jun 23, 2016 at 9:22 PM, Jean-Baptiste Onofré <jb...@nanthrax.net>
>> wrote:
>>
>> Hi Neville,
>>>
>>> thanks for the update !
>>>
>>> As it's another language support, and to clearly identify the purpose, I
>>> would say sdks/scala.
>>>
>>> Regards
>>> JB
>>>
>>>
>>> On 06/23/2016 11:56 PM, Neville Li wrote:
>>>
>>> +folks in my team
>>>>
>>>> On Thu, Jun 23, 2016 at 5:57 PM Neville Li <ne...@gmail.com>
>>>> wrote:
>>>>
>>>> Hi all,
>>>>
>>>>>
>>>>> I'm the co-author of Scio <https://github.com/spotify/scio> and am in
>>>>> the
>>>>> progress of moving code to Beam (BEAM-302
>>>>> <https://issues.apache.org/jira/browse/BEAM-302>). Just wondering if
>>>>> sdks/scala is the right place for this code or if something like
>>>>> dsls/scio
>>>>> is a better choice? What do you think?
>>>>>
>>>>> A little background: Scio was built as a high-level Scala API for
>>>>> Google
>>>>> Cloud Dataflow (now also Apache Beam) and is heavily influenced by
>>>>> Spark
>>>>> and Scalding. It wraps around the Dataflow/Beam Java SDK while also
>>>>> providing features comparable to other Scala data frameworks. We use
>>>>> Scio
>>>>> on Dataflow for production extensively inside Spotify.
>>>>>
>>>>> Cheers,
>>>>> Neville
>>>>>
>>>>>
>>>>>
>>>> --
>>> Jean-Baptiste Onofré
>>> jbonofre@apache.org
>>> http://blog.nanthrax.net
>>> Talend - http://www.talend.com
>>>
>>>
>>
> --
> Jean-Baptiste Onofré
> jbonofre@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>

Re: Scala DSL

Posted by Jean-Baptiste Onofré <jb...@nanthrax.net>.
Good point about new Fn and the fact it's based on the Java SDK.

It's just that in term of "marketing", it's a good message to provide a 
Scala SDK even if technically it's more a DSL.

For instance, a valid "marketing" DSL would be a Java fluent DSL on top 
of the Java SDK, or a declarative XML DSL.

However, from a technical perspective, it can go into dsl module.

My $0.02 ;)

Regards
JB

On 06/24/2016 06:51 AM, Frances Perry wrote:
> +Rafal & Andrew again
>
> I am leaning DSL for two reasons: (1) scio uses the existing java execution
> environment (and won't have a language-specific fn harness of its own), and
> (2) it changes the abstractions that users interact with.
>
> I recently saw a scio repl demo from Reuven -- there's some really cool
> stuff in there. I'd love to dive into it a bit more and see what can be
> generalized beyond scio. The repl-like interactive graph construction is
> very similar to what we've seen with ipython, in that it doesn't always
> play nicely with the graph construction / graph execution distinction. I
> wonder what changes to Beam might more generally support this. The
> materialize stuff looks similar to some functionality in FlumeJava we used
> to support multi-segment pipelines with some shared intermediate
> PCollections.
>
> On Thu, Jun 23, 2016 at 9:22 PM, Jean-Baptiste Onofr� <jb...@nanthrax.net>
> wrote:
>
>> Hi Neville,
>>
>> thanks for the update !
>>
>> As it's another language support, and to clearly identify the purpose, I
>> would say sdks/scala.
>>
>> Regards
>> JB
>>
>>
>> On 06/23/2016 11:56 PM, Neville Li wrote:
>>
>>> +folks in my team
>>>
>>> On Thu, Jun 23, 2016 at 5:57 PM Neville Li <ne...@gmail.com> wrote:
>>>
>>> Hi all,
>>>>
>>>> I'm the co-author of Scio <https://github.com/spotify/scio> and am in
>>>> the
>>>> progress of moving code to Beam (BEAM-302
>>>> <https://issues.apache.org/jira/browse/BEAM-302>). Just wondering if
>>>> sdks/scala is the right place for this code or if something like
>>>> dsls/scio
>>>> is a better choice? What do you think?
>>>>
>>>> A little background: Scio was built as a high-level Scala API for Google
>>>> Cloud Dataflow (now also Apache Beam) and is heavily influenced by Spark
>>>> and Scalding. It wraps around the Dataflow/Beam Java SDK while also
>>>> providing features comparable to other Scala data frameworks. We use Scio
>>>> on Dataflow for production extensively inside Spotify.
>>>>
>>>> Cheers,
>>>> Neville
>>>>
>>>>
>>>
>> --
>> Jean-Baptiste Onofr�
>> jbonofre@apache.org
>> http://blog.nanthrax.net
>> Talend - http://www.talend.com
>>
>

-- 
Jean-Baptiste Onofr�
jbonofre@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com

Re: Scala DSL

Posted by Frances Perry <fj...@google.com.INVALID>.
+Rafal & Andrew again

I am leaning DSL for two reasons: (1) scio uses the existing java execution
environment (and won't have a language-specific fn harness of its own), and
(2) it changes the abstractions that users interact with.

I recently saw a scio repl demo from Reuven -- there's some really cool
stuff in there. I'd love to dive into it a bit more and see what can be
generalized beyond scio. The repl-like interactive graph construction is
very similar to what we've seen with ipython, in that it doesn't always
play nicely with the graph construction / graph execution distinction. I
wonder what changes to Beam might more generally support this. The
materialize stuff looks similar to some functionality in FlumeJava we used
to support multi-segment pipelines with some shared intermediate
PCollections.

On Thu, Jun 23, 2016 at 9:22 PM, Jean-Baptiste Onofré <jb...@nanthrax.net>
wrote:

> Hi Neville,
>
> thanks for the update !
>
> As it's another language support, and to clearly identify the purpose, I
> would say sdks/scala.
>
> Regards
> JB
>
>
> On 06/23/2016 11:56 PM, Neville Li wrote:
>
>> +folks in my team
>>
>> On Thu, Jun 23, 2016 at 5:57 PM Neville Li <ne...@gmail.com> wrote:
>>
>> Hi all,
>>>
>>> I'm the co-author of Scio <https://github.com/spotify/scio> and am in
>>> the
>>> progress of moving code to Beam (BEAM-302
>>> <https://issues.apache.org/jira/browse/BEAM-302>). Just wondering if
>>> sdks/scala is the right place for this code or if something like
>>> dsls/scio
>>> is a better choice? What do you think?
>>>
>>> A little background: Scio was built as a high-level Scala API for Google
>>> Cloud Dataflow (now also Apache Beam) and is heavily influenced by Spark
>>> and Scalding. It wraps around the Dataflow/Beam Java SDK while also
>>> providing features comparable to other Scala data frameworks. We use Scio
>>> on Dataflow for production extensively inside Spotify.
>>>
>>> Cheers,
>>> Neville
>>>
>>>
>>
> --
> Jean-Baptiste Onofré
> jbonofre@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>

Re: Scala DSL

Posted by Jean-Baptiste Onofré <jb...@nanthrax.net>.
Hi Neville,

thanks for the update !

As it's another language support, and to clearly identify the purpose, I 
would say sdks/scala.

Regards
JB

On 06/23/2016 11:56 PM, Neville Li wrote:
> +folks in my team
>
> On Thu, Jun 23, 2016 at 5:57 PM Neville Li <ne...@gmail.com> wrote:
>
>> Hi all,
>>
>> I'm the co-author of Scio <https://github.com/spotify/scio> and am in the
>> progress of moving code to Beam (BEAM-302
>> <https://issues.apache.org/jira/browse/BEAM-302>). Just wondering if
>> sdks/scala is the right place for this code or if something like dsls/scio
>> is a better choice? What do you think?
>>
>> A little background: Scio was built as a high-level Scala API for Google
>> Cloud Dataflow (now also Apache Beam) and is heavily influenced by Spark
>> and Scalding. It wraps around the Dataflow/Beam Java SDK while also
>> providing features comparable to other Scala data frameworks. We use Scio
>> on Dataflow for production extensively inside Spotify.
>>
>> Cheers,
>> Neville
>>
>

-- 
Jean-Baptiste Onofr�
jbonofre@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com

Re: Scala DSL

Posted by Neville Li <ne...@gmail.com>.
+folks in my team

On Thu, Jun 23, 2016 at 5:57 PM Neville Li <ne...@gmail.com> wrote:

> Hi all,
>
> I'm the co-author of Scio <https://github.com/spotify/scio> and am in the
> progress of moving code to Beam (BEAM-302
> <https://issues.apache.org/jira/browse/BEAM-302>). Just wondering if
> sdks/scala is the right place for this code or if something like dsls/scio
> is a better choice? What do you think?
>
> A little background: Scio was built as a high-level Scala API for Google
> Cloud Dataflow (now also Apache Beam) and is heavily influenced by Spark
> and Scalding. It wraps around the Dataflow/Beam Java SDK while also
> providing features comparable to other Scala data frameworks. We use Scio
> on Dataflow for production extensively inside Spotify.
>
> Cheers,
> Neville
>