You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@beam.apache.org by Neville Li <ne...@gmail.com> on 2016/07/01 18:43:02 UTC

Re: Scala DSL

Looks like dsls/scio is the winner :)

I like it too plus we get to keep the Scio name. This also leaves room for
other Scala wrappers of different flavor.
Scio is a DSL in the domain of functional style data pipelines.

On Mon, Jun 27, 2016 at 3:55 AM Ismaël Mejía <ie...@gmail.com> wrote:

> Just to summarize, at this point:
>
> - Everybody agrees about the fact that scio is not an SDK.
> - Almost everybody agrees that given the current choice they would prefer
> ‘dsls/scio’
> - Some of us are not particularly married with the DSL classification.
>
> I have a proposition to make, we can define two concepts with their given
> structure in the Beam repository:
>
> 1. Beam API: A set of abstractions to program the complete Beam Model in a
> given programming language.
>
> These are idiomatic versions of the Beam Model, and ideally should cover
> the complete Beam Model e.g. scio is one example. The directory structure
> for Beam APIs could be:
>
> apis/scala
> apis/clojure
> apis/groovy
> ...
>
> 2. Beam DSL: A domain-specific set of abstractions that run on Beam, e.g.
> graphs, machine learning, etc
>
> These represent domain specific idioms, e.g. a graph DSL would represent
> graph concepts. e.g. edges, vertex, etc as first citizens. The directory
> structure for Beam DSLs could be:
>
> dsls/graph
> dsls/ml
> dsls/cep
> ...
>
> Given these definitions for the concrete scio case I think the most
> accurate directory would be:
>
> apis/scala
> or
> apis/scala/scio
>
> I personally prefer the first one (apis/scala) because we don’t have any
> other scala API for the moment and because I think that we shouldn’t have
> more than one API per language to avoid confusion e.g. imagine that someone
> creates apis/java/bcollections to represent Beam Pipelines as distributed
> collections, that would be confusing. However I understand the arguments
> for the second directory e.g. to support different APIs per language, and
> to preserve their original names (scio). Anyway I would be ok with any of
> the two.
>
> I excuse myself for this long message, and for not choosing any of the two
> structures proposed in this thread, but I think it is important to be clear
> about the differences in scope of both Beam APIs and DSLs in particular if
> we think about new users.
>
> What do you think, do you think my proposition makes sense, any suggestions
> ?
>
> Regards,
> Ismaël
>
> ps. One last thing, I found this text that in part corroborates my feeling
> about scio been an API and not a DSL:
>
> “… a Scala Dataflow API (a nascent open-source version of which already
> exists, and which seems likely to flower into maturity in due time given
> Dataflow's move to join the ASF).”
> https://cloud.google.com/dataflow/blog/dataflow-beam-and-spark-comparison
>
>
> On Mon, Jun 27, 2016 at 4:52 AM, Raghu Angadi <ra...@google.com.invalid>
> wrote:
>
> > On Fri, Jun 24, 2016 at 7:05 PM, Dan Halperin
> <dhalperi@google.com.invalid
> > >
> > wrote:
> >
> > > > I love the
> > > > name scio. But I think sdks/scala might be most appropriate and would
> > > make
> > > > it a first class citizen for Beam.
> > > >
> > >
> > > I am strongly against it being in the 'sdks/' top-level module -- it's
> > not
> > > a Beam SDK. Unlike DSL, SDK is a very specific term in Beam.
> > >
> >
> > +1. I agree, it is not Beam SDK in that sense.
> >
> > Raghu.
> >
> >
> > >
> > > > Where would a future python sdk reside?
> > > >
> > >
> > > The Python SDK is in the python-sdk branch on Apache already, and it
> > lives
> > > in `sdks/python`. (And it is aiming to become a proper Beam SDK. ;)
> >
>

Re: Scala DSL

Posted by Jean-Baptiste Onofré <jb...@nanthrax.net>.
+1 for dsls/scio.

Let me know how I can help there !

Thanks
Regards
JB

On 07/01/2016 08:43 PM, Neville Li wrote:
> Looks like dsls/scio is the winner :)
>
> I like it too plus we get to keep the Scio name. This also leaves room for
> other Scala wrappers of different flavor.
> Scio is a DSL in the domain of functional style data pipelines.
>
> On Mon, Jun 27, 2016 at 3:55 AM Isma�l Mej�a <ie...@gmail.com> wrote:
>
>> Just to summarize, at this point:
>>
>> - Everybody agrees about the fact that scio is not an SDK.
>> - Almost everybody agrees that given the current choice they would prefer
>> \u2018dsls/scio\u2019
>> - Some of us are not particularly married with the DSL classification.
>>
>> I have a proposition to make, we can define two concepts with their given
>> structure in the Beam repository:
>>
>> 1. Beam API: A set of abstractions to program the complete Beam Model in a
>> given programming language.
>>
>> These are idiomatic versions of the Beam Model, and ideally should cover
>> the complete Beam Model e.g. scio is one example. The directory structure
>> for Beam APIs could be:
>>
>> apis/scala
>> apis/clojure
>> apis/groovy
>> ...
>>
>> 2. Beam DSL: A domain-specific set of abstractions that run on Beam, e.g.
>> graphs, machine learning, etc
>>
>> These represent domain specific idioms, e.g. a graph DSL would represent
>> graph concepts. e.g. edges, vertex, etc as first citizens. The directory
>> structure for Beam DSLs could be:
>>
>> dsls/graph
>> dsls/ml
>> dsls/cep
>> ...
>>
>> Given these definitions for the concrete scio case I think the most
>> accurate directory would be:
>>
>> apis/scala
>> or
>> apis/scala/scio
>>
>> I personally prefer the first one (apis/scala) because we don\u2019t have any
>> other scala API for the moment and because I think that we shouldn\u2019t have
>> more than one API per language to avoid confusion e.g. imagine that someone
>> creates apis/java/bcollections to represent Beam Pipelines as distributed
>> collections, that would be confusing. However I understand the arguments
>> for the second directory e.g. to support different APIs per language, and
>> to preserve their original names (scio). Anyway I would be ok with any of
>> the two.
>>
>> I excuse myself for this long message, and for not choosing any of the two
>> structures proposed in this thread, but I think it is important to be clear
>> about the differences in scope of both Beam APIs and DSLs in particular if
>> we think about new users.
>>
>> What do you think, do you think my proposition makes sense, any suggestions
>> ?
>>
>> Regards,
>> Isma�l
>>
>> ps. One last thing, I found this text that in part corroborates my feeling
>> about scio been an API and not a DSL:
>>
>> \u201c\u2026 a Scala Dataflow API (a nascent open-source version of which already
>> exists, and which seems likely to flower into maturity in due time given
>> Dataflow's move to join the ASF).\u201d
>> https://cloud.google.com/dataflow/blog/dataflow-beam-and-spark-comparison
>>
>>
>> On Mon, Jun 27, 2016 at 4:52 AM, Raghu Angadi <ra...@google.com.invalid>
>> wrote:
>>
>>> On Fri, Jun 24, 2016 at 7:05 PM, Dan Halperin
>> <dhalperi@google.com.invalid
>>>>
>>> wrote:
>>>
>>>>> I love the
>>>>> name scio. But I think sdks/scala might be most appropriate and would
>>>> make
>>>>> it a first class citizen for Beam.
>>>>>
>>>>
>>>> I am strongly against it being in the 'sdks/' top-level module -- it's
>>> not
>>>> a Beam SDK. Unlike DSL, SDK is a very specific term in Beam.
>>>>
>>>
>>> +1. I agree, it is not Beam SDK in that sense.
>>>
>>> Raghu.
>>>
>>>
>>>>
>>>>> Where would a future python sdk reside?
>>>>>
>>>>
>>>> The Python SDK is in the python-sdk branch on Apache already, and it
>>> lives
>>>> in `sdks/python`. (And it is aiming to become a proper Beam SDK. ;)
>>>
>>
>

-- 
Jean-Baptiste Onofr�
jbonofre@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com