You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@beam.apache.org by Jean-Baptiste Onofré <jb...@nanthrax.net> on 2016/03/24 14:01:41 UTC

[PROPOSAL] New sdk languages

Hi beamers,

right now, Beam provides Java SDK.

AFAIK, very soon, you should have the Python SDK ;)

Spotify created a Scala API on top of Google Dataflow SDK:

https://github.com/spotify/scio

What do you think of asking if they want to donate this as Beam Scala SDK ?
I planned to work on a Scala SDK, but as it seems there's already 
something, it makes sense to leverage it.

Thoughts ?

Regards
JB
-- 
Jean-Baptiste Onofré
jbonofre@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com

Re: [PROPOSAL] New sdk languages

Posted by Jean-Baptiste Onofré <jb...@nanthrax.net>.
Hi Neville,

agree with you: the purpose is not to recreate a Scala SDK from scratch. 
I think implemented some primitives in Scala and leverage others from 
Java makes sense.

Regards
JB

On 04/04/2016 09:51 PM, Neville Li wrote:
> There's really no point of creating a Scala SDK from scratch to duplicate
> all the apply/transform API, coders, etc. since one can call Java libraries
> directly and seamlessly in Scala and any competent Scala dev can write Java
> code in Scala, like what we're doing with the wrappers.
>
> We created Scio to make life easier for Scala devs and those we're working
> with would agree.
>
> On Fri, Apr 1, 2016 at 5:26 AM Ismaël Mejía <ie...@gmail.com> wrote:
>
>> Excellent questions,
>>
>>> - Scio is more of a thin Scala wrapper/DSL on top of Java SDK to offer FP
>>> style API similar to Spark/Scalding and a few other features often found
>> in
>>> other Scala libraries, e.g. REPL, macro type providers, Futures. How do
>>> those fit in the big picture? Is there a place for such high level DSLs?
>>
>> I don't know what the others think, but I think a good approach would be to
>> have
>> the core functionality (the one that is semantically equal to the java one
>> and
>> covers the core dataflow model) as the 'official' beam bindings for scala
>> and
>> all the extra niceties as independent packages (beam-scala-repl,
>> beam-scala-extra, etc).
>>
>>> - What's the vision for feature parity between SDKs? Should they all
>> expose
>>> the same apply/transform style API or have freedom to provide API
>> idiomatic
>>> for the language?
>>
>> I had the same doubts about idiomatic bindings for the SDKs, a new
>> programmer
>> who checks the Beam API in java (based on apply/transform) vs the scio
>> scala API
>> (based on distributed collections) will be a bit surprised (as I did coming
>> from
>> the spark world) because the styles are quite different even if the
>> model/semantics are similar.
>>
>> I think dealing with different styles can make the project hard to
>> approach, on
>> the other hand I see the value of idiomatic versions. What do the others
>> think ?
>> What would be a good compromise here?
>>
>> Just as an extra point, maybe a good way to document the differences would
>> be to
>> provide docs like they do in the ReactiveX world with base concepts of the
>> model
>> and side notes for language specific syntax.
>>
>> http://reactivex.io/documentation/operators/flatmap.html
>>
>> Cheers,
>> Ismaël
>>
>>
>>
>> On Fri, Apr 1, 2016 at 4:49 AM, Neville Li <ne...@gmail.com> wrote:
>>
>>> I read some technical docs and have a few more questions.
>>>
>>> - Scio is more of a thin Scala wrapper/DSL on top of Java SDK to offer FP
>>> style API similar to Spark/Scalding and a few other features often found
>> in
>>> other Scala libraries, e.g. REPL, macro type providers, Futures. How do
>>> those fit in the big picture? Is there a place for such high level DSLs?
>>> - Therefore it's not really a native SDK equivalent to the Java or Python
>>> SDK, does it fit in the /sdks/scala repo structure?
>>> - What's the vision for feature parity between SDKs? Should they all
>> expose
>>> the same apply/transform style API or have freedom to provide API
>> idiomatic
>>> for the language?
>>>
>>> Asking these because we want to leverage both the Java SDK and the Scala
>>> ecosystem and it'll be nice to have a vision for these things.
>>>
>>> Cheers,
>>> Neville
>>>
>>> On Sat, Mar 26, 2016 at 5:39 PM Pierre Mage <pi...@gmail.com>
>> wrote:
>>>
>>>> Hi Neville,
>>>>
>>>> I don't know how up to date this roadmap is but from "Apache Beam:
>>>> Technical Vision":
>>>>
>>>>
>>>
>> https://docs.google.com/presentation/d/1E9seGPB_VXtY_KZP4HngDPTbsu5RVZFFaTlwEYa88Zw/edit#slide=id.g108d3a202f_0_287
>>>>
>>>> And for more details:
>>>>
>>>>
>>>
>> https://docs.google.com/document/d/1UyAeugHxZmVlQ5cEWo_eOPgXNQA1oD-rGooWOSwAqh8/edit#heading=h.ywcvt1a9xcx1
>>>>
>>>> On 26 March 2016 at 06:53, Jean-Baptiste Onofré <jb...@nanthrax.net>
>> wrote:
>>>>
>>>>> Hi Neville,
>>>>>
>>>>> that's great news, and the timeline is perfect !
>>>>>
>>>>> We are working on some refactoring & polishing on our side (Runner
>> API,
>>>>> etc). So, one or two months is not a big deal !
>>>>>
>>>>> Let me know if I can help in any way.
>>>>>
>>>>> Thanks,
>>>>> Regards
>>>>> JB
>>>>>
>>>>>
>>>>> On 03/25/2016 08:03 PM, Neville Li wrote:
>>>>>
>>>>>> Thanks guys. Yes we'd love to donate the project but would also like
>>> to
>>>>>> polish the API a bit first, like in the next month or two. What's
>> the
>>>>>> timeline like for BEAM and related projects?
>>>>>>
>>>>>> Will also read the technical docs and follow up later.
>>>>>>
>>>>>> On Fri, Mar 25, 2016, 12:55 AM Ismaël Mejía <ie...@gmail.com>
>>> wrote:
>>>>>>
>>>>>> Hello Neville,
>>>>>>>
>>>>>>> First congratulations guys, excellent job / API, the scalding
>> touches
>>>> are
>>>>>>> pretty neat (as well as the Tap abstraction). I am also new to
>> Beam,
>>> so
>>>>>>> believe me, you guys already know more than me.
>>>>>>>
>>>>>>> In my comment I mentioned sessions referring to session windows,
>> but
>>> it
>>>>>>> was
>>>>>>> my mistake since I just took a fast look at your code and initially
>>>>>>> didn't
>>>>>>> see them. Anyway if you are interested in the model there is a good
>>>>>>> description of the current capabilities of the runners in the
>>> website,
>>>>>>>
>>>>>>> https://beam.incubator.apache.org/capability-matrix/
>>>>>>>
>>>>>>> And the new additions to the model are openly discussed in the
>>> mailing
>>>>>>> list
>>>>>>> and in the technical docs (e.g. lateness):
>>>>>>>
>>>>>>> https://goo.gl/ps8twC
>>>>>>>
>>>>>>> -Ismaël
>>>>>>>
>>>>>>> On Fri, Mar 25, 2016 at 8:36 AM, Neville Li <neville.lyh@gmail.com
>>>
>>>>>>> wrote:
>>>>>>>
>>>>>>> Thanks guys for the interest. I'm really excited about all the
>>>> feedbacks
>>>>>>>> from the community.
>>>>>>>>
>>>>>>>> A little background: we developed Scio to bring Google Cloud
>>> Dataflow
>>>>>>>> closer to the Scalding/Spark ecosystem that our developers are
>>>> familiar
>>>>>>>> with while bringing some missing pieces to the table (type safe
>>>>>>>> BigQuery,
>>>>>>>> HDFS, REPL to name a few).
>>>>>>>>
>>>>>>>> I have to admit that I'm pretty new to the BEAM development but
>>> would
>>>>>>>>
>>>>>>> love
>>>>>>>
>>>>>>>> to get feedbacks and advices on how to bring Scio closer to BEAM
>>>> feature
>>>>>>>> set and semantics. Scio doesn't have to live with the BEAM code
>> base
>>>>>>>> just
>>>>>>>> yet (we're still under heavy development) but I'd like to see it
>> as
>>> a
>>>> de
>>>>>>>> facto Scala API endorsed by the BEAM community.
>>>>>>>>
>>>>>>>> @Ismaël: I'm curious what's this session thing you're referring
>> to?
>>>>>>>>
>>>>>>>> On Thu, Mar 24, 2016 at 3:40 PM Frances Perry
>>> <fjp@google.com.invalid
>>>>>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>> +Neville and Rafal for their take ;-)
>>>>>>>>>
>>>>>>>>> Excited to see this out. Multiple community driven SDKs are right
>>> in
>>>>>>>>>
>>>>>>>> line
>>>>>>>
>>>>>>>> with our goals for Beam.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Thu, Mar 24, 2016 at 3:04 PM, Ismaël Mejía <iemejia@gmail.com
>>>
>>>>>>>>>
>>>>>>>> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>>> Addendum: actually the semantic model support is not so far away
>>> as I
>>>>>>>>>>
>>>>>>>>> said
>>>>>>>>>
>>>>>>>>>> before (I havent finished reading and I thought they didn't
>>> support
>>>>>>>>>> sessions), and looking at the git history the project is not so
>>>> young
>>>>>>>>>> either and it is quite active.
>>>>>>>>>>
>>>>>>>>>> On Thu, Mar 24, 2016 at 10:52 PM, Ismaël Mejía <
>> iemejia@gmail.com
>>>>
>>>>>>>>>>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Hello,
>>>>>>>>>>>
>>>>>>>>>>> I just checked a bit the code and what they have done is
>>>>>>>>>>>
>>>>>>>>>> interesting,
>>>>>>>
>>>>>>>> the
>>>>>>>>>
>>>>>>>>>> SCollection wrapper is worth a look, as well as the examples to
>>> get
>>>>>>>>>>>
>>>>>>>>>> an
>>>>>>>>
>>>>>>>>> idea
>>>>>>>>>>
>>>>>>>>>>> of their intentions, the fact that the code looks so spark-lish
>>>>>>>>>>> (distributed collections like) is something that is quite
>>>>>>>>>>>
>>>>>>>>>> interesting
>>>>>>>
>>>>>>>> too:
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>       val (sc, args) = ContextAndArgs(cmdlineArgs)
>>>>>>>>>>>       sc.textFile(args.getOrElse("input",
>> ExampleData.KING_LEAR))
>>>>>>>>>>>         .flatMap(_.split("[^a-zA-Z']+").filter(_.nonEmpty))
>>>>>>>>>>>         .countByValue()
>>>>>>>>>>>         .map(t => t._1 + ": " + t._2)
>>>>>>>>>>>         .saveAsTextFile(args("output"))
>>>>>>>>>>>       sc.close()
>>>>>>>>>>>
>>>>>>>>>>> They have a repl, and since the project is a bit young they
>> don't
>>>>>>>>>>>
>>>>>>>>>> support
>>>>>>>>>
>>>>>>>>>> all the advanced semantics of Beam, They also have a Hadoop File
>>>>>>>>>>> Sink/Source. I think it would be nice to work with them, but if
>>> it
>>>>>>>>>>>
>>>>>>>>>> is
>>>>>>>
>>>>>>>> not
>>>>>>>>>
>>>>>>>>>> possible, at least I think it is worth to coordinate some
>> sharing
>>>>>>>>>>>
>>>>>>>>>> e.g.
>>>>>>>>
>>>>>>>>> in
>>>>>>>>>
>>>>>>>>>> the Sink/Source area + other extensions.
>>>>>>>>>>>
>>>>>>>>>>> Aditionally their code is also under the Apache license.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Thu, Mar 24, 2016 at 9:20 PM, Jean-Baptiste Onofré <
>>>>>>>>>>>
>>>>>>>>>> jb@nanthrax.net
>>>>>>>>
>>>>>>>>>
>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>> Hi Raghu,
>>>>>>>>>>>>
>>>>>>>>>>>> I agree: we should provide SDK in different languages, and
>> DSLs
>>>>>>>>>>>>
>>>>>>>>>>> for
>>>>>>>
>>>>>>>> specific use cases.
>>>>>>>>>>>>
>>>>>>>>>>>> You got why I sent my proposal  ;)
>>>>>>>>>>>>
>>>>>>>>>>>> Regards
>>>>>>>>>>>> JB
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On 03/24/2016 07:14 PM, Raghu Angadi wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> I would love to see Scala API properly supported. I didn't
>> know
>>>>>>>>>>>>>
>>>>>>>>>>>> about
>>>>>>>>
>>>>>>>>> scio.
>>>>>>>>>>>>> Scala is such a natural fit for Dataflow API.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I am not sure of the policy w.r.t where such packages would
>>> live
>>>>>>>>>>>>>
>>>>>>>>>>>> in
>>>>>>>
>>>>>>>> Beam
>>>>>>>>>>
>>>>>>>>>>> repo, but I personally would write my Dataflow applications in
>>>>>>>>>>>>>
>>>>>>>>>>>> Scala.
>>>>>>>>
>>>>>>>>> It
>>>>>>>>>>
>>>>>>>>>>> is
>>>>>>>>>>>>> probably already the case but my request would be : it should
>>> be
>>>>>>>>>>>>>
>>>>>>>>>>>> as
>>>>>>>
>>>>>>>> thin
>>>>>>>>>>
>>>>>>>>>>> as
>>>>>>>>>>>>> reasonably possible (that might make it a bit less like
>>>>>>>>>>>>>
>>>>>>>>>>>> scalding/spark
>>>>>>>>>
>>>>>>>>>> API
>>>>>>>>>>>>> in some cases, which I think is a good compromise).
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Thu, Mar 24, 2016 at 6:01 AM, Jean-Baptiste Onofré <
>>>>>>>>>>>>>
>>>>>>>>>>>> jb@nanthrax.net
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> Hi beamers,
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> right now, Beam provides Java SDK.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> AFAIK, very soon, you should have the Python SDK ;)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Spotify created a Scala API on top of Google Dataflow SDK:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> https://github.com/spotify/scio
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> What do you think of asking if they want to donate this as
>>> Beam
>>>>>>>>>>>>>>
>>>>>>>>>>>>> Scala
>>>>>>>>>
>>>>>>>>>> SDK ?
>>>>>>>>>>>>>> I planned to work on a Scala SDK, but as it seems there's
>>>>>>>>>>>>>>
>>>>>>>>>>>>> already
>>>>>>>
>>>>>>>> something, it makes sense to leverage it.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thoughts ?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Regards
>>>>>>>>>>>>>> JB
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> Jean-Baptiste Onofré
>>>>>>>>>>>>>> jbonofre@apache.org
>>>>>>>>>>>>>> http://blog.nanthrax.net
>>>>>>>>>>>>>> Talend - http://www.talend.com
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>> Jean-Baptiste Onofré
>>>>>>>>>>>> jbonofre@apache.org
>>>>>>>>>>>> http://blog.nanthrax.net
>>>>>>>>>>>> Talend - http://www.talend.com
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>> --
>>>>> Jean-Baptiste Onofré
>>>>> jbonofre@apache.org
>>>>> http://blog.nanthrax.net
>>>>> Talend - http://www.talend.com
>>>>>
>>>>
>>>
>>
>

-- 
Jean-Baptiste Onofré
jbonofre@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com

Re: [PROPOSAL] New sdk languages

Posted by Frances Perry <fj...@google.com.INVALID>.
It sounded like we were going to wait a little bit for things (both Beam
and Scio) to stabilize. That sounds right to me, particularly given the way
we are rearchitecting things right now to make SDK extensibility much
easier and clarify the difference between new "core" SDKs (which require
runtime support) and new "wrapper" SDKs (which use an existing runtime).

But I'm sure there are others (like me!) who would be happy to be part of
the conversation if we're ready to move forward. Thanks!

On Fri, Apr 8, 2016 at 9:04 AM, Jean-Baptiste Onofré <jb...@nanthrax.net>
wrote:

> Hi Neville,
>
> can we have a chat to move forward on this ?
>
> Let me send you an e-mail to plan this.
>
> Thanks,
> Regards
> JB
>
> On 04/04/2016 09:51 PM, Neville Li wrote:
>
>> There's really no point of creating a Scala SDK from scratch to duplicate
>> all the apply/transform API, coders, etc. since one can call Java
>> libraries
>> directly and seamlessly in Scala and any competent Scala dev can write
>> Java
>> code in Scala, like what we're doing with the wrappers.
>>
>> We created Scio to make life easier for Scala devs and those we're working
>> with would agree.
>>
>> On Fri, Apr 1, 2016 at 5:26 AM Ismaël Mejía <ie...@gmail.com> wrote:
>>
>> Excellent questions,
>>>
>>> - Scio is more of a thin Scala wrapper/DSL on top of Java SDK to offer FP
>>>> style API similar to Spark/Scalding and a few other features often found
>>>>
>>> in
>>>
>>>> other Scala libraries, e.g. REPL, macro type providers, Futures. How do
>>>> those fit in the big picture? Is there a place for such high level DSLs?
>>>>
>>>
>>> I don't know what the others think, but I think a good approach would be
>>> to
>>> have
>>> the core functionality (the one that is semantically equal to the java
>>> one
>>> and
>>> covers the core dataflow model) as the 'official' beam bindings for scala
>>> and
>>> all the extra niceties as independent packages (beam-scala-repl,
>>> beam-scala-extra, etc).
>>>
>>> - What's the vision for feature parity between SDKs? Should they all
>>>>
>>> expose
>>>
>>>> the same apply/transform style API or have freedom to provide API
>>>>
>>> idiomatic
>>>
>>>> for the language?
>>>>
>>>
>>> I had the same doubts about idiomatic bindings for the SDKs, a new
>>> programmer
>>> who checks the Beam API in java (based on apply/transform) vs the scio
>>> scala API
>>> (based on distributed collections) will be a bit surprised (as I did
>>> coming
>>> from
>>> the spark world) because the styles are quite different even if the
>>> model/semantics are similar.
>>>
>>> I think dealing with different styles can make the project hard to
>>> approach, on
>>> the other hand I see the value of idiomatic versions. What do the others
>>> think ?
>>> What would be a good compromise here?
>>>
>>> Just as an extra point, maybe a good way to document the differences
>>> would
>>> be to
>>> provide docs like they do in the ReactiveX world with base concepts of
>>> the
>>> model
>>> and side notes for language specific syntax.
>>>
>>> http://reactivex.io/documentation/operators/flatmap.html
>>>
>>> Cheers,
>>> Ismaël
>>>
>>>
>>>
>>> On Fri, Apr 1, 2016 at 4:49 AM, Neville Li <ne...@gmail.com>
>>> wrote:
>>>
>>> I read some technical docs and have a few more questions.
>>>>
>>>> - Scio is more of a thin Scala wrapper/DSL on top of Java SDK to offer
>>>> FP
>>>> style API similar to Spark/Scalding and a few other features often found
>>>>
>>> in
>>>
>>>> other Scala libraries, e.g. REPL, macro type providers, Futures. How do
>>>> those fit in the big picture? Is there a place for such high level DSLs?
>>>> - Therefore it's not really a native SDK equivalent to the Java or
>>>> Python
>>>> SDK, does it fit in the /sdks/scala repo structure?
>>>> - What's the vision for feature parity between SDKs? Should they all
>>>>
>>> expose
>>>
>>>> the same apply/transform style API or have freedom to provide API
>>>>
>>> idiomatic
>>>
>>>> for the language?
>>>>
>>>> Asking these because we want to leverage both the Java SDK and the Scala
>>>> ecosystem and it'll be nice to have a vision for these things.
>>>>
>>>> Cheers,
>>>> Neville
>>>>
>>>> On Sat, Mar 26, 2016 at 5:39 PM Pierre Mage <pi...@gmail.com>
>>>>
>>> wrote:
>>>
>>>>
>>>> Hi Neville,
>>>>>
>>>>> I don't know how up to date this roadmap is but from "Apache Beam:
>>>>> Technical Vision":
>>>>>
>>>>>
>>>>>
>>>>
>>> https://docs.google.com/presentation/d/1E9seGPB_VXtY_KZP4HngDPTbsu5RVZFFaTlwEYa88Zw/edit#slide=id.g108d3a202f_0_287
>>>
>>>>
>>>>> And for more details:
>>>>>
>>>>>
>>>>>
>>>>
>>> https://docs.google.com/document/d/1UyAeugHxZmVlQ5cEWo_eOPgXNQA1oD-rGooWOSwAqh8/edit#heading=h.ywcvt1a9xcx1
>>>
>>>>
>>>>> On 26 March 2016 at 06:53, Jean-Baptiste Onofré <jb...@nanthrax.net>
>>>>>
>>>> wrote:
>>>
>>>>
>>>>> Hi Neville,
>>>>>>
>>>>>> that's great news, and the timeline is perfect !
>>>>>>
>>>>>> We are working on some refactoring & polishing on our side (Runner
>>>>>>
>>>>> API,
>>>
>>>> etc). So, one or two months is not a big deal !
>>>>>>
>>>>>> Let me know if I can help in any way.
>>>>>>
>>>>>> Thanks,
>>>>>> Regards
>>>>>> JB
>>>>>>
>>>>>>
>>>>>> On 03/25/2016 08:03 PM, Neville Li wrote:
>>>>>>
>>>>>> Thanks guys. Yes we'd love to donate the project but would also like
>>>>>>>
>>>>>> to
>>>>
>>>>> polish the API a bit first, like in the next month or two. What's
>>>>>>>
>>>>>> the
>>>
>>>> timeline like for BEAM and related projects?
>>>>>>>
>>>>>>> Will also read the technical docs and follow up later.
>>>>>>>
>>>>>>> On Fri, Mar 25, 2016, 12:55 AM Ismaël Mejía <ie...@gmail.com>
>>>>>>>
>>>>>> wrote:
>>>>
>>>>>
>>>>>>> Hello Neville,
>>>>>>>
>>>>>>>>
>>>>>>>> First congratulations guys, excellent job / API, the scalding
>>>>>>>>
>>>>>>> touches
>>>
>>>> are
>>>>>
>>>>>> pretty neat (as well as the Tap abstraction). I am also new to
>>>>>>>>
>>>>>>> Beam,
>>>
>>>> so
>>>>
>>>>> believe me, you guys already know more than me.
>>>>>>>>
>>>>>>>> In my comment I mentioned sessions referring to session windows,
>>>>>>>>
>>>>>>> but
>>>
>>>> it
>>>>
>>>>> was
>>>>>>>> my mistake since I just took a fast look at your code and initially
>>>>>>>> didn't
>>>>>>>> see them. Anyway if you are interested in the model there is a good
>>>>>>>> description of the current capabilities of the runners in the
>>>>>>>>
>>>>>>> website,
>>>>
>>>>>
>>>>>>>> https://beam.incubator.apache.org/capability-matrix/
>>>>>>>>
>>>>>>>> And the new additions to the model are openly discussed in the
>>>>>>>>
>>>>>>> mailing
>>>>
>>>>> list
>>>>>>>> and in the technical docs (e.g. lateness):
>>>>>>>>
>>>>>>>> https://goo.gl/ps8twC
>>>>>>>>
>>>>>>>> -Ismaël
>>>>>>>>
>>>>>>>> On Fri, Mar 25, 2016 at 8:36 AM, Neville Li <neville.lyh@gmail.com
>>>>>>>>
>>>>>>>
>>>> wrote:
>>>>>>>>
>>>>>>>> Thanks guys for the interest. I'm really excited about all the
>>>>>>>>
>>>>>>> feedbacks
>>>>>
>>>>>> from the community.
>>>>>>>>>
>>>>>>>>> A little background: we developed Scio to bring Google Cloud
>>>>>>>>>
>>>>>>>> Dataflow
>>>>
>>>>> closer to the Scalding/Spark ecosystem that our developers are
>>>>>>>>>
>>>>>>>> familiar
>>>>>
>>>>>> with while bringing some missing pieces to the table (type safe
>>>>>>>>> BigQuery,
>>>>>>>>> HDFS, REPL to name a few).
>>>>>>>>>
>>>>>>>>> I have to admit that I'm pretty new to the BEAM development but
>>>>>>>>>
>>>>>>>> would
>>>>
>>>>>
>>>>>>>>> love
>>>>>>>>
>>>>>>>> to get feedbacks and advices on how to bring Scio closer to BEAM
>>>>>>>>>
>>>>>>>> feature
>>>>>
>>>>>> set and semantics. Scio doesn't have to live with the BEAM code
>>>>>>>>>
>>>>>>>> base
>>>
>>>> just
>>>>>>>>> yet (we're still under heavy development) but I'd like to see it
>>>>>>>>>
>>>>>>>> as
>>>
>>>> a
>>>>
>>>>> de
>>>>>
>>>>>> facto Scala API endorsed by the BEAM community.
>>>>>>>>>
>>>>>>>>> @Ismaël: I'm curious what's this session thing you're referring
>>>>>>>>>
>>>>>>>> to?
>>>
>>>>
>>>>>>>>> On Thu, Mar 24, 2016 at 3:40 PM Frances Perry
>>>>>>>>>
>>>>>>>> <fjp@google.com.invalid
>>>>
>>>>>
>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> +Neville and Rafal for their take ;-)
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Excited to see this out. Multiple community driven SDKs are right
>>>>>>>>>>
>>>>>>>>> in
>>>>
>>>>>
>>>>>>>>>> line
>>>>>>>>>
>>>>>>>>
>>>>>>>> with our goals for Beam.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Thu, Mar 24, 2016 at 3:04 PM, Ismaël Mejía <iemejia@gmail.com
>>>>>>>>>>
>>>>>>>>>
>>>>
>>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>> Addendum: actually the semantic model support is not so far away
>>>>>>>>>>
>>>>>>>>> as I
>>>>
>>>>>
>>>>>>>>>>> said
>>>>>>>>>>
>>>>>>>>>> before (I havent finished reading and I thought they didn't
>>>>>>>>>>>
>>>>>>>>>> support
>>>>
>>>>> sessions), and looking at the git history the project is not so
>>>>>>>>>>>
>>>>>>>>>> young
>>>>>
>>>>>> either and it is quite active.
>>>>>>>>>>>
>>>>>>>>>>> On Thu, Mar 24, 2016 at 10:52 PM, Ismaël Mejía <
>>>>>>>>>>>
>>>>>>>>>> iemejia@gmail.com
>>>
>>>>
>>>>>
>>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> Hello,
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> I just checked a bit the code and what they have done is
>>>>>>>>>>>>
>>>>>>>>>>>> interesting,
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>> the
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> SCollection wrapper is worth a look, as well as the examples to
>>>>>>>>>>>
>>>>>>>>>> get
>>>>
>>>>>
>>>>>>>>>>>> an
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>> idea
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> of their intentions, the fact that the code looks so spark-lish
>>>>>>>>>>>> (distributed collections like) is something that is quite
>>>>>>>>>>>>
>>>>>>>>>>>> interesting
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>> too:
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>       val (sc, args) = ContextAndArgs(cmdlineArgs)
>>>>>>>>>>>>       sc.textFile(args.getOrElse("input",
>>>>>>>>>>>>
>>>>>>>>>>> ExampleData.KING_LEAR))
>>>
>>>>         .flatMap(_.split("[^a-zA-Z']+").filter(_.nonEmpty))
>>>>>>>>>>>>         .countByValue()
>>>>>>>>>>>>         .map(t => t._1 + ": " + t._2)
>>>>>>>>>>>>         .saveAsTextFile(args("output"))
>>>>>>>>>>>>       sc.close()
>>>>>>>>>>>>
>>>>>>>>>>>> They have a repl, and since the project is a bit young they
>>>>>>>>>>>>
>>>>>>>>>>> don't
>>>
>>>>
>>>>>>>>>>>> support
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> all the advanced semantics of Beam, They also have a Hadoop File
>>>>>>>>>>>
>>>>>>>>>>>> Sink/Source. I think it would be nice to work with them, but if
>>>>>>>>>>>>
>>>>>>>>>>> it
>>>>
>>>>>
>>>>>>>>>>>> is
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>> not
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> possible, at least I think it is worth to coordinate some
>>>>>>>>>>>
>>>>>>>>>> sharing
>>>
>>>>
>>>>>>>>>>>> e.g.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>> in
>>>>>>>>>>
>>>>>>>>>> the Sink/Source area + other extensions.
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Aditionally their code is also under the Apache license.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Thu, Mar 24, 2016 at 9:20 PM, Jean-Baptiste Onofré <
>>>>>>>>>>>>
>>>>>>>>>>>> jb@nanthrax.net
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Hi Raghu,
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> I agree: we should provide SDK in different languages, and
>>>>>>>>>>>>>
>>>>>>>>>>>> DSLs
>>>
>>>>
>>>>>>>>>>>>> for
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>> specific use cases.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>>> You got why I sent my proposal  ;)
>>>>>>>>>>>>>
>>>>>>>>>>>>> Regards
>>>>>>>>>>>>> JB
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 03/24/2016 07:14 PM, Raghu Angadi wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> I would love to see Scala API properly supported. I didn't
>>>>>>>>>>>>>
>>>>>>>>>>>> know
>>>
>>>>
>>>>>>>>>>>>>> about
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>> scio.
>>>>>>>>>>
>>>>>>>>>>> Scala is such a natural fit for Dataflow API.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I am not sure of the policy w.r.t where such packages would
>>>>>>>>>>>>>>
>>>>>>>>>>>>> live
>>>>
>>>>>
>>>>>>>>>>>>>> in
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>> Beam
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> repo, but I personally would write my Dataflow applications in
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Scala.
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>> It
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> is
>>>>>>>>>>>>
>>>>>>>>>>>>> probably already the case but my request would be : it should
>>>>>>>>>>>>>>
>>>>>>>>>>>>> be
>>>>
>>>>>
>>>>>>>>>>>>>> as
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>> thin
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> as
>>>>>>>>>>>>
>>>>>>>>>>>>> reasonably possible (that might make it a bit less like
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> scalding/spark
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>> API
>>>>>>>>>>>
>>>>>>>>>>>> in some cases, which I think is a good compromise).
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Thu, Mar 24, 2016 at 6:01 AM, Jean-Baptiste Onofré <
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> jb@nanthrax.net
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi beamers,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> right now, Beam provides Java SDK.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> AFAIK, very soon, you should have the Python SDK ;)
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Spotify created a Scala API on top of Google Dataflow SDK:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> https://github.com/spotify/scio
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> What do you think of asking if they want to donate this as
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Beam
>>>>
>>>>>
>>>>>>>>>>>>>>> Scala
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>> SDK ?
>>>>>>>>>>>
>>>>>>>>>>>> I planned to work on a Scala SDK, but as it seems there's
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> already
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>> something, it makes sense to leverage it.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>>>>> Thoughts ?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Regards
>>>>>>>>>>>>>>> JB
>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>> Jean-Baptiste Onofré
>>>>>>>>>>>>>>> jbonofre@apache.org
>>>>>>>>>>>>>>> http://blog.nanthrax.net
>>>>>>>>>>>>>>> Talend - http://www.talend.com
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>
>>>>>>>>>>>>> Jean-Baptiste Onofré
>>>>>>>>>>>>> jbonofre@apache.org
>>>>>>>>>>>>> http://blog.nanthrax.net
>>>>>>>>>>>>> Talend - http://www.talend.com
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>> --
>>>>>> Jean-Baptiste Onofré
>>>>>> jbonofre@apache.org
>>>>>> http://blog.nanthrax.net
>>>>>> Talend - http://www.talend.com
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
> --
> Jean-Baptiste Onofré
> jbonofre@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>

Re: [PROPOSAL] New sdk languages

Posted by Jean-Baptiste Onofré <jb...@nanthrax.net>.
Hi Neville,

can we have a chat to move forward on this ?

Let me send you an e-mail to plan this.

Thanks,
Regards
JB

On 04/04/2016 09:51 PM, Neville Li wrote:
> There's really no point of creating a Scala SDK from scratch to duplicate
> all the apply/transform API, coders, etc. since one can call Java libraries
> directly and seamlessly in Scala and any competent Scala dev can write Java
> code in Scala, like what we're doing with the wrappers.
>
> We created Scio to make life easier for Scala devs and those we're working
> with would agree.
>
> On Fri, Apr 1, 2016 at 5:26 AM Ismaël Mejía <ie...@gmail.com> wrote:
>
>> Excellent questions,
>>
>>> - Scio is more of a thin Scala wrapper/DSL on top of Java SDK to offer FP
>>> style API similar to Spark/Scalding and a few other features often found
>> in
>>> other Scala libraries, e.g. REPL, macro type providers, Futures. How do
>>> those fit in the big picture? Is there a place for such high level DSLs?
>>
>> I don't know what the others think, but I think a good approach would be to
>> have
>> the core functionality (the one that is semantically equal to the java one
>> and
>> covers the core dataflow model) as the 'official' beam bindings for scala
>> and
>> all the extra niceties as independent packages (beam-scala-repl,
>> beam-scala-extra, etc).
>>
>>> - What's the vision for feature parity between SDKs? Should they all
>> expose
>>> the same apply/transform style API or have freedom to provide API
>> idiomatic
>>> for the language?
>>
>> I had the same doubts about idiomatic bindings for the SDKs, a new
>> programmer
>> who checks the Beam API in java (based on apply/transform) vs the scio
>> scala API
>> (based on distributed collections) will be a bit surprised (as I did coming
>> from
>> the spark world) because the styles are quite different even if the
>> model/semantics are similar.
>>
>> I think dealing with different styles can make the project hard to
>> approach, on
>> the other hand I see the value of idiomatic versions. What do the others
>> think ?
>> What would be a good compromise here?
>>
>> Just as an extra point, maybe a good way to document the differences would
>> be to
>> provide docs like they do in the ReactiveX world with base concepts of the
>> model
>> and side notes for language specific syntax.
>>
>> http://reactivex.io/documentation/operators/flatmap.html
>>
>> Cheers,
>> Ismaël
>>
>>
>>
>> On Fri, Apr 1, 2016 at 4:49 AM, Neville Li <ne...@gmail.com> wrote:
>>
>>> I read some technical docs and have a few more questions.
>>>
>>> - Scio is more of a thin Scala wrapper/DSL on top of Java SDK to offer FP
>>> style API similar to Spark/Scalding and a few other features often found
>> in
>>> other Scala libraries, e.g. REPL, macro type providers, Futures. How do
>>> those fit in the big picture? Is there a place for such high level DSLs?
>>> - Therefore it's not really a native SDK equivalent to the Java or Python
>>> SDK, does it fit in the /sdks/scala repo structure?
>>> - What's the vision for feature parity between SDKs? Should they all
>> expose
>>> the same apply/transform style API or have freedom to provide API
>> idiomatic
>>> for the language?
>>>
>>> Asking these because we want to leverage both the Java SDK and the Scala
>>> ecosystem and it'll be nice to have a vision for these things.
>>>
>>> Cheers,
>>> Neville
>>>
>>> On Sat, Mar 26, 2016 at 5:39 PM Pierre Mage <pi...@gmail.com>
>> wrote:
>>>
>>>> Hi Neville,
>>>>
>>>> I don't know how up to date this roadmap is but from "Apache Beam:
>>>> Technical Vision":
>>>>
>>>>
>>>
>> https://docs.google.com/presentation/d/1E9seGPB_VXtY_KZP4HngDPTbsu5RVZFFaTlwEYa88Zw/edit#slide=id.g108d3a202f_0_287
>>>>
>>>> And for more details:
>>>>
>>>>
>>>
>> https://docs.google.com/document/d/1UyAeugHxZmVlQ5cEWo_eOPgXNQA1oD-rGooWOSwAqh8/edit#heading=h.ywcvt1a9xcx1
>>>>
>>>> On 26 March 2016 at 06:53, Jean-Baptiste Onofré <jb...@nanthrax.net>
>> wrote:
>>>>
>>>>> Hi Neville,
>>>>>
>>>>> that's great news, and the timeline is perfect !
>>>>>
>>>>> We are working on some refactoring & polishing on our side (Runner
>> API,
>>>>> etc). So, one or two months is not a big deal !
>>>>>
>>>>> Let me know if I can help in any way.
>>>>>
>>>>> Thanks,
>>>>> Regards
>>>>> JB
>>>>>
>>>>>
>>>>> On 03/25/2016 08:03 PM, Neville Li wrote:
>>>>>
>>>>>> Thanks guys. Yes we'd love to donate the project but would also like
>>> to
>>>>>> polish the API a bit first, like in the next month or two. What's
>> the
>>>>>> timeline like for BEAM and related projects?
>>>>>>
>>>>>> Will also read the technical docs and follow up later.
>>>>>>
>>>>>> On Fri, Mar 25, 2016, 12:55 AM Ismaël Mejía <ie...@gmail.com>
>>> wrote:
>>>>>>
>>>>>> Hello Neville,
>>>>>>>
>>>>>>> First congratulations guys, excellent job / API, the scalding
>> touches
>>>> are
>>>>>>> pretty neat (as well as the Tap abstraction). I am also new to
>> Beam,
>>> so
>>>>>>> believe me, you guys already know more than me.
>>>>>>>
>>>>>>> In my comment I mentioned sessions referring to session windows,
>> but
>>> it
>>>>>>> was
>>>>>>> my mistake since I just took a fast look at your code and initially
>>>>>>> didn't
>>>>>>> see them. Anyway if you are interested in the model there is a good
>>>>>>> description of the current capabilities of the runners in the
>>> website,
>>>>>>>
>>>>>>> https://beam.incubator.apache.org/capability-matrix/
>>>>>>>
>>>>>>> And the new additions to the model are openly discussed in the
>>> mailing
>>>>>>> list
>>>>>>> and in the technical docs (e.g. lateness):
>>>>>>>
>>>>>>> https://goo.gl/ps8twC
>>>>>>>
>>>>>>> -Ismaël
>>>>>>>
>>>>>>> On Fri, Mar 25, 2016 at 8:36 AM, Neville Li <neville.lyh@gmail.com
>>>
>>>>>>> wrote:
>>>>>>>
>>>>>>> Thanks guys for the interest. I'm really excited about all the
>>>> feedbacks
>>>>>>>> from the community.
>>>>>>>>
>>>>>>>> A little background: we developed Scio to bring Google Cloud
>>> Dataflow
>>>>>>>> closer to the Scalding/Spark ecosystem that our developers are
>>>> familiar
>>>>>>>> with while bringing some missing pieces to the table (type safe
>>>>>>>> BigQuery,
>>>>>>>> HDFS, REPL to name a few).
>>>>>>>>
>>>>>>>> I have to admit that I'm pretty new to the BEAM development but
>>> would
>>>>>>>>
>>>>>>> love
>>>>>>>
>>>>>>>> to get feedbacks and advices on how to bring Scio closer to BEAM
>>>> feature
>>>>>>>> set and semantics. Scio doesn't have to live with the BEAM code
>> base
>>>>>>>> just
>>>>>>>> yet (we're still under heavy development) but I'd like to see it
>> as
>>> a
>>>> de
>>>>>>>> facto Scala API endorsed by the BEAM community.
>>>>>>>>
>>>>>>>> @Ismaël: I'm curious what's this session thing you're referring
>> to?
>>>>>>>>
>>>>>>>> On Thu, Mar 24, 2016 at 3:40 PM Frances Perry
>>> <fjp@google.com.invalid
>>>>>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>> +Neville and Rafal for their take ;-)
>>>>>>>>>
>>>>>>>>> Excited to see this out. Multiple community driven SDKs are right
>>> in
>>>>>>>>>
>>>>>>>> line
>>>>>>>
>>>>>>>> with our goals for Beam.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Thu, Mar 24, 2016 at 3:04 PM, Ismaël Mejía <iemejia@gmail.com
>>>
>>>>>>>>>
>>>>>>>> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>>> Addendum: actually the semantic model support is not so far away
>>> as I
>>>>>>>>>>
>>>>>>>>> said
>>>>>>>>>
>>>>>>>>>> before (I havent finished reading and I thought they didn't
>>> support
>>>>>>>>>> sessions), and looking at the git history the project is not so
>>>> young
>>>>>>>>>> either and it is quite active.
>>>>>>>>>>
>>>>>>>>>> On Thu, Mar 24, 2016 at 10:52 PM, Ismaël Mejía <
>> iemejia@gmail.com
>>>>
>>>>>>>>>>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Hello,
>>>>>>>>>>>
>>>>>>>>>>> I just checked a bit the code and what they have done is
>>>>>>>>>>>
>>>>>>>>>> interesting,
>>>>>>>
>>>>>>>> the
>>>>>>>>>
>>>>>>>>>> SCollection wrapper is worth a look, as well as the examples to
>>> get
>>>>>>>>>>>
>>>>>>>>>> an
>>>>>>>>
>>>>>>>>> idea
>>>>>>>>>>
>>>>>>>>>>> of their intentions, the fact that the code looks so spark-lish
>>>>>>>>>>> (distributed collections like) is something that is quite
>>>>>>>>>>>
>>>>>>>>>> interesting
>>>>>>>
>>>>>>>> too:
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>       val (sc, args) = ContextAndArgs(cmdlineArgs)
>>>>>>>>>>>       sc.textFile(args.getOrElse("input",
>> ExampleData.KING_LEAR))
>>>>>>>>>>>         .flatMap(_.split("[^a-zA-Z']+").filter(_.nonEmpty))
>>>>>>>>>>>         .countByValue()
>>>>>>>>>>>         .map(t => t._1 + ": " + t._2)
>>>>>>>>>>>         .saveAsTextFile(args("output"))
>>>>>>>>>>>       sc.close()
>>>>>>>>>>>
>>>>>>>>>>> They have a repl, and since the project is a bit young they
>> don't
>>>>>>>>>>>
>>>>>>>>>> support
>>>>>>>>>
>>>>>>>>>> all the advanced semantics of Beam, They also have a Hadoop File
>>>>>>>>>>> Sink/Source. I think it would be nice to work with them, but if
>>> it
>>>>>>>>>>>
>>>>>>>>>> is
>>>>>>>
>>>>>>>> not
>>>>>>>>>
>>>>>>>>>> possible, at least I think it is worth to coordinate some
>> sharing
>>>>>>>>>>>
>>>>>>>>>> e.g.
>>>>>>>>
>>>>>>>>> in
>>>>>>>>>
>>>>>>>>>> the Sink/Source area + other extensions.
>>>>>>>>>>>
>>>>>>>>>>> Aditionally their code is also under the Apache license.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Thu, Mar 24, 2016 at 9:20 PM, Jean-Baptiste Onofré <
>>>>>>>>>>>
>>>>>>>>>> jb@nanthrax.net
>>>>>>>>
>>>>>>>>>
>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>> Hi Raghu,
>>>>>>>>>>>>
>>>>>>>>>>>> I agree: we should provide SDK in different languages, and
>> DSLs
>>>>>>>>>>>>
>>>>>>>>>>> for
>>>>>>>
>>>>>>>> specific use cases.
>>>>>>>>>>>>
>>>>>>>>>>>> You got why I sent my proposal  ;)
>>>>>>>>>>>>
>>>>>>>>>>>> Regards
>>>>>>>>>>>> JB
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On 03/24/2016 07:14 PM, Raghu Angadi wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> I would love to see Scala API properly supported. I didn't
>> know
>>>>>>>>>>>>>
>>>>>>>>>>>> about
>>>>>>>>
>>>>>>>>> scio.
>>>>>>>>>>>>> Scala is such a natural fit for Dataflow API.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I am not sure of the policy w.r.t where such packages would
>>> live
>>>>>>>>>>>>>
>>>>>>>>>>>> in
>>>>>>>
>>>>>>>> Beam
>>>>>>>>>>
>>>>>>>>>>> repo, but I personally would write my Dataflow applications in
>>>>>>>>>>>>>
>>>>>>>>>>>> Scala.
>>>>>>>>
>>>>>>>>> It
>>>>>>>>>>
>>>>>>>>>>> is
>>>>>>>>>>>>> probably already the case but my request would be : it should
>>> be
>>>>>>>>>>>>>
>>>>>>>>>>>> as
>>>>>>>
>>>>>>>> thin
>>>>>>>>>>
>>>>>>>>>>> as
>>>>>>>>>>>>> reasonably possible (that might make it a bit less like
>>>>>>>>>>>>>
>>>>>>>>>>>> scalding/spark
>>>>>>>>>
>>>>>>>>>> API
>>>>>>>>>>>>> in some cases, which I think is a good compromise).
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Thu, Mar 24, 2016 at 6:01 AM, Jean-Baptiste Onofré <
>>>>>>>>>>>>>
>>>>>>>>>>>> jb@nanthrax.net
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> Hi beamers,
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> right now, Beam provides Java SDK.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> AFAIK, very soon, you should have the Python SDK ;)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Spotify created a Scala API on top of Google Dataflow SDK:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> https://github.com/spotify/scio
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> What do you think of asking if they want to donate this as
>>> Beam
>>>>>>>>>>>>>>
>>>>>>>>>>>>> Scala
>>>>>>>>>
>>>>>>>>>> SDK ?
>>>>>>>>>>>>>> I planned to work on a Scala SDK, but as it seems there's
>>>>>>>>>>>>>>
>>>>>>>>>>>>> already
>>>>>>>
>>>>>>>> something, it makes sense to leverage it.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thoughts ?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Regards
>>>>>>>>>>>>>> JB
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> Jean-Baptiste Onofré
>>>>>>>>>>>>>> jbonofre@apache.org
>>>>>>>>>>>>>> http://blog.nanthrax.net
>>>>>>>>>>>>>> Talend - http://www.talend.com
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>> Jean-Baptiste Onofré
>>>>>>>>>>>> jbonofre@apache.org
>>>>>>>>>>>> http://blog.nanthrax.net
>>>>>>>>>>>> Talend - http://www.talend.com
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>> --
>>>>> Jean-Baptiste Onofré
>>>>> jbonofre@apache.org
>>>>> http://blog.nanthrax.net
>>>>> Talend - http://www.talend.com
>>>>>
>>>>
>>>
>>
>

-- 
Jean-Baptiste Onofré
jbonofre@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com

Re: [PROPOSAL] New sdk languages

Posted by Neville Li <ne...@gmail.com>.
There's really no point of creating a Scala SDK from scratch to duplicate
all the apply/transform API, coders, etc. since one can call Java libraries
directly and seamlessly in Scala and any competent Scala dev can write Java
code in Scala, like what we're doing with the wrappers.

We created Scio to make life easier for Scala devs and those we're working
with would agree.

On Fri, Apr 1, 2016 at 5:26 AM Ismaël Mejía <ie...@gmail.com> wrote:

> Excellent questions,
>
> > - Scio is more of a thin Scala wrapper/DSL on top of Java SDK to offer FP
> > style API similar to Spark/Scalding and a few other features often found
> in
> > other Scala libraries, e.g. REPL, macro type providers, Futures. How do
> > those fit in the big picture? Is there a place for such high level DSLs?
>
> I don't know what the others think, but I think a good approach would be to
> have
> the core functionality (the one that is semantically equal to the java one
> and
> covers the core dataflow model) as the 'official' beam bindings for scala
> and
> all the extra niceties as independent packages (beam-scala-repl,
> beam-scala-extra, etc).
>
> > - What's the vision for feature parity between SDKs? Should they all
> expose
> > the same apply/transform style API or have freedom to provide API
> idiomatic
> > for the language?
>
> I had the same doubts about idiomatic bindings for the SDKs, a new
> programmer
> who checks the Beam API in java (based on apply/transform) vs the scio
> scala API
> (based on distributed collections) will be a bit surprised (as I did coming
> from
> the spark world) because the styles are quite different even if the
> model/semantics are similar.
>
> I think dealing with different styles can make the project hard to
> approach, on
> the other hand I see the value of idiomatic versions. What do the others
> think ?
> What would be a good compromise here?
>
> Just as an extra point, maybe a good way to document the differences would
> be to
> provide docs like they do in the ReactiveX world with base concepts of the
> model
> and side notes for language specific syntax.
>
> http://reactivex.io/documentation/operators/flatmap.html
>
> Cheers,
> Ismaël
>
>
>
> On Fri, Apr 1, 2016 at 4:49 AM, Neville Li <ne...@gmail.com> wrote:
>
> > I read some technical docs and have a few more questions.
> >
> > - Scio is more of a thin Scala wrapper/DSL on top of Java SDK to offer FP
> > style API similar to Spark/Scalding and a few other features often found
> in
> > other Scala libraries, e.g. REPL, macro type providers, Futures. How do
> > those fit in the big picture? Is there a place for such high level DSLs?
> > - Therefore it's not really a native SDK equivalent to the Java or Python
> > SDK, does it fit in the /sdks/scala repo structure?
> > - What's the vision for feature parity between SDKs? Should they all
> expose
> > the same apply/transform style API or have freedom to provide API
> idiomatic
> > for the language?
> >
> > Asking these because we want to leverage both the Java SDK and the Scala
> > ecosystem and it'll be nice to have a vision for these things.
> >
> > Cheers,
> > Neville
> >
> > On Sat, Mar 26, 2016 at 5:39 PM Pierre Mage <pi...@gmail.com>
> wrote:
> >
> > > Hi Neville,
> > >
> > > I don't know how up to date this roadmap is but from "Apache Beam:
> > > Technical Vision":
> > >
> > >
> >
> https://docs.google.com/presentation/d/1E9seGPB_VXtY_KZP4HngDPTbsu5RVZFFaTlwEYa88Zw/edit#slide=id.g108d3a202f_0_287
> > >
> > > And for more details:
> > >
> > >
> >
> https://docs.google.com/document/d/1UyAeugHxZmVlQ5cEWo_eOPgXNQA1oD-rGooWOSwAqh8/edit#heading=h.ywcvt1a9xcx1
> > >
> > > On 26 March 2016 at 06:53, Jean-Baptiste Onofré <jb...@nanthrax.net>
> wrote:
> > >
> > > > Hi Neville,
> > > >
> > > > that's great news, and the timeline is perfect !
> > > >
> > > > We are working on some refactoring & polishing on our side (Runner
> API,
> > > > etc). So, one or two months is not a big deal !
> > > >
> > > > Let me know if I can help in any way.
> > > >
> > > > Thanks,
> > > > Regards
> > > > JB
> > > >
> > > >
> > > > On 03/25/2016 08:03 PM, Neville Li wrote:
> > > >
> > > >> Thanks guys. Yes we'd love to donate the project but would also like
> > to
> > > >> polish the API a bit first, like in the next month or two. What's
> the
> > > >> timeline like for BEAM and related projects?
> > > >>
> > > >> Will also read the technical docs and follow up later.
> > > >>
> > > >> On Fri, Mar 25, 2016, 12:55 AM Ismaël Mejía <ie...@gmail.com>
> > wrote:
> > > >>
> > > >> Hello Neville,
> > > >>>
> > > >>> First congratulations guys, excellent job / API, the scalding
> touches
> > > are
> > > >>> pretty neat (as well as the Tap abstraction). I am also new to
> Beam,
> > so
> > > >>> believe me, you guys already know more than me.
> > > >>>
> > > >>> In my comment I mentioned sessions referring to session windows,
> but
> > it
> > > >>> was
> > > >>> my mistake since I just took a fast look at your code and initially
> > > >>> didn't
> > > >>> see them. Anyway if you are interested in the model there is a good
> > > >>> description of the current capabilities of the runners in the
> > website,
> > > >>>
> > > >>> https://beam.incubator.apache.org/capability-matrix/
> > > >>>
> > > >>> And the new additions to the model are openly discussed in the
> > mailing
> > > >>> list
> > > >>> and in the technical docs (e.g. lateness):
> > > >>>
> > > >>> https://goo.gl/ps8twC
> > > >>>
> > > >>> -Ismaël
> > > >>>
> > > >>> On Fri, Mar 25, 2016 at 8:36 AM, Neville Li <neville.lyh@gmail.com
> >
> > > >>> wrote:
> > > >>>
> > > >>> Thanks guys for the interest. I'm really excited about all the
> > > feedbacks
> > > >>>> from the community.
> > > >>>>
> > > >>>> A little background: we developed Scio to bring Google Cloud
> > Dataflow
> > > >>>> closer to the Scalding/Spark ecosystem that our developers are
> > > familiar
> > > >>>> with while bringing some missing pieces to the table (type safe
> > > >>>> BigQuery,
> > > >>>> HDFS, REPL to name a few).
> > > >>>>
> > > >>>> I have to admit that I'm pretty new to the BEAM development but
> > would
> > > >>>>
> > > >>> love
> > > >>>
> > > >>>> to get feedbacks and advices on how to bring Scio closer to BEAM
> > > feature
> > > >>>> set and semantics. Scio doesn't have to live with the BEAM code
> base
> > > >>>> just
> > > >>>> yet (we're still under heavy development) but I'd like to see it
> as
> > a
> > > de
> > > >>>> facto Scala API endorsed by the BEAM community.
> > > >>>>
> > > >>>> @Ismaël: I'm curious what's this session thing you're referring
> to?
> > > >>>>
> > > >>>> On Thu, Mar 24, 2016 at 3:40 PM Frances Perry
> > <fjp@google.com.invalid
> > > >
> > > >>>> wrote:
> > > >>>>
> > > >>>> +Neville and Rafal for their take ;-)
> > > >>>>>
> > > >>>>> Excited to see this out. Multiple community driven SDKs are right
> > in
> > > >>>>>
> > > >>>> line
> > > >>>
> > > >>>> with our goals for Beam.
> > > >>>>>
> > > >>>>>
> > > >>>>> On Thu, Mar 24, 2016 at 3:04 PM, Ismaël Mejía <iemejia@gmail.com
> >
> > > >>>>>
> > > >>>> wrote:
> > > >>>
> > > >>>>
> > > >>>>> Addendum: actually the semantic model support is not so far away
> > as I
> > > >>>>>>
> > > >>>>> said
> > > >>>>>
> > > >>>>>> before (I havent finished reading and I thought they didn't
> > support
> > > >>>>>> sessions), and looking at the git history the project is not so
> > > young
> > > >>>>>> either and it is quite active.
> > > >>>>>>
> > > >>>>>> On Thu, Mar 24, 2016 at 10:52 PM, Ismaël Mejía <
> iemejia@gmail.com
> > >
> > > >>>>>>
> > > >>>>> wrote:
> > > >>>>>
> > > >>>>>>
> > > >>>>>> Hello,
> > > >>>>>>>
> > > >>>>>>> I just checked a bit the code and what they have done is
> > > >>>>>>>
> > > >>>>>> interesting,
> > > >>>
> > > >>>> the
> > > >>>>>
> > > >>>>>> SCollection wrapper is worth a look, as well as the examples to
> > get
> > > >>>>>>>
> > > >>>>>> an
> > > >>>>
> > > >>>>> idea
> > > >>>>>>
> > > >>>>>>> of their intentions, the fact that the code looks so spark-lish
> > > >>>>>>> (distributed collections like) is something that is quite
> > > >>>>>>>
> > > >>>>>> interesting
> > > >>>
> > > >>>> too:
> > > >>>>>>
> > > >>>>>>>
> > > >>>>>>>      val (sc, args) = ContextAndArgs(cmdlineArgs)
> > > >>>>>>>      sc.textFile(args.getOrElse("input",
> ExampleData.KING_LEAR))
> > > >>>>>>>        .flatMap(_.split("[^a-zA-Z']+").filter(_.nonEmpty))
> > > >>>>>>>        .countByValue()
> > > >>>>>>>        .map(t => t._1 + ": " + t._2)
> > > >>>>>>>        .saveAsTextFile(args("output"))
> > > >>>>>>>      sc.close()
> > > >>>>>>>
> > > >>>>>>> They have a repl, and since the project is a bit young they
> don't
> > > >>>>>>>
> > > >>>>>> support
> > > >>>>>
> > > >>>>>> all the advanced semantics of Beam, They also have a Hadoop File
> > > >>>>>>> Sink/Source. I think it would be nice to work with them, but if
> > it
> > > >>>>>>>
> > > >>>>>> is
> > > >>>
> > > >>>> not
> > > >>>>>
> > > >>>>>> possible, at least I think it is worth to coordinate some
> sharing
> > > >>>>>>>
> > > >>>>>> e.g.
> > > >>>>
> > > >>>>> in
> > > >>>>>
> > > >>>>>> the Sink/Source area + other extensions.
> > > >>>>>>>
> > > >>>>>>> Aditionally their code is also under the Apache license.
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>> On Thu, Mar 24, 2016 at 9:20 PM, Jean-Baptiste Onofré <
> > > >>>>>>>
> > > >>>>>> jb@nanthrax.net
> > > >>>>
> > > >>>>>
> > > >>>>>> wrote:
> > > >>>>>>>
> > > >>>>>>> Hi Raghu,
> > > >>>>>>>>
> > > >>>>>>>> I agree: we should provide SDK in different languages, and
> DSLs
> > > >>>>>>>>
> > > >>>>>>> for
> > > >>>
> > > >>>> specific use cases.
> > > >>>>>>>>
> > > >>>>>>>> You got why I sent my proposal  ;)
> > > >>>>>>>>
> > > >>>>>>>> Regards
> > > >>>>>>>> JB
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>> On 03/24/2016 07:14 PM, Raghu Angadi wrote:
> > > >>>>>>>>
> > > >>>>>>>> I would love to see Scala API properly supported. I didn't
> know
> > > >>>>>>>>>
> > > >>>>>>>> about
> > > >>>>
> > > >>>>> scio.
> > > >>>>>>>>> Scala is such a natural fit for Dataflow API.
> > > >>>>>>>>>
> > > >>>>>>>>> I am not sure of the policy w.r.t where such packages would
> > live
> > > >>>>>>>>>
> > > >>>>>>>> in
> > > >>>
> > > >>>> Beam
> > > >>>>>>
> > > >>>>>>> repo, but I personally would write my Dataflow applications in
> > > >>>>>>>>>
> > > >>>>>>>> Scala.
> > > >>>>
> > > >>>>> It
> > > >>>>>>
> > > >>>>>>> is
> > > >>>>>>>>> probably already the case but my request would be : it should
> > be
> > > >>>>>>>>>
> > > >>>>>>>> as
> > > >>>
> > > >>>> thin
> > > >>>>>>
> > > >>>>>>> as
> > > >>>>>>>>> reasonably possible (that might make it a bit less like
> > > >>>>>>>>>
> > > >>>>>>>> scalding/spark
> > > >>>>>
> > > >>>>>> API
> > > >>>>>>>>> in some cases, which I think is a good compromise).
> > > >>>>>>>>>
> > > >>>>>>>>> On Thu, Mar 24, 2016 at 6:01 AM, Jean-Baptiste Onofré <
> > > >>>>>>>>>
> > > >>>>>>>> jb@nanthrax.net
> > > >>>>>
> > > >>>>>>
> > > >>>>>>> wrote:
> > > >>>>>>>>>
> > > >>>>>>>>> Hi beamers,
> > > >>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>>>>> right now, Beam provides Java SDK.
> > > >>>>>>>>>>
> > > >>>>>>>>>> AFAIK, very soon, you should have the Python SDK ;)
> > > >>>>>>>>>>
> > > >>>>>>>>>> Spotify created a Scala API on top of Google Dataflow SDK:
> > > >>>>>>>>>>
> > > >>>>>>>>>> https://github.com/spotify/scio
> > > >>>>>>>>>>
> > > >>>>>>>>>> What do you think of asking if they want to donate this as
> > Beam
> > > >>>>>>>>>>
> > > >>>>>>>>> Scala
> > > >>>>>
> > > >>>>>> SDK ?
> > > >>>>>>>>>> I planned to work on a Scala SDK, but as it seems there's
> > > >>>>>>>>>>
> > > >>>>>>>>> already
> > > >>>
> > > >>>> something, it makes sense to leverage it.
> > > >>>>>>>>>>
> > > >>>>>>>>>> Thoughts ?
> > > >>>>>>>>>>
> > > >>>>>>>>>> Regards
> > > >>>>>>>>>> JB
> > > >>>>>>>>>> --
> > > >>>>>>>>>> Jean-Baptiste Onofré
> > > >>>>>>>>>> jbonofre@apache.org
> > > >>>>>>>>>> http://blog.nanthrax.net
> > > >>>>>>>>>> Talend - http://www.talend.com
> > > >>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>>>> --
> > > >>>>>>>> Jean-Baptiste Onofré
> > > >>>>>>>> jbonofre@apache.org
> > > >>>>>>>> http://blog.nanthrax.net
> > > >>>>>>>> Talend - http://www.talend.com
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>
> > > >>>>>
> > > >>>>
> > > >>>
> > > >>
> > > > --
> > > > Jean-Baptiste Onofré
> > > > jbonofre@apache.org
> > > > http://blog.nanthrax.net
> > > > Talend - http://www.talend.com
> > > >
> > >
> >
>

Re: [PROPOSAL] New sdk languages

Posted by Ismaël Mejía <ie...@gmail.com>.
Excellent questions,

> - Scio is more of a thin Scala wrapper/DSL on top of Java SDK to offer FP
> style API similar to Spark/Scalding and a few other features often found
in
> other Scala libraries, e.g. REPL, macro type providers, Futures. How do
> those fit in the big picture? Is there a place for such high level DSLs?

I don't know what the others think, but I think a good approach would be to
have
the core functionality (the one that is semantically equal to the java one
and
covers the core dataflow model) as the 'official' beam bindings for scala
and
all the extra niceties as independent packages (beam-scala-repl,
beam-scala-extra, etc).

> - What's the vision for feature parity between SDKs? Should they all
expose
> the same apply/transform style API or have freedom to provide API
idiomatic
> for the language?

I had the same doubts about idiomatic bindings for the SDKs, a new
programmer
who checks the Beam API in java (based on apply/transform) vs the scio
scala API
(based on distributed collections) will be a bit surprised (as I did coming
from
the spark world) because the styles are quite different even if the
model/semantics are similar.

I think dealing with different styles can make the project hard to
approach, on
the other hand I see the value of idiomatic versions. What do the others
think ?
What would be a good compromise here?

Just as an extra point, maybe a good way to document the differences would
be to
provide docs like they do in the ReactiveX world with base concepts of the
model
and side notes for language specific syntax.

http://reactivex.io/documentation/operators/flatmap.html

Cheers,
Ismaël



On Fri, Apr 1, 2016 at 4:49 AM, Neville Li <ne...@gmail.com> wrote:

> I read some technical docs and have a few more questions.
>
> - Scio is more of a thin Scala wrapper/DSL on top of Java SDK to offer FP
> style API similar to Spark/Scalding and a few other features often found in
> other Scala libraries, e.g. REPL, macro type providers, Futures. How do
> those fit in the big picture? Is there a place for such high level DSLs?
> - Therefore it's not really a native SDK equivalent to the Java or Python
> SDK, does it fit in the /sdks/scala repo structure?
> - What's the vision for feature parity between SDKs? Should they all expose
> the same apply/transform style API or have freedom to provide API idiomatic
> for the language?
>
> Asking these because we want to leverage both the Java SDK and the Scala
> ecosystem and it'll be nice to have a vision for these things.
>
> Cheers,
> Neville
>
> On Sat, Mar 26, 2016 at 5:39 PM Pierre Mage <pi...@gmail.com> wrote:
>
> > Hi Neville,
> >
> > I don't know how up to date this roadmap is but from "Apache Beam:
> > Technical Vision":
> >
> >
> https://docs.google.com/presentation/d/1E9seGPB_VXtY_KZP4HngDPTbsu5RVZFFaTlwEYa88Zw/edit#slide=id.g108d3a202f_0_287
> >
> > And for more details:
> >
> >
> https://docs.google.com/document/d/1UyAeugHxZmVlQ5cEWo_eOPgXNQA1oD-rGooWOSwAqh8/edit#heading=h.ywcvt1a9xcx1
> >
> > On 26 March 2016 at 06:53, Jean-Baptiste Onofré <jb...@nanthrax.net> wrote:
> >
> > > Hi Neville,
> > >
> > > that's great news, and the timeline is perfect !
> > >
> > > We are working on some refactoring & polishing on our side (Runner API,
> > > etc). So, one or two months is not a big deal !
> > >
> > > Let me know if I can help in any way.
> > >
> > > Thanks,
> > > Regards
> > > JB
> > >
> > >
> > > On 03/25/2016 08:03 PM, Neville Li wrote:
> > >
> > >> Thanks guys. Yes we'd love to donate the project but would also like
> to
> > >> polish the API a bit first, like in the next month or two. What's the
> > >> timeline like for BEAM and related projects?
> > >>
> > >> Will also read the technical docs and follow up later.
> > >>
> > >> On Fri, Mar 25, 2016, 12:55 AM Ismaël Mejía <ie...@gmail.com>
> wrote:
> > >>
> > >> Hello Neville,
> > >>>
> > >>> First congratulations guys, excellent job / API, the scalding touches
> > are
> > >>> pretty neat (as well as the Tap abstraction). I am also new to Beam,
> so
> > >>> believe me, you guys already know more than me.
> > >>>
> > >>> In my comment I mentioned sessions referring to session windows, but
> it
> > >>> was
> > >>> my mistake since I just took a fast look at your code and initially
> > >>> didn't
> > >>> see them. Anyway if you are interested in the model there is a good
> > >>> description of the current capabilities of the runners in the
> website,
> > >>>
> > >>> https://beam.incubator.apache.org/capability-matrix/
> > >>>
> > >>> And the new additions to the model are openly discussed in the
> mailing
> > >>> list
> > >>> and in the technical docs (e.g. lateness):
> > >>>
> > >>> https://goo.gl/ps8twC
> > >>>
> > >>> -Ismaël
> > >>>
> > >>> On Fri, Mar 25, 2016 at 8:36 AM, Neville Li <ne...@gmail.com>
> > >>> wrote:
> > >>>
> > >>> Thanks guys for the interest. I'm really excited about all the
> > feedbacks
> > >>>> from the community.
> > >>>>
> > >>>> A little background: we developed Scio to bring Google Cloud
> Dataflow
> > >>>> closer to the Scalding/Spark ecosystem that our developers are
> > familiar
> > >>>> with while bringing some missing pieces to the table (type safe
> > >>>> BigQuery,
> > >>>> HDFS, REPL to name a few).
> > >>>>
> > >>>> I have to admit that I'm pretty new to the BEAM development but
> would
> > >>>>
> > >>> love
> > >>>
> > >>>> to get feedbacks and advices on how to bring Scio closer to BEAM
> > feature
> > >>>> set and semantics. Scio doesn't have to live with the BEAM code base
> > >>>> just
> > >>>> yet (we're still under heavy development) but I'd like to see it as
> a
> > de
> > >>>> facto Scala API endorsed by the BEAM community.
> > >>>>
> > >>>> @Ismaël: I'm curious what's this session thing you're referring to?
> > >>>>
> > >>>> On Thu, Mar 24, 2016 at 3:40 PM Frances Perry
> <fjp@google.com.invalid
> > >
> > >>>> wrote:
> > >>>>
> > >>>> +Neville and Rafal for their take ;-)
> > >>>>>
> > >>>>> Excited to see this out. Multiple community driven SDKs are right
> in
> > >>>>>
> > >>>> line
> > >>>
> > >>>> with our goals for Beam.
> > >>>>>
> > >>>>>
> > >>>>> On Thu, Mar 24, 2016 at 3:04 PM, Ismaël Mejía <ie...@gmail.com>
> > >>>>>
> > >>>> wrote:
> > >>>
> > >>>>
> > >>>>> Addendum: actually the semantic model support is not so far away
> as I
> > >>>>>>
> > >>>>> said
> > >>>>>
> > >>>>>> before (I havent finished reading and I thought they didn't
> support
> > >>>>>> sessions), and looking at the git history the project is not so
> > young
> > >>>>>> either and it is quite active.
> > >>>>>>
> > >>>>>> On Thu, Mar 24, 2016 at 10:52 PM, Ismaël Mejía <iemejia@gmail.com
> >
> > >>>>>>
> > >>>>> wrote:
> > >>>>>
> > >>>>>>
> > >>>>>> Hello,
> > >>>>>>>
> > >>>>>>> I just checked a bit the code and what they have done is
> > >>>>>>>
> > >>>>>> interesting,
> > >>>
> > >>>> the
> > >>>>>
> > >>>>>> SCollection wrapper is worth a look, as well as the examples to
> get
> > >>>>>>>
> > >>>>>> an
> > >>>>
> > >>>>> idea
> > >>>>>>
> > >>>>>>> of their intentions, the fact that the code looks so spark-lish
> > >>>>>>> (distributed collections like) is something that is quite
> > >>>>>>>
> > >>>>>> interesting
> > >>>
> > >>>> too:
> > >>>>>>
> > >>>>>>>
> > >>>>>>>      val (sc, args) = ContextAndArgs(cmdlineArgs)
> > >>>>>>>      sc.textFile(args.getOrElse("input", ExampleData.KING_LEAR))
> > >>>>>>>        .flatMap(_.split("[^a-zA-Z']+").filter(_.nonEmpty))
> > >>>>>>>        .countByValue()
> > >>>>>>>        .map(t => t._1 + ": " + t._2)
> > >>>>>>>        .saveAsTextFile(args("output"))
> > >>>>>>>      sc.close()
> > >>>>>>>
> > >>>>>>> They have a repl, and since the project is a bit young they don't
> > >>>>>>>
> > >>>>>> support
> > >>>>>
> > >>>>>> all the advanced semantics of Beam, They also have a Hadoop File
> > >>>>>>> Sink/Source. I think it would be nice to work with them, but if
> it
> > >>>>>>>
> > >>>>>> is
> > >>>
> > >>>> not
> > >>>>>
> > >>>>>> possible, at least I think it is worth to coordinate some sharing
> > >>>>>>>
> > >>>>>> e.g.
> > >>>>
> > >>>>> in
> > >>>>>
> > >>>>>> the Sink/Source area + other extensions.
> > >>>>>>>
> > >>>>>>> Aditionally their code is also under the Apache license.
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> On Thu, Mar 24, 2016 at 9:20 PM, Jean-Baptiste Onofré <
> > >>>>>>>
> > >>>>>> jb@nanthrax.net
> > >>>>
> > >>>>>
> > >>>>>> wrote:
> > >>>>>>>
> > >>>>>>> Hi Raghu,
> > >>>>>>>>
> > >>>>>>>> I agree: we should provide SDK in different languages, and DSLs
> > >>>>>>>>
> > >>>>>>> for
> > >>>
> > >>>> specific use cases.
> > >>>>>>>>
> > >>>>>>>> You got why I sent my proposal  ;)
> > >>>>>>>>
> > >>>>>>>> Regards
> > >>>>>>>> JB
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>> On 03/24/2016 07:14 PM, Raghu Angadi wrote:
> > >>>>>>>>
> > >>>>>>>> I would love to see Scala API properly supported. I didn't know
> > >>>>>>>>>
> > >>>>>>>> about
> > >>>>
> > >>>>> scio.
> > >>>>>>>>> Scala is such a natural fit for Dataflow API.
> > >>>>>>>>>
> > >>>>>>>>> I am not sure of the policy w.r.t where such packages would
> live
> > >>>>>>>>>
> > >>>>>>>> in
> > >>>
> > >>>> Beam
> > >>>>>>
> > >>>>>>> repo, but I personally would write my Dataflow applications in
> > >>>>>>>>>
> > >>>>>>>> Scala.
> > >>>>
> > >>>>> It
> > >>>>>>
> > >>>>>>> is
> > >>>>>>>>> probably already the case but my request would be : it should
> be
> > >>>>>>>>>
> > >>>>>>>> as
> > >>>
> > >>>> thin
> > >>>>>>
> > >>>>>>> as
> > >>>>>>>>> reasonably possible (that might make it a bit less like
> > >>>>>>>>>
> > >>>>>>>> scalding/spark
> > >>>>>
> > >>>>>> API
> > >>>>>>>>> in some cases, which I think is a good compromise).
> > >>>>>>>>>
> > >>>>>>>>> On Thu, Mar 24, 2016 at 6:01 AM, Jean-Baptiste Onofré <
> > >>>>>>>>>
> > >>>>>>>> jb@nanthrax.net
> > >>>>>
> > >>>>>>
> > >>>>>>> wrote:
> > >>>>>>>>>
> > >>>>>>>>> Hi beamers,
> > >>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>> right now, Beam provides Java SDK.
> > >>>>>>>>>>
> > >>>>>>>>>> AFAIK, very soon, you should have the Python SDK ;)
> > >>>>>>>>>>
> > >>>>>>>>>> Spotify created a Scala API on top of Google Dataflow SDK:
> > >>>>>>>>>>
> > >>>>>>>>>> https://github.com/spotify/scio
> > >>>>>>>>>>
> > >>>>>>>>>> What do you think of asking if they want to donate this as
> Beam
> > >>>>>>>>>>
> > >>>>>>>>> Scala
> > >>>>>
> > >>>>>> SDK ?
> > >>>>>>>>>> I planned to work on a Scala SDK, but as it seems there's
> > >>>>>>>>>>
> > >>>>>>>>> already
> > >>>
> > >>>> something, it makes sense to leverage it.
> > >>>>>>>>>>
> > >>>>>>>>>> Thoughts ?
> > >>>>>>>>>>
> > >>>>>>>>>> Regards
> > >>>>>>>>>> JB
> > >>>>>>>>>> --
> > >>>>>>>>>> Jean-Baptiste Onofré
> > >>>>>>>>>> jbonofre@apache.org
> > >>>>>>>>>> http://blog.nanthrax.net
> > >>>>>>>>>> Talend - http://www.talend.com
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>> --
> > >>>>>>>> Jean-Baptiste Onofré
> > >>>>>>>> jbonofre@apache.org
> > >>>>>>>> http://blog.nanthrax.net
> > >>>>>>>> Talend - http://www.talend.com
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>
> > >>>>>
> > >>>>
> > >>>
> > >>
> > > --
> > > Jean-Baptiste Onofré
> > > jbonofre@apache.org
> > > http://blog.nanthrax.net
> > > Talend - http://www.talend.com
> > >
> >
>

Re: [PROPOSAL] New sdk languages

Posted by Neville Li <ne...@gmail.com>.
I read some technical docs and have a few more questions.

- Scio is more of a thin Scala wrapper/DSL on top of Java SDK to offer FP
style API similar to Spark/Scalding and a few other features often found in
other Scala libraries, e.g. REPL, macro type providers, Futures. How do
those fit in the big picture? Is there a place for such high level DSLs?
- Therefore it's not really a native SDK equivalent to the Java or Python
SDK, does it fit in the /sdks/scala repo structure?
- What's the vision for feature parity between SDKs? Should they all expose
the same apply/transform style API or have freedom to provide API idiomatic
for the language?

Asking these because we want to leverage both the Java SDK and the Scala
ecosystem and it'll be nice to have a vision for these things.

Cheers,
Neville

On Sat, Mar 26, 2016 at 5:39 PM Pierre Mage <pi...@gmail.com> wrote:

> Hi Neville,
>
> I don't know how up to date this roadmap is but from "Apache Beam:
> Technical Vision":
>
> https://docs.google.com/presentation/d/1E9seGPB_VXtY_KZP4HngDPTbsu5RVZFFaTlwEYa88Zw/edit#slide=id.g108d3a202f_0_287
>
> And for more details:
>
> https://docs.google.com/document/d/1UyAeugHxZmVlQ5cEWo_eOPgXNQA1oD-rGooWOSwAqh8/edit#heading=h.ywcvt1a9xcx1
>
> On 26 March 2016 at 06:53, Jean-Baptiste Onofré <jb...@nanthrax.net> wrote:
>
> > Hi Neville,
> >
> > that's great news, and the timeline is perfect !
> >
> > We are working on some refactoring & polishing on our side (Runner API,
> > etc). So, one or two months is not a big deal !
> >
> > Let me know if I can help in any way.
> >
> > Thanks,
> > Regards
> > JB
> >
> >
> > On 03/25/2016 08:03 PM, Neville Li wrote:
> >
> >> Thanks guys. Yes we'd love to donate the project but would also like to
> >> polish the API a bit first, like in the next month or two. What's the
> >> timeline like for BEAM and related projects?
> >>
> >> Will also read the technical docs and follow up later.
> >>
> >> On Fri, Mar 25, 2016, 12:55 AM Ismaël Mejía <ie...@gmail.com> wrote:
> >>
> >> Hello Neville,
> >>>
> >>> First congratulations guys, excellent job / API, the scalding touches
> are
> >>> pretty neat (as well as the Tap abstraction). I am also new to Beam, so
> >>> believe me, you guys already know more than me.
> >>>
> >>> In my comment I mentioned sessions referring to session windows, but it
> >>> was
> >>> my mistake since I just took a fast look at your code and initially
> >>> didn't
> >>> see them. Anyway if you are interested in the model there is a good
> >>> description of the current capabilities of the runners in the website,
> >>>
> >>> https://beam.incubator.apache.org/capability-matrix/
> >>>
> >>> And the new additions to the model are openly discussed in the mailing
> >>> list
> >>> and in the technical docs (e.g. lateness):
> >>>
> >>> https://goo.gl/ps8twC
> >>>
> >>> -Ismaël
> >>>
> >>> On Fri, Mar 25, 2016 at 8:36 AM, Neville Li <ne...@gmail.com>
> >>> wrote:
> >>>
> >>> Thanks guys for the interest. I'm really excited about all the
> feedbacks
> >>>> from the community.
> >>>>
> >>>> A little background: we developed Scio to bring Google Cloud Dataflow
> >>>> closer to the Scalding/Spark ecosystem that our developers are
> familiar
> >>>> with while bringing some missing pieces to the table (type safe
> >>>> BigQuery,
> >>>> HDFS, REPL to name a few).
> >>>>
> >>>> I have to admit that I'm pretty new to the BEAM development but would
> >>>>
> >>> love
> >>>
> >>>> to get feedbacks and advices on how to bring Scio closer to BEAM
> feature
> >>>> set and semantics. Scio doesn't have to live with the BEAM code base
> >>>> just
> >>>> yet (we're still under heavy development) but I'd like to see it as a
> de
> >>>> facto Scala API endorsed by the BEAM community.
> >>>>
> >>>> @Ismaël: I'm curious what's this session thing you're referring to?
> >>>>
> >>>> On Thu, Mar 24, 2016 at 3:40 PM Frances Perry <fjp@google.com.invalid
> >
> >>>> wrote:
> >>>>
> >>>> +Neville and Rafal for their take ;-)
> >>>>>
> >>>>> Excited to see this out. Multiple community driven SDKs are right in
> >>>>>
> >>>> line
> >>>
> >>>> with our goals for Beam.
> >>>>>
> >>>>>
> >>>>> On Thu, Mar 24, 2016 at 3:04 PM, Ismaël Mejía <ie...@gmail.com>
> >>>>>
> >>>> wrote:
> >>>
> >>>>
> >>>>> Addendum: actually the semantic model support is not so far away as I
> >>>>>>
> >>>>> said
> >>>>>
> >>>>>> before (I havent finished reading and I thought they didn't support
> >>>>>> sessions), and looking at the git history the project is not so
> young
> >>>>>> either and it is quite active.
> >>>>>>
> >>>>>> On Thu, Mar 24, 2016 at 10:52 PM, Ismaël Mejía <ie...@gmail.com>
> >>>>>>
> >>>>> wrote:
> >>>>>
> >>>>>>
> >>>>>> Hello,
> >>>>>>>
> >>>>>>> I just checked a bit the code and what they have done is
> >>>>>>>
> >>>>>> interesting,
> >>>
> >>>> the
> >>>>>
> >>>>>> SCollection wrapper is worth a look, as well as the examples to get
> >>>>>>>
> >>>>>> an
> >>>>
> >>>>> idea
> >>>>>>
> >>>>>>> of their intentions, the fact that the code looks so spark-lish
> >>>>>>> (distributed collections like) is something that is quite
> >>>>>>>
> >>>>>> interesting
> >>>
> >>>> too:
> >>>>>>
> >>>>>>>
> >>>>>>>      val (sc, args) = ContextAndArgs(cmdlineArgs)
> >>>>>>>      sc.textFile(args.getOrElse("input", ExampleData.KING_LEAR))
> >>>>>>>        .flatMap(_.split("[^a-zA-Z']+").filter(_.nonEmpty))
> >>>>>>>        .countByValue()
> >>>>>>>        .map(t => t._1 + ": " + t._2)
> >>>>>>>        .saveAsTextFile(args("output"))
> >>>>>>>      sc.close()
> >>>>>>>
> >>>>>>> They have a repl, and since the project is a bit young they don't
> >>>>>>>
> >>>>>> support
> >>>>>
> >>>>>> all the advanced semantics of Beam, They also have a Hadoop File
> >>>>>>> Sink/Source. I think it would be nice to work with them, but if it
> >>>>>>>
> >>>>>> is
> >>>
> >>>> not
> >>>>>
> >>>>>> possible, at least I think it is worth to coordinate some sharing
> >>>>>>>
> >>>>>> e.g.
> >>>>
> >>>>> in
> >>>>>
> >>>>>> the Sink/Source area + other extensions.
> >>>>>>>
> >>>>>>> Aditionally their code is also under the Apache license.
> >>>>>>>
> >>>>>>>
> >>>>>>> On Thu, Mar 24, 2016 at 9:20 PM, Jean-Baptiste Onofré <
> >>>>>>>
> >>>>>> jb@nanthrax.net
> >>>>
> >>>>>
> >>>>>> wrote:
> >>>>>>>
> >>>>>>> Hi Raghu,
> >>>>>>>>
> >>>>>>>> I agree: we should provide SDK in different languages, and DSLs
> >>>>>>>>
> >>>>>>> for
> >>>
> >>>> specific use cases.
> >>>>>>>>
> >>>>>>>> You got why I sent my proposal  ;)
> >>>>>>>>
> >>>>>>>> Regards
> >>>>>>>> JB
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On 03/24/2016 07:14 PM, Raghu Angadi wrote:
> >>>>>>>>
> >>>>>>>> I would love to see Scala API properly supported. I didn't know
> >>>>>>>>>
> >>>>>>>> about
> >>>>
> >>>>> scio.
> >>>>>>>>> Scala is such a natural fit for Dataflow API.
> >>>>>>>>>
> >>>>>>>>> I am not sure of the policy w.r.t where such packages would live
> >>>>>>>>>
> >>>>>>>> in
> >>>
> >>>> Beam
> >>>>>>
> >>>>>>> repo, but I personally would write my Dataflow applications in
> >>>>>>>>>
> >>>>>>>> Scala.
> >>>>
> >>>>> It
> >>>>>>
> >>>>>>> is
> >>>>>>>>> probably already the case but my request would be : it should be
> >>>>>>>>>
> >>>>>>>> as
> >>>
> >>>> thin
> >>>>>>
> >>>>>>> as
> >>>>>>>>> reasonably possible (that might make it a bit less like
> >>>>>>>>>
> >>>>>>>> scalding/spark
> >>>>>
> >>>>>> API
> >>>>>>>>> in some cases, which I think is a good compromise).
> >>>>>>>>>
> >>>>>>>>> On Thu, Mar 24, 2016 at 6:01 AM, Jean-Baptiste Onofré <
> >>>>>>>>>
> >>>>>>>> jb@nanthrax.net
> >>>>>
> >>>>>>
> >>>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>> Hi beamers,
> >>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> right now, Beam provides Java SDK.
> >>>>>>>>>>
> >>>>>>>>>> AFAIK, very soon, you should have the Python SDK ;)
> >>>>>>>>>>
> >>>>>>>>>> Spotify created a Scala API on top of Google Dataflow SDK:
> >>>>>>>>>>
> >>>>>>>>>> https://github.com/spotify/scio
> >>>>>>>>>>
> >>>>>>>>>> What do you think of asking if they want to donate this as Beam
> >>>>>>>>>>
> >>>>>>>>> Scala
> >>>>>
> >>>>>> SDK ?
> >>>>>>>>>> I planned to work on a Scala SDK, but as it seems there's
> >>>>>>>>>>
> >>>>>>>>> already
> >>>
> >>>> something, it makes sense to leverage it.
> >>>>>>>>>>
> >>>>>>>>>> Thoughts ?
> >>>>>>>>>>
> >>>>>>>>>> Regards
> >>>>>>>>>> JB
> >>>>>>>>>> --
> >>>>>>>>>> Jean-Baptiste Onofré
> >>>>>>>>>> jbonofre@apache.org
> >>>>>>>>>> http://blog.nanthrax.net
> >>>>>>>>>> Talend - http://www.talend.com
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>> --
> >>>>>>>> Jean-Baptiste Onofré
> >>>>>>>> jbonofre@apache.org
> >>>>>>>> http://blog.nanthrax.net
> >>>>>>>> Talend - http://www.talend.com
> >>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
> > --
> > Jean-Baptiste Onofré
> > jbonofre@apache.org
> > http://blog.nanthrax.net
> > Talend - http://www.talend.com
> >
>

Re: [PROPOSAL] New sdk languages

Posted by Pierre Mage <pi...@gmail.com>.
Hi Neville,

I don't know how up to date this roadmap is but from "Apache Beam:
Technical Vision":
https://docs.google.com/presentation/d/1E9seGPB_VXtY_KZP4HngDPTbsu5RVZFFaTlwEYa88Zw/edit#slide=id.g108d3a202f_0_287

And for more details:
https://docs.google.com/document/d/1UyAeugHxZmVlQ5cEWo_eOPgXNQA1oD-rGooWOSwAqh8/edit#heading=h.ywcvt1a9xcx1

On 26 March 2016 at 06:53, Jean-Baptiste Onofré <jb...@nanthrax.net> wrote:

> Hi Neville,
>
> that's great news, and the timeline is perfect !
>
> We are working on some refactoring & polishing on our side (Runner API,
> etc). So, one or two months is not a big deal !
>
> Let me know if I can help in any way.
>
> Thanks,
> Regards
> JB
>
>
> On 03/25/2016 08:03 PM, Neville Li wrote:
>
>> Thanks guys. Yes we'd love to donate the project but would also like to
>> polish the API a bit first, like in the next month or two. What's the
>> timeline like for BEAM and related projects?
>>
>> Will also read the technical docs and follow up later.
>>
>> On Fri, Mar 25, 2016, 12:55 AM Ismaël Mejía <ie...@gmail.com> wrote:
>>
>> Hello Neville,
>>>
>>> First congratulations guys, excellent job / API, the scalding touches are
>>> pretty neat (as well as the Tap abstraction). I am also new to Beam, so
>>> believe me, you guys already know more than me.
>>>
>>> In my comment I mentioned sessions referring to session windows, but it
>>> was
>>> my mistake since I just took a fast look at your code and initially
>>> didn't
>>> see them. Anyway if you are interested in the model there is a good
>>> description of the current capabilities of the runners in the website,
>>>
>>> https://beam.incubator.apache.org/capability-matrix/
>>>
>>> And the new additions to the model are openly discussed in the mailing
>>> list
>>> and in the technical docs (e.g. lateness):
>>>
>>> https://goo.gl/ps8twC
>>>
>>> -Ismaël
>>>
>>> On Fri, Mar 25, 2016 at 8:36 AM, Neville Li <ne...@gmail.com>
>>> wrote:
>>>
>>> Thanks guys for the interest. I'm really excited about all the feedbacks
>>>> from the community.
>>>>
>>>> A little background: we developed Scio to bring Google Cloud Dataflow
>>>> closer to the Scalding/Spark ecosystem that our developers are familiar
>>>> with while bringing some missing pieces to the table (type safe
>>>> BigQuery,
>>>> HDFS, REPL to name a few).
>>>>
>>>> I have to admit that I'm pretty new to the BEAM development but would
>>>>
>>> love
>>>
>>>> to get feedbacks and advices on how to bring Scio closer to BEAM feature
>>>> set and semantics. Scio doesn't have to live with the BEAM code base
>>>> just
>>>> yet (we're still under heavy development) but I'd like to see it as a de
>>>> facto Scala API endorsed by the BEAM community.
>>>>
>>>> @Ismaël: I'm curious what's this session thing you're referring to?
>>>>
>>>> On Thu, Mar 24, 2016 at 3:40 PM Frances Perry <fj...@google.com.invalid>
>>>> wrote:
>>>>
>>>> +Neville and Rafal for their take ;-)
>>>>>
>>>>> Excited to see this out. Multiple community driven SDKs are right in
>>>>>
>>>> line
>>>
>>>> with our goals for Beam.
>>>>>
>>>>>
>>>>> On Thu, Mar 24, 2016 at 3:04 PM, Ismaël Mejía <ie...@gmail.com>
>>>>>
>>>> wrote:
>>>
>>>>
>>>>> Addendum: actually the semantic model support is not so far away as I
>>>>>>
>>>>> said
>>>>>
>>>>>> before (I havent finished reading and I thought they didn't support
>>>>>> sessions), and looking at the git history the project is not so young
>>>>>> either and it is quite active.
>>>>>>
>>>>>> On Thu, Mar 24, 2016 at 10:52 PM, Ismaël Mejía <ie...@gmail.com>
>>>>>>
>>>>> wrote:
>>>>>
>>>>>>
>>>>>> Hello,
>>>>>>>
>>>>>>> I just checked a bit the code and what they have done is
>>>>>>>
>>>>>> interesting,
>>>
>>>> the
>>>>>
>>>>>> SCollection wrapper is worth a look, as well as the examples to get
>>>>>>>
>>>>>> an
>>>>
>>>>> idea
>>>>>>
>>>>>>> of their intentions, the fact that the code looks so spark-lish
>>>>>>> (distributed collections like) is something that is quite
>>>>>>>
>>>>>> interesting
>>>
>>>> too:
>>>>>>
>>>>>>>
>>>>>>>      val (sc, args) = ContextAndArgs(cmdlineArgs)
>>>>>>>      sc.textFile(args.getOrElse("input", ExampleData.KING_LEAR))
>>>>>>>        .flatMap(_.split("[^a-zA-Z']+").filter(_.nonEmpty))
>>>>>>>        .countByValue()
>>>>>>>        .map(t => t._1 + ": " + t._2)
>>>>>>>        .saveAsTextFile(args("output"))
>>>>>>>      sc.close()
>>>>>>>
>>>>>>> They have a repl, and since the project is a bit young they don't
>>>>>>>
>>>>>> support
>>>>>
>>>>>> all the advanced semantics of Beam, They also have a Hadoop File
>>>>>>> Sink/Source. I think it would be nice to work with them, but if it
>>>>>>>
>>>>>> is
>>>
>>>> not
>>>>>
>>>>>> possible, at least I think it is worth to coordinate some sharing
>>>>>>>
>>>>>> e.g.
>>>>
>>>>> in
>>>>>
>>>>>> the Sink/Source area + other extensions.
>>>>>>>
>>>>>>> Aditionally their code is also under the Apache license.
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Mar 24, 2016 at 9:20 PM, Jean-Baptiste Onofré <
>>>>>>>
>>>>>> jb@nanthrax.net
>>>>
>>>>>
>>>>>> wrote:
>>>>>>>
>>>>>>> Hi Raghu,
>>>>>>>>
>>>>>>>> I agree: we should provide SDK in different languages, and DSLs
>>>>>>>>
>>>>>>> for
>>>
>>>> specific use cases.
>>>>>>>>
>>>>>>>> You got why I sent my proposal  ;)
>>>>>>>>
>>>>>>>> Regards
>>>>>>>> JB
>>>>>>>>
>>>>>>>>
>>>>>>>> On 03/24/2016 07:14 PM, Raghu Angadi wrote:
>>>>>>>>
>>>>>>>> I would love to see Scala API properly supported. I didn't know
>>>>>>>>>
>>>>>>>> about
>>>>
>>>>> scio.
>>>>>>>>> Scala is such a natural fit for Dataflow API.
>>>>>>>>>
>>>>>>>>> I am not sure of the policy w.r.t where such packages would live
>>>>>>>>>
>>>>>>>> in
>>>
>>>> Beam
>>>>>>
>>>>>>> repo, but I personally would write my Dataflow applications in
>>>>>>>>>
>>>>>>>> Scala.
>>>>
>>>>> It
>>>>>>
>>>>>>> is
>>>>>>>>> probably already the case but my request would be : it should be
>>>>>>>>>
>>>>>>>> as
>>>
>>>> thin
>>>>>>
>>>>>>> as
>>>>>>>>> reasonably possible (that might make it a bit less like
>>>>>>>>>
>>>>>>>> scalding/spark
>>>>>
>>>>>> API
>>>>>>>>> in some cases, which I think is a good compromise).
>>>>>>>>>
>>>>>>>>> On Thu, Mar 24, 2016 at 6:01 AM, Jean-Baptiste Onofré <
>>>>>>>>>
>>>>>>>> jb@nanthrax.net
>>>>>
>>>>>>
>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> Hi beamers,
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> right now, Beam provides Java SDK.
>>>>>>>>>>
>>>>>>>>>> AFAIK, very soon, you should have the Python SDK ;)
>>>>>>>>>>
>>>>>>>>>> Spotify created a Scala API on top of Google Dataflow SDK:
>>>>>>>>>>
>>>>>>>>>> https://github.com/spotify/scio
>>>>>>>>>>
>>>>>>>>>> What do you think of asking if they want to donate this as Beam
>>>>>>>>>>
>>>>>>>>> Scala
>>>>>
>>>>>> SDK ?
>>>>>>>>>> I planned to work on a Scala SDK, but as it seems there's
>>>>>>>>>>
>>>>>>>>> already
>>>
>>>> something, it makes sense to leverage it.
>>>>>>>>>>
>>>>>>>>>> Thoughts ?
>>>>>>>>>>
>>>>>>>>>> Regards
>>>>>>>>>> JB
>>>>>>>>>> --
>>>>>>>>>> Jean-Baptiste Onofré
>>>>>>>>>> jbonofre@apache.org
>>>>>>>>>> http://blog.nanthrax.net
>>>>>>>>>> Talend - http://www.talend.com
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>> --
>>>>>>>> Jean-Baptiste Onofré
>>>>>>>> jbonofre@apache.org
>>>>>>>> http://blog.nanthrax.net
>>>>>>>> Talend - http://www.talend.com
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
> --
> Jean-Baptiste Onofré
> jbonofre@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>

Re: [PROPOSAL] New sdk languages

Posted by Jean-Baptiste Onofré <jb...@nanthrax.net>.
Hi Neville,

that's great news, and the timeline is perfect !

We are working on some refactoring & polishing on our side (Runner API, 
etc). So, one or two months is not a big deal !

Let me know if I can help in any way.

Thanks,
Regards
JB

On 03/25/2016 08:03 PM, Neville Li wrote:
> Thanks guys. Yes we'd love to donate the project but would also like to
> polish the API a bit first, like in the next month or two. What's the
> timeline like for BEAM and related projects?
>
> Will also read the technical docs and follow up later.
>
> On Fri, Mar 25, 2016, 12:55 AM Ismaël Mejía <ie...@gmail.com> wrote:
>
>> Hello Neville,
>>
>> First congratulations guys, excellent job / API, the scalding touches are
>> pretty neat (as well as the Tap abstraction). I am also new to Beam, so
>> believe me, you guys already know more than me.
>>
>> In my comment I mentioned sessions referring to session windows, but it was
>> my mistake since I just took a fast look at your code and initially didn't
>> see them. Anyway if you are interested in the model there is a good
>> description of the current capabilities of the runners in the website,
>>
>> https://beam.incubator.apache.org/capability-matrix/
>>
>> And the new additions to the model are openly discussed in the mailing list
>> and in the technical docs (e.g. lateness):
>>
>> https://goo.gl/ps8twC
>>
>> -Ismaël
>>
>> On Fri, Mar 25, 2016 at 8:36 AM, Neville Li <ne...@gmail.com> wrote:
>>
>>> Thanks guys for the interest. I'm really excited about all the feedbacks
>>> from the community.
>>>
>>> A little background: we developed Scio to bring Google Cloud Dataflow
>>> closer to the Scalding/Spark ecosystem that our developers are familiar
>>> with while bringing some missing pieces to the table (type safe BigQuery,
>>> HDFS, REPL to name a few).
>>>
>>> I have to admit that I'm pretty new to the BEAM development but would
>> love
>>> to get feedbacks and advices on how to bring Scio closer to BEAM feature
>>> set and semantics. Scio doesn't have to live with the BEAM code base just
>>> yet (we're still under heavy development) but I'd like to see it as a de
>>> facto Scala API endorsed by the BEAM community.
>>>
>>> @Ismaël: I'm curious what's this session thing you're referring to?
>>>
>>> On Thu, Mar 24, 2016 at 3:40 PM Frances Perry <fj...@google.com.invalid>
>>> wrote:
>>>
>>>> +Neville and Rafal for their take ;-)
>>>>
>>>> Excited to see this out. Multiple community driven SDKs are right in
>> line
>>>> with our goals for Beam.
>>>>
>>>>
>>>> On Thu, Mar 24, 2016 at 3:04 PM, Ismaël Mejía <ie...@gmail.com>
>> wrote:
>>>>
>>>>> Addendum: actually the semantic model support is not so far away as I
>>>> said
>>>>> before (I havent finished reading and I thought they didn't support
>>>>> sessions), and looking at the git history the project is not so young
>>>>> either and it is quite active.
>>>>>
>>>>> On Thu, Mar 24, 2016 at 10:52 PM, Ismaël Mejía <ie...@gmail.com>
>>>> wrote:
>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> I just checked a bit the code and what they have done is
>> interesting,
>>>> the
>>>>>> SCollection wrapper is worth a look, as well as the examples to get
>>> an
>>>>> idea
>>>>>> of their intentions, the fact that the code looks so spark-lish
>>>>>> (distributed collections like) is something that is quite
>> interesting
>>>>> too:
>>>>>>
>>>>>>      val (sc, args) = ContextAndArgs(cmdlineArgs)
>>>>>>      sc.textFile(args.getOrElse("input", ExampleData.KING_LEAR))
>>>>>>        .flatMap(_.split("[^a-zA-Z']+").filter(_.nonEmpty))
>>>>>>        .countByValue()
>>>>>>        .map(t => t._1 + ": " + t._2)
>>>>>>        .saveAsTextFile(args("output"))
>>>>>>      sc.close()
>>>>>>
>>>>>> They have a repl, and since the project is a bit young they don't
>>>> support
>>>>>> all the advanced semantics of Beam, They also have a Hadoop File
>>>>>> Sink/Source. I think it would be nice to work with them, but if it
>> is
>>>> not
>>>>>> possible, at least I think it is worth to coordinate some sharing
>>> e.g.
>>>> in
>>>>>> the Sink/Source area + other extensions.
>>>>>>
>>>>>> Aditionally their code is also under the Apache license.
>>>>>>
>>>>>>
>>>>>> On Thu, Mar 24, 2016 at 9:20 PM, Jean-Baptiste Onofré <
>>> jb@nanthrax.net
>>>>>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi Raghu,
>>>>>>>
>>>>>>> I agree: we should provide SDK in different languages, and DSLs
>> for
>>>>>>> specific use cases.
>>>>>>>
>>>>>>> You got why I sent my proposal  ;)
>>>>>>>
>>>>>>> Regards
>>>>>>> JB
>>>>>>>
>>>>>>>
>>>>>>> On 03/24/2016 07:14 PM, Raghu Angadi wrote:
>>>>>>>
>>>>>>>> I would love to see Scala API properly supported. I didn't know
>>> about
>>>>>>>> scio.
>>>>>>>> Scala is such a natural fit for Dataflow API.
>>>>>>>>
>>>>>>>> I am not sure of the policy w.r.t where such packages would live
>> in
>>>>> Beam
>>>>>>>> repo, but I personally would write my Dataflow applications in
>>> Scala.
>>>>> It
>>>>>>>> is
>>>>>>>> probably already the case but my request would be : it should be
>> as
>>>>> thin
>>>>>>>> as
>>>>>>>> reasonably possible (that might make it a bit less like
>>>> scalding/spark
>>>>>>>> API
>>>>>>>> in some cases, which I think is a good compromise).
>>>>>>>>
>>>>>>>> On Thu, Mar 24, 2016 at 6:01 AM, Jean-Baptiste Onofré <
>>>> jb@nanthrax.net
>>>>>>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>> Hi beamers,
>>>>>>>>>
>>>>>>>>> right now, Beam provides Java SDK.
>>>>>>>>>
>>>>>>>>> AFAIK, very soon, you should have the Python SDK ;)
>>>>>>>>>
>>>>>>>>> Spotify created a Scala API on top of Google Dataflow SDK:
>>>>>>>>>
>>>>>>>>> https://github.com/spotify/scio
>>>>>>>>>
>>>>>>>>> What do you think of asking if they want to donate this as Beam
>>>> Scala
>>>>>>>>> SDK ?
>>>>>>>>> I planned to work on a Scala SDK, but as it seems there's
>> already
>>>>>>>>> something, it makes sense to leverage it.
>>>>>>>>>
>>>>>>>>> Thoughts ?
>>>>>>>>>
>>>>>>>>> Regards
>>>>>>>>> JB
>>>>>>>>> --
>>>>>>>>> Jean-Baptiste Onofré
>>>>>>>>> jbonofre@apache.org
>>>>>>>>> http://blog.nanthrax.net
>>>>>>>>> Talend - http://www.talend.com
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>> --
>>>>>>> Jean-Baptiste Onofré
>>>>>>> jbonofre@apache.org
>>>>>>> http://blog.nanthrax.net
>>>>>>> Talend - http://www.talend.com
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

-- 
Jean-Baptiste Onofré
jbonofre@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com

Re: [PROPOSAL] New sdk languages

Posted by Neville Li <ne...@gmail.com>.
Thanks guys. Yes we'd love to donate the project but would also like to
polish the API a bit first, like in the next month or two. What's the
timeline like for BEAM and related projects?

Will also read the technical docs and follow up later.

On Fri, Mar 25, 2016, 12:55 AM Ismaël Mejía <ie...@gmail.com> wrote:

> Hello Neville,
>
> First congratulations guys, excellent job / API, the scalding touches are
> pretty neat (as well as the Tap abstraction). I am also new to Beam, so
> believe me, you guys already know more than me.
>
> In my comment I mentioned sessions referring to session windows, but it was
> my mistake since I just took a fast look at your code and initially didn't
> see them. Anyway if you are interested in the model there is a good
> description of the current capabilities of the runners in the website,
>
> https://beam.incubator.apache.org/capability-matrix/
>
> And the new additions to the model are openly discussed in the mailing list
> and in the technical docs (e.g. lateness):
>
> https://goo.gl/ps8twC
>
> -Ismaël
>
> On Fri, Mar 25, 2016 at 8:36 AM, Neville Li <ne...@gmail.com> wrote:
>
> > Thanks guys for the interest. I'm really excited about all the feedbacks
> > from the community.
> >
> > A little background: we developed Scio to bring Google Cloud Dataflow
> > closer to the Scalding/Spark ecosystem that our developers are familiar
> > with while bringing some missing pieces to the table (type safe BigQuery,
> > HDFS, REPL to name a few).
> >
> > I have to admit that I'm pretty new to the BEAM development but would
> love
> > to get feedbacks and advices on how to bring Scio closer to BEAM feature
> > set and semantics. Scio doesn't have to live with the BEAM code base just
> > yet (we're still under heavy development) but I'd like to see it as a de
> > facto Scala API endorsed by the BEAM community.
> >
> > @Ismaël: I'm curious what's this session thing you're referring to?
> >
> > On Thu, Mar 24, 2016 at 3:40 PM Frances Perry <fj...@google.com.invalid>
> > wrote:
> >
> > > +Neville and Rafal for their take ;-)
> > >
> > > Excited to see this out. Multiple community driven SDKs are right in
> line
> > > with our goals for Beam.
> > >
> > >
> > > On Thu, Mar 24, 2016 at 3:04 PM, Ismaël Mejía <ie...@gmail.com>
> wrote:
> > >
> > > > Addendum: actually the semantic model support is not so far away as I
> > > said
> > > > before (I havent finished reading and I thought they didn't support
> > > > sessions), and looking at the git history the project is not so young
> > > > either and it is quite active.
> > > >
> > > > On Thu, Mar 24, 2016 at 10:52 PM, Ismaël Mejía <ie...@gmail.com>
> > > wrote:
> > > >
> > > > > Hello,
> > > > >
> > > > > I just checked a bit the code and what they have done is
> interesting,
> > > the
> > > > > SCollection wrapper is worth a look, as well as the examples to get
> > an
> > > > idea
> > > > > of their intentions, the fact that the code looks so spark-lish
> > > > > (distributed collections like) is something that is quite
> interesting
> > > > too:
> > > > >
> > > > >     val (sc, args) = ContextAndArgs(cmdlineArgs)
> > > > >     sc.textFile(args.getOrElse("input", ExampleData.KING_LEAR))
> > > > >       .flatMap(_.split("[^a-zA-Z']+").filter(_.nonEmpty))
> > > > >       .countByValue()
> > > > >       .map(t => t._1 + ": " + t._2)
> > > > >       .saveAsTextFile(args("output"))
> > > > >     sc.close()
> > > > >
> > > > > They have a repl, and since the project is a bit young they don't
> > > support
> > > > > all the advanced semantics of Beam, They also have a Hadoop File
> > > > > Sink/Source. I think it would be nice to work with them, but if it
> is
> > > not
> > > > > possible, at least I think it is worth to coordinate some sharing
> > e.g.
> > > in
> > > > > the Sink/Source area + other extensions.
> > > > >
> > > > > Aditionally their code is also under the Apache license.
> > > > >
> > > > >
> > > > > On Thu, Mar 24, 2016 at 9:20 PM, Jean-Baptiste Onofré <
> > jb@nanthrax.net
> > > >
> > > > > wrote:
> > > > >
> > > > >> Hi Raghu,
> > > > >>
> > > > >> I agree: we should provide SDK in different languages, and DSLs
> for
> > > > >> specific use cases.
> > > > >>
> > > > >> You got why I sent my proposal  ;)
> > > > >>
> > > > >> Regards
> > > > >> JB
> > > > >>
> > > > >>
> > > > >> On 03/24/2016 07:14 PM, Raghu Angadi wrote:
> > > > >>
> > > > >>> I would love to see Scala API properly supported. I didn't know
> > about
> > > > >>> scio.
> > > > >>> Scala is such a natural fit for Dataflow API.
> > > > >>>
> > > > >>> I am not sure of the policy w.r.t where such packages would live
> in
> > > > Beam
> > > > >>> repo, but I personally would write my Dataflow applications in
> > Scala.
> > > > It
> > > > >>> is
> > > > >>> probably already the case but my request would be : it should be
> as
> > > > thin
> > > > >>> as
> > > > >>> reasonably possible (that might make it a bit less like
> > > scalding/spark
> > > > >>> API
> > > > >>> in some cases, which I think is a good compromise).
> > > > >>>
> > > > >>> On Thu, Mar 24, 2016 at 6:01 AM, Jean-Baptiste Onofré <
> > > jb@nanthrax.net
> > > > >
> > > > >>> wrote:
> > > > >>>
> > > > >>> Hi beamers,
> > > > >>>>
> > > > >>>> right now, Beam provides Java SDK.
> > > > >>>>
> > > > >>>> AFAIK, very soon, you should have the Python SDK ;)
> > > > >>>>
> > > > >>>> Spotify created a Scala API on top of Google Dataflow SDK:
> > > > >>>>
> > > > >>>> https://github.com/spotify/scio
> > > > >>>>
> > > > >>>> What do you think of asking if they want to donate this as Beam
> > > Scala
> > > > >>>> SDK ?
> > > > >>>> I planned to work on a Scala SDK, but as it seems there's
> already
> > > > >>>> something, it makes sense to leverage it.
> > > > >>>>
> > > > >>>> Thoughts ?
> > > > >>>>
> > > > >>>> Regards
> > > > >>>> JB
> > > > >>>> --
> > > > >>>> Jean-Baptiste Onofré
> > > > >>>> jbonofre@apache.org
> > > > >>>> http://blog.nanthrax.net
> > > > >>>> Talend - http://www.talend.com
> > > > >>>>
> > > > >>>>
> > > > >>>
> > > > >> --
> > > > >> Jean-Baptiste Onofré
> > > > >> jbonofre@apache.org
> > > > >> http://blog.nanthrax.net
> > > > >> Talend - http://www.talend.com
> > > > >>
> > > > >
> > > > >
> > > >
> > >
> >
>

Re: [PROPOSAL] New sdk languages

Posted by Ismaël Mejía <ie...@gmail.com>.
Hello Neville,

First congratulations guys, excellent job / API, the scalding touches are
pretty neat (as well as the Tap abstraction). I am also new to Beam, so
believe me, you guys already know more than me.

In my comment I mentioned sessions referring to session windows, but it was
my mistake since I just took a fast look at your code and initially didn't
see them. Anyway if you are interested in the model there is a good
description of the current capabilities of the runners in the website,

https://beam.incubator.apache.org/capability-matrix/

And the new additions to the model are openly discussed in the mailing list
and in the technical docs (e.g. lateness):

https://goo.gl/ps8twC

-Ismaël

On Fri, Mar 25, 2016 at 8:36 AM, Neville Li <ne...@gmail.com> wrote:

> Thanks guys for the interest. I'm really excited about all the feedbacks
> from the community.
>
> A little background: we developed Scio to bring Google Cloud Dataflow
> closer to the Scalding/Spark ecosystem that our developers are familiar
> with while bringing some missing pieces to the table (type safe BigQuery,
> HDFS, REPL to name a few).
>
> I have to admit that I'm pretty new to the BEAM development but would love
> to get feedbacks and advices on how to bring Scio closer to BEAM feature
> set and semantics. Scio doesn't have to live with the BEAM code base just
> yet (we're still under heavy development) but I'd like to see it as a de
> facto Scala API endorsed by the BEAM community.
>
> @Ismaël: I'm curious what's this session thing you're referring to?
>
> On Thu, Mar 24, 2016 at 3:40 PM Frances Perry <fj...@google.com.invalid>
> wrote:
>
> > +Neville and Rafal for their take ;-)
> >
> > Excited to see this out. Multiple community driven SDKs are right in line
> > with our goals for Beam.
> >
> >
> > On Thu, Mar 24, 2016 at 3:04 PM, Ismaël Mejía <ie...@gmail.com> wrote:
> >
> > > Addendum: actually the semantic model support is not so far away as I
> > said
> > > before (I havent finished reading and I thought they didn't support
> > > sessions), and looking at the git history the project is not so young
> > > either and it is quite active.
> > >
> > > On Thu, Mar 24, 2016 at 10:52 PM, Ismaël Mejía <ie...@gmail.com>
> > wrote:
> > >
> > > > Hello,
> > > >
> > > > I just checked a bit the code and what they have done is interesting,
> > the
> > > > SCollection wrapper is worth a look, as well as the examples to get
> an
> > > idea
> > > > of their intentions, the fact that the code looks so spark-lish
> > > > (distributed collections like) is something that is quite interesting
> > > too:
> > > >
> > > >     val (sc, args) = ContextAndArgs(cmdlineArgs)
> > > >     sc.textFile(args.getOrElse("input", ExampleData.KING_LEAR))
> > > >       .flatMap(_.split("[^a-zA-Z']+").filter(_.nonEmpty))
> > > >       .countByValue()
> > > >       .map(t => t._1 + ": " + t._2)
> > > >       .saveAsTextFile(args("output"))
> > > >     sc.close()
> > > >
> > > > They have a repl, and since the project is a bit young they don't
> > support
> > > > all the advanced semantics of Beam, They also have a Hadoop File
> > > > Sink/Source. I think it would be nice to work with them, but if it is
> > not
> > > > possible, at least I think it is worth to coordinate some sharing
> e.g.
> > in
> > > > the Sink/Source area + other extensions.
> > > >
> > > > Aditionally their code is also under the Apache license.
> > > >
> > > >
> > > > On Thu, Mar 24, 2016 at 9:20 PM, Jean-Baptiste Onofré <
> jb@nanthrax.net
> > >
> > > > wrote:
> > > >
> > > >> Hi Raghu,
> > > >>
> > > >> I agree: we should provide SDK in different languages, and DSLs for
> > > >> specific use cases.
> > > >>
> > > >> You got why I sent my proposal  ;)
> > > >>
> > > >> Regards
> > > >> JB
> > > >>
> > > >>
> > > >> On 03/24/2016 07:14 PM, Raghu Angadi wrote:
> > > >>
> > > >>> I would love to see Scala API properly supported. I didn't know
> about
> > > >>> scio.
> > > >>> Scala is such a natural fit for Dataflow API.
> > > >>>
> > > >>> I am not sure of the policy w.r.t where such packages would live in
> > > Beam
> > > >>> repo, but I personally would write my Dataflow applications in
> Scala.
> > > It
> > > >>> is
> > > >>> probably already the case but my request would be : it should be as
> > > thin
> > > >>> as
> > > >>> reasonably possible (that might make it a bit less like
> > scalding/spark
> > > >>> API
> > > >>> in some cases, which I think is a good compromise).
> > > >>>
> > > >>> On Thu, Mar 24, 2016 at 6:01 AM, Jean-Baptiste Onofré <
> > jb@nanthrax.net
> > > >
> > > >>> wrote:
> > > >>>
> > > >>> Hi beamers,
> > > >>>>
> > > >>>> right now, Beam provides Java SDK.
> > > >>>>
> > > >>>> AFAIK, very soon, you should have the Python SDK ;)
> > > >>>>
> > > >>>> Spotify created a Scala API on top of Google Dataflow SDK:
> > > >>>>
> > > >>>> https://github.com/spotify/scio
> > > >>>>
> > > >>>> What do you think of asking if they want to donate this as Beam
> > Scala
> > > >>>> SDK ?
> > > >>>> I planned to work on a Scala SDK, but as it seems there's already
> > > >>>> something, it makes sense to leverage it.
> > > >>>>
> > > >>>> Thoughts ?
> > > >>>>
> > > >>>> Regards
> > > >>>> JB
> > > >>>> --
> > > >>>> Jean-Baptiste Onofré
> > > >>>> jbonofre@apache.org
> > > >>>> http://blog.nanthrax.net
> > > >>>> Talend - http://www.talend.com
> > > >>>>
> > > >>>>
> > > >>>
> > > >> --
> > > >> Jean-Baptiste Onofré
> > > >> jbonofre@apache.org
> > > >> http://blog.nanthrax.net
> > > >> Talend - http://www.talend.com
> > > >>
> > > >
> > > >
> > >
> >
>

Re: [PROPOSAL] New sdk languages

Posted by Jean-Baptiste Onofré <jb...@nanthrax.net>.
Hi Neville,

Actually, we already planned to provide new SDK languages, including 
Scala. So, a great move would be that you "donate" your Scala directly 
in Beam. I don't say it has to happen now, but I would love to help you 
to go in this direction. IMHO, if you don't want to give into the Beam 
project, Beam itself will probably provide its own Scala SDK.

It's the same for the IO: we already plan to extend a lot the coverage 
(HDFS, JMS, MQTT, ...). You can contribute in this area too !

Basically, I really encourage you to join the beam project: we love 
contribution ;)

I would be happy to discuss and help you to get you involved in the 
community.

Thanks !
Regards
JB

On 03/25/2016 08:36 AM, Neville Li wrote:
> Thanks guys for the interest. I'm really excited about all the feedbacks
> from the community.
>
> A little background: we developed Scio to bring Google Cloud Dataflow
> closer to the Scalding/Spark ecosystem that our developers are familiar
> with while bringing some missing pieces to the table (type safe BigQuery,
> HDFS, REPL to name a few).
>
> I have to admit that I'm pretty new to the BEAM development but would love
> to get feedbacks and advices on how to bring Scio closer to BEAM feature
> set and semantics. Scio doesn't have to live with the BEAM code base just
> yet (we're still under heavy development) but I'd like to see it as a de
> facto Scala API endorsed by the BEAM community.
>
> @Ismaël: I'm curious what's this session thing you're referring to?
>
> On Thu, Mar 24, 2016 at 3:40 PM Frances Perry <fj...@google.com.invalid>
> wrote:
>
>> +Neville and Rafal for their take ;-)
>>
>> Excited to see this out. Multiple community driven SDKs are right in line
>> with our goals for Beam.
>>
>>
>> On Thu, Mar 24, 2016 at 3:04 PM, Ismaël Mejía <ie...@gmail.com> wrote:
>>
>>> Addendum: actually the semantic model support is not so far away as I
>> said
>>> before (I havent finished reading and I thought they didn't support
>>> sessions), and looking at the git history the project is not so young
>>> either and it is quite active.
>>>
>>> On Thu, Mar 24, 2016 at 10:52 PM, Ismaël Mejía <ie...@gmail.com>
>> wrote:
>>>
>>>> Hello,
>>>>
>>>> I just checked a bit the code and what they have done is interesting,
>> the
>>>> SCollection wrapper is worth a look, as well as the examples to get an
>>> idea
>>>> of their intentions, the fact that the code looks so spark-lish
>>>> (distributed collections like) is something that is quite interesting
>>> too:
>>>>
>>>>      val (sc, args) = ContextAndArgs(cmdlineArgs)
>>>>      sc.textFile(args.getOrElse("input", ExampleData.KING_LEAR))
>>>>        .flatMap(_.split("[^a-zA-Z']+").filter(_.nonEmpty))
>>>>        .countByValue()
>>>>        .map(t => t._1 + ": " + t._2)
>>>>        .saveAsTextFile(args("output"))
>>>>      sc.close()
>>>>
>>>> They have a repl, and since the project is a bit young they don't
>> support
>>>> all the advanced semantics of Beam, They also have a Hadoop File
>>>> Sink/Source. I think it would be nice to work with them, but if it is
>> not
>>>> possible, at least I think it is worth to coordinate some sharing e.g.
>> in
>>>> the Sink/Source area + other extensions.
>>>>
>>>> Aditionally their code is also under the Apache license.
>>>>
>>>>
>>>> On Thu, Mar 24, 2016 at 9:20 PM, Jean-Baptiste Onofré <jb@nanthrax.net
>>>
>>>> wrote:
>>>>
>>>>> Hi Raghu,
>>>>>
>>>>> I agree: we should provide SDK in different languages, and DSLs for
>>>>> specific use cases.
>>>>>
>>>>> You got why I sent my proposal  ;)
>>>>>
>>>>> Regards
>>>>> JB
>>>>>
>>>>>
>>>>> On 03/24/2016 07:14 PM, Raghu Angadi wrote:
>>>>>
>>>>>> I would love to see Scala API properly supported. I didn't know about
>>>>>> scio.
>>>>>> Scala is such a natural fit for Dataflow API.
>>>>>>
>>>>>> I am not sure of the policy w.r.t where such packages would live in
>>> Beam
>>>>>> repo, but I personally would write my Dataflow applications in Scala.
>>> It
>>>>>> is
>>>>>> probably already the case but my request would be : it should be as
>>> thin
>>>>>> as
>>>>>> reasonably possible (that might make it a bit less like
>> scalding/spark
>>>>>> API
>>>>>> in some cases, which I think is a good compromise).
>>>>>>
>>>>>> On Thu, Mar 24, 2016 at 6:01 AM, Jean-Baptiste Onofré <
>> jb@nanthrax.net
>>>>
>>>>>> wrote:
>>>>>>
>>>>>> Hi beamers,
>>>>>>>
>>>>>>> right now, Beam provides Java SDK.
>>>>>>>
>>>>>>> AFAIK, very soon, you should have the Python SDK ;)
>>>>>>>
>>>>>>> Spotify created a Scala API on top of Google Dataflow SDK:
>>>>>>>
>>>>>>> https://github.com/spotify/scio
>>>>>>>
>>>>>>> What do you think of asking if they want to donate this as Beam
>> Scala
>>>>>>> SDK ?
>>>>>>> I planned to work on a Scala SDK, but as it seems there's already
>>>>>>> something, it makes sense to leverage it.
>>>>>>>
>>>>>>> Thoughts ?
>>>>>>>
>>>>>>> Regards
>>>>>>> JB
>>>>>>> --
>>>>>>> Jean-Baptiste Onofré
>>>>>>> jbonofre@apache.org
>>>>>>> http://blog.nanthrax.net
>>>>>>> Talend - http://www.talend.com
>>>>>>>
>>>>>>>
>>>>>>
>>>>> --
>>>>> Jean-Baptiste Onofré
>>>>> jbonofre@apache.org
>>>>> http://blog.nanthrax.net
>>>>> Talend - http://www.talend.com
>>>>>
>>>>
>>>>
>>>
>>
>

-- 
Jean-Baptiste Onofré
jbonofre@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com

Re: [PROPOSAL] New sdk languages

Posted by Neville Li <ne...@gmail.com>.
Thanks guys for the interest. I'm really excited about all the feedbacks
from the community.

A little background: we developed Scio to bring Google Cloud Dataflow
closer to the Scalding/Spark ecosystem that our developers are familiar
with while bringing some missing pieces to the table (type safe BigQuery,
HDFS, REPL to name a few).

I have to admit that I'm pretty new to the BEAM development but would love
to get feedbacks and advices on how to bring Scio closer to BEAM feature
set and semantics. Scio doesn't have to live with the BEAM code base just
yet (we're still under heavy development) but I'd like to see it as a de
facto Scala API endorsed by the BEAM community.

@Ismaël: I'm curious what's this session thing you're referring to?

On Thu, Mar 24, 2016 at 3:40 PM Frances Perry <fj...@google.com.invalid>
wrote:

> +Neville and Rafal for their take ;-)
>
> Excited to see this out. Multiple community driven SDKs are right in line
> with our goals for Beam.
>
>
> On Thu, Mar 24, 2016 at 3:04 PM, Ismaël Mejía <ie...@gmail.com> wrote:
>
> > Addendum: actually the semantic model support is not so far away as I
> said
> > before (I havent finished reading and I thought they didn't support
> > sessions), and looking at the git history the project is not so young
> > either and it is quite active.
> >
> > On Thu, Mar 24, 2016 at 10:52 PM, Ismaël Mejía <ie...@gmail.com>
> wrote:
> >
> > > Hello,
> > >
> > > I just checked a bit the code and what they have done is interesting,
> the
> > > SCollection wrapper is worth a look, as well as the examples to get an
> > idea
> > > of their intentions, the fact that the code looks so spark-lish
> > > (distributed collections like) is something that is quite interesting
> > too:
> > >
> > >     val (sc, args) = ContextAndArgs(cmdlineArgs)
> > >     sc.textFile(args.getOrElse("input", ExampleData.KING_LEAR))
> > >       .flatMap(_.split("[^a-zA-Z']+").filter(_.nonEmpty))
> > >       .countByValue()
> > >       .map(t => t._1 + ": " + t._2)
> > >       .saveAsTextFile(args("output"))
> > >     sc.close()
> > >
> > > They have a repl, and since the project is a bit young they don't
> support
> > > all the advanced semantics of Beam, They also have a Hadoop File
> > > Sink/Source. I think it would be nice to work with them, but if it is
> not
> > > possible, at least I think it is worth to coordinate some sharing e.g.
> in
> > > the Sink/Source area + other extensions.
> > >
> > > Aditionally their code is also under the Apache license.
> > >
> > >
> > > On Thu, Mar 24, 2016 at 9:20 PM, Jean-Baptiste Onofré <jb@nanthrax.net
> >
> > > wrote:
> > >
> > >> Hi Raghu,
> > >>
> > >> I agree: we should provide SDK in different languages, and DSLs for
> > >> specific use cases.
> > >>
> > >> You got why I sent my proposal  ;)
> > >>
> > >> Regards
> > >> JB
> > >>
> > >>
> > >> On 03/24/2016 07:14 PM, Raghu Angadi wrote:
> > >>
> > >>> I would love to see Scala API properly supported. I didn't know about
> > >>> scio.
> > >>> Scala is such a natural fit for Dataflow API.
> > >>>
> > >>> I am not sure of the policy w.r.t where such packages would live in
> > Beam
> > >>> repo, but I personally would write my Dataflow applications in Scala.
> > It
> > >>> is
> > >>> probably already the case but my request would be : it should be as
> > thin
> > >>> as
> > >>> reasonably possible (that might make it a bit less like
> scalding/spark
> > >>> API
> > >>> in some cases, which I think is a good compromise).
> > >>>
> > >>> On Thu, Mar 24, 2016 at 6:01 AM, Jean-Baptiste Onofré <
> jb@nanthrax.net
> > >
> > >>> wrote:
> > >>>
> > >>> Hi beamers,
> > >>>>
> > >>>> right now, Beam provides Java SDK.
> > >>>>
> > >>>> AFAIK, very soon, you should have the Python SDK ;)
> > >>>>
> > >>>> Spotify created a Scala API on top of Google Dataflow SDK:
> > >>>>
> > >>>> https://github.com/spotify/scio
> > >>>>
> > >>>> What do you think of asking if they want to donate this as Beam
> Scala
> > >>>> SDK ?
> > >>>> I planned to work on a Scala SDK, but as it seems there's already
> > >>>> something, it makes sense to leverage it.
> > >>>>
> > >>>> Thoughts ?
> > >>>>
> > >>>> Regards
> > >>>> JB
> > >>>> --
> > >>>> Jean-Baptiste Onofré
> > >>>> jbonofre@apache.org
> > >>>> http://blog.nanthrax.net
> > >>>> Talend - http://www.talend.com
> > >>>>
> > >>>>
> > >>>
> > >> --
> > >> Jean-Baptiste Onofré
> > >> jbonofre@apache.org
> > >> http://blog.nanthrax.net
> > >> Talend - http://www.talend.com
> > >>
> > >
> > >
> >
>

Re: [PROPOSAL] New sdk languages

Posted by Frances Perry <fj...@google.com.INVALID>.
+Neville and Rafal for their take ;-)

Excited to see this out. Multiple community driven SDKs are right in line
with our goals for Beam.


On Thu, Mar 24, 2016 at 3:04 PM, Ismaël Mejía <ie...@gmail.com> wrote:

> Addendum: actually the semantic model support is not so far away as I said
> before (I havent finished reading and I thought they didn't support
> sessions), and looking at the git history the project is not so young
> either and it is quite active.
>
> On Thu, Mar 24, 2016 at 10:52 PM, Ismaël Mejía <ie...@gmail.com> wrote:
>
> > Hello,
> >
> > I just checked a bit the code and what they have done is interesting, the
> > SCollection wrapper is worth a look, as well as the examples to get an
> idea
> > of their intentions, the fact that the code looks so spark-lish
> > (distributed collections like) is something that is quite interesting
> too:
> >
> >     val (sc, args) = ContextAndArgs(cmdlineArgs)
> >     sc.textFile(args.getOrElse("input", ExampleData.KING_LEAR))
> >       .flatMap(_.split("[^a-zA-Z']+").filter(_.nonEmpty))
> >       .countByValue()
> >       .map(t => t._1 + ": " + t._2)
> >       .saveAsTextFile(args("output"))
> >     sc.close()
> >
> > They have a repl, and since the project is a bit young they don't support
> > all the advanced semantics of Beam, They also have a Hadoop File
> > Sink/Source. I think it would be nice to work with them, but if it is not
> > possible, at least I think it is worth to coordinate some sharing e.g. in
> > the Sink/Source area + other extensions.
> >
> > Aditionally their code is also under the Apache license.
> >
> >
> > On Thu, Mar 24, 2016 at 9:20 PM, Jean-Baptiste Onofré <jb...@nanthrax.net>
> > wrote:
> >
> >> Hi Raghu,
> >>
> >> I agree: we should provide SDK in different languages, and DSLs for
> >> specific use cases.
> >>
> >> You got why I sent my proposal  ;)
> >>
> >> Regards
> >> JB
> >>
> >>
> >> On 03/24/2016 07:14 PM, Raghu Angadi wrote:
> >>
> >>> I would love to see Scala API properly supported. I didn't know about
> >>> scio.
> >>> Scala is such a natural fit for Dataflow API.
> >>>
> >>> I am not sure of the policy w.r.t where such packages would live in
> Beam
> >>> repo, but I personally would write my Dataflow applications in Scala.
> It
> >>> is
> >>> probably already the case but my request would be : it should be as
> thin
> >>> as
> >>> reasonably possible (that might make it a bit less like scalding/spark
> >>> API
> >>> in some cases, which I think is a good compromise).
> >>>
> >>> On Thu, Mar 24, 2016 at 6:01 AM, Jean-Baptiste Onofré <jb@nanthrax.net
> >
> >>> wrote:
> >>>
> >>> Hi beamers,
> >>>>
> >>>> right now, Beam provides Java SDK.
> >>>>
> >>>> AFAIK, very soon, you should have the Python SDK ;)
> >>>>
> >>>> Spotify created a Scala API on top of Google Dataflow SDK:
> >>>>
> >>>> https://github.com/spotify/scio
> >>>>
> >>>> What do you think of asking if they want to donate this as Beam Scala
> >>>> SDK ?
> >>>> I planned to work on a Scala SDK, but as it seems there's already
> >>>> something, it makes sense to leverage it.
> >>>>
> >>>> Thoughts ?
> >>>>
> >>>> Regards
> >>>> JB
> >>>> --
> >>>> Jean-Baptiste Onofré
> >>>> jbonofre@apache.org
> >>>> http://blog.nanthrax.net
> >>>> Talend - http://www.talend.com
> >>>>
> >>>>
> >>>
> >> --
> >> Jean-Baptiste Onofré
> >> jbonofre@apache.org
> >> http://blog.nanthrax.net
> >> Talend - http://www.talend.com
> >>
> >
> >
>

Re: [PROPOSAL] New sdk languages

Posted by Ismaël Mejía <ie...@gmail.com>.
Addendum: actually the semantic model support is not so far away as I said
before (I havent finished reading and I thought they didn't support
sessions), and looking at the git history the project is not so young
either and it is quite active.

On Thu, Mar 24, 2016 at 10:52 PM, Ismaël Mejía <ie...@gmail.com> wrote:

> Hello,
>
> I just checked a bit the code and what they have done is interesting, the
> SCollection wrapper is worth a look, as well as the examples to get an idea
> of their intentions, the fact that the code looks so spark-lish
> (distributed collections like) is something that is quite interesting too:
>
>     val (sc, args) = ContextAndArgs(cmdlineArgs)
>     sc.textFile(args.getOrElse("input", ExampleData.KING_LEAR))
>       .flatMap(_.split("[^a-zA-Z']+").filter(_.nonEmpty))
>       .countByValue()
>       .map(t => t._1 + ": " + t._2)
>       .saveAsTextFile(args("output"))
>     sc.close()
>
> They have a repl, and since the project is a bit young they don't support
> all the advanced semantics of Beam, They also have a Hadoop File
> Sink/Source. I think it would be nice to work with them, but if it is not
> possible, at least I think it is worth to coordinate some sharing e.g. in
> the Sink/Source area + other extensions.
>
> Aditionally their code is also under the Apache license.
>
>
> On Thu, Mar 24, 2016 at 9:20 PM, Jean-Baptiste Onofré <jb...@nanthrax.net>
> wrote:
>
>> Hi Raghu,
>>
>> I agree: we should provide SDK in different languages, and DSLs for
>> specific use cases.
>>
>> You got why I sent my proposal  ;)
>>
>> Regards
>> JB
>>
>>
>> On 03/24/2016 07:14 PM, Raghu Angadi wrote:
>>
>>> I would love to see Scala API properly supported. I didn't know about
>>> scio.
>>> Scala is such a natural fit for Dataflow API.
>>>
>>> I am not sure of the policy w.r.t where such packages would live in Beam
>>> repo, but I personally would write my Dataflow applications in Scala. It
>>> is
>>> probably already the case but my request would be : it should be as thin
>>> as
>>> reasonably possible (that might make it a bit less like scalding/spark
>>> API
>>> in some cases, which I think is a good compromise).
>>>
>>> On Thu, Mar 24, 2016 at 6:01 AM, Jean-Baptiste Onofré <jb...@nanthrax.net>
>>> wrote:
>>>
>>> Hi beamers,
>>>>
>>>> right now, Beam provides Java SDK.
>>>>
>>>> AFAIK, very soon, you should have the Python SDK ;)
>>>>
>>>> Spotify created a Scala API on top of Google Dataflow SDK:
>>>>
>>>> https://github.com/spotify/scio
>>>>
>>>> What do you think of asking if they want to donate this as Beam Scala
>>>> SDK ?
>>>> I planned to work on a Scala SDK, but as it seems there's already
>>>> something, it makes sense to leverage it.
>>>>
>>>> Thoughts ?
>>>>
>>>> Regards
>>>> JB
>>>> --
>>>> Jean-Baptiste Onofré
>>>> jbonofre@apache.org
>>>> http://blog.nanthrax.net
>>>> Talend - http://www.talend.com
>>>>
>>>>
>>>
>> --
>> Jean-Baptiste Onofré
>> jbonofre@apache.org
>> http://blog.nanthrax.net
>> Talend - http://www.talend.com
>>
>
>

Re: [PROPOSAL] New sdk languages

Posted by Ismaël Mejía <ie...@gmail.com>.
Hello,

I just checked a bit the code and what they have done is interesting, the
SCollection wrapper is worth a look, as well as the examples to get an idea
of their intentions, the fact that the code looks so spark-lish
(distributed collections like) is something that is quite interesting too:

    val (sc, args) = ContextAndArgs(cmdlineArgs)
    sc.textFile(args.getOrElse("input", ExampleData.KING_LEAR))
      .flatMap(_.split("[^a-zA-Z']+").filter(_.nonEmpty))
      .countByValue()
      .map(t => t._1 + ": " + t._2)
      .saveAsTextFile(args("output"))
    sc.close()

They have a repl, and since the project is a bit young they don't support
all the advanced semantics of Beam, They also have a Hadoop File
Sink/Source. I think it would be nice to work with them, but if it is not
possible, at least I think it is worth to coordinate some sharing e.g. in
the Sink/Source area + other extensions.

Aditionally their code is also under the Apache license.


On Thu, Mar 24, 2016 at 9:20 PM, Jean-Baptiste Onofré <jb...@nanthrax.net>
wrote:

> Hi Raghu,
>
> I agree: we should provide SDK in different languages, and DSLs for
> specific use cases.
>
> You got why I sent my proposal  ;)
>
> Regards
> JB
>
>
> On 03/24/2016 07:14 PM, Raghu Angadi wrote:
>
>> I would love to see Scala API properly supported. I didn't know about
>> scio.
>> Scala is such a natural fit for Dataflow API.
>>
>> I am not sure of the policy w.r.t where such packages would live in Beam
>> repo, but I personally would write my Dataflow applications in Scala. It
>> is
>> probably already the case but my request would be : it should be as thin
>> as
>> reasonably possible (that might make it a bit less like scalding/spark API
>> in some cases, which I think is a good compromise).
>>
>> On Thu, Mar 24, 2016 at 6:01 AM, Jean-Baptiste Onofré <jb...@nanthrax.net>
>> wrote:
>>
>> Hi beamers,
>>>
>>> right now, Beam provides Java SDK.
>>>
>>> AFAIK, very soon, you should have the Python SDK ;)
>>>
>>> Spotify created a Scala API on top of Google Dataflow SDK:
>>>
>>> https://github.com/spotify/scio
>>>
>>> What do you think of asking if they want to donate this as Beam Scala
>>> SDK ?
>>> I planned to work on a Scala SDK, but as it seems there's already
>>> something, it makes sense to leverage it.
>>>
>>> Thoughts ?
>>>
>>> Regards
>>> JB
>>> --
>>> Jean-Baptiste Onofré
>>> jbonofre@apache.org
>>> http://blog.nanthrax.net
>>> Talend - http://www.talend.com
>>>
>>>
>>
> --
> Jean-Baptiste Onofré
> jbonofre@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>

Re: [PROPOSAL] New sdk languages

Posted by Jean-Baptiste Onofré <jb...@nanthrax.net>.
Hi Raghu,

I agree: we should provide SDK in different languages, and DSLs for 
specific use cases.

You got why I sent my proposal  ;)

Regards
JB

On 03/24/2016 07:14 PM, Raghu Angadi wrote:
> I would love to see Scala API properly supported. I didn't know about scio.
> Scala is such a natural fit for Dataflow API.
>
> I am not sure of the policy w.r.t where such packages would live in Beam
> repo, but I personally would write my Dataflow applications in Scala. It is
> probably already the case but my request would be : it should be as thin as
> reasonably possible (that might make it a bit less like scalding/spark API
> in some cases, which I think is a good compromise).
>
> On Thu, Mar 24, 2016 at 6:01 AM, Jean-Baptiste Onofré <jb...@nanthrax.net>
> wrote:
>
>> Hi beamers,
>>
>> right now, Beam provides Java SDK.
>>
>> AFAIK, very soon, you should have the Python SDK ;)
>>
>> Spotify created a Scala API on top of Google Dataflow SDK:
>>
>> https://github.com/spotify/scio
>>
>> What do you think of asking if they want to donate this as Beam Scala SDK ?
>> I planned to work on a Scala SDK, but as it seems there's already
>> something, it makes sense to leverage it.
>>
>> Thoughts ?
>>
>> Regards
>> JB
>> --
>> Jean-Baptiste Onofré
>> jbonofre@apache.org
>> http://blog.nanthrax.net
>> Talend - http://www.talend.com
>>
>

-- 
Jean-Baptiste Onofré
jbonofre@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com

Re: [PROPOSAL] New sdk languages

Posted by Raghu Angadi <ra...@google.com.INVALID>.
I would love to see Scala API properly supported. I didn't know about scio.
Scala is such a natural fit for Dataflow API.

I am not sure of the policy w.r.t where such packages would live in Beam
repo, but I personally would write my Dataflow applications in Scala. It is
probably already the case but my request would be : it should be as thin as
reasonably possible (that might make it a bit less like scalding/spark API
in some cases, which I think is a good compromise).

On Thu, Mar 24, 2016 at 6:01 AM, Jean-Baptiste Onofré <jb...@nanthrax.net>
wrote:

> Hi beamers,
>
> right now, Beam provides Java SDK.
>
> AFAIK, very soon, you should have the Python SDK ;)
>
> Spotify created a Scala API on top of Google Dataflow SDK:
>
> https://github.com/spotify/scio
>
> What do you think of asking if they want to donate this as Beam Scala SDK ?
> I planned to work on a Scala SDK, but as it seems there's already
> something, it makes sense to leverage it.
>
> Thoughts ?
>
> Regards
> JB
> --
> Jean-Baptiste Onofré
> jbonofre@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>