You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@beam.apache.org by Kenneth Knowles <ke...@apache.org> on 2019/01/04 21:22:22 UTC

Re: introducing streamy-db

Scio is a Scala wrapper on top of Beam's Java SDK. So it still benefits
from the maturity of Beam Java in terms of performance and reliability.
Using Scala will definitely be less verbose than Java. You can use Scio or
the Java SDK directly with Scala's support for calling Java libraries.

Kenn

On Sun, Dec 30, 2018 at 12:38 PM <ja...@gmail.com> wrote:

> Hi Chak,
>
> I'm not sure if it's the correct decision...
> To be completely honest, the first iterations (which I haven't made public
> so far) were actually in java.
> However I find java to be a bit verbose for my taste.
> The past 5 years I've worked in OCaml, and despite the lacking
> tooling/ecosystem I really liked the language. It's really expressive.
>
> Scala has a similar feel as OCaml, which is why I want to pick it up, and
> thus why I experimented with it on this project.
>
> But if most people who would want to contribute would only do so in a java
> codebase, then I don't mind continuing this in java.
>
> Kind regards,
> Jan
>
>
> On Sun, 30 Dec 2018 at 21:14, Chak-Pong Chung <cc...@gatech.edu> wrote:
>
>> Hi Jan,
>>
>> This is quite interesting. As far as I know, Beam and Flink have more
>> mature and stable API in Java. What is the motivation here to use
>> scala/scio in your project?
>>
>> Kind regards,
>> Chak
>>
>> On Sun, Dec 30, 2018 at 1:02 PM <ja...@gmail.com> wrote:
>>
>>> Hi all,
>>>
>>> I figured out how to build deterministic transaction processing on top
>>> of apache beam/flink.
>>>
>>> https://domsj.info/2018/12/30/introducing-streamy-db.html
>>> https://github.com/domsj/streamy-db
>>>
>>> I can use some help, please join me!
>>>
>>> Kind regards,
>>> Jan
>>>
>>

Re: introducing streamy-db

Posted by Gleb Kanterov <gl...@spotify.com>.
Max is right, it isn't generally possible to upgrade Beam version for scio.
Scio doesn't use much of Beam internals, but Beam API isn't guaranteed to
be binary compatible in a minor release. There is a small chance that it
might work.

I would recommend using the latest version 0.7.0-beta3 built for Beam 2.9.0.

On Wed, Jan 9, 2019 at 4:49 PM Maximilian Michels <mx...@apache.org> wrote:

> @Gleb Thanks for pointing out some of the features unique to Scio! Have
> yet to
> try it out, but I see many Beam folks liking it a lot.
>
> @Jan I think the Beam version is hardcoded in Scio because Scio is
> essentially a
> layer on top of Beam and some of its internals. While the public API does
> not
> change much, some of the internals do and thus you can't freely choose the
> Beam
> version. Perhaps Gleb can give more insight?
>
> I think it would be fine to add your project to the page. We don't have
> strict
> requirements for the level of maturity of the project. Feel free to open a
> PR.
>
> Let's have a chat at Fosdem! :)
>
> Cheers,
> Max
>
> On 07.01.19 16:43, jan.doms@gmail.com wrote:
> > Hi all,
> >
> > I think I'm going to stick with scala and scio :-).
> >
> > I'm curious though: why is there a hard coupling between scio and beam
> versions?
> > I was hoping to use latest scio 0.7.0-beta2 with beam 2.9.0 but that
> appears to
> > get blocked, which was unexpected to me.
> >
> > Regarding the suggestion to add my project to this page: I'm flattered,
> but it's
> > all still very early and prototype like...
> >
> > Btw, Max, I was wondering where I heard your name before. Apparently
> it's
> > because I was planning to go to your fosdem talk.
> > So if you or anyone else want to have a chat, I should probably be there
> on
> > sunday :-).
> >
> > Kind regards,
> > Jan
> >
> >
> > On Mon, 7 Jan 2019 at 17:13, Gleb Kanterov <gleb@spotify.com
> > <ma...@spotify.com>> wrote:
> >
> >     Agree with Max that scio is lagging behind. However, it also has
> features
> >     that significantly reduce boilerplate, and even improve performance.
> For
> >     instance, the latest version (0.7.0) automatically derives binary
> coders for
> >     case classes using macro at compile-time, that is a way better
> >     than Java/Kryo serialization.
> >
> >     Jan, if you find that you are missing features, or have general
> feedback,
> >     you are always welcome to create issues or pull requests in
> spotify/scio
> >     <https://github.com/spotify/scio> repository :).
> >
> >     Gleb
> >
> >     On Mon, Jan 7, 2019 at 4:57 PM Maximilian Michels <mxm@apache.org
> >     <ma...@apache.org>> wrote:
> >
> >         Interesting project, Jan! I think we could add your project to
> this page:
> >         https://beam.apache.org/community/integrations/
> >
> >         The benefit of using the Java DSL would be to be able to
> directly track
> >         Beam.
> >         The Scio Scala DSL usually lags a bit behind. But since you
> probably don't
> >         require the latest features and Scala is more enjoyable for you,
> I think
> >         your
> >         current design choice is sensible.
> >
> >         Best,
> >         Max
> >
> >         On 04.01.19 16:22, Kenneth Knowles wrote:
> >          > Scio is a Scala wrapper on top of Beam's Java SDK. So it still
> >         benefits from the
> >          > maturity of Beam Java in terms of performance and
> reliability. Using
> >         Scala will
> >          > definitely be less verbose than Java. You can use Scio or the
> Java
> >         SDK directly
> >          > with Scala's support for calling Java libraries.
> >          >
> >          > Kenn
> >          >
> >          > On Sun, Dec 30, 2018 at 12:38 PM <jan.doms@gmail.com
> >         <ma...@gmail.com>
> >          > <mailto:jan.doms@gmail.com <ma...@gmail.com>>>
> wrote:
> >          >
> >          >     Hi Chak,
> >          >
> >          >     I'm not sure if it's the correct decision...
> >          >     To be completely honest, the first iterations (which I
> haven't
> >         made public
> >          >     so far) were actually in java.
> >          >     However I find java to be a bit verbose for my taste.
> >          >     The past 5 years I've worked in OCaml, and despite the
> lacking
> >          >     tooling/ecosystem I really liked the language. It's really
> >         expressive.
> >          >
> >          >     Scala has a similar feel as OCaml, which is why I want to
> pick it
> >         up, and
> >          >     thus why I experimented with it on this project.
> >          >
> >          >     But if most people who would want to contribute would
> only do so
> >         in a java
> >          >     codebase, then I don't mind continuing this in java.
> >          >
> >          >     Kind regards,
> >          >     Jan
> >          >
> >          >
> >          >     On Sun, 30 Dec 2018 at 21:14, Chak-Pong Chung
> >         <cchung49@gatech.edu <ma...@gatech.edu>
> >          >     <mailto:cchung49@gatech.edu <ma...@gatech.edu>>>
> wrote:
> >          >
> >          >         Hi Jan,
> >          >
> >          >         This is quite interesting. As far as I know, Beam and
> Flink
> >         have more
> >          >         mature and stable API in Java. What is the motivation
> here to use
> >          >         scala/scio in your project?
> >          >
> >          >         Kind regards,
> >          >         Chak
> >          >
> >          >         On Sun, Dec 30, 2018 at 1:02 PM <jan.doms@gmail.com
> >         <ma...@gmail.com>
> >          >         <mailto:jan.doms@gmail.com <ma...@gmail.com>>>
> wrote:
> >          >
> >          >             Hi all,
> >          >
> >          >             I figured out how to build deterministic
> transaction
> >         processing on
> >          >             top of apache beam/flink.
> >          >
> >          > https://domsj.info/2018/12/30/introducing-streamy-db.html
> >          > https://github.com/domsj/streamy-db
> >          >
> >          >             I can use some help, please join me!
> >          >
> >          >             Kind regards,
> >          >             Jan
> >          >
> >
> >
> >
> >     --
> >     Cheers,
> >     Gleb
> >
>


-- 
Cheers,
Gleb

Re: introducing streamy-db

Posted by Maximilian Michels <mx...@apache.org>.
@Gleb Thanks for pointing out some of the features unique to Scio! Have yet to 
try it out, but I see many Beam folks liking it a lot.

@Jan I think the Beam version is hardcoded in Scio because Scio is essentially a 
layer on top of Beam and some of its internals. While the public API does not 
change much, some of the internals do and thus you can't freely choose the Beam 
version. Perhaps Gleb can give more insight?

I think it would be fine to add your project to the page. We don't have strict 
requirements for the level of maturity of the project. Feel free to open a PR.

Let's have a chat at Fosdem! :)

Cheers,
Max

On 07.01.19 16:43, jan.doms@gmail.com wrote:
> Hi all,
> 
> I think I'm going to stick with scala and scio :-).
> 
> I'm curious though: why is there a hard coupling between scio and beam versions? 
> I was hoping to use latest scio 0.7.0-beta2 with beam 2.9.0 but that appears to 
> get blocked, which was unexpected to me.
> 
> Regarding the suggestion to add my project to this page: I'm flattered, but it's 
> all still very early and prototype like...
> 
> Btw, Max, I was wondering where I heard your name before. Apparently it's 
> because I was planning to go to your fosdem talk.
> So if you or anyone else want to have a chat, I should probably be there on 
> sunday :-).
> 
> Kind regards,
> Jan
> 
> 
> On Mon, 7 Jan 2019 at 17:13, Gleb Kanterov <gleb@spotify.com 
> <ma...@spotify.com>> wrote:
> 
>     Agree with Max that scio is lagging behind. However, it also has features
>     that significantly reduce boilerplate, and even improve performance. For
>     instance, the latest version (0.7.0) automatically derives binary coders for
>     case classes using macro at compile-time, that is a way better
>     than Java/Kryo serialization.
> 
>     Jan, if you find that you are missing features, or have general feedback,
>     you are always welcome to create issues or pull requests in spotify/scio
>     <https://github.com/spotify/scio> repository :).
> 
>     Gleb
> 
>     On Mon, Jan 7, 2019 at 4:57 PM Maximilian Michels <mxm@apache.org
>     <ma...@apache.org>> wrote:
> 
>         Interesting project, Jan! I think we could add your project to this page:
>         https://beam.apache.org/community/integrations/
> 
>         The benefit of using the Java DSL would be to be able to directly track
>         Beam.
>         The Scio Scala DSL usually lags a bit behind. But since you probably don't
>         require the latest features and Scala is more enjoyable for you, I think
>         your
>         current design choice is sensible.
> 
>         Best,
>         Max
> 
>         On 04.01.19 16:22, Kenneth Knowles wrote:
>          > Scio is a Scala wrapper on top of Beam's Java SDK. So it still
>         benefits from the
>          > maturity of Beam Java in terms of performance and reliability. Using
>         Scala will
>          > definitely be less verbose than Java. You can use Scio or the Java
>         SDK directly
>          > with Scala's support for calling Java libraries.
>          >
>          > Kenn
>          >
>          > On Sun, Dec 30, 2018 at 12:38 PM <jan.doms@gmail.com
>         <ma...@gmail.com>
>          > <mailto:jan.doms@gmail.com <ma...@gmail.com>>> wrote:
>          >
>          >     Hi Chak,
>          >
>          >     I'm not sure if it's the correct decision...
>          >     To be completely honest, the first iterations (which I haven't
>         made public
>          >     so far) were actually in java.
>          >     However I find java to be a bit verbose for my taste.
>          >     The past 5 years I've worked in OCaml, and despite the lacking
>          >     tooling/ecosystem I really liked the language. It's really
>         expressive.
>          >
>          >     Scala has a similar feel as OCaml, which is why I want to pick it
>         up, and
>          >     thus why I experimented with it on this project.
>          >
>          >     But if most people who would want to contribute would only do so
>         in a java
>          >     codebase, then I don't mind continuing this in java.
>          >
>          >     Kind regards,
>          >     Jan
>          >
>          >
>          >     On Sun, 30 Dec 2018 at 21:14, Chak-Pong Chung
>         <cchung49@gatech.edu <ma...@gatech.edu>
>          >     <mailto:cchung49@gatech.edu <ma...@gatech.edu>>> wrote:
>          >
>          >         Hi Jan,
>          >
>          >         This is quite interesting. As far as I know, Beam and Flink
>         have more
>          >         mature and stable API in Java. What is the motivation here to use
>          >         scala/scio in your project?
>          >
>          >         Kind regards,
>          >         Chak
>          >
>          >         On Sun, Dec 30, 2018 at 1:02 PM <jan.doms@gmail.com
>         <ma...@gmail.com>
>          >         <mailto:jan.doms@gmail.com <ma...@gmail.com>>> wrote:
>          >
>          >             Hi all,
>          >
>          >             I figured out how to build deterministic transaction
>         processing on
>          >             top of apache beam/flink.
>          >
>          > https://domsj.info/2018/12/30/introducing-streamy-db.html
>          > https://github.com/domsj/streamy-db
>          >
>          >             I can use some help, please join me!
>          >
>          >             Kind regards,
>          >             Jan
>          >
> 
> 
> 
>     -- 
>     Cheers,
>     Gleb
> 

Re: introducing streamy-db

Posted by ja...@gmail.com.
Hi all,

I think I'm going to stick with scala and scio :-).

I'm curious though: why is there a hard coupling between scio and beam
versions? I was hoping to use latest scio 0.7.0-beta2 with beam 2.9.0 but
that appears to get blocked, which was unexpected to me.

Regarding the suggestion to add my project to this page: I'm flattered, but
it's all still very early and prototype like...

Btw, Max, I was wondering where I heard your name before. Apparently it's
because I was planning to go to your fosdem talk.
So if you or anyone else want to have a chat, I should probably be there on
sunday :-).

Kind regards,
Jan


On Mon, 7 Jan 2019 at 17:13, Gleb Kanterov <gl...@spotify.com> wrote:

> Agree with Max that scio is lagging behind. However, it also has features
> that significantly reduce boilerplate, and even improve performance. For
> instance, the latest version (0.7.0) automatically derives binary coders
> for case classes using macro at compile-time, that is a way better
> than Java/Kryo serialization.
>
> Jan, if you find that you are missing features, or have general feedback,
> you are always welcome to create issues or pull requests in spotify/scio
> <https://github.com/spotify/scio> repository :).
>
> Gleb
>
> On Mon, Jan 7, 2019 at 4:57 PM Maximilian Michels <mx...@apache.org> wrote:
>
>> Interesting project, Jan! I think we could add your project to this page:
>> https://beam.apache.org/community/integrations/
>>
>> The benefit of using the Java DSL would be to be able to directly track
>> Beam.
>> The Scio Scala DSL usually lags a bit behind. But since you probably
>> don't
>> require the latest features and Scala is more enjoyable for you, I think
>> your
>> current design choice is sensible.
>>
>> Best,
>> Max
>>
>> On 04.01.19 16:22, Kenneth Knowles wrote:
>> > Scio is a Scala wrapper on top of Beam's Java SDK. So it still benefits
>> from the
>> > maturity of Beam Java in terms of performance and reliability. Using
>> Scala will
>> > definitely be less verbose than Java. You can use Scio or the Java SDK
>> directly
>> > with Scala's support for calling Java libraries.
>> >
>> > Kenn
>> >
>> > On Sun, Dec 30, 2018 at 12:38 PM <jan.doms@gmail.com
>> > <ma...@gmail.com>> wrote:
>> >
>> >     Hi Chak,
>> >
>> >     I'm not sure if it's the correct decision...
>> >     To be completely honest, the first iterations (which I haven't made
>> public
>> >     so far) were actually in java.
>> >     However I find java to be a bit verbose for my taste.
>> >     The past 5 years I've worked in OCaml, and despite the lacking
>> >     tooling/ecosystem I really liked the language. It's really
>> expressive.
>> >
>> >     Scala has a similar feel as OCaml, which is why I want to pick it
>> up, and
>> >     thus why I experimented with it on this project.
>> >
>> >     But if most people who would want to contribute would only do so in
>> a java
>> >     codebase, then I don't mind continuing this in java.
>> >
>> >     Kind regards,
>> >     Jan
>> >
>> >
>> >     On Sun, 30 Dec 2018 at 21:14, Chak-Pong Chung <cchung49@gatech.edu
>> >     <ma...@gatech.edu>> wrote:
>> >
>> >         Hi Jan,
>> >
>> >         This is quite interesting. As far as I know, Beam and Flink
>> have more
>> >         mature and stable API in Java. What is the motivation here to
>> use
>> >         scala/scio in your project?
>> >
>> >         Kind regards,
>> >         Chak
>> >
>> >         On Sun, Dec 30, 2018 at 1:02 PM <jan.doms@gmail.com
>> >         <ma...@gmail.com>> wrote:
>> >
>> >             Hi all,
>> >
>> >             I figured out how to build deterministic transaction
>> processing on
>> >             top of apache beam/flink.
>> >
>> >             https://domsj.info/2018/12/30/introducing-streamy-db.html
>> >             https://github.com/domsj/streamy-db
>> >
>> >             I can use some help, please join me!
>> >
>> >             Kind regards,
>> >             Jan
>> >
>>
>
>
> --
> Cheers,
> Gleb
>

Re: introducing streamy-db

Posted by Gleb Kanterov <gl...@spotify.com>.
Agree with Max that scio is lagging behind. However, it also has features
that significantly reduce boilerplate, and even improve performance. For
instance, the latest version (0.7.0) automatically derives binary coders
for case classes using macro at compile-time, that is a way better
than Java/Kryo serialization.

Jan, if you find that you are missing features, or have general feedback,
you are always welcome to create issues or pull requests in spotify/scio
<https://github.com/spotify/scio> repository :).

Gleb

On Mon, Jan 7, 2019 at 4:57 PM Maximilian Michels <mx...@apache.org> wrote:

> Interesting project, Jan! I think we could add your project to this page:
> https://beam.apache.org/community/integrations/
>
> The benefit of using the Java DSL would be to be able to directly track
> Beam.
> The Scio Scala DSL usually lags a bit behind. But since you probably don't
> require the latest features and Scala is more enjoyable for you, I think
> your
> current design choice is sensible.
>
> Best,
> Max
>
> On 04.01.19 16:22, Kenneth Knowles wrote:
> > Scio is a Scala wrapper on top of Beam's Java SDK. So it still benefits
> from the
> > maturity of Beam Java in terms of performance and reliability. Using
> Scala will
> > definitely be less verbose than Java. You can use Scio or the Java SDK
> directly
> > with Scala's support for calling Java libraries.
> >
> > Kenn
> >
> > On Sun, Dec 30, 2018 at 12:38 PM <jan.doms@gmail.com
> > <ma...@gmail.com>> wrote:
> >
> >     Hi Chak,
> >
> >     I'm not sure if it's the correct decision...
> >     To be completely honest, the first iterations (which I haven't made
> public
> >     so far) were actually in java.
> >     However I find java to be a bit verbose for my taste.
> >     The past 5 years I've worked in OCaml, and despite the lacking
> >     tooling/ecosystem I really liked the language. It's really
> expressive.
> >
> >     Scala has a similar feel as OCaml, which is why I want to pick it
> up, and
> >     thus why I experimented with it on this project.
> >
> >     But if most people who would want to contribute would only do so in
> a java
> >     codebase, then I don't mind continuing this in java.
> >
> >     Kind regards,
> >     Jan
> >
> >
> >     On Sun, 30 Dec 2018 at 21:14, Chak-Pong Chung <cchung49@gatech.edu
> >     <ma...@gatech.edu>> wrote:
> >
> >         Hi Jan,
> >
> >         This is quite interesting. As far as I know, Beam and Flink have
> more
> >         mature and stable API in Java. What is the motivation here to use
> >         scala/scio in your project?
> >
> >         Kind regards,
> >         Chak
> >
> >         On Sun, Dec 30, 2018 at 1:02 PM <jan.doms@gmail.com
> >         <ma...@gmail.com>> wrote:
> >
> >             Hi all,
> >
> >             I figured out how to build deterministic transaction
> processing on
> >             top of apache beam/flink.
> >
> >             https://domsj.info/2018/12/30/introducing-streamy-db.html
> >             https://github.com/domsj/streamy-db
> >
> >             I can use some help, please join me!
> >
> >             Kind regards,
> >             Jan
> >
>


-- 
Cheers,
Gleb

Re: introducing streamy-db

Posted by Maximilian Michels <mx...@apache.org>.
Interesting project, Jan! I think we could add your project to this page: 
https://beam.apache.org/community/integrations/

The benefit of using the Java DSL would be to be able to directly track Beam. 
The Scio Scala DSL usually lags a bit behind. But since you probably don't 
require the latest features and Scala is more enjoyable for you, I think your 
current design choice is sensible.

Best,
Max

On 04.01.19 16:22, Kenneth Knowles wrote:
> Scio is a Scala wrapper on top of Beam's Java SDK. So it still benefits from the 
> maturity of Beam Java in terms of performance and reliability. Using Scala will 
> definitely be less verbose than Java. You can use Scio or the Java SDK directly 
> with Scala's support for calling Java libraries.
> 
> Kenn
> 
> On Sun, Dec 30, 2018 at 12:38 PM <jan.doms@gmail.com 
> <ma...@gmail.com>> wrote:
> 
>     Hi Chak,
> 
>     I'm not sure if it's the correct decision...
>     To be completely honest, the first iterations (which I haven't made public
>     so far) were actually in java.
>     However I find java to be a bit verbose for my taste.
>     The past 5 years I've worked in OCaml, and despite the lacking
>     tooling/ecosystem I really liked the language. It's really expressive.
> 
>     Scala has a similar feel as OCaml, which is why I want to pick it up, and
>     thus why I experimented with it on this project.
> 
>     But if most people who would want to contribute would only do so in a java
>     codebase, then I don't mind continuing this in java.
> 
>     Kind regards,
>     Jan
> 
> 
>     On Sun, 30 Dec 2018 at 21:14, Chak-Pong Chung <cchung49@gatech.edu
>     <ma...@gatech.edu>> wrote:
> 
>         Hi Jan,
> 
>         This is quite interesting. As far as I know, Beam and Flink have more
>         mature and stable API in Java. What is the motivation here to use
>         scala/scio in your project?
> 
>         Kind regards,
>         Chak
> 
>         On Sun, Dec 30, 2018 at 1:02 PM <jan.doms@gmail.com
>         <ma...@gmail.com>> wrote:
> 
>             Hi all,
> 
>             I figured out how to build deterministic transaction processing on
>             top of apache beam/flink.
> 
>             https://domsj.info/2018/12/30/introducing-streamy-db.html
>             https://github.com/domsj/streamy-db
> 
>             I can use some help, please join me!
> 
>             Kind regards,
>             Jan
>