You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@beam.apache.org by Pei HE <pe...@gmail.com> on 2017/04/01 01:24:33 UTC

Update of Pei in Alibaba

Hi all,
On February, I moved from Seattle to Hangzhou, China, and joined Alibaba.
And, I want to give an update of things in here.

A colleague and I have been working on JStorm
<https://github.com/alibaba/jstorm> runner. We have a prototype that works
with WordCount and PAssert. (I am going to start a separate email thread
about how to get it reviewed and merged in Apache Beam.)
We also have Spark clusters, and are very interested in using Spark runner.

Last Saturday, I went to China Hadoop Summit, and gave a talk about Apache
Beam model. While many companies gave talks of their in-house solutions for
unified batch&streaming and unified SQL, there are also lots of interests
and enthusiasts of Beam.

Looking forward to chat more.
--
Pei

Re: Update of Pei in Alibaba

Posted by Ismaël Mejía <ie...@gmail.com>.
Hello Basti,

Thanks a lot for answering, I imagined that, that all the improvements
of both JStorm and Heron wouldn’t translate perfectly but still a
worth goal to try to have the ‘common’ storm parts isolated so they
can be shared with the other runners.

Really interesting, I wish you guys the best for this work, and
welcome to the community.

Ismaël

On Thu, Apr 6, 2017 at 11:51 AM, 刘键(Basti Liu) <ba...@alibaba-inc.com> wrote:
> Hi Ismaël,
>
>
>
> Sorry for the late response. I am the developer of JStorm, and currently work with Pei on JStorm runner.
>
> We have went through current Storm runner( <https://github.com/apache/storm/commits/beam-runner> https://github.com/apache/storm/commits/beam-runner). But it is a very draft version, several PTransforms are not supported or not fully supported, especial for window, trigger and state.
>
>
>
> Generally, JStorm is compatible with the basic API of Storm, while providing improvements or new features on topology master, state manager, exactly once, message transfer mechanism, stage-by-stage backpressure flow control, metrics, etc.
>
> For the basic “at most once” job, JStorm runner can be reused on Storm. But for “window”, “state” and “exactly once” job, unfortunately, JStorm runner can’t be reused. Anyway, we will figure out if the propagation is possible for Storm in the future.
>
>
>
> Regards
>
> Jian Liu(Basti)
>
>
>
> -----Original Message-----
> From: Ismaël Mejía [mailto:iemejia@gmail.com]
> Sent: Sunday, April 02, 2017 3:18 AM
> To: dev@beam.apache.org
> Subject: Re: Update of Pei in Alibaba
>
>
>
> Excellent news,
>
>
>
> Pei it would be great to have a new runner. I am curious about how different are the implementations of storm among them considering that there are already three 'versions': Storm, Jstorm and Heron, I wonder if one runner could traduce to an API that would cover all of them (of course maybe I am super naive I really don't know much about JStorm or Heron and how much they differ from the original storm).
>
>
>
> Jingson, I am super curious about this Galaxy project, it is there any public information about it? is this related to the previous blink ali baba project? I already looked a bit but searching "Ali baba galaxy"
>
> is a recipe for a myriad of telephone sellers :)
>
>
>
> Nice to see that you are going to keep contributing to the project Pei.
>
>
>
> Regards,
>
> Ismaël
>
>
>
>
>
>
>
> On Sat, Apr 1, 2017 at 4:59 PM, Tibor Kiss < <ma...@gmail.com> tibor.kiss@gmail.com> wrote:
>
>> Exciting times, looking forward to try it out!
>
>>
>
>> I shall mention that Taylor Goetz also started creating a BEAM runner
>
>> using Storm.
>
>> His work is available in the storm repo:
>
>>  <https://github.com/apache/storm/commits/beam-runner> https://github.com/apache/storm/commits/beam-runner
>
>> Maybe it's worth while to take a peek and see if something is reusable
>
>> from there.
>
>>
>
>> - Tibor
>
>>
>
>> On Sat, Apr 1, 2017 at 4:37 AM, JingsongLee < <ma...@aliyun.com> lzljs3620320@aliyun.com> wrote:
>
>>
>
>>> Wow, very glad to see JStorm also started building BeamRunner.
>
>>> I am working in Galaxy (Another streaming process engine in Alibaba).
>
>>> I hope that we can work together to promote the use of Apache Beam in
>
>>> Alibaba and China.
>
>>>
>
>>> best,
>
>>> JingsongLee
>
>>> ------------------------------------------------------------------Fro
>
>>> m:Pei HE < <ma...@gmail.com> peihe0@gmail.com>Time:2017 Apr 1 (Sat) 09:24To:dev <
>
>>>  <mailto:dev@beam.apache.org%3eSubject:Update> dev@beam.apache.org>Subject:Update of Pei in Alibaba Hi all, On
>
>>> February, I moved from Seattle to Hangzhou, China, and joined Alibaba.
>
>>> And, I want to give an update of things in here.
>
>>>
>
>>> A colleague and I have been working on JStorm
>
>>> < <https://github.com/alibaba/jstorm> https://github.com/alibaba/jstorm> runner. We have a prototype that
>
>>> works with WordCount and PAssert. (I am going to start a separate
>
>>> email thread about how to get it reviewed and merged in Apache Beam.)
>
>>> We also have Spark clusters, and are very interested in using Spark
>
>>> runner.
>
>>>
>
>>> Last Saturday, I went to China Hadoop Summit, and gave a talk about
>
>>> Apache Beam model. While many companies gave talks of their in-house
>
>>> solutions for unified batch&streaming and unified SQL, there are also
>
>>> lots of interests and enthusiasts of Beam.
>
>>>
>
>>> Looking forward to chat more.
>
>>> --
>
>>> Pei
>
>>>
>
>>>
>
>>
>
>>
>
>> --
>
>> Kiss Tibor
>
>>
>
>> +36 70 275 9863
>
>>  <ma...@gmail.com> tibor.kiss@gmail.com
>

RE: Update of Pei in Alibaba

Posted by "刘键(Basti Liu)" <ba...@alibaba-inc.com>.
Hi Ismaël,

 

Sorry for the late response. I am the developer of JStorm, and currently work with Pei on JStorm runner. 

We have went through current Storm runner( <https://github.com/apache/storm/commits/beam-runner> https://github.com/apache/storm/commits/beam-runner). But it is a very draft version, several PTransforms are not supported or not fully supported, especial for window, trigger and state.

 

Generally, JStorm is compatible with the basic API of Storm, while providing improvements or new features on topology master, state manager, exactly once, message transfer mechanism, stage-by-stage backpressure flow control, metrics, etc. 

For the basic “at most once” job, JStorm runner can be reused on Storm. But for “window”, “state” and “exactly once” job, unfortunately, JStorm runner can’t be reused. Anyway, we will figure out if the propagation is possible for Storm in the future.

 

Regards

Jian Liu(Basti)

 

-----Original Message-----
From: Ismaël Mejía [mailto:iemejia@gmail.com] 
Sent: Sunday, April 02, 2017 3:18 AM
To: dev@beam.apache.org
Subject: Re: Update of Pei in Alibaba

 

Excellent news,

 

Pei it would be great to have a new runner. I am curious about how different are the implementations of storm among them considering that there are already three 'versions': Storm, Jstorm and Heron, I wonder if one runner could traduce to an API that would cover all of them (of course maybe I am super naive I really don't know much about JStorm or Heron and how much they differ from the original storm).

 

Jingson, I am super curious about this Galaxy project, it is there any public information about it? is this related to the previous blink ali baba project? I already looked a bit but searching "Ali baba galaxy"

is a recipe for a myriad of telephone sellers :)

 

Nice to see that you are going to keep contributing to the project Pei.

 

Regards,

Ismaël

 

 

 

On Sat, Apr 1, 2017 at 4:59 PM, Tibor Kiss < <ma...@gmail.com> tibor.kiss@gmail.com> wrote:

> Exciting times, looking forward to try it out!

> 

> I shall mention that Taylor Goetz also started creating a BEAM runner 

> using Storm.

> His work is available in the storm repo:

>  <https://github.com/apache/storm/commits/beam-runner> https://github.com/apache/storm/commits/beam-runner

> Maybe it's worth while to take a peek and see if something is reusable 

> from there.

> 

> - Tibor

> 

> On Sat, Apr 1, 2017 at 4:37 AM, JingsongLee < <ma...@aliyun.com> lzljs3620320@aliyun.com> wrote:

> 

>> Wow, very glad to see JStorm also started building BeamRunner.

>> I am working in Galaxy (Another streaming process engine in Alibaba).

>> I hope that we can work together to promote the use of Apache Beam in 

>> Alibaba and China.

>> 

>> best,

>> JingsongLee

>> ------------------------------------------------------------------Fro

>> m:Pei HE < <ma...@gmail.com> peihe0@gmail.com>Time:2017 Apr 1 (Sat) 09:24To:dev < 

>>  <mailto:dev@beam.apache.org%3eSubject:Update> dev@beam.apache.org>Subject:Update of Pei in Alibaba Hi all, On 

>> February, I moved from Seattle to Hangzhou, China, and joined Alibaba.

>> And, I want to give an update of things in here.

>> 

>> A colleague and I have been working on JStorm 

>> < <https://github.com/alibaba/jstorm> https://github.com/alibaba/jstorm> runner. We have a prototype that 

>> works with WordCount and PAssert. (I am going to start a separate 

>> email thread about how to get it reviewed and merged in Apache Beam.) 

>> We also have Spark clusters, and are very interested in using Spark 

>> runner.

>> 

>> Last Saturday, I went to China Hadoop Summit, and gave a talk about 

>> Apache Beam model. While many companies gave talks of their in-house 

>> solutions for unified batch&streaming and unified SQL, there are also 

>> lots of interests and enthusiasts of Beam.

>> 

>> Looking forward to chat more.

>> --

>> Pei

>> 

>> 

> 

> 

> --

> Kiss Tibor

> 

> +36 70 275 9863

>  <ma...@gmail.com> tibor.kiss@gmail.com


Re: Update of Pei in Alibaba

Posted by Ismaël Mejía <ie...@gmail.com>.
Thanks Jingsong for answering, and the Streamscope ref, I am going to
check the paper, the concept of non-global-checkpointing sounds super
interesting.

It is nice that you guys are also trying to promote the move to a unified model.

Regards,
Ismaël


On Sun, Apr 2, 2017 at 3:40 PM, JingsongLee <lz...@aliyun.com> wrote:
> Hi Ismaël,
> We have a streaming computing platform in Alibaba.
> Galaxy is an internal system, so you can't find some information from Google.
> It is becoming more like StreamScope (you can search it for the paper).
> Non-global-checkpoint makes failure recovery quickly and makes streaming
> applications easier to develop and debug.
>
>
> But as far as I know, each engine has its own tradeoffs, has its own good cases.
> So we also developed an open source platform, which has Spark, Flink and so on.
> We hope we can use Apache Beam to unify the user program model.  This will make
>  the user learning costs are low, the application migration costs are low.
> (Not only from batch to streaming, but also conducive to migration from the
> streaming to the streaming.)
>
>
> ------------------------------------------------------------------From:Ismaël Mejía <ie...@gmail.com>Time:2017 Apr 2 (Sun) 03:18To:dev <de...@beam.apache.org>Subject:Re: Update of Pei in Alibaba
> Excellent news,
>
> Pei it would be great to have a new runner. I am curious about how
> different are the implementations of storm among them considering that
> there are already three 'versions': Storm, Jstorm and Heron, I wonder
> if one runner could traduce to an API that would cover all of them (of
> course maybe I am super naive I really don't know much about JStorm or
> Heron and how much they differ from the original storm).
>
> Jingson, I am super curious about this Galaxy project, it is there any
> public information about it? is this related to the previous blink ali
> baba project? I already looked a bit but searching "Ali baba galaxy"
> is a recipe for a myriad of telephone sellers :)
>
> Nice to see that you are going to keep contributing to the project Pei.
>
> Regards,
> Ismaël
>
>
>
> On Sat, Apr 1, 2017 at 4:59 PM, Tibor Kiss <ti...@gmail.com> wrote:
>> Exciting times, looking forward to try it out!
>>
>> I shall mention that Taylor Goetz also started creating a BEAM runner using
>> Storm.
>> His work is available in the storm repo:
>> https://github.com/apache/storm/commits/beam-runner
>> Maybe it's worth while to take a peek and see if something is reusable from
>> there.
>>
>> - Tibor
>>
>> On Sat, Apr 1, 2017 at 4:37 AM, JingsongLee <lz...@aliyun.com> wrote:
>>
>>> Wow, very glad to see JStorm also started building BeamRunner.
>>> I am working in Galaxy (Another streaming process engine in Alibaba).
>>> I hope that we can work together to promote the use of Apache Beam
>>> in Alibaba and China.
>>>
>>> best,
>>> JingsongLee
>>> ------------------------------------------------------------------From:Pei
>>> HE <pe...@gmail.com>Time:2017 Apr 1 (Sat) 09:24To:dev <
>>> dev@beam.apache.org>Subject:Update of Pei in Alibaba
>>> Hi all,
>>> On February, I moved from Seattle to Hangzhou, China, and joined Alibaba.
>>> And, I want to give an update of things in here.
>>>
>>> A colleague and I have been working on JStorm
>>> <https://github.com/alibaba/jstorm> runner. We have a prototype that works
>>> with WordCount and PAssert. (I am going to start a separate email thread
>>> about how to get it reviewed and merged in Apache Beam.)
>>> We also have Spark clusters, and are very interested in
>>> using Spark runner.
>>>
>>> Last Saturday, I went to China Hadoop Summit, and gave a talk about Apache
>>> Beam model. While many companies gave talks of their
>>> in-house solutions for
>>> unified batch&streaming and unified SQL, there are also lots of interests
>>> and enthusiasts of Beam.
>>>
>>> Looking forward to chat more.
>>> --
>>> Pei
>>>
>>>
>>
>>
>> --
>> Kiss Tibor
>>
>> +36 70 275 9863
>> tibor.kiss@gmail.com

Re: Update of Pei in Alibaba

Posted by JingsongLee <lz...@aliyun.com>.
Hi Ismaël,
We have a streaming computing platform in Alibaba.
Galaxy is an internal system, so you can't find some information from Google.
It is becoming more like StreamScope (you can search it for the paper). 
Non-global-checkpoint makes failure recovery quickly and makes streaming 
applications easier to develop and debug.


But as far as I know, each engine has its own tradeoffs, has its own good cases.
So we also developed an open source platform, which has Spark, Flink and so on.
We hope we can use Apache Beam to unify the user program model.  This will make
 the user learning costs are low, the application migration costs are low. 
(Not only from batch to streaming, but also conducive to migration from the 
streaming to the streaming.) 


------------------------------------------------------------------From:Ismaël Mejía <ie...@gmail.com>Time:2017 Apr 2 (Sun) 03:18To:dev <de...@beam.apache.org>Subject:Re: Update of Pei in Alibaba
Excellent news,

Pei it would be great to have a new runner. I am curious about how
different are the implementations of storm among them considering that
there are already three 'versions': Storm, Jstorm and Heron, I wonder
if one runner could traduce to an API that would cover all of them (of
course maybe I am super naive I really don't know much about JStorm or
Heron and how much they differ from the original storm).

Jingson, I am super curious about this Galaxy project, it is there any
public information about it? is this related to the previous blink ali
baba project? I already looked a bit but searching "Ali baba galaxy"
is a recipe for a myriad of telephone sellers :)

Nice to see that you are going to keep contributing to the project Pei.

Regards,
Ismaël



On Sat, Apr 1, 2017 at 4:59 PM, Tibor Kiss <ti...@gmail.com> wrote:
> Exciting times, looking forward to try it out!
>
> I shall mention that Taylor Goetz also started creating a BEAM runner using
> Storm.
> His work is available in the storm repo:
> https://github.com/apache/storm/commits/beam-runner
> Maybe it's worth while to take a peek and see if something is reusable from
> there.
>
> - Tibor
>
> On Sat, Apr 1, 2017 at 4:37 AM, JingsongLee <lz...@aliyun.com> wrote:
>
>> Wow, very glad to see JStorm also started building BeamRunner.
>> I am working in Galaxy (Another streaming process engine in Alibaba).
>> I hope that we can work together to promote the use of Apache Beam
>> in Alibaba and China.
>>
>> best,
>> JingsongLee
>> ------------------------------------------------------------------From:Pei
>> HE <pe...@gmail.com>Time:2017 Apr 1 (Sat) 09:24To:dev <
>> dev@beam.apache.org>Subject:Update of Pei in Alibaba
>> Hi all,
>> On February, I moved from Seattle to Hangzhou, China, and joined Alibaba.
>> And, I want to give an update of things in here.
>>
>> A colleague and I have been working on JStorm
>> <https://github.com/alibaba/jstorm> runner. We have a prototype that works
>> with WordCount and PAssert. (I am going to start a separate email thread
>> about how to get it reviewed and merged in Apache Beam.)
>> We also have Spark clusters, and are very interested in
>> using Spark runner.
>>
>> Last Saturday, I went to China Hadoop Summit, and gave a talk about Apache
>> Beam model. While many companies gave talks of their
>> in-house solutions for
>> unified batch&streaming and unified SQL, there are also lots of interests
>> and enthusiasts of Beam.
>>
>> Looking forward to chat more.
>> --
>> Pei
>>
>>
>
>
> --
> Kiss Tibor
>
> +36 70 275 9863
> tibor.kiss@gmail.com

Re: Update of Pei in Alibaba

Posted by Ismaël Mejía <ie...@gmail.com>.
Excellent news,

Pei it would be great to have a new runner. I am curious about how
different are the implementations of storm among them considering that
there are already three 'versions': Storm, Jstorm and Heron, I wonder
if one runner could traduce to an API that would cover all of them (of
course maybe I am super naive I really don't know much about JStorm or
Heron and how much they differ from the original storm).

Jingson, I am super curious about this Galaxy project, it is there any
public information about it? is this related to the previous blink ali
baba project? I already looked a bit but searching "Ali baba galaxy"
is a recipe for a myriad of telephone sellers :)

Nice to see that you are going to keep contributing to the project Pei.

Regards,
Ismaël



On Sat, Apr 1, 2017 at 4:59 PM, Tibor Kiss <ti...@gmail.com> wrote:
> Exciting times, looking forward to try it out!
>
> I shall mention that Taylor Goetz also started creating a BEAM runner using
> Storm.
> His work is available in the storm repo:
> https://github.com/apache/storm/commits/beam-runner
> Maybe it's worth while to take a peek and see if something is reusable from
> there.
>
> - Tibor
>
> On Sat, Apr 1, 2017 at 4:37 AM, JingsongLee <lz...@aliyun.com> wrote:
>
>> Wow, very glad to see JStorm also started building BeamRunner.
>> I am working in Galaxy (Another streaming process engine in Alibaba).
>> I hope that we can work together to promote the use of Apache Beam
>> in Alibaba and China.
>>
>> best,
>> JingsongLee
>> ------------------------------------------------------------------From:Pei
>> HE <pe...@gmail.com>Time:2017 Apr 1 (Sat) 09:24To:dev <
>> dev@beam.apache.org>Subject:Update of Pei in Alibaba
>> Hi all,
>> On February, I moved from Seattle to Hangzhou, China, and joined Alibaba.
>> And, I want to give an update of things in here.
>>
>> A colleague and I have been working on JStorm
>> <https://github.com/alibaba/jstorm> runner. We have a prototype that works
>> with WordCount and PAssert. (I am going to start a separate email thread
>> about how to get it reviewed and merged in Apache Beam.)
>> We also have Spark clusters, and are very interested in
>> using Spark runner.
>>
>> Last Saturday, I went to China Hadoop Summit, and gave a talk about Apache
>> Beam model. While many companies gave talks of their
>> in-house solutions for
>> unified batch&streaming and unified SQL, there are also lots of interests
>> and enthusiasts of Beam.
>>
>> Looking forward to chat more.
>> --
>> Pei
>>
>>
>
>
> --
> Kiss Tibor
>
> +36 70 275 9863
> tibor.kiss@gmail.com

Re: Update of Pei in Alibaba

Posted by Tibor Kiss <ti...@gmail.com>.
Exciting times, looking forward to try it out!

I shall mention that Taylor Goetz also started creating a BEAM runner using
Storm.
His work is available in the storm repo:
https://github.com/apache/storm/commits/beam-runner
Maybe it's worth while to take a peek and see if something is reusable from
there.

- Tibor

On Sat, Apr 1, 2017 at 4:37 AM, JingsongLee <lz...@aliyun.com> wrote:

> Wow, very glad to see JStorm also started building BeamRunner.
> I am working in Galaxy (Another streaming process engine in Alibaba).
> I hope that we can work together to promote the use of Apache Beam
> in Alibaba and China.
>
> best,
> JingsongLee
> ------------------------------------------------------------------From:Pei
> HE <pe...@gmail.com>Time:2017 Apr 1 (Sat) 09:24To:dev <
> dev@beam.apache.org>Subject:Update of Pei in Alibaba
> Hi all,
> On February, I moved from Seattle to Hangzhou, China, and joined Alibaba.
> And, I want to give an update of things in here.
>
> A colleague and I have been working on JStorm
> <https://github.com/alibaba/jstorm> runner. We have a prototype that works
> with WordCount and PAssert. (I am going to start a separate email thread
> about how to get it reviewed and merged in Apache Beam.)
> We also have Spark clusters, and are very interested in
> using Spark runner.
>
> Last Saturday, I went to China Hadoop Summit, and gave a talk about Apache
> Beam model. While many companies gave talks of their
> in-house solutions for
> unified batch&streaming and unified SQL, there are also lots of interests
> and enthusiasts of Beam.
>
> Looking forward to chat more.
> --
> Pei
>
>


-- 
Kiss Tibor

+36 70 275 9863
tibor.kiss@gmail.com

Re: Update of Pei in Alibaba

Posted by JingsongLee <lz...@aliyun.com>.
Wow, very glad to see JStorm also started building BeamRunner.
I am working in Galaxy (Another streaming process engine in Alibaba). 
I hope that we can work together to promote the use of Apache Beam in Alibaba and China.

best,
JingsongLee
------------------------------------------------------------------From:Pei HE <pe...@gmail.com>Time:2017 Apr 1 (Sat) 09:24To:dev <de...@beam.apache.org>Subject:Update of Pei in Alibaba
Hi all,
On February, I moved from Seattle to Hangzhou, China, and joined Alibaba.
And, I want to give an update of things in here.

A colleague and I have been working on JStorm
<https://github.com/alibaba/jstorm> runner. We have a prototype that works
with WordCount and PAssert. (I am going to start a separate email thread
about how to get it reviewed and merged in Apache Beam.)
We also have Spark clusters, and are very interested in using Spark runner.

Last Saturday, I went to China Hadoop Summit, and gave a talk about Apache
Beam model. While many companies gave talks of their in-house solutions for
unified batch&streaming and unified SQL, there are also lots of interests
and enthusiasts of Beam.

Looking forward to chat more.
--
Pei


Re: Update of Pei in Alibaba

Posted by Kenneth Knowles <kl...@google.com.INVALID>.
Nice to hear from you again, Pei!

This is awesome news. I'd love to help when you are ready to get it in the
repo and hooked up to our testing infrastructure.

Kenn

On Fri, Mar 31, 2017 at 6:24 PM, Pei HE <pe...@gmail.com> wrote:

> Hi all,
> On February, I moved from Seattle to Hangzhou, China, and joined Alibaba.
> And, I want to give an update of things in here.
>
> A colleague and I have been working on JStorm
> <https://github.com/alibaba/jstorm> runner. We have a prototype that works
> with WordCount and PAssert. (I am going to start a separate email thread
> about how to get it reviewed and merged in Apache Beam.)
> We also have Spark clusters, and are very interested in using Spark runner.
>
> Last Saturday, I went to China Hadoop Summit, and gave a talk about Apache
> Beam model. While many companies gave talks of their in-house solutions for
> unified batch&streaming and unified SQL, there are also lots of interests
> and enthusiasts of Beam.
>
> Looking forward to chat more.
> --
> Pei
>