You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@beam.apache.org by "P. Taylor Goetz" <pt...@apache.org> on 2017/04/10 17:47:41 UTC

Apache Storm/JStorm Runner(s) for Apache Beam

Note: cross-posting to dev@beam and dev@storm

I’ve seen at least two threads on the dev@ list discussing the JStorm runner and my hope is we can expand on that discussion and cross-pollinate with the Storm/JStorm/Beam communities as well.

A while back I created a very preliminary proof of concept of getting a Storm Beam runner working [1]. That was mainly an exercise for me to familiarize myself with the Beam API and discover what it would take to develop a Beam runner on top of Storm. That code is way out of date (I was targeting Beam’s HEAD before the 0.2.0 release, and a lot of changes have since taken place) and didn’t really work as Jian Liu pointed out. It was a start, that perhaps could be further built upon, or parts harvested, etc. I don’t have any particular attachment to that code and wouldn’t be upset if it were completely discarded in favor of a better or more extensible implementation.

What I would like to see, and I think this is a great opportunity to do so, is a closer collaboration between the Apache Storm and JStorm communities. For those who aren’t familiar with those projects’ relationship, I’ll start with a little history…

JStorm began at Alibaba as a fork of Storm (pre-Apache?) with Storm’s Clojure code reimplemented in Java. The rationale behind that move was that Alibaba had a large number of Java developers but very few who were proficient with Clojure. Moving to pure Java made sense as it would expand the base of potential contributors.

In late 2015 Alibaba donated the JStorm codebase to the Apache Storm project, and the Apache Storm PMC committed to converting its Clojure code to Java in order to incorporate the code donation. At the time there was one catch — Apache Storm had implemented comprehensive security features such as Kerberos authentication/authorization and multi-tenancy in its Clojure code, which greatly complicated the move to Java and incorporation of the JStorm code. JStorm did not have the same security features. A number of JStorm developers have also become Storm PMC members.

Fast forward to today. The Storm community has completed the bulk of the move to Java and the next major release (presumably 2.0, which is currently under discussion) will be largely Java-based. We are now in a much better position to begin incorporating JStorm’s features, as well as implementing new features necessary to support the Beam API (such as support for bounded pipelines, among other features).

Having separate Apache Storm and JStorm beam runner implementations doesn’t feel appropriate in my personal opinion, especially since both projects have expressed an ongoing commitment to bringing JStorm’s additional features, and just as important, community, to Apache Storm.

One final note, when the Storm community initially discussed developing a Beam runner, the general consensus was do so within the Storm repository. My current thinking is that such an effort should take place within the Beam community, not only since that is the development pattern followed by other runner implementations (Flink, Apex, etc.), but also because it would serve to increase collaboration between Apache projects (always a good thing!).

I would love to hear opinions from others in the Storm/JStorm/Beam communities.

-Taylor

Re: Apache Storm/JStorm Runner(s) for Apache Beam

Posted by Kenneth Knowles <kl...@google.com.INVALID>.
Hi Taylor,

Thanks immensely for taking the time to write such rich detail. I have a
lot to learn about the relationship between Storm and JStorm as software
and as communities.

Your final note I can immediately agree with and reinforce. The fruits of
this endeavor should reside in the Beam repository. It is good for all the
reasons you mention. Even more specifically: any runner gains tremendous
benefit from our automated testing, both for achieving maturity and for not
getting broken as the project evolves.

Kenn

On Mon, Apr 10, 2017 at 10:47 AM, P. Taylor Goetz <pt...@apache.org>
wrote:

> Note: cross-posting to dev@beam and dev@storm
>
> I’ve seen at least two threads on the dev@ list discussing the JStorm
> runner and my hope is we can expand on that discussion and cross-pollinate
> with the Storm/JStorm/Beam communities as well.
>
> A while back I created a very preliminary proof of concept of getting a
> Storm Beam runner working [1]. That was mainly an exercise for me to
> familiarize myself with the Beam API and discover what it would take to
> develop a Beam runner on top of Storm. That code is way out of date (I was
> targeting Beam’s HEAD before the 0.2.0 release, and a lot of changes have
> since taken place) and didn’t really work as Jian Liu pointed out. It was a
> start, that perhaps could be further built upon, or parts harvested, etc. I
> don’t have any particular attachment to that code and wouldn’t be upset if
> it were completely discarded in favor of a better or more extensible
> implementation.
>
> What I would like to see, and I think this is a great opportunity to do
> so, is a closer collaboration between the Apache Storm and JStorm
> communities. For those who aren’t familiar with those projects’
> relationship, I’ll start with a little history…
>
> JStorm began at Alibaba as a fork of Storm (pre-Apache?) with Storm’s
> Clojure code reimplemented in Java. The rationale behind that move was that
> Alibaba had a large number of Java developers but very few who were
> proficient with Clojure. Moving to pure Java made sense as it would expand
> the base of potential contributors.
>
> In late 2015 Alibaba donated the JStorm codebase to the Apache Storm
> project, and the Apache Storm PMC committed to converting its Clojure code
> to Java in order to incorporate the code donation. At the time there was
> one catch — Apache Storm had implemented comprehensive security features
> such as Kerberos authentication/authorization and multi-tenancy in its
> Clojure code, which greatly complicated the move to Java and incorporation
> of the JStorm code. JStorm did not have the same security features. A
> number of JStorm developers have also become Storm PMC members.
>
> Fast forward to today. The Storm community has completed the bulk of the
> move to Java and the next major release (presumably 2.0, which is currently
> under discussion) will be largely Java-based. We are now in a much better
> position to begin incorporating JStorm’s features, as well as implementing
> new features necessary to support the Beam API (such as support for bounded
> pipelines, among other features).
>
> Having separate Apache Storm and JStorm beam runner implementations
> doesn’t feel appropriate in my personal opinion, especially since both
> projects have expressed an ongoing commitment to bringing JStorm’s
> additional features, and just as important, community, to Apache Storm.
>
> One final note, when the Storm community initially discussed developing a
> Beam runner, the general consensus was do so within the Storm repository.
> My current thinking is that such an effort should take place within the
> Beam community, not only since that is the development pattern followed by
> other runner implementations (Flink, Apex, etc.), but also because it would
> serve to increase collaboration between Apache projects (always a good
> thing!).
>
> I would love to hear opinions from others in the Storm/JStorm/Beam
> communities.
>
> -Taylor

Re: Apache Storm/JStorm Runner(s) for Apache Beam

Posted by Pei HE <pe...@gmail.com>.
Hi Taylor,
I am very glad to see the interests in pushing forward Beam Storm runner.

However, I cannot convince myself the benefits of having one runner to
support all.

Beam have three types of users: pipeline writers, library writers, and
runner implementers.

I can see pros vs cons as followings:
Pros:
1. For pipeline writers and library writers, I don't see any benefits
because they are using Beam API directly.
2. For runner implementers: (I am not that familiar with the current
similarities and differences of Storm and JStorm, maybe you can help me to
fill it in.)

Cons:
For pipeline writers and library writers:
1. It means delay of the delivery. We already have a working prototype, and
there are lots of JStorm users eagerly want a JStorm API.
2. "One runner to support all" may increase the complexity, and
compromise the quality of the runner.

From my point of view, cons are clearly over pros unless I am missing
something.

Let's me know what you think.
Thanks
--
Pei


On Tue, Apr 11, 2017 at 1:47 AM, P. Taylor Goetz <pt...@apache.org> wrote:

> Note: cross-posting to dev@beam and dev@storm
>
> I’ve seen at least two threads on the dev@ list discussing the JStorm
> runner and my hope is we can expand on that discussion and cross-pollinate
> with the Storm/JStorm/Beam communities as well.
>
> A while back I created a very preliminary proof of concept of getting a
> Storm Beam runner working [1]. That was mainly an exercise for me to
> familiarize myself with the Beam API and discover what it would take to
> develop a Beam runner on top of Storm. That code is way out of date (I was
> targeting Beam’s HEAD before the 0.2.0 release, and a lot of changes have
> since taken place) and didn’t really work as Jian Liu pointed out. It was a
> start, that perhaps could be further built upon, or parts harvested, etc. I
> don’t have any particular attachment to that code and wouldn’t be upset if
> it were completely discarded in favor of a better or more extensible
> implementation.
>
> What I would like to see, and I think this is a great opportunity to do
> so, is a closer collaboration between the Apache Storm and JStorm
> communities. For those who aren’t familiar with those projects’
> relationship, I’ll start with a little history…
>
> JStorm began at Alibaba as a fork of Storm (pre-Apache?) with Storm’s
> Clojure code reimplemented in Java. The rationale behind that move was that
> Alibaba had a large number of Java developers but very few who were
> proficient with Clojure. Moving to pure Java made sense as it would expand
> the base of potential contributors.
>
> In late 2015 Alibaba donated the JStorm codebase to the Apache Storm
> project, and the Apache Storm PMC committed to converting its Clojure code
> to Java in order to incorporate the code donation. At the time there was
> one catch — Apache Storm had implemented comprehensive security features
> such as Kerberos authentication/authorization and multi-tenancy in its
> Clojure code, which greatly complicated the move to Java and incorporation
> of the JStorm code. JStorm did not have the same security features. A
> number of JStorm developers have also become Storm PMC members.
>
> Fast forward to today. The Storm community has completed the bulk of the
> move to Java and the next major release (presumably 2.0, which is currently
> under discussion) will be largely Java-based. We are now in a much better
> position to begin incorporating JStorm’s features, as well as implementing
> new features necessary to support the Beam API (such as support for bounded
> pipelines, among other features).
>
> Having separate Apache Storm and JStorm beam runner implementations
> doesn’t feel appropriate in my personal opinion, especially since both
> projects have expressed an ongoing commitment to bringing JStorm’s
> additional features, and just as important, community, to Apache Storm.
>
> One final note, when the Storm community initially discussed developing a
> Beam runner, the general consensus was do so within the Storm repository.
> My current thinking is that such an effort should take place within the
> Beam community, not only since that is the development pattern followed by
> other runner implementations (Flink, Apex, etc.), but also because it would
> serve to increase collaboration between Apache projects (always a good
> thing!).
>
> I would love to hear opinions from others in the Storm/JStorm/Beam
> communities.
>
> -Taylor

Re: Apache Storm/JStorm Runner(s) for Apache Beam

Posted by Kenneth Knowles <kl...@google.com.INVALID>.
Hi Taylor,

Thanks immensely for taking the time to write such rich detail. I have a
lot to learn about the relationship between Storm and JStorm as software
and as communities.

Your final note I can immediately agree with and reinforce. The fruits of
this endeavor should reside in the Beam repository. It is good for all the
reasons you mention. Even more specifically: any runner gains tremendous
benefit from our automated testing, both for achieving maturity and for not
getting broken as the project evolves.

Kenn

On Mon, Apr 10, 2017 at 10:47 AM, P. Taylor Goetz <pt...@apache.org>
wrote:

> Note: cross-posting to dev@beam and dev@storm
>
> I’ve seen at least two threads on the dev@ list discussing the JStorm
> runner and my hope is we can expand on that discussion and cross-pollinate
> with the Storm/JStorm/Beam communities as well.
>
> A while back I created a very preliminary proof of concept of getting a
> Storm Beam runner working [1]. That was mainly an exercise for me to
> familiarize myself with the Beam API and discover what it would take to
> develop a Beam runner on top of Storm. That code is way out of date (I was
> targeting Beam’s HEAD before the 0.2.0 release, and a lot of changes have
> since taken place) and didn’t really work as Jian Liu pointed out. It was a
> start, that perhaps could be further built upon, or parts harvested, etc. I
> don’t have any particular attachment to that code and wouldn’t be upset if
> it were completely discarded in favor of a better or more extensible
> implementation.
>
> What I would like to see, and I think this is a great opportunity to do
> so, is a closer collaboration between the Apache Storm and JStorm
> communities. For those who aren’t familiar with those projects’
> relationship, I’ll start with a little history…
>
> JStorm began at Alibaba as a fork of Storm (pre-Apache?) with Storm’s
> Clojure code reimplemented in Java. The rationale behind that move was that
> Alibaba had a large number of Java developers but very few who were
> proficient with Clojure. Moving to pure Java made sense as it would expand
> the base of potential contributors.
>
> In late 2015 Alibaba donated the JStorm codebase to the Apache Storm
> project, and the Apache Storm PMC committed to converting its Clojure code
> to Java in order to incorporate the code donation. At the time there was
> one catch — Apache Storm had implemented comprehensive security features
> such as Kerberos authentication/authorization and multi-tenancy in its
> Clojure code, which greatly complicated the move to Java and incorporation
> of the JStorm code. JStorm did not have the same security features. A
> number of JStorm developers have also become Storm PMC members.
>
> Fast forward to today. The Storm community has completed the bulk of the
> move to Java and the next major release (presumably 2.0, which is currently
> under discussion) will be largely Java-based. We are now in a much better
> position to begin incorporating JStorm’s features, as well as implementing
> new features necessary to support the Beam API (such as support for bounded
> pipelines, among other features).
>
> Having separate Apache Storm and JStorm beam runner implementations
> doesn’t feel appropriate in my personal opinion, especially since both
> projects have expressed an ongoing commitment to bringing JStorm’s
> additional features, and just as important, community, to Apache Storm.
>
> One final note, when the Storm community initially discussed developing a
> Beam runner, the general consensus was do so within the Storm repository.
> My current thinking is that such an effort should take place within the
> Beam community, not only since that is the development pattern followed by
> other runner implementations (Flink, Apex, etc.), but also because it would
> serve to increase collaboration between Apache projects (always a good
> thing!).
>
> I would love to hear opinions from others in the Storm/JStorm/Beam
> communities.
>
> -Taylor

Re: Apache Storm/JStorm Runner(s) for Apache Beam

Posted by Davor Bonaci <da...@apache.org>.
This is a great discussion; thanks everyone.

From my perspective, the functionality to execute pipelines on both Storm
and JStorm is very welcome and a big step forward for Beam.

I'm not an expert on the Storm/JStorm differences, but one vs. two runners
discussion sounds more like a packaging / experience question than a deep
technical problem. I'm guessing that a fair amount of functionality can be
shared between both runners -- it would be a shame not to share these
parts. Even further, with a possible merger on the horizon, it would make
sense to plan and design for an (eventual) unified solution.

In the meanwhile, whether the functionality is released in one runner (and,
say, controlled with a flag) or two runners (without a flag) sounds minor.
But, it would be great if the code that can be shared -- be written in a
way that is shareable.

On Tue, Apr 11, 2017 at 10:35 PM, 刘键(Basti Liu) <ba...@alibaba-inc.com>
wrote:

> Hi Taylor,
>
> It is glad to see your opinion.
> After the open source of Beam, there are a lot of interests in Beam from
> our internal users in Alibaba and other companies in China, which promotes
> us to provide the support of JStorm runner. But since the implementation of
> Storm runner is out of date, and over the past year many new features or
> different solution(especially for exactly once and state) were introduced
> in JStorm, we have to start the separate development of JStorm runner.
> Currently, we have finished a prototype(support most PTransforms, window
> and trigger of Beam) as Pei mentioned in another email, and the full
> testing is still on-going. Some users has built up their trial topology on
> it in Alibaba. But for further improvement, we still need the help of
> review from Beam community to ensure the correctness, and get notification
> of any broken or un-compatible update of Beam evolves. That is the reason
> why we decide to commit JStorm runner into Beam repository.
>
> For personal understanding, the JStorm runner is not a duplicated effort.
> The major part of JStorm runner is probably reused in Storm. Some other
> parts like exactly once and state needs a propagation. When Storm community
> plan to restart the development of Storm runner, we'd like to help on this,
> as a part of merging JStorm features planned before. At that time, we can
> discuss whether merging JStorm feature or propagation is required.
> Looking forward to the better collaboration between Beam, Storm and JStorm.
>
> Regards
> Jian Liu(Basti)
>
> -----Original Message-----
> From: P. Taylor Goetz [mailto:ptgoetz@apache.org]
> Sent: Tuesday, April 11, 2017 1:48 AM
> To: dev@beam.apache.org; dev@storm.apache.org
> Subject: Apache Storm/JStorm Runner(s) for Apache Beam
>
> Note: cross-posting to dev@beam and dev@storm
>
> I’ve seen at least two threads on the dev@ list discussing the JStorm
> runner and my hope is we can expand on that discussion and cross-pollinate
> with the Storm/JStorm/Beam communities as well.
>
> A while back I created a very preliminary proof of concept of getting a
> Storm Beam runner working [1]. That was mainly an exercise for me to
> familiarize myself with the Beam API and discover what it would take to
> develop a Beam runner on top of Storm. That code is way out of date (I was
> targeting Beam’s HEAD before the 0.2.0 release, and a lot of changes have
> since taken place) and didn’t really work as Jian Liu pointed out. It was a
> start, that perhaps could be further built upon, or parts harvested, etc. I
> don’t have any particular attachment to that code and wouldn’t be upset if
> it were completely discarded in favor of a better or more extensible
> implementation.
>
> What I would like to see, and I think this is a great opportunity to do
> so, is a closer collaboration between the Apache Storm and JStorm
> communities. For those who aren’t familiar with those projects’
> relationship, I’ll start with a little history…
>
> JStorm began at Alibaba as a fork of Storm (pre-Apache?) with Storm’s
> Clojure code reimplemented in Java. The rationale behind that move was that
> Alibaba had a large number of Java developers but very few who were
> proficient with Clojure. Moving to pure Java made sense as it would expand
> the base of potential contributors.
>
> In late 2015 Alibaba donated the JStorm codebase to the Apache Storm
> project, and the Apache Storm PMC committed to converting its Clojure code
> to Java in order to incorporate the code donation. At the time there was
> one catch — Apache Storm had implemented comprehensive security features
> such as Kerberos authentication/authorization and multi-tenancy in its
> Clojure code, which greatly complicated the move to Java and incorporation
> of the JStorm code. JStorm did not have the same security features. A
> number of JStorm developers have also become Storm PMC members.
>
> Fast forward to today. The Storm community has completed the bulk of the
> move to Java and the next major release (presumably 2.0, which is currently
> under discussion) will be largely Java-based. We are now in a much better
> position to begin incorporating JStorm’s features, as well as implementing
> new features necessary to support the Beam API (such as support for bounded
> pipelines, among other features).
>
> Having separate Apache Storm and JStorm beam runner implementations
> doesn’t feel appropriate in my personal opinion, especially since both
> projects have expressed an ongoing commitment to bringing JStorm’s
> additional features, and just as important, community, to Apache Storm.
>
> One final note, when the Storm community initially discussed developing a
> Beam runner, the general consensus was do so within the Storm repository.
> My current thinking is that such an effort should take place within the
> Beam community, not only since that is the development pattern followed by
> other runner implementations (Flink, Apex, etc.), but also because it would
> serve to increase collaboration between Apache projects (always a good
> thing!).
>
> I would love to hear opinions from others in the Storm/JStorm/Beam
> communities.
>
> -Taylor=
>
>

Re: Apache Storm/JStorm Runner(s) for Apache Beam

Posted by Davor Bonaci <da...@apache.org>.
This is a great discussion; thanks everyone.

From my perspective, the functionality to execute pipelines on both Storm
and JStorm is very welcome and a big step forward for Beam.

I'm not an expert on the Storm/JStorm differences, but one vs. two runners
discussion sounds more like a packaging / experience question than a deep
technical problem. I'm guessing that a fair amount of functionality can be
shared between both runners -- it would be a shame not to share these
parts. Even further, with a possible merger on the horizon, it would make
sense to plan and design for an (eventual) unified solution.

In the meanwhile, whether the functionality is released in one runner (and,
say, controlled with a flag) or two runners (without a flag) sounds minor.
But, it would be great if the code that can be shared -- be written in a
way that is shareable.

On Tue, Apr 11, 2017 at 10:35 PM, 刘键(Basti Liu) <ba...@alibaba-inc.com>
wrote:

> Hi Taylor,
>
> It is glad to see your opinion.
> After the open source of Beam, there are a lot of interests in Beam from
> our internal users in Alibaba and other companies in China, which promotes
> us to provide the support of JStorm runner. But since the implementation of
> Storm runner is out of date, and over the past year many new features or
> different solution(especially for exactly once and state) were introduced
> in JStorm, we have to start the separate development of JStorm runner.
> Currently, we have finished a prototype(support most PTransforms, window
> and trigger of Beam) as Pei mentioned in another email, and the full
> testing is still on-going. Some users has built up their trial topology on
> it in Alibaba. But for further improvement, we still need the help of
> review from Beam community to ensure the correctness, and get notification
> of any broken or un-compatible update of Beam evolves. That is the reason
> why we decide to commit JStorm runner into Beam repository.
>
> For personal understanding, the JStorm runner is not a duplicated effort.
> The major part of JStorm runner is probably reused in Storm. Some other
> parts like exactly once and state needs a propagation. When Storm community
> plan to restart the development of Storm runner, we'd like to help on this,
> as a part of merging JStorm features planned before. At that time, we can
> discuss whether merging JStorm feature or propagation is required.
> Looking forward to the better collaboration between Beam, Storm and JStorm.
>
> Regards
> Jian Liu(Basti)
>
> -----Original Message-----
> From: P. Taylor Goetz [mailto:ptgoetz@apache.org]
> Sent: Tuesday, April 11, 2017 1:48 AM
> To: dev@beam.apache.org; dev@storm.apache.org
> Subject: Apache Storm/JStorm Runner(s) for Apache Beam
>
> Note: cross-posting to dev@beam and dev@storm
>
> I’ve seen at least two threads on the dev@ list discussing the JStorm
> runner and my hope is we can expand on that discussion and cross-pollinate
> with the Storm/JStorm/Beam communities as well.
>
> A while back I created a very preliminary proof of concept of getting a
> Storm Beam runner working [1]. That was mainly an exercise for me to
> familiarize myself with the Beam API and discover what it would take to
> develop a Beam runner on top of Storm. That code is way out of date (I was
> targeting Beam’s HEAD before the 0.2.0 release, and a lot of changes have
> since taken place) and didn’t really work as Jian Liu pointed out. It was a
> start, that perhaps could be further built upon, or parts harvested, etc. I
> don’t have any particular attachment to that code and wouldn’t be upset if
> it were completely discarded in favor of a better or more extensible
> implementation.
>
> What I would like to see, and I think this is a great opportunity to do
> so, is a closer collaboration between the Apache Storm and JStorm
> communities. For those who aren’t familiar with those projects’
> relationship, I’ll start with a little history…
>
> JStorm began at Alibaba as a fork of Storm (pre-Apache?) with Storm’s
> Clojure code reimplemented in Java. The rationale behind that move was that
> Alibaba had a large number of Java developers but very few who were
> proficient with Clojure. Moving to pure Java made sense as it would expand
> the base of potential contributors.
>
> In late 2015 Alibaba donated the JStorm codebase to the Apache Storm
> project, and the Apache Storm PMC committed to converting its Clojure code
> to Java in order to incorporate the code donation. At the time there was
> one catch — Apache Storm had implemented comprehensive security features
> such as Kerberos authentication/authorization and multi-tenancy in its
> Clojure code, which greatly complicated the move to Java and incorporation
> of the JStorm code. JStorm did not have the same security features. A
> number of JStorm developers have also become Storm PMC members.
>
> Fast forward to today. The Storm community has completed the bulk of the
> move to Java and the next major release (presumably 2.0, which is currently
> under discussion) will be largely Java-based. We are now in a much better
> position to begin incorporating JStorm’s features, as well as implementing
> new features necessary to support the Beam API (such as support for bounded
> pipelines, among other features).
>
> Having separate Apache Storm and JStorm beam runner implementations
> doesn’t feel appropriate in my personal opinion, especially since both
> projects have expressed an ongoing commitment to bringing JStorm’s
> additional features, and just as important, community, to Apache Storm.
>
> One final note, when the Storm community initially discussed developing a
> Beam runner, the general consensus was do so within the Storm repository.
> My current thinking is that such an effort should take place within the
> Beam community, not only since that is the development pattern followed by
> other runner implementations (Flink, Apex, etc.), but also because it would
> serve to increase collaboration between Apache projects (always a good
> thing!).
>
> I would love to hear opinions from others in the Storm/JStorm/Beam
> communities.
>
> -Taylor=
>
>

RE: Apache Storm/JStorm Runner(s) for Apache Beam

Posted by "刘键(Basti Liu)" <ba...@alibaba-inc.com>.
Hi Taylor,

It is glad to see your opinion. 
After the open source of Beam, there are a lot of interests in Beam from our internal users in Alibaba and other companies in China, which promotes us to provide the support of JStorm runner. But since the implementation of Storm runner is out of date, and over the past year many new features or different solution(especially for exactly once and state) were introduced in JStorm, we have to start the separate development of JStorm runner. 
Currently, we have finished a prototype(support most PTransforms, window and trigger of Beam) as Pei mentioned in another email, and the full testing is still on-going. Some users has built up their trial topology on it in Alibaba. But for further improvement, we still need the help of review from Beam community to ensure the correctness, and get notification of any broken or un-compatible update of Beam evolves. That is the reason why we decide to commit JStorm runner into Beam repository.

For personal understanding, the JStorm runner is not a duplicated effort. The major part of JStorm runner is probably reused in Storm. Some other parts like exactly once and state needs a propagation. When Storm community plan to restart the development of Storm runner, we'd like to help on this, as a part of merging JStorm features planned before. At that time, we can discuss whether merging JStorm feature or propagation is required.
Looking forward to the better collaboration between Beam, Storm and JStorm.

Regards
Jian Liu(Basti)

-----Original Message-----
From: P. Taylor Goetz [mailto:ptgoetz@apache.org] 
Sent: Tuesday, April 11, 2017 1:48 AM
To: dev@beam.apache.org; dev@storm.apache.org
Subject: Apache Storm/JStorm Runner(s) for Apache Beam

Note: cross-posting to dev@beam and dev@storm

I’ve seen at least two threads on the dev@ list discussing the JStorm runner and my hope is we can expand on that discussion and cross-pollinate with the Storm/JStorm/Beam communities as well.

A while back I created a very preliminary proof of concept of getting a Storm Beam runner working [1]. That was mainly an exercise for me to familiarize myself with the Beam API and discover what it would take to develop a Beam runner on top of Storm. That code is way out of date (I was targeting Beam’s HEAD before the 0.2.0 release, and a lot of changes have since taken place) and didn’t really work as Jian Liu pointed out. It was a start, that perhaps could be further built upon, or parts harvested, etc. I don’t have any particular attachment to that code and wouldn’t be upset if it were completely discarded in favor of a better or more extensible implementation.

What I would like to see, and I think this is a great opportunity to do so, is a closer collaboration between the Apache Storm and JStorm communities. For those who aren’t familiar with those projects’ relationship, I’ll start with a little history…

JStorm began at Alibaba as a fork of Storm (pre-Apache?) with Storm’s Clojure code reimplemented in Java. The rationale behind that move was that Alibaba had a large number of Java developers but very few who were proficient with Clojure. Moving to pure Java made sense as it would expand the base of potential contributors.

In late 2015 Alibaba donated the JStorm codebase to the Apache Storm project, and the Apache Storm PMC committed to converting its Clojure code to Java in order to incorporate the code donation. At the time there was one catch — Apache Storm had implemented comprehensive security features such as Kerberos authentication/authorization and multi-tenancy in its Clojure code, which greatly complicated the move to Java and incorporation of the JStorm code. JStorm did not have the same security features. A number of JStorm developers have also become Storm PMC members.

Fast forward to today. The Storm community has completed the bulk of the move to Java and the next major release (presumably 2.0, which is currently under discussion) will be largely Java-based. We are now in a much better position to begin incorporating JStorm’s features, as well as implementing new features necessary to support the Beam API (such as support for bounded pipelines, among other features).

Having separate Apache Storm and JStorm beam runner implementations doesn’t feel appropriate in my personal opinion, especially since both projects have expressed an ongoing commitment to bringing JStorm’s additional features, and just as important, community, to Apache Storm.

One final note, when the Storm community initially discussed developing a Beam runner, the general consensus was do so within the Storm repository. My current thinking is that such an effort should take place within the Beam community, not only since that is the development pattern followed by other runner implementations (Flink, Apex, etc.), but also because it would serve to increase collaboration between Apache projects (always a good thing!).

I would love to hear opinions from others in the Storm/JStorm/Beam communities.

-Taylor=


RE: Apache Storm/JStorm Runner(s) for Apache Beam

Posted by "刘键(Basti Liu)" <ba...@alibaba-inc.com>.
Hi Taylor,

It is glad to see your opinion. 
After the open source of Beam, there are a lot of interests in Beam from our internal users in Alibaba and other companies in China, which promotes us to provide the support of JStorm runner. But since the implementation of Storm runner is out of date, and over the past year many new features or different solution(especially for exactly once and state) were introduced in JStorm, we have to start the separate development of JStorm runner. 
Currently, we have finished a prototype(support most PTransforms, window and trigger of Beam) as Pei mentioned in another email, and the full testing is still on-going. Some users has built up their trial topology on it in Alibaba. But for further improvement, we still need the help of review from Beam community to ensure the correctness, and get notification of any broken or un-compatible update of Beam evolves. That is the reason why we decide to commit JStorm runner into Beam repository.

For personal understanding, the JStorm runner is not a duplicated effort. The major part of JStorm runner is probably reused in Storm. Some other parts like exactly once and state needs a propagation. When Storm community plan to restart the development of Storm runner, we'd like to help on this, as a part of merging JStorm features planned before. At that time, we can discuss whether merging JStorm feature or propagation is required.
Looking forward to the better collaboration between Beam, Storm and JStorm.

Regards
Jian Liu(Basti)

-----Original Message-----
From: P. Taylor Goetz [mailto:ptgoetz@apache.org] 
Sent: Tuesday, April 11, 2017 1:48 AM
To: dev@beam.apache.org; dev@storm.apache.org
Subject: Apache Storm/JStorm Runner(s) for Apache Beam

Note: cross-posting to dev@beam and dev@storm

I’ve seen at least two threads on the dev@ list discussing the JStorm runner and my hope is we can expand on that discussion and cross-pollinate with the Storm/JStorm/Beam communities as well.

A while back I created a very preliminary proof of concept of getting a Storm Beam runner working [1]. That was mainly an exercise for me to familiarize myself with the Beam API and discover what it would take to develop a Beam runner on top of Storm. That code is way out of date (I was targeting Beam’s HEAD before the 0.2.0 release, and a lot of changes have since taken place) and didn’t really work as Jian Liu pointed out. It was a start, that perhaps could be further built upon, or parts harvested, etc. I don’t have any particular attachment to that code and wouldn’t be upset if it were completely discarded in favor of a better or more extensible implementation.

What I would like to see, and I think this is a great opportunity to do so, is a closer collaboration between the Apache Storm and JStorm communities. For those who aren’t familiar with those projects’ relationship, I’ll start with a little history…

JStorm began at Alibaba as a fork of Storm (pre-Apache?) with Storm’s Clojure code reimplemented in Java. The rationale behind that move was that Alibaba had a large number of Java developers but very few who were proficient with Clojure. Moving to pure Java made sense as it would expand the base of potential contributors.

In late 2015 Alibaba donated the JStorm codebase to the Apache Storm project, and the Apache Storm PMC committed to converting its Clojure code to Java in order to incorporate the code donation. At the time there was one catch — Apache Storm had implemented comprehensive security features such as Kerberos authentication/authorization and multi-tenancy in its Clojure code, which greatly complicated the move to Java and incorporation of the JStorm code. JStorm did not have the same security features. A number of JStorm developers have also become Storm PMC members.

Fast forward to today. The Storm community has completed the bulk of the move to Java and the next major release (presumably 2.0, which is currently under discussion) will be largely Java-based. We are now in a much better position to begin incorporating JStorm’s features, as well as implementing new features necessary to support the Beam API (such as support for bounded pipelines, among other features).

Having separate Apache Storm and JStorm beam runner implementations doesn’t feel appropriate in my personal opinion, especially since both projects have expressed an ongoing commitment to bringing JStorm’s additional features, and just as important, community, to Apache Storm.

One final note, when the Storm community initially discussed developing a Beam runner, the general consensus was do so within the Storm repository. My current thinking is that such an effort should take place within the Beam community, not only since that is the development pattern followed by other runner implementations (Flink, Apex, etc.), but also because it would serve to increase collaboration between Apache projects (always a good thing!).

I would love to hear opinions from others in the Storm/JStorm/Beam communities.

-Taylor=


Re: Apache Storm/JStorm Runner(s) for Apache Beam

Posted by Pei HE <pe...@gmail.com>.
Hi Taylor,
I am very glad to see the interests in pushing forward Beam Storm runner.

However, I cannot convince myself the benefits of having one runner to
support all.

Beam have three types of users: pipeline writers, library writers, and
runner implementers.

I can see pros vs cons as followings:
Pros:
1. For pipeline writers and library writers, I don't see any benefits
because they are using Beam API directly.
2. For runner implementers: (I am not that familiar with the current
similarities and differences of Storm and JStorm, maybe you can help me to
fill it in.)

Cons:
For pipeline writers and library writers:
1. It means delay of the delivery. We already have a working prototype, and
there are lots of JStorm users eagerly want a JStorm API.
2. "One runner to support all" may increase the complexity, and
compromise the quality of the runner.

From my point of view, cons are clearly over pros unless I am missing
something.

Let's me know what you think.
Thanks
--
Pei


On Tue, Apr 11, 2017 at 1:47 AM, P. Taylor Goetz <pt...@apache.org> wrote:

> Note: cross-posting to dev@beam and dev@storm
>
> I’ve seen at least two threads on the dev@ list discussing the JStorm
> runner and my hope is we can expand on that discussion and cross-pollinate
> with the Storm/JStorm/Beam communities as well.
>
> A while back I created a very preliminary proof of concept of getting a
> Storm Beam runner working [1]. That was mainly an exercise for me to
> familiarize myself with the Beam API and discover what it would take to
> develop a Beam runner on top of Storm. That code is way out of date (I was
> targeting Beam’s HEAD before the 0.2.0 release, and a lot of changes have
> since taken place) and didn’t really work as Jian Liu pointed out. It was a
> start, that perhaps could be further built upon, or parts harvested, etc. I
> don’t have any particular attachment to that code and wouldn’t be upset if
> it were completely discarded in favor of a better or more extensible
> implementation.
>
> What I would like to see, and I think this is a great opportunity to do
> so, is a closer collaboration between the Apache Storm and JStorm
> communities. For those who aren’t familiar with those projects’
> relationship, I’ll start with a little history…
>
> JStorm began at Alibaba as a fork of Storm (pre-Apache?) with Storm’s
> Clojure code reimplemented in Java. The rationale behind that move was that
> Alibaba had a large number of Java developers but very few who were
> proficient with Clojure. Moving to pure Java made sense as it would expand
> the base of potential contributors.
>
> In late 2015 Alibaba donated the JStorm codebase to the Apache Storm
> project, and the Apache Storm PMC committed to converting its Clojure code
> to Java in order to incorporate the code donation. At the time there was
> one catch — Apache Storm had implemented comprehensive security features
> such as Kerberos authentication/authorization and multi-tenancy in its
> Clojure code, which greatly complicated the move to Java and incorporation
> of the JStorm code. JStorm did not have the same security features. A
> number of JStorm developers have also become Storm PMC members.
>
> Fast forward to today. The Storm community has completed the bulk of the
> move to Java and the next major release (presumably 2.0, which is currently
> under discussion) will be largely Java-based. We are now in a much better
> position to begin incorporating JStorm’s features, as well as implementing
> new features necessary to support the Beam API (such as support for bounded
> pipelines, among other features).
>
> Having separate Apache Storm and JStorm beam runner implementations
> doesn’t feel appropriate in my personal opinion, especially since both
> projects have expressed an ongoing commitment to bringing JStorm’s
> additional features, and just as important, community, to Apache Storm.
>
> One final note, when the Storm community initially discussed developing a
> Beam runner, the general consensus was do so within the Storm repository.
> My current thinking is that such an effort should take place within the
> Beam community, not only since that is the development pattern followed by
> other runner implementations (Flink, Apex, etc.), but also because it would
> serve to increase collaboration between Apache projects (always a good
> thing!).
>
> I would love to hear opinions from others in the Storm/JStorm/Beam
> communities.
>
> -Taylor