You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@heron.apache.org by Ning Wang <wa...@gmail.com> on 2018/09/19 17:21:50 UTC

Discussion of the support of Bolt/Spout in Streamlet API

Hi, all,

We had a discussion in this PR but I am feeling that it would be good to
gather more thoughts from other devs/users as well.

https://github.com/apache/incubator-heron/pull/3029#pullrequestreview-156614156


During Twitter internal onboarding of Streamlet API, I started to consider
supporting low level Bolt and Spout in Streamlet API. I totally understand
the concerns that Neng and Jerry raised in the PR that the Streamlet API is
not pure with Bolt/Spout support because it would expose low level things.
However I am still feeling that the advantages is way more than the
disadvantages with the support. The following are my comments in the RP:

========

Here are my thoughts:

Streamlet is not really the abstraction. My feeling is that Streamlet is
good at the DAG layer but not flexible enough in the low level (operators).
I would think it is like Scala vs Java(not the same, just some idea). Scala
has the nice functional API but it is pretty useless in real life if
procedural code is not allowed/supported.

Two reasons:

   1. Migration is one major reason. There are quite some existing
   topologies written in low level API (for heron and storm). Streamlet is
   only friendly to new users, existing code such as KafkaSpout (it is spout,
   but same issue) in storm and some ML bolts has to be rewritten to take the
   readability/maintainability advantages.
   2. Bolt/Spout are more flexible. They can do a lot more than a function
   provided by Streamlet API (initialization, config, checkpoint, etc). For
   examples, the stateful processing and component configs, they are not
   supported currently by Streamlet and if we add the features, it is likely
   user has to provide extra functions as parameters and the Streamlet API
   would became more and more complicated. Streamlet API will evolve but
   supporting Bolt/Spout could give us a lot room to design a clean API.

========

Re: Discussion of the support of Bolt/Spout in Streamlet API

Posted by Ning Wang <wa...@gmail.com>.
Thanks for your input Josh!

Sanjeev has a comment in the PR to improve it. I am going to try it out. At
the same time, please feel free to reply with your concerns or suggestions.
Thanks in advance.

On Wed, Sep 19, 2018 at 2:00 PM Josh Fischer <jo...@joshfischer.io> wrote:

> I can understand why some would not want to mix the two APIs as they each
> stand for a different concept.  I also have found in my own experience the
> streamlet API to be limiting in some cases.  For example I couldn't find a
> way to implement a specific grouping between Streamlets in a case where I
> wanted fine grained control on what data was sent over different instances
> of a Streamlet (of course this is probably part of the abstraction).    I
> like the low level control you have with the spout and bolt implementations
> and think it would be nice to have the flexibility to choose when you want
> to take fine grained control if using the Streamlet API.
>
>
>
> On Wed, Sep 19, 2018 at 12:22 PM Ning Wang <wa...@gmail.com> wrote:
>
> > Hi, all,
> >
> > We had a discussion in this PR but I am feeling that it would be good to
> > gather more thoughts from other devs/users as well.
> >
> >
> >
> https://github.com/apache/incubator-heron/pull/3029#pullrequestreview-156614156
> >
> >
> > During Twitter internal onboarding of Streamlet API, I started to
> consider
> > supporting low level Bolt and Spout in Streamlet API. I totally
> understand
> > the concerns that Neng and Jerry raised in the PR that the Streamlet API
> is
> > not pure with Bolt/Spout support because it would expose low level
> things.
> > However I am still feeling that the advantages is way more than the
> > disadvantages with the support. The following are my comments in the RP:
> >
> > ========
> >
> > Here are my thoughts:
> >
> > Streamlet is not really the abstraction. My feeling is that Streamlet is
> > good at the DAG layer but not flexible enough in the low level
> (operators).
> > I would think it is like Scala vs Java(not the same, just some idea).
> Scala
> > has the nice functional API but it is pretty useless in real life if
> > procedural code is not allowed/supported.
> >
> > Two reasons:
> >
> >    1. Migration is one major reason. There are quite some existing
> >    topologies written in low level API (for heron and storm). Streamlet
> is
> >    only friendly to new users, existing code such as KafkaSpout (it is
> > spout,
> >    but same issue) in storm and some ML bolts has to be rewritten to take
> > the
> >    readability/maintainability advantages.
> >    2. Bolt/Spout are more flexible. They can do a lot more than a
> function
> >    provided by Streamlet API (initialization, config, checkpoint, etc).
> For
> >    examples, the stateful processing and component configs, they are not
> >    supported currently by Streamlet and if we add the features, it is
> > likely
> >    user has to provide extra functions as parameters and the Streamlet
> API
> >    would became more and more complicated. Streamlet API will evolve but
> >    supporting Bolt/Spout could give us a lot room to design a clean API.
> >
> > ========
> >
>

Re: Discussion of the support of Bolt/Spout in Streamlet API

Posted by Josh Fischer <jo...@joshfischer.io>.
I can understand why some would not want to mix the two APIs as they each
stand for a different concept.  I also have found in my own experience the
streamlet API to be limiting in some cases.  For example I couldn't find a
way to implement a specific grouping between Streamlets in a case where I
wanted fine grained control on what data was sent over different instances
of a Streamlet (of course this is probably part of the abstraction).    I
like the low level control you have with the spout and bolt implementations
and think it would be nice to have the flexibility to choose when you want
to take fine grained control if using the Streamlet API.



On Wed, Sep 19, 2018 at 12:22 PM Ning Wang <wa...@gmail.com> wrote:

> Hi, all,
>
> We had a discussion in this PR but I am feeling that it would be good to
> gather more thoughts from other devs/users as well.
>
>
> https://github.com/apache/incubator-heron/pull/3029#pullrequestreview-156614156
>
>
> During Twitter internal onboarding of Streamlet API, I started to consider
> supporting low level Bolt and Spout in Streamlet API. I totally understand
> the concerns that Neng and Jerry raised in the PR that the Streamlet API is
> not pure with Bolt/Spout support because it would expose low level things.
> However I am still feeling that the advantages is way more than the
> disadvantages with the support. The following are my comments in the RP:
>
> ========
>
> Here are my thoughts:
>
> Streamlet is not really the abstraction. My feeling is that Streamlet is
> good at the DAG layer but not flexible enough in the low level (operators).
> I would think it is like Scala vs Java(not the same, just some idea). Scala
> has the nice functional API but it is pretty useless in real life if
> procedural code is not allowed/supported.
>
> Two reasons:
>
>    1. Migration is one major reason. There are quite some existing
>    topologies written in low level API (for heron and storm). Streamlet is
>    only friendly to new users, existing code such as KafkaSpout (it is
> spout,
>    but same issue) in storm and some ML bolts has to be rewritten to take
> the
>    readability/maintainability advantages.
>    2. Bolt/Spout are more flexible. They can do a lot more than a function
>    provided by Streamlet API (initialization, config, checkpoint, etc). For
>    examples, the stateful processing and component configs, they are not
>    supported currently by Streamlet and if we add the features, it is
> likely
>    user has to provide extra functions as parameters and the Streamlet API
>    would became more and more complicated. Streamlet API will evolve but
>    supporting Bolt/Spout could give us a lot room to design a clean API.
>
> ========
>