You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@heron.apache.org by Ning Wang <wa...@gmail.com> on 2019/01/16 07:35:39 UTC

Heron Spouts Code

Hi, all,

A few of us (Spencer, Saikat, Siming, Karthik, Josh, Sree) discussed today
in our general slack channel that we should have spouts code somewhere so
that people can reuse them (spouts are highly reusable in general) and
contribute improvements. This is just a recap of the idea and some updates.

We have two options:
1. add a spouts/ dir in heron project.
2. create a new project in github.

For option 1, it is easy to start. But the iteration and release will be
coupled with Heron project itself. It is likely there will be quite some
activities around spouts time by time when new spouts are added. Also,
Heron itself is basically the engine itself plus APIs and tooling, while
there could be quite some spouts in future with many new dependencies like
Kafka, pubsub, neo4j and neptune, etc. It is debatable to have spout
implementations in Heron project, and these extra dependencies could add
some unnecessary complexity.

For option 2, there will be some work up front. but it will be much easier
to manage and evolve. And here will be less concerns about new spouts (in
different languages) and dependencies because spouts are relatively
independent to each other and we may generate artifacts per spout.

Overall most people prefer option 2 for its cleanness.

I talked with Twitter OSS team. They are happy to support the initiative
and suggest us to check with Apache team and see what is the best process.
First question is that should this new side project be under Apache or not?
This might be a question to mentors. What do you think/suggest?

Another topic being discussed is the build tool in case we decide to create
a new side project. Maven is more mature for sure, but we will likely need
multi language support so currently Bazel seems to be the winner (I
personally vote for Bazel 1.0 because the backward compatibility has been
bad so far).

Any ideas or suggestions, please feel free to reply.

Regards,
--ning

Re: Heron Spouts Code

Posted by Dave Fisher <da...@comcast.net>.

Hi -

> On Jan 15, 2019, at 11:35 PM, Ning Wang <wa...@gmail.com> wrote:
> 
> Hi, all,
> 
> A few of us (Spencer, Saikat, Siming, Karthik, Josh, Sree) discussed today
> in our general slack channel that we should have spouts code somewhere so
> that people can reuse them (spouts are highly reusable in general) and
> contribute improvements. This is just a recap of the idea and some updates.
> 
> We have two options:
> 1. add a spouts/ dir in heron project.
> 2. create a new project in github.
> 
> For option 1, it is easy to start. But the iteration and release will be
> coupled with Heron project itself. It is likely there will be quite some
> activities around spouts time by time when new spouts are added. Also,
> Heron itself is basically the engine itself plus APIs and tooling, while
> there could be quite some spouts in future with many new dependencies like
> Kafka, pubsub, neo4j and neptune, etc. It is debatable to have spout
> implementations in Heron project, and these extra dependencies could add
> some unnecessary complexity.
> 
> For option 2, there will be some work up front. but it will be much easier
> to manage and evolve. And here will be less concerns about new spouts (in
> different languages) and dependencies because spouts are relatively
> independent to each other and we may generate artifacts per spout.
> 
> Overall most people prefer option 2 for its cleanness.
> 
> I talked with Twitter OSS team. They are happy to support the initiative
> and suggest us to check with Apache team and see what is the best process.
> First question is that should this new side project be under Apache or not?
> This might be a question to mentors. What do you think/suggest?

The Heron PPMC is an Apache Team. The choice should be this communities.

(1) It is ok for an Apache Project to have multiple products and repos. Examples that quickly come to mind include:
- Apache Lucene with both Lucene and Solr.
- Apache POI which recently took control of the Atticed XMLBean project’s XMLBean product.
- Apache Sling has tens of repositories since it releases its features as a large set of OSGi bundles. Have a look at Sling which seems to have a model that would be similar to that contemplated for “Spouts”.

(2) It is ok to host a new repos in Twitter’s or Ning's Github accounts. If you wanted to then bring these back to the Heron (or other project) there would then need to be an IP Clearance process. If this is a new project that you wanted to Incubate then the group could reenter the Incubator. That said if you aspire to be an Apache project with “Spouts” then you are already here.

(3) It is ok for the project to change direction and decide to be about “Spouts”, “Bolts” and “IO Connectors”.

This seems like an important discussion for the future of Heron and I am glad that you brought it to the list. I’d like to read what the community thinks about the choices.

Regards,
Dave

> 
> Another topic being discussed is the build tool in case we decide to create
> a new side project. Maven is more mature for sure, but we will likely need
> multi language support so currently Bazel seems to be the winner (I
> personally vote for Bazel 1.0 because the backward compatibility has been
> bad so far).
> 
> Any ideas or suggestions, please feel free to reply.
> 
> Regards,
> --ning

Re: Heron Spouts Code

Posted by Karthik Ramasamy <ka...@streaml.io>.

+1 for a separate repo

On Wed, Jan 16, 2019 at 2:34 PM Ning Wang <wa...@gmail.com> wrote:

> +Siming
>
> On Tue, Jan 15, 2019 at 11:35 PM Ning Wang <wa...@gmail.com> wrote:
>
> > Hi, all,
> >
> > A few of us (Spencer, Saikat, Siming, Karthik, Josh, Sree) discussed
> today
> > in our general slack channel that we should have spouts code somewhere so
> > that people can reuse them (spouts are highly reusable in general) and
> > contribute improvements. This is just a recap of the idea and some
> updates.
> >
> > We have two options:
> > 1. add a spouts/ dir in heron project.
> > 2. create a new project in github.
> >
> > For option 1, it is easy to start. But the iteration and release will be
> > coupled with Heron project itself. It is likely there will be quite some
> > activities around spouts time by time when new spouts are added. Also,
> > Heron itself is basically the engine itself plus APIs and tooling, while
> > there could be quite some spouts in future with many new dependencies
> like
> > Kafka, pubsub, neo4j and neptune, etc. It is debatable to have spout
> > implementations in Heron project, and these extra dependencies could add
> > some unnecessary complexity.
> >
> > For option 2, there will be some work up front. but it will be much
> easier
> > to manage and evolve. And here will be less concerns about new spouts (in
> > different languages) and dependencies because spouts are relatively
> > independent to each other and we may generate artifacts per spout.
> >
> > Overall most people prefer option 2 for its cleanness.
> >
> > I talked with Twitter OSS team. They are happy to support the initiative
> > and suggest us to check with Apache team and see what is the best
> process.
> > First question is that should this new side project be under Apache or
> not?
> > This might be a question to mentors. What do you think/suggest?
> >
> > Another topic being discussed is the build tool in case we decide to
> > create a new side project. Maven is more mature for sure, but we will
> > likely need multi language support so currently Bazel seems to be the
> > winner (I personally vote for Bazel 1.0 because the backward
> compatibility
> > has been bad so far).
> >
> > Any ideas or suggestions, please feel free to reply.
> >
> > Regards,
> > --ning
> >
>

Re: Heron Spouts Code

Posted by Ning Wang <wa...@gmail.com>.

It does make sense for each spout has its own version number. We should
define a guide about the versions and a good way to track them.

For Heron version, it is a good question. Spouts are not that dependent on
Heron version so I feel the lib shouldn't need to have this dependency.
However we may specify heron version in something like integration test
framework to test the logic and show how it works to users.



On Thu, Jan 17, 2019 at 7:33 AM Saikat Kanjilal <sx...@hotmail.com> wrote:

> As an analogy I was looking at Hadoop, yarn and spark as a comparison
> related to get some ideas and it seems that these components work together
> pretty seamlessly and have independent versioning.   I really feel like
> it’s up to the main engineers of each spout project on how to version
> things, as far as how to tell what version of heron to use that’s typically
> specified on the readme or the main site page for the spout.
>
> My 2 cents.
>
> Sent from my iPhone
>
> > On Jan 17, 2019, at 5:29 AM, Simon Weng <si...@gmail.com> wrote:
> >
> > This is a good question. Each version spout must maintain a compatibility
> > matrix {Spout Version, external SDK version, Heron API version}. It’s
> more
> > of a documentation effort so that user haves enough information to
> > determine which one to pick, isn’t it?
> >
> >> On Thu, Jan 17, 2019 at 7:48 AM Josh Fischer <jo...@joshfischer.io>
> wrote:
> >>
> >> If we were to go with a separate repo for the spouts how would we
> version
> >> it?  Would it be consistent with the Heron repo?  How would people know
> >> what version spout to use with the Heron version they are running?
> >>
> >>
> >>> On Thu, Jan 17, 2019 at 1:26 AM Ning Wang <wa...@gmail.com>
> wrote:
> >>>
> >>> This is an option. I have a few concerns about it:
> >>> - There will be a lot of repos and it will be messy to manage and it
> might
> >>> be harder for users to find it. I am expecting at least more than ten
> >>> (different services times different languages).
> >>> - There will be some duplicated code such as build/release configs,
> >>> scripts. etc.
> >>>
> >>> I think we should be able to achieve the first reason with a single
> repo.
> >>> Different spouts should likely be in different folders and they can
> evolve
> >>> separately.
> >>> The second reason is valid, but duplicated code is a side effect.
> >>> The third reason depends on building tool I feel. Bazel is powerful,
> but
> >>> it
> >>> is just changing time by time. :(
> >>>
> >>> Just my two cents.
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>> On Wed, Jan 16, 2019 at 8:09 PM Simon Weng <si...@gmail.com>
> wrote:
> >>>>
> >>>> Hi, all:
> >>>>
> >>>> Can it also be one of the options to even have separate repo for each
> >>> type
> >>>> of spouts? The reasons it is worth considering are:
> >>>>
> >>>> 1. Allow each spout to evolve and release in different pace because
> each
> >>>> is technically driven by external source software. For example, the
> >>>> community may need different versions of the Kafka Spout to be
> >>> compatible
> >>>> with their deployed Kafka cluster in production
> >>>> 2. Allow each spout project to use the de facto build tool that suits
> >>> the
> >>>> external SDK best. This will help to minimize the learning curve for
> >>>> constributors who specialize in different source software stack
> >>>> 3. Simply the maintainence of the build and CI
> >>>>
> >>>> I’m not familiar with the capability of Bazel, so certainly I’m not
> >>>> against it. If it can help to achieve some of the above, I guess one
> >>> single
> >>>> repo will also work then.
> >>>>
> >>>> SiMing
> >>>>
> >>>>> On Wed, Jan 16, 2019 at 5:34 PM Ning Wang <wa...@gmail.com>
> wrote:
> >>>>>
> >>>>> +Siming
> >>>>>
> >>>>> On Tue, Jan 15, 2019 at 11:35 PM Ning Wang <wa...@gmail.com>
> >>> wrote:
> >>>>>
> >>>>>> Hi, all,
> >>>>>>
> >>>>>> A few of us (Spencer, Saikat, Siming, Karthik, Josh, Sree) discussed
> >>>>>> today in our general slack channel that we should have spouts code
> >>>>>> somewhere so that people can reuse them (spouts are highly reusable
> in
> >>>>>> general) and contribute improvements. This is just a recap of the
> >>> idea and
> >>>>>> some updates.
> >>>>>>
> >>>>>> We have two options:
> >>>>>> 1. add a spouts/ dir in heron project.
> >>>>>> 2. create a new project in github.
> >>>>>>
> >>>>>> For option 1, it is easy to start. But the iteration and release
> will
> >>> be
> >>>>>> coupled with Heron project itself. It is likely there will be quite
> >>> some
> >>>>>> activities around spouts time by time when new spouts are added.
> Also,
> >>>>>> Heron itself is basically the engine itself plus APIs and tooling,
> >>> while
> >>>>>> there could be quite some spouts in future with many new
> dependencies
> >>> like
> >>>>>> Kafka, pubsub, neo4j and neptune, etc. It is debatable to have spout
> >>>>>> implementations in Heron project, and these extra dependencies could
> >>> add
> >>>>>> some unnecessary complexity.
> >>>>>>
> >>>>>> For option 2, there will be some work up front. but it will be much
> >>>>>> easier to manage and evolve. And here will be less concerns about
> new
> >>>>>> spouts (in different languages) and dependencies because spouts are
> >>>>>> relatively independent to each other and we may generate artifacts
> per
> >>>>>> spout.
> >>>>>>
> >>>>>> Overall most people prefer option 2 for its cleanness.
> >>>>>>
> >>>>>> I talked with Twitter OSS team. They are happy to support the
> >>> initiative
> >>>>>> and suggest us to check with Apache team and see what is the best
> >>> process.
> >>>>>> First question is that should this new side project be under Apache
> >>> or not?
> >>>>>> This might be a question to mentors. What do you think/suggest?
> >>>>>>
> >>>>>> Another topic being discussed is the build tool in case we decide to
> >>>>>> create a new side project. Maven is more mature for sure, but we
> will
> >>>>>> likely need multi language support so currently Bazel seems to be
> the
> >>>>>> winner (I personally vote for Bazel 1.0 because the backward
> >>> compatibility
> >>>>>> has been bad so far).
> >>>>>>
> >>>>>> Any ideas or suggestions, please feel free to reply.
> >>>>>>
> >>>>>> Regards,
> >>>>>> --ning
> >>>>>>
> >>>>> --
> >>>> Sent from Gmail Mobile
> >>>>
> >>>
> >> --
> >> Sent from A Mobile Device
> >>
> > --
> > Sent from Gmail Mobile
>

Re: Heron Spouts Code

Posted by Saikat Kanjilal <sx...@hotmail.com>.

As an analogy I was looking at Hadoop, yarn and spark as a comparison related to get some ideas and it seems that these components work together pretty seamlessly and have independent versioning.   I really feel like it’s up to the main engineers of each spout project on how to version things, as far as how to tell what version of heron to use that’s typically specified on the readme or the main site page for the spout.

My 2 cents.

Sent from my iPhone

> On Jan 17, 2019, at 5:29 AM, Simon Weng <si...@gmail.com> wrote:
> 
> This is a good question. Each version spout must maintain a compatibility
> matrix {Spout Version, external SDK version, Heron API version}. It’s more
> of a documentation effort so that user haves enough information to
> determine which one to pick, isn’t it?
> 
>> On Thu, Jan 17, 2019 at 7:48 AM Josh Fischer <jo...@joshfischer.io> wrote:
>> 
>> If we were to go with a separate repo for the spouts how would we version
>> it?  Would it be consistent with the Heron repo?  How would people know
>> what version spout to use with the Heron version they are running?
>> 
>> 
>>> On Thu, Jan 17, 2019 at 1:26 AM Ning Wang <wa...@gmail.com> wrote:
>>> 
>>> This is an option. I have a few concerns about it:
>>> - There will be a lot of repos and it will be messy to manage and it might
>>> be harder for users to find it. I am expecting at least more than ten
>>> (different services times different languages).
>>> - There will be some duplicated code such as build/release configs,
>>> scripts. etc.
>>> 
>>> I think we should be able to achieve the first reason with a single repo.
>>> Different spouts should likely be in different folders and they can evolve
>>> separately.
>>> The second reason is valid, but duplicated code is a side effect.
>>> The third reason depends on building tool I feel. Bazel is powerful, but
>>> it
>>> is just changing time by time. :(
>>> 
>>> Just my two cents.
>>> 
>>> 
>>> 
>>> 
>>> 
>>>> On Wed, Jan 16, 2019 at 8:09 PM Simon Weng <si...@gmail.com> wrote:
>>>> 
>>>> Hi, all:
>>>> 
>>>> Can it also be one of the options to even have separate repo for each
>>> type
>>>> of spouts? The reasons it is worth considering are:
>>>> 
>>>> 1. Allow each spout to evolve and release in different pace because each
>>>> is technically driven by external source software. For example, the
>>>> community may need different versions of the Kafka Spout to be
>>> compatible
>>>> with their deployed Kafka cluster in production
>>>> 2. Allow each spout project to use the de facto build tool that suits
>>> the
>>>> external SDK best. This will help to minimize the learning curve for
>>>> constributors who specialize in different source software stack
>>>> 3. Simply the maintainence of the build and CI
>>>> 
>>>> I’m not familiar with the capability of Bazel, so certainly I’m not
>>>> against it. If it can help to achieve some of the above, I guess one
>>> single
>>>> repo will also work then.
>>>> 
>>>> SiMing
>>>> 
>>>>> On Wed, Jan 16, 2019 at 5:34 PM Ning Wang <wa...@gmail.com> wrote:
>>>>> 
>>>>> +Siming
>>>>> 
>>>>> On Tue, Jan 15, 2019 at 11:35 PM Ning Wang <wa...@gmail.com>
>>> wrote:
>>>>> 
>>>>>> Hi, all,
>>>>>> 
>>>>>> A few of us (Spencer, Saikat, Siming, Karthik, Josh, Sree) discussed
>>>>>> today in our general slack channel that we should have spouts code
>>>>>> somewhere so that people can reuse them (spouts are highly reusable in
>>>>>> general) and contribute improvements. This is just a recap of the
>>> idea and
>>>>>> some updates.
>>>>>> 
>>>>>> We have two options:
>>>>>> 1. add a spouts/ dir in heron project.
>>>>>> 2. create a new project in github.
>>>>>> 
>>>>>> For option 1, it is easy to start. But the iteration and release will
>>> be
>>>>>> coupled with Heron project itself. It is likely there will be quite
>>> some
>>>>>> activities around spouts time by time when new spouts are added. Also,
>>>>>> Heron itself is basically the engine itself plus APIs and tooling,
>>> while
>>>>>> there could be quite some spouts in future with many new dependencies
>>> like
>>>>>> Kafka, pubsub, neo4j and neptune, etc. It is debatable to have spout
>>>>>> implementations in Heron project, and these extra dependencies could
>>> add
>>>>>> some unnecessary complexity.
>>>>>> 
>>>>>> For option 2, there will be some work up front. but it will be much
>>>>>> easier to manage and evolve. And here will be less concerns about new
>>>>>> spouts (in different languages) and dependencies because spouts are
>>>>>> relatively independent to each other and we may generate artifacts per
>>>>>> spout.
>>>>>> 
>>>>>> Overall most people prefer option 2 for its cleanness.
>>>>>> 
>>>>>> I talked with Twitter OSS team. They are happy to support the
>>> initiative
>>>>>> and suggest us to check with Apache team and see what is the best
>>> process.
>>>>>> First question is that should this new side project be under Apache
>>> or not?
>>>>>> This might be a question to mentors. What do you think/suggest?
>>>>>> 
>>>>>> Another topic being discussed is the build tool in case we decide to
>>>>>> create a new side project. Maven is more mature for sure, but we will
>>>>>> likely need multi language support so currently Bazel seems to be the
>>>>>> winner (I personally vote for Bazel 1.0 because the backward
>>> compatibility
>>>>>> has been bad so far).
>>>>>> 
>>>>>> Any ideas or suggestions, please feel free to reply.
>>>>>> 
>>>>>> Regards,
>>>>>> --ning
>>>>>> 
>>>>> --
>>>> Sent from Gmail Mobile
>>>> 
>>> 
>> --
>> Sent from A Mobile Device
>> 
> -- 
> Sent from Gmail Mobile

Re: Heron Spouts Code

Posted by Simon Weng <si...@gmail.com>.

This is a good question. Each version spout must maintain a compatibility
matrix {Spout Version, external SDK version, Heron API version}. It’s more
of a documentation effort so that user haves enough information to
determine which one to pick, isn’t it?

On Thu, Jan 17, 2019 at 7:48 AM Josh Fischer <jo...@joshfischer.io> wrote:

> If we were to go with a separate repo for the spouts how would we version
> it?  Would it be consistent with the Heron repo?  How would people know
> what version spout to use with the Heron version they are running?
>
>
> On Thu, Jan 17, 2019 at 1:26 AM Ning Wang <wa...@gmail.com> wrote:
>
>> This is an option. I have a few concerns about it:
>> - There will be a lot of repos and it will be messy to manage and it might
>> be harder for users to find it. I am expecting at least more than ten
>> (different services times different languages).
>> - There will be some duplicated code such as build/release configs,
>> scripts. etc.
>>
>> I think we should be able to achieve the first reason with a single repo.
>> Different spouts should likely be in different folders and they can evolve
>> separately.
>> The second reason is valid, but duplicated code is a side effect.
>> The third reason depends on building tool I feel. Bazel is powerful, but
>> it
>> is just changing time by time. :(
>>
>> Just my two cents.
>>
>>
>>
>>
>>
>> On Wed, Jan 16, 2019 at 8:09 PM Simon Weng <si...@gmail.com> wrote:
>>
>> > Hi, all:
>> >
>> > Can it also be one of the options to even have separate repo for each
>> type
>> > of spouts? The reasons it is worth considering are:
>> >
>> > 1. Allow each spout to evolve and release in different pace because each
>> > is technically driven by external source software. For example, the
>> > community may need different versions of the Kafka Spout to be
>> compatible
>> > with their deployed Kafka cluster in production
>> > 2. Allow each spout project to use the de facto build tool that suits
>> the
>> > external SDK best. This will help to minimize the learning curve for
>> > constributors who specialize in different source software stack
>> > 3. Simply the maintainence of the build and CI
>> >
>> > I’m not familiar with the capability of Bazel, so certainly I’m not
>> > against it. If it can help to achieve some of the above, I guess one
>> single
>> > repo will also work then.
>> >
>> > SiMing
>> >
>> > On Wed, Jan 16, 2019 at 5:34 PM Ning Wang <wa...@gmail.com> wrote:
>> >
>> >> +Siming
>> >>
>> >> On Tue, Jan 15, 2019 at 11:35 PM Ning Wang <wa...@gmail.com>
>> wrote:
>> >>
>> >>> Hi, all,
>> >>>
>> >>> A few of us (Spencer, Saikat, Siming, Karthik, Josh, Sree) discussed
>> >>> today in our general slack channel that we should have spouts code
>> >>> somewhere so that people can reuse them (spouts are highly reusable in
>> >>> general) and contribute improvements. This is just a recap of the
>> idea and
>> >>> some updates.
>> >>>
>> >>> We have two options:
>> >>> 1. add a spouts/ dir in heron project.
>> >>> 2. create a new project in github.
>> >>>
>> >>> For option 1, it is easy to start. But the iteration and release will
>> be
>> >>> coupled with Heron project itself. It is likely there will be quite
>> some
>> >>> activities around spouts time by time when new spouts are added. Also,
>> >>> Heron itself is basically the engine itself plus APIs and tooling,
>> while
>> >>> there could be quite some spouts in future with many new dependencies
>> like
>> >>> Kafka, pubsub, neo4j and neptune, etc. It is debatable to have spout
>> >>> implementations in Heron project, and these extra dependencies could
>> add
>> >>> some unnecessary complexity.
>> >>>
>> >>> For option 2, there will be some work up front. but it will be much
>> >>> easier to manage and evolve. And here will be less concerns about new
>> >>> spouts (in different languages) and dependencies because spouts are
>> >>> relatively independent to each other and we may generate artifacts per
>> >>> spout.
>> >>>
>> >>> Overall most people prefer option 2 for its cleanness.
>> >>>
>> >>> I talked with Twitter OSS team. They are happy to support the
>> initiative
>> >>> and suggest us to check with Apache team and see what is the best
>> process.
>> >>> First question is that should this new side project be under Apache
>> or not?
>> >>> This might be a question to mentors. What do you think/suggest?
>> >>>
>> >>> Another topic being discussed is the build tool in case we decide to
>> >>> create a new side project. Maven is more mature for sure, but we will
>> >>> likely need multi language support so currently Bazel seems to be the
>> >>> winner (I personally vote for Bazel 1.0 because the backward
>> compatibility
>> >>> has been bad so far).
>> >>>
>> >>> Any ideas or suggestions, please feel free to reply.
>> >>>
>> >>> Regards,
>> >>> --ning
>> >>>
>> >> --
>> > Sent from Gmail Mobile
>> >
>>
> --
> Sent from A Mobile Device
>
-- 
Sent from Gmail Mobile

Re: Heron Spouts Code

Posted by Josh Fischer <jo...@joshfischer.io>.

If we were to go with a separate repo for the spouts how would we version
it?  Would it be consistent with the Heron repo?  How would people know
what version spout to use with the Heron version they are running?


On Thu, Jan 17, 2019 at 1:26 AM Ning Wang <wa...@gmail.com> wrote:

> This is an option. I have a few concerns about it:
> - There will be a lot of repos and it will be messy to manage and it might
> be harder for users to find it. I am expecting at least more than ten
> (different services times different languages).
> - There will be some duplicated code such as build/release configs,
> scripts. etc.
>
> I think we should be able to achieve the first reason with a single repo.
> Different spouts should likely be in different folders and they can evolve
> separately.
> The second reason is valid, but duplicated code is a side effect.
> The third reason depends on building tool I feel. Bazel is powerful, but it
> is just changing time by time. :(
>
> Just my two cents.
>
>
>
>
>
> On Wed, Jan 16, 2019 at 8:09 PM Simon Weng <si...@gmail.com> wrote:
>
> > Hi, all:
> >
> > Can it also be one of the options to even have separate repo for each
> type
> > of spouts? The reasons it is worth considering are:
> >
> > 1. Allow each spout to evolve and release in different pace because each
> > is technically driven by external source software. For example, the
> > community may need different versions of the Kafka Spout to be compatible
> > with their deployed Kafka cluster in production
> > 2. Allow each spout project to use the de facto build tool that suits the
> > external SDK best. This will help to minimize the learning curve for
> > constributors who specialize in different source software stack
> > 3. Simply the maintainence of the build and CI
> >
> > I’m not familiar with the capability of Bazel, so certainly I’m not
> > against it. If it can help to achieve some of the above, I guess one
> single
> > repo will also work then.
> >
> > SiMing
> >
> > On Wed, Jan 16, 2019 at 5:34 PM Ning Wang <wa...@gmail.com> wrote:
> >
> >> +Siming
> >>
> >> On Tue, Jan 15, 2019 at 11:35 PM Ning Wang <wa...@gmail.com>
> wrote:
> >>
> >>> Hi, all,
> >>>
> >>> A few of us (Spencer, Saikat, Siming, Karthik, Josh, Sree) discussed
> >>> today in our general slack channel that we should have spouts code
> >>> somewhere so that people can reuse them (spouts are highly reusable in
> >>> general) and contribute improvements. This is just a recap of the idea
> and
> >>> some updates.
> >>>
> >>> We have two options:
> >>> 1. add a spouts/ dir in heron project.
> >>> 2. create a new project in github.
> >>>
> >>> For option 1, it is easy to start. But the iteration and release will
> be
> >>> coupled with Heron project itself. It is likely there will be quite
> some
> >>> activities around spouts time by time when new spouts are added. Also,
> >>> Heron itself is basically the engine itself plus APIs and tooling,
> while
> >>> there could be quite some spouts in future with many new dependencies
> like
> >>> Kafka, pubsub, neo4j and neptune, etc. It is debatable to have spout
> >>> implementations in Heron project, and these extra dependencies could
> add
> >>> some unnecessary complexity.
> >>>
> >>> For option 2, there will be some work up front. but it will be much
> >>> easier to manage and evolve. And here will be less concerns about new
> >>> spouts (in different languages) and dependencies because spouts are
> >>> relatively independent to each other and we may generate artifacts per
> >>> spout.
> >>>
> >>> Overall most people prefer option 2 for its cleanness.
> >>>
> >>> I talked with Twitter OSS team. They are happy to support the
> initiative
> >>> and suggest us to check with Apache team and see what is the best
> process.
> >>> First question is that should this new side project be under Apache or
> not?
> >>> This might be a question to mentors. What do you think/suggest?
> >>>
> >>> Another topic being discussed is the build tool in case we decide to
> >>> create a new side project. Maven is more mature for sure, but we will
> >>> likely need multi language support so currently Bazel seems to be the
> >>> winner (I personally vote for Bazel 1.0 because the backward
> compatibility
> >>> has been bad so far).
> >>>
> >>> Any ideas or suggestions, please feel free to reply.
> >>>
> >>> Regards,
> >>> --ning
> >>>
> >> --
> > Sent from Gmail Mobile
> >
>
-- 
Sent from A Mobile Device

Re: Heron Spouts Code

Posted by Simon Weng <si...@gmail.com>.

As long as it does not have to be a single release train for all of the
spout, I’m good with a single separate repo hosting all of spout. We just
need a bit upfront effort to setup Bazel for the project.

However, I expect once one Spout project folder is setup, it can serve as
sample for other spouts of same language, for example, once Kafka Spout is
migrated from Maven to Bazel, other Java-based spout should be easy to
setup.

Seems time to pick up some Bazel knowledge.

On Thu, Jan 17, 2019 at 2:25 AM Ning Wang <wa...@gmail.com> wrote:

> This is an option. I have a few concerns about it:
> - There will be a lot of repos and it will be messy to manage and it might
> be harder for users to find it. I am expecting at least more than ten
> (different services times different languages).
> - There will be some duplicated code such as build/release configs,
> scripts. etc.
>
> I think we should be able to achieve the first reason with a single repo.
> Different spouts should likely be in different folders and they can evolve
> separately.
> The second reason is valid, but duplicated code is a side effect.
> The third reason depends on building tool I feel. Bazel is powerful, but
> it is just changing time by time. :(
>
> Just my two cents.
>
>
>
>
>
> On Wed, Jan 16, 2019 at 8:09 PM Simon Weng <si...@gmail.com> wrote:
>
>> Hi, all:
>>
>> Can it also be one of the options to even have separate repo for each
>> type of spouts? The reasons it is worth considering are:
>>
>> 1. Allow each spout to evolve and release in different pace because each
>> is technically driven by external source software. For example, the
>> community may need different versions of the Kafka Spout to be compatible
>> with their deployed Kafka cluster in production
>> 2. Allow each spout project to use the de facto build tool that suits the
>> external SDK best. This will help to minimize the learning curve for
>> constributors who specialize in different source software stack
>> 3. Simply the maintainence of the build and CI
>>
>> I’m not familiar with the capability of Bazel, so certainly I’m not
>> against it. If it can help to achieve some of the above, I guess one single
>> repo will also work then.
>>
>> SiMing
>>
>> On Wed, Jan 16, 2019 at 5:34 PM Ning Wang <wa...@gmail.com> wrote:
>>
>>> +Siming
>>>
>>> On Tue, Jan 15, 2019 at 11:35 PM Ning Wang <wa...@gmail.com> wrote:
>>>
>>>> Hi, all,
>>>>
>>>> A few of us (Spencer, Saikat, Siming, Karthik, Josh, Sree) discussed
>>>> today in our general slack channel that we should have spouts code
>>>> somewhere so that people can reuse them (spouts are highly reusable in
>>>> general) and contribute improvements. This is just a recap of the idea and
>>>> some updates.
>>>>
>>>> We have two options:
>>>> 1. add a spouts/ dir in heron project.
>>>> 2. create a new project in github.
>>>>
>>>> For option 1, it is easy to start. But the iteration and release will
>>>> be coupled with Heron project itself. It is likely there will be quite some
>>>> activities around spouts time by time when new spouts are added. Also,
>>>> Heron itself is basically the engine itself plus APIs and tooling, while
>>>> there could be quite some spouts in future with many new dependencies like
>>>> Kafka, pubsub, neo4j and neptune, etc. It is debatable to have spout
>>>> implementations in Heron project, and these extra dependencies could add
>>>> some unnecessary complexity.
>>>>
>>>> For option 2, there will be some work up front. but it will be much
>>>> easier to manage and evolve. And here will be less concerns about new
>>>> spouts (in different languages) and dependencies because spouts are
>>>> relatively independent to each other and we may generate artifacts per
>>>> spout.
>>>>
>>>> Overall most people prefer option 2 for its cleanness.
>>>>
>>>> I talked with Twitter OSS team. They are happy to support the
>>>> initiative and suggest us to check with Apache team and see what is the
>>>> best process. First question is that should this new side project be under
>>>> Apache or not? This might be a question to mentors. What do you
>>>> think/suggest?
>>>>
>>>> Another topic being discussed is the build tool in case we decide to
>>>> create a new side project. Maven is more mature for sure, but we will
>>>> likely need multi language support so currently Bazel seems to be the
>>>> winner (I personally vote for Bazel 1.0 because the backward compatibility
>>>> has been bad so far).
>>>>
>>>> Any ideas or suggestions, please feel free to reply.
>>>>
>>>> Regards,
>>>> --ning
>>>>
>>> --
>> Sent from Gmail Mobile
>>
> --
Sent from Gmail Mobile

Re: Heron Spouts Code

Posted by Ning Wang <wa...@gmail.com>.

This is an option. I have a few concerns about it:
- There will be a lot of repos and it will be messy to manage and it might
be harder for users to find it. I am expecting at least more than ten
(different services times different languages).
- There will be some duplicated code such as build/release configs,
scripts. etc.

I think we should be able to achieve the first reason with a single repo.
Different spouts should likely be in different folders and they can evolve
separately.
The second reason is valid, but duplicated code is a side effect.
The third reason depends on building tool I feel. Bazel is powerful, but it
is just changing time by time. :(

Just my two cents.





On Wed, Jan 16, 2019 at 8:09 PM Simon Weng <si...@gmail.com> wrote:

> Hi, all:
>
> Can it also be one of the options to even have separate repo for each type
> of spouts? The reasons it is worth considering are:
>
> 1. Allow each spout to evolve and release in different pace because each
> is technically driven by external source software. For example, the
> community may need different versions of the Kafka Spout to be compatible
> with their deployed Kafka cluster in production
> 2. Allow each spout project to use the de facto build tool that suits the
> external SDK best. This will help to minimize the learning curve for
> constributors who specialize in different source software stack
> 3. Simply the maintainence of the build and CI
>
> I’m not familiar with the capability of Bazel, so certainly I’m not
> against it. If it can help to achieve some of the above, I guess one single
> repo will also work then.
>
> SiMing
>
> On Wed, Jan 16, 2019 at 5:34 PM Ning Wang <wa...@gmail.com> wrote:
>
>> +Siming
>>
>> On Tue, Jan 15, 2019 at 11:35 PM Ning Wang <wa...@gmail.com> wrote:
>>
>>> Hi, all,
>>>
>>> A few of us (Spencer, Saikat, Siming, Karthik, Josh, Sree) discussed
>>> today in our general slack channel that we should have spouts code
>>> somewhere so that people can reuse them (spouts are highly reusable in
>>> general) and contribute improvements. This is just a recap of the idea and
>>> some updates.
>>>
>>> We have two options:
>>> 1. add a spouts/ dir in heron project.
>>> 2. create a new project in github.
>>>
>>> For option 1, it is easy to start. But the iteration and release will be
>>> coupled with Heron project itself. It is likely there will be quite some
>>> activities around spouts time by time when new spouts are added. Also,
>>> Heron itself is basically the engine itself plus APIs and tooling, while
>>> there could be quite some spouts in future with many new dependencies like
>>> Kafka, pubsub, neo4j and neptune, etc. It is debatable to have spout
>>> implementations in Heron project, and these extra dependencies could add
>>> some unnecessary complexity.
>>>
>>> For option 2, there will be some work up front. but it will be much
>>> easier to manage and evolve. And here will be less concerns about new
>>> spouts (in different languages) and dependencies because spouts are
>>> relatively independent to each other and we may generate artifacts per
>>> spout.
>>>
>>> Overall most people prefer option 2 for its cleanness.
>>>
>>> I talked with Twitter OSS team. They are happy to support the initiative
>>> and suggest us to check with Apache team and see what is the best process.
>>> First question is that should this new side project be under Apache or not?
>>> This might be a question to mentors. What do you think/suggest?
>>>
>>> Another topic being discussed is the build tool in case we decide to
>>> create a new side project. Maven is more mature for sure, but we will
>>> likely need multi language support so currently Bazel seems to be the
>>> winner (I personally vote for Bazel 1.0 because the backward compatibility
>>> has been bad so far).
>>>
>>> Any ideas or suggestions, please feel free to reply.
>>>
>>> Regards,
>>> --ning
>>>
>> --
> Sent from Gmail Mobile
>

Re: Heron Spouts Code

Posted by Simon Weng <si...@gmail.com>.

Hi, all:

Can it also be one of the options to even have separate repo for each type
of spouts? The reasons it is worth considering are:

1. Allow each spout to evolve and release in different pace because each is
technically driven by external source software. For example, the community
may need different versions of the Kafka Spout to be compatible with their
deployed Kafka cluster in production
2. Allow each spout project to use the de facto build tool that suits the
external SDK best. This will help to minimize the learning curve for
constributors who specialize in different source software stack
3. Simply the maintainence of the build and CI

I’m not familiar with the capability of Bazel, so certainly I’m not against
it. If it can help to achieve some of the above, I guess one single repo
will also work then.

SiMing

On Wed, Jan 16, 2019 at 5:34 PM Ning Wang <wa...@gmail.com> wrote:

> +Siming
>
> On Tue, Jan 15, 2019 at 11:35 PM Ning Wang <wa...@gmail.com> wrote:
>
>> Hi, all,
>>
>> A few of us (Spencer, Saikat, Siming, Karthik, Josh, Sree) discussed
>> today in our general slack channel that we should have spouts code
>> somewhere so that people can reuse them (spouts are highly reusable in
>> general) and contribute improvements. This is just a recap of the idea and
>> some updates.
>>
>> We have two options:
>> 1. add a spouts/ dir in heron project.
>> 2. create a new project in github.
>>
>> For option 1, it is easy to start. But the iteration and release will be
>> coupled with Heron project itself. It is likely there will be quite some
>> activities around spouts time by time when new spouts are added. Also,
>> Heron itself is basically the engine itself plus APIs and tooling, while
>> there could be quite some spouts in future with many new dependencies like
>> Kafka, pubsub, neo4j and neptune, etc. It is debatable to have spout
>> implementations in Heron project, and these extra dependencies could add
>> some unnecessary complexity.
>>
>> For option 2, there will be some work up front. but it will be much
>> easier to manage and evolve. And here will be less concerns about new
>> spouts (in different languages) and dependencies because spouts are
>> relatively independent to each other and we may generate artifacts per
>> spout.
>>
>> Overall most people prefer option 2 for its cleanness.
>>
>> I talked with Twitter OSS team. They are happy to support the initiative
>> and suggest us to check with Apache team and see what is the best process.
>> First question is that should this new side project be under Apache or not?
>> This might be a question to mentors. What do you think/suggest?
>>
>> Another topic being discussed is the build tool in case we decide to
>> create a new side project. Maven is more mature for sure, but we will
>> likely need multi language support so currently Bazel seems to be the
>> winner (I personally vote for Bazel 1.0 because the backward compatibility
>> has been bad so far).
>>
>> Any ideas or suggestions, please feel free to reply.
>>
>> Regards,
>> --ning
>>
> --
Sent from Gmail Mobile

Re: Heron Spouts Code

Posted by Ning Wang <wa...@gmail.com>.

+Siming

On Tue, Jan 15, 2019 at 11:35 PM Ning Wang <wa...@gmail.com> wrote:

> Hi, all,
>
> A few of us (Spencer, Saikat, Siming, Karthik, Josh, Sree) discussed today
> in our general slack channel that we should have spouts code somewhere so
> that people can reuse them (spouts are highly reusable in general) and
> contribute improvements. This is just a recap of the idea and some updates.
>
> We have two options:
> 1. add a spouts/ dir in heron project.
> 2. create a new project in github.
>
> For option 1, it is easy to start. But the iteration and release will be
> coupled with Heron project itself. It is likely there will be quite some
> activities around spouts time by time when new spouts are added. Also,
> Heron itself is basically the engine itself plus APIs and tooling, while
> there could be quite some spouts in future with many new dependencies like
> Kafka, pubsub, neo4j and neptune, etc. It is debatable to have spout
> implementations in Heron project, and these extra dependencies could add
> some unnecessary complexity.
>
> For option 2, there will be some work up front. but it will be much easier
> to manage and evolve. And here will be less concerns about new spouts (in
> different languages) and dependencies because spouts are relatively
> independent to each other and we may generate artifacts per spout.
>
> Overall most people prefer option 2 for its cleanness.
>
> I talked with Twitter OSS team. They are happy to support the initiative
> and suggest us to check with Apache team and see what is the best process.
> First question is that should this new side project be under Apache or not?
> This might be a question to mentors. What do you think/suggest?
>
> Another topic being discussed is the build tool in case we decide to
> create a new side project. Maven is more mature for sure, but we will
> likely need multi language support so currently Bazel seems to be the
> winner (I personally vote for Bazel 1.0 because the backward compatibility
> has been bad so far).
>
> Any ideas or suggestions, please feel free to reply.
>
> Regards,
> --ning
>